The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press 【DBLP Link】
【Paper Link】 【Pages】:3-11
【Authors】: Avinash Balakrishnan ; Djallel Bouneffouf ; Nicholas Mattei ; Francesca Rossi
【Abstract】: AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance.
【Keywords】:
【Paper Link】 【Pages】:12-19
【Authors】: Sambaran Bandyopadhyay ; N. Lokesh ; M. Narasimha Murty
【Abstract】: Attributed network embedding has received much interest from the research community as most of the networks come with some content in each node, which is also known as node attributes. Existing attributed network approaches work well when the network is consistent in structure and attributes, and nodes behave as expected. But real world networks often have anomalous nodes. Typically these outliers, being relatively unexplainable, affect the embeddings of other nodes in the network. Thus all the downstream network mining tasks fail miserably in the presence of such outliers. Hence an integrated approach to detect anomalies and reduce their overall effect on the network embedding is required.Towards this end, we propose an unsupervised outlier aware network embedding algorithm (ONE) for attributed networks, which minimizes the effect of the outlier nodes, and hence generates robust network embeddings. We align and jointly optimize the loss functions coming from structure and attributes of the network. To the best of our knowledge, this is the first generic network embedding approach which incorporates the effect of outliers for an attributed network without any supervision. We experimented on publicly available real networks and manually planted different types of outliers to check the performance of the proposed algorithm. Results demonstrate the superiority of our approach to detect the network outliers compared to the state-of-the-art approaches. We also consider different downstream machine learning applications on networks to show the efficiency of ONE as a generic network embedding technique. The source code is made available at https://github.com/sambaranban/ONE.
【Keywords】:
【Paper Link】 【Pages】:20-28
【Authors】: Umanga Bista ; Alexander Patrick Mathews ; Minjeong Shin ; Aditya Krishna Menon ; Lexing Xie
【Abstract】: Thispaperconsidersextractivesummarisationinacomparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13months.Weobserve thatgradient-based optimisationoutperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.
【Keywords】:
【Paper Link】 【Pages】:29-36
【Authors】: Jiaoyan Chen ; Ernesto Jiménez-Ruiz ; Ian Horrocks ; Charles A. Sutton
【Abstract】: Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation framework named ColNet which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction. The prediction model not only considers the contextual semantics within a cell using word representation, but also embeds the semantics of a column by learning locality features from multiple cells. The method is evaluated with DBPedia and two different web table datasets, T2Dv2 from the general Web and Limaye from Wikipedia pages, and achieves higher performance than the state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:37-44
【Authors】: Jin Chen ; Defu Lian ; Kai Zheng
【Abstract】: One-class collaborative filtering (OCCF) problems are vital in many applications of recommender systems, such as news and music recommendation, but suffers from sparsity issues and lacks negative examples. To address this problem, the state-of-the-arts assigned smaller weights to unobserved samples and performed low-rank approximation. However, the ground-truth ratings of unobserved samples are usually set to zero but ill-defined. In this paper, we propose a ranking-based implicit regularizer and provide a new general framework for OCCF, to avert the ground-truth ratings of unobserved samples. We then exploit it to regularize a ranking-based loss function and design efficient optimization algorithms to learn model parameters. Finally, we evaluate them on three realworld datasets. The results show that the proposed regularizer significantly improves ranking-based algorithms and that the proposed framework outperforms the state-of-the-art OCCF algorithms.
【Keywords】:
【Paper Link】 【Pages】:45-52
【Authors】: Long Chen ; Ziyu Guan ; Wei Zhao ; Wanqing Zhao ; Xiaopeng Wang ; Zhou Zhao ; Huan Sun
【Abstract】: Online Shopping has become a part of our daily routine, but it still cannot offer intuitive experience as store shopping. Nowadays, most e-commerce Websites offer a Question Answering (QA) system that allows users to consult other users who have purchased the product. However, users still need to wait patiently for others’ replies. In this paper, we investigate how to provide a quick response to the asker by plausible answer identification from product reviews. By analyzing the similarity and discrepancy between explicit answers and reviews that can be answers, a novel multi-task deep learning method with carefully designed attention mechanisms is developed. The method can well exploit large amounts of user generated QA data and a few manually labeled review data to address the problem. Experiments on data collected from Amazon demonstrate its effectiveness and superiority over competitive baselines.
【Keywords】:
【Paper Link】 【Pages】:53-60
【Authors】: Xu Chen ; Yongfeng Zhang ; Zheng Qin
【Abstract】: Providing explanations in a recommender system is getting more and more attention in both industry and research communities. Most existing explainable recommender models regard user preferences as invariant to generate static explanations. However, in real scenarios, a user’s preference is always dynamic, and she may be interested in different product features at different states. The mismatching between the explanation and user preference may degrade costumers’ satisfaction, confidence and trust for the recommender system. With the desire to fill up this gap, in this paper, we build a novel Dynamic Explainable Recommender (called DER) for more accurate user modeling and explanations. In specific, we design a time-aware gated recurrent unit (GRU) to model user dynamic preferences, and profile an item by its review information based on sentence-level convolutional neural network (CNN). By attentively learning the important review information according to the user current state, we are not only able to improve the recommendation performance, but also can provide explanations tailored for the users’ current preferences. We conduct extensive experiments to demonstrate the superiority of our model for improving recommendation performance. And to evaluate the explainability of our model, we first present examples to provide intuitive analysis on the highlighted review information, and then crowd-sourcing based evaluations are conducted to quantitatively verify our model’s superiority.
【Keywords】:
【Paper Link】 【Pages】:61-68
【Authors】: Zhi-Hong Deng ; Ling Huang ; Chang-Dong Wang ; Jian-Huang Lai ; Philip S. Yu
【Abstract】: In general, recommendation can be viewed as a matching problem, i.e., match proper items for proper users. However, due to the huge semantic gap between users and items, it’s almost impossible to directly match users and items in their initial representation spaces. To solve this problem, many methods have been studied, which can be generally categorized into two types, i.e., representation learning-based CF methods and matching function learning-based CF methods. Representation learning-based CF methods try to map users and items into a common representation space. In this case, the higher similarity between a user and an item in that space implies they match better. Matching function learning-based CF methods try to directly learn the complex matching function that maps user-item pairs to matching scores. Although both methods are well developed, they suffer from two fundamental flaws, i.e., the limited expressiveness of dot product and the weakness in capturing low-rank relations respectively. To this end, we propose a general framework named DeepCF, short for Deep Collaborative Filtering, to combine the strengths of the two types of methods and overcome such flaws. Extensive experiments on four publicly available datasets demonstrate the effectiveness of the proposed DeepCF framework.
【Keywords】:
【Paper Link】 【Pages】:69-76
【Authors】: Haoyu Dong ; Shijie Liu ; Shi Han ; Zhouyu Fu ; Dongmei Zhang
【Abstract】: Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for table detection to meet the domain-specific requirement on precise table boundary detection; third, we propose an effective uncertainty metric to guide an active learning based smart sampling algorithm, which enables the efficient build-up of a training dataset with 22,176 tables on 10,220 sheets with broad coverage of diverse table structures and layouts. Our evaluation shows that TableSense is highly effective with 91.3% recall and 86.5% precision in EoB-2 metric, a significant improvement over both the current detection algorithm that are used in commodity spreadsheet tools and state-of-the-art convolutional neural networks in computer vision.
【Keywords】:
【Paper Link】 【Pages】:77-85
【Authors】: Tiansi Dong ; Zhigang Wang ; Juanzi Li ; Christian Bauckhage ; Armin B. Cremers
【Abstract】: A Triple in knowledge-graph takes a form that consists of head, relation, tail. Triple Classification is used to determine the truth value of an unknown Triple. This is a hard task for 1-to-N relations using the vector-based embedding approach. We propose a new region-based embedding approach using fine-grained type chains. A novel geometric process is presented to extend the vectors of pre-trained entities into n-balls (n-dimensional balls) under the condition that head balls shall contain their tail balls. Our algorithm achieves zero energy cost, therefore, serves as a case study of perfectly imposing tree structures into vector space. An unknown Triple (h,r,x) will be predicted as true, when x’s n-ball is located in the r-subspace of h’s n-ball, following the same construction of known tails of h. The experiments are based on large datasets derived from the benchmark datasets WN11, FB13, and WN18. Our results show that the performance of the new method is related to the length of the type chain and the quality of pre-trained entityembeddings, and that performances of long chains with welltrained entity-embeddings outperform other methods in the literature. Source codes and datasets are located at https: //github.com/GnodIsNait/mushroom.
【Keywords】:
【Paper Link】 【Pages】:86-93
【Authors】: Zi-Yi Dou ; Zhaopeng Tu ; Xing Wang ; Longyue Wang ; Shuming Shi ; Tong Zhang
【Abstract】: With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation. However, most of the previous methods combine layers in a static fashion in that their aggregation strategy is independent of specific hidden states. Inspired by recent progress on capsule networks, in this paper we propose to use routing-by-agreement strategies to aggregate layers dynamically. Specifically, the algorithm learns the probability of a part (individual layer representations) assigned to a whole (aggregated representations) in an iterative way and combines parts accordingly. We implement our algorithm on top of the state-of-the-art neural machine translation model TRANSFORMER and conduct experiments on the widely-used WMT14 sh⇒German and WMT17 Chinese⇒English translation datasets. Experimental results across language pairs show that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.
【Keywords】:
【Paper Link】 【Pages】:94-101
【Authors】: Wenjing Fu ; Zhaohui Peng ; Senzhang Wang ; Yang Xu ; Jin Li
【Abstract】: As one promising way to solve the challenging issues of data sparsity and cold start in recommender systems, crossdomain recommendation has gained increasing research interest recently. Cross-domain recommendation aims to improve the recommendation performance by means of transferring explicit or implicit feedback from the auxiliary domain to the target domain. Although the side information of review texts and item contents has been proven to be useful in recommendation, most existing works only use one kind of side information and cannot deeply fuse this side information with ratings. In this paper, we propose a Review and Content based Deep Fusion Model named RC-DFM for crossdomain recommendation. We first extend Stacked Denoising Autoencoders (SDAE) to effectively fuse review texts and item contents with the rating matrix in both auxiliary and target domains. Through this way, the learned latent factors of users and items in both domains preserve more semantic information for recommendation. Then we utilize a multi-layer perceptron to transfer user latent factors between the two domains to address the data sparsity and cold start issues. Experimental results on real datasets demonstrate the superior performance of RC-DFM compared with state-of-the-art recommendation methods.Deeply Fusing Reviews and Contents for Cold Start Users in Cross-Domain Recommendation Systems
【Keywords】:
【Paper Link】 【Pages】:102-109
【Authors】: Xiaolong Gong ; Linpeng Huang ; Fuwei Wang
【Abstract】: Real web datasets are often associated with multiple views such as long and short commentaries, users preference and so on. However, with the rapid growth of user generated texts, each view of the dataset has a large feature space and leads to the computational challenge during matrix decomposition process. In this paper, we propose a novel multi-view clustering algorithm based on the non-negative matrix factorization that attempts to use feature sampling strategy in order to reduce the complexity during the iteration process. In particular, our method exploits unsupervised semantic information in the learning process to capture the intrinsic similarity through a graph regularization. Moreover, we use Hilbert Schmidt Independence Criterion (HSIC) to explore the unsupervised semantic diversity information among multi-view contents of one web item. The overall objective is to minimize the loss function of multi-view non-negative matrix factorization that combines with an intra-semantic similarity graph regularizer and an inter-semantic diversity term. Compared with some state-of-the-art methods, we demonstrate the effectiveness of our proposed method on a large real-world dataset Doucom and the other three smaller datasets.
【Keywords】:
【Paper Link】 【Pages】:110-117
【Authors】: Tao Gui ; Liang Zhu ; Qi Zhang ; Minlong Peng ; Xu Zhou ; Keyu Ding ; Zhigang Chen
【Abstract】: The advent of social media has presented a promising new opportunity for the early detection of depression. To do so effectively, there are two challenges to overcome. The first is that textual and visual information must be jointly considered to make accurate inferences about depression. The second challenge is that due to the variety of content types posted by users, it is difficult to extract many of the relevant indicator texts and images. In this work, we propose the use of a novel cooperative multi-agent model to address these challenges. From the historical posts of users, the proposed method can automatically select related indicator texts and images. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods by a large margin (over 30% error reduction). In several experiments and examples, we also verify that the selected posts can successfully indicate user depression, and our model can obtained a robust performance in realistic scenarios.
【Keywords】:
【Paper Link】 【Pages】:118-125
【Authors】: Jun Guo ; Jiahui Ye
【Abstract】: Clustering on multi-view data has attracted much more attention in the past decades. Most previous studies assume that each instance appears in all views, or there is at least one view containing all instances. However, real world data often suffers from missing some instances in each view, leading to the research problem of partial multi-view clustering. To address this issue, this paper proposes a simple yet effective Anchorbased Partial Multi-view Clustering (APMC) method, which utilizes anchors to reconstruct instance-to-instance relationships for clustering. APMC is conceptually simple and easy to implement in practice, besides it has clear intuitions and non-trivial empirical guarantees. Specifically, APMC firstly integrates intra- and inter- view similarities through anchors. Then, spectral clustering is performed on the fused similarities to obtain a unified clustering result. Compared with existing partial multi-view clustering methods, APMC has three notable advantages: 1) it can capture more non-linear relations among instances with the help of kernel-based similarities; 2) it has a much lower time complexity in virtue of a noniterative scheme; 3) it can inherently handle data with negative entries as well as be extended to more than two views. Finally, we extensively evaluate the proposed method on five benchmark datasets. Experimental results demonstrate the superiority of APMC over state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:126-133
【Authors】: Zhizhong Han ; Mingyang Shang ; Xiyang Wang ; Yu-Shen Liu ; Matthias Zwicker
【Abstract】: Jointly learning representations of 3D shapes and text is crucial to support tasks such as cross-modal retrieval or shape captioning. A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y2Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and prediction of view and word sequences. Specifically, the network architecture of Y2Seq2Seq bridges the semantic meaning embedded in the two modalities by two coupled “Y” like sequence-tosequence (Seq2Seq) structures. In addition, our novel hierarchical constraints further increase the discriminability of the cross-modal representations by employing more detailed discriminative information. Experimental results on cross-modal retrieval and 3D shape captioning show that Y2Seq2Seq outperforms the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:134-141
【Authors】: Shizhu He ; Kang Liu ; Weiting An
【Abstract】: Customers ask questions, and customer service staffs answer those questions. It is the basic service manner of customer service (CS). The progress of CS is a typical multi-round conversation. However, there are no explicit corresponding relations among conversational utterances. This paper focuses on obtaining explicit alignments of question and answer utterances in CS. It not only is an important task of dialogue analysis, but also able to obtain lots of valuable train data for learning dialogue systems. In this work, we propose end-to-end models for aligning question (Q) and answer (A) utterances in CS conversation with recurrent pointer networks (RPN). On the one hand, RPN-based alignment models are able to model the conversational contexts and the mutual influence of different Q-A alignments. On the other hand, they are able to address the issue of empty and multiple alignments for some utterances in a unified manner. We construct a dataset from an in-house online CS. The experimental results demonstrate that the proposed models are effective to learn the alignments of question and answer utterances.
【Keywords】:
【Paper Link】 【Pages】:142-151
【Authors】: Ryu Iida ; Canasai Kruengkrai ; Ryo Ishida ; Kentaro Torisawa ; Jong-Hoon Oh ; Julien Kloetzer
【Abstract】: This paper proposes a novel method for generating compact answers to open-domain why-questions, such as the following answer, “Because deep learning technologies were introduced,” to the question, “Why did Google’s machine translation service improve so drastically?” Although many works have dealt with why-question answering, most have focused on retrieving as answers relatively long text passages that consist of several sentences. Because of their length, such passages are not appropriate to be read aloud by spoken dialog systems and smart speakers; hence, we need to create a method that generates compact answers. We developed a novel neural summarizer for this compact answer generation task. It combines a recurrent neural network-based encoderdecoder model with stacked convolutional neural networks and was designed to effectively exploit background knowledge, in this case a set of causal relations (e.g., “[Microsoft’s machine translation has made great progress over the last few years]effect since [it started to use deep learning.]cause”) that was extracted from a large web data archive (4 billion web pages). Our experimental results show that our method achieved significantly better ROUGE F-scores than existing encoder-decoder models and their variations that were augmented with query-attention and memory networks, which are used to exploit the background knowledge.
【Keywords】:
【Paper Link】 【Pages】:152-159
【Authors】: Di Jin ; Ziyang Liu ; Weihao Li ; Dongxiao He ; Weixiong Zhang
【Abstract】: Community detection is a fundamental problem in network science with various applications. The problem has attracted much attention and many approaches have been proposed. Among the existing approaches are the latest methods based on Graph Convolutional Networks (GCN) and on statistical modeling of Markov Random Fields (MRF). Here, we propose to integrate the techniques of GCN and MRF to solve the problem of semi-supervised community detection in attributed networks with semantic information. Our new method takes advantage of salient features of GNN and MRF and exploits both network topology and node semantic information in a complete end-to-end deep network architecture. Our extensive experiments demonstrate the superior performance of the new method over state-of-the-art methods and its scalability on several large benchmark problems.
【Keywords】:
【Paper Link】 【Pages】:160-167
【Authors】: Di Jin ; Xinxin You ; Weihao Li ; Dongxiao He ; Peng Cui ; Françoise Fogelman-Soulié ; Tanmoy Chakraborty
【Abstract】: Recent research on community detection focuses on learning representations of nodes using different network embedding methods, and then feeding them as normal features to clustering algorithms. However, we find that though one may have good results by direct clustering based on such network embedding features, there is ample room for improvement. More seriously, in many real networks, some statisticallysignificant nodes which play pivotal roles are often divided into incorrect communities using network embedding methods. This is because while some distance measures are used to capture the spatial relationship between nodes by embedding, the nodes after mapping to feature vectors are essentially not coupled any more, losing important structural information. To address this problem, we propose a general Markov Random Field (MRF) framework to incorporate coupling in network embedding which allows better detecting network communities. By smartly utilizing properties of MRF, the new framework not only preserves the advantages of network embedding (e.g. low complexity, high parallelizability and applicability for traditional machine learning), but also alleviates its core drawback of inadequate representations of dependencies via making up the missing coupling relationships. Experiments on real networks show that our new approach improves the accuracy of existing embedding methods (e.g. Node2Vec, DeepWalk and MNMF), and corrects most wrongly-divided statistically-significant nodes, which makes network embedding essentially suitable for real community detection applications. The new approach also outperforms other state-of-the-art conventional community detection methods.
【Keywords】:
【Paper Link】 【Pages】:168-175
【Authors】: Ricky Laishram ; Jeremy D. Wendt ; Sucheta Soundarajan
【Abstract】: We examine the problem of crawling the community structure of a multiplex network containing multiple layers of edge relationships. While there has been a great deal of work examining community structure in general, and some work on the problem of sampling a network to preserve its community structure, to the best of our knowledge, this is the first work to consider this problem on multiplex networks. We consider the specific case in which the layers of a multiplex network have different query (collection) costs and reliabilities; and a data collector is interested in identifying the community structure of the most expensive layer. We propose MultiComSample (MCS), a novel algorithm for crawling a multiplex network. MCS uses multiple levels of multi-armed bandits to determine the best layers, communities and node roles for selecting nodes to query. We test MCS against six baseline algorithms on real-world multiplex networks, and achieved large gains in performance. For example, after consuming a budget equivalent to sampling 20% of the nodes in the expensive layer, we observe that MCS outperforms the best baseline by up to 49%.
【Keywords】:
【Paper Link】 【Pages】:176-183
【Authors】: Chao Li ; Cheng Deng ; Lei Wang ; De Xie ; Xianglong Liu
【Abstract】: In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.
【Keywords】:
【Paper Link】 【Pages】:184-191
【Authors】: Xiaoming Li ; Hui Fang ; Jie Zhang
【Abstract】: The task of user ranking in signed networks, aiming to predict potential friends and enemies for each user, has attracted increasing attention in numerous applications. Existing approaches are mainly extended from heuristics of the traditional models in unsigned networks. They suffer from two limitations: (1) mainly focus on global rankings thus cannot provide effective personalized ranking results, and (2) have a relatively unrealistic assumption that each user treats her neighbors’ social strengths indifferently. To address these two issues, we propose a supervised method based on random walk to learn social strengths between each user and her neighbors, in which the random walk more likely visits “potential friends” and less likely visits “potential enemies”. We learn the personalized social strengths by optimizing on a particularly designed loss function oriented on ranking. We further present a fast ranking method based on the local structure among each seed node and a certain set of candidates. It much simplifies the proposed ranking model meanwhile maintains the performance. Experimental results demonstrate the superiority of our approach over the state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:192-199
【Authors】: Zeyu Li ; Jyun-Yu Jiang ; Yizhou Sun ; Wei Wang
【Abstract】: Question Routing (QR) on Community-based Question Answering (CQA) websites aims at recommending answerers that have high probabilities of providing the “accepted answers” to new questions. The existing question routing algorithms simply predict the ranking of users based on query content. As a consequence, the question raiser information is ignored. On the other hand, they lack learnable scoring functions to explicitly compute ranking scores.To tackle these challenges, we propose NeRank that (1) jointly learns representations of question content, question raiser, and question answerers by a heterogeneous information network embedding algorithm and a long short-term memory (LSTM) model. The embeddings of the three types of entities are unified in the same latent space, and (2) conducts question routing for personalized queries, i.e., queries with two entities (question content, question raiser), by a convolutional scoring function taking the learned embeddings of all three types of entities as input. Using the scores, NeRank routes new questions to high-ranking answerers that are skillfulness in the question domain and have similar backgrounds to the question raiser.Experimental results show that NeRank significantly outperforms competitive baseline question routing models that ignore the raiser information in three ranking metrics. In addition, NeRank is convergeable in several thousand iterations and insensitive to parameter changes, which prove its effectiveness, scalability, and robustness.
【Keywords】:
【Paper Link】 【Pages】:200-207
【Authors】: Dongliang Liao ; Jin Xu ; Gongfu Li ; Weijie Huang ; Weiqing Liu ; Jing Li
【Abstract】: Predicting the popularity of online article sheds light to many applications such as recommendation, advertising and information retrieval. However, there are several technical challenges to be addressed for developing the best of predictive capability. (1) The popularity fluctuates under impacts of external factors, which are unpredictable and hard to capture. (2) Content and meta-data features, largely determining the online content popularity, are usually multi-modal and nontrivial to model. (3) Besides, it also needs to figure out how to integrate temporal process and content features modeling for popularity prediction in different lifecycle stages of online articles. In this paper, we propose a Deep Fusion of Temporal process and Content features (DFTC) method to tackle them. For modeling the temporal popularity process, we adopt the recurrent neural network and convolutional neural network. For multi-modal content features, we exploit the hierarchical attention network and embedding technique. Finally, a temporal attention fusion is employed for dynamically integrating all these parts. Using datasets collected from WeChat, we show that the proposed model significantly outperforms state-of-the-art approaches on popularity prediction.
【Keywords】:
【Paper Link】 【Pages】:208-215
【Authors】: Chenghao Liu ; Xin Wang ; Tao Lu ; Wenwu Zhu ; Jianling Sun ; Steven C. H. Hoi
【Abstract】: Social recommendation, which aims at improving the performance of traditional recommender systems by considering social information, has attracted broad range of interests. As one of the most widely used methods, matrix factorization typically uses continuous vectors to represent user/item latent features. However, the large volume of user/item latent features results in expensive storage and computation cost, particularly on terminal user devices where the computation resource to operate model is very limited. Thus when taking extra social information into account, precisely extracting K most relevant items for a given user from massive candidates tends to consume even more time and memory, which imposes formidable challenges for efficient and accurate recommendations. A promising way is to simply binarize the latent features (obtained in the training phase) and then compute the relevance score through Hamming distance. However, such a two-stage hashing based learning procedure is not capable of preserving the original data geometry in the real-value space and may result in a severe quantization loss. To address these issues, this work proposes a novel discrete social recommendation (DSR) method which learns binary codes in a unified framework for users and items, considering social information. We further put the balanced and uncorrelated constraints on the objective to ensure the learned binary codes can be informative yet compact, and finally develop an efficient optimization algorithm to estimate the model parameters. Extensive experiments on three real-world datasets demonstrate that DSR runs nearly 5 times faster and consumes only with 1/37 of its real-value competitor’s memory usage at the cost of almost no loss in accuracy.
【Keywords】:
【Paper Link】 【Pages】:216-223
【Authors】: Jiaqi Ma ; Zhe Zhao ; Jilin Chen ; Ang Li ; Lichan Hong ; Ed H. Chi
【Abstract】: Machine learning applications, such as object detection and content recommendation, often require training a single model to predict multiple targets at the same time. Multi-task learning through neural networks became popular recently, because it not only helps improve the accuracy of many prediction tasks when they are related, but also saves computation cost by sharing model architectures and low-level representations. The latter is critical for real-time large-scale machine learning systems. However, classic multi-task neural networks may degenerate significantly in accuracy when tasks are less related. Previous works (Misra et al. 2016; Yang and Hospedales 2016; Ma et al. 2018) showed that having more flexible architectures in multi-task models, either manually-tuned or softparameter-sharing structures like gating networks, helps improve the prediction accuracy. However, manual tuning is not scalable, and the previous soft-parameter sharing models are either not flexible enough or computationally expensive. In this work, we propose a novel framework called SubNetwork Routing (SNR) to achieve more flexible parameter sharing while maintaining the computational advantage of the classic multi-task neural-network model. SNR modularizes the shared low-level hidden layers into multiple layers of subnetworks, and controls the connection of sub-networks with learnable latent variables to achieve flexible parameter sharing. We demonstrate the effectiveness of our approach on a large-scale dataset YouTube8M. We show that the proposed method improves the accuracy of multi-task models while maintaining their computation efficiency.
【Keywords】:
【Paper Link】 【Pages】:224-231
【Authors】: Fandong Meng ; Jinchao Zhang
【Abstract】: Past years have witnessed rapid developments in Neural Machine Translation (NMT). Most recently, with advanced modeling and training techniques, the RNN-based NMT (RNMT) has shown its potential strength, even compared with the well-known Transformer (self-attentional) model. Although the RNMT model can possess very deep architectures through stacking layers, the transition depth between consecutive hidden states along the sequential axis is still shallow. In this paper, we further enhance the RNN-based NMT through increasing the transition depth between consecutive hidden states and build a novel Deep Transition RNN-based Architecture for Neural Machine Translation, named DTMT. This model enhances the hidden-to-hidden transition with multiple non-linear transformations, as well as maintains a linear transformation path throughout this deep transition by the well-designed linear transformation mechanism to alleviate the gradient vanishing problem. Experiments show that with the specially designed deep transition modules, our DTMT can achieve remarkable improvements on translation quality. Experimental results on Chinese⇒English translation task show that DTMT can outperform the Transformer model by +2.09 BLEU points and achieve the best results ever reported in the same dataset. On WMT14 English⇒German and English⇒French translation tasks, DTMT shows superior quality to the state-of-the-art NMT systems, including the Transformer and the RNMT+.
【Keywords】:
【Paper Link】 【Pages】:232-240
【Authors】: Jinfeng Rao ; Wei Yang ; Yuhao Zhang ; Ferhan Türe ; Jimmy Lin
【Abstract】: Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have mostly been applied to “standard” ad hoc retrieval tasks over web pages and newswire articles. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network), a novel neural ranking model specifically designed for ranking short social media posts. We identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain, and present a model specifically designed with these characteristics in mind. Our model uses hierarchical convolutional layers to learn latent semantic soft-match relevance signals at the character, word, and phrase levels. A poolingbased similarity measurement layer integrates evidence from multiple types of matches between the query, the social media post, as well as URLs contained in the post. Extensive experiments using Twitter data from the TREC Microblog Tracks 2011–2014 show that our model significantly outperforms prior feature-based as well as existing neural ranking models. To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models. Our code and data are publicly available.1
【Keywords】:
【Paper Link】 【Pages】:241-248
【Authors】: Shuo Ren ; Zhirui Zhang ; Shujie Liu ; Ming Zhou ; Shuai Ma
【Abstract】: Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.
【Keywords】:
【Paper Link】 【Pages】:249-256
【Authors】: Jiaming Shen ; Ruiliang Lyu ; Xiang Ren ; Michelle Vanni ; Brian M. Sadler ; Jiawei Han
【Abstract】: Mining entity synonym sets (i.e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications. Previous work either rank terms based on their similarity to a given query term, or treats the problem as a two-phase task (i.e., detecting synonymy pairs, followed by organizing these pairs into synonym sets). However, these approaches fail to model the holistic semantics of a set and suffer from the error propagation issue. Here we propose a new framework, named SynSetMine, that efficiently generates entity synonym sets from a given vocabulary, using example sets from external knowledge bases as distant supervision. SynSetMine consists of two novel modules: (1) a set-instance classifier that jointly learns how to represent a permutation invariant synonym set and whether to include a new instance (i.e., a term) into the set, and (2) a set generation algorithm that enumerates the vocabulary only once and applies the learned set-instance classifier to detect all entity synonym sets in it. Experiments on three real datasets from different domains demonstrate both effectiveness and efficiency of SynSetMine for mining entity synonym sets.
【Keywords】:
【Paper Link】 【Pages】:257-264
【Authors】: Atanu R. Sinha ; Deepali Jain ; Nikhil Sheoran ; Sopan Khosla ; Reshmi Sasidharan
【Abstract】: The ‘old world’ instrument, survey, remains a tool of choice for firms to obtain ratings of satisfaction and experience that customers realize while interacting online with firms. While avenues for survey have evolved from emails and links to pop-ups while browsing, the deficiencies persist. These include - reliance on ratings of very few respondents to infer about all customers’ online interactions; failing to capture a customer’s interactions over time since the rating is a one-time snapshot; and inability to tie back customers’ ratings to specific interactions because ratings provided relate to all interactions. To overcome these deficiencies we extract proxy ratings from clickstream data, typically collected for every customer’s online interactions, by developing an approach based on Reinforcement Learning (RL). We introduce a new way to interpret values generated by the value function of RL, as proxy ratings. Our approach does not need any survey data for training. Yet, on validation against actual survey data, proxy ratings yield reasonable performance results. Additionally, we offer a new way to draw insights from values of the value function, which allow associating specific interactions to their proxy ratings. We introduce two new metrics to represent ratings - one, customer-level and the other, aggregate-level for click actions across customers. Both are defined around proportion of all pairwise, successive actions that show increase in proxy ratings. This intuitive customer-level metric enables gauging the dynamics of ratings over time and is a better predictor of purchase than customer ratings from survey. The aggregate-level metric allows pinpointing actions that help or hurt experience. In sum, proxy ratings computed unobtrusively from clickstream, for every action, for each customer, and for every session can offer interpretable and more insightful alternative to surveys.
【Keywords】:
【Paper Link】 【Pages】:265-272
【Authors】: Jiankai Sun ; Bortik Bandyopadhyay ; Armin Bashizade ; Jiongqian Liang ; P. Sadayappan ; Srinivasan Parthasarathy
【Abstract】: Directed graphs have been widely used in Community Question Answering services (CQAs) to model asymmetric relationships among different types of nodes in CQA graphs, e.g., question, answer, user. Asymmetric transitivity is an essential property of directed graphs, since it can play an important role in downstream graph inference and analysis. Question difficulty and user expertise follow the characteristic of asymmetric transitivity. Maintaining such properties, while reducing the graph to a lower dimensional vector embedding space, has been the focus of much recent research. In this paper, we tackle the challenge of directed graph embedding with asymmetric transitivity preservation and then leverage the proposed embedding method to solve a fundamental task in CQAs: how to appropriately route and assign newly posted questions to users with the suitable expertise and interest in CQAs. The technique incorporates graph hierarchy and reachability information naturally by relying on a nonlinear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. Extensive experiments show that our framework consistently and significantly outperforms the state-of-the-art baselines on three diverse realworld tasks: link prediction, and question difficulty estimation and expert finding in online forums like Stack Exchange. Particularly, our framework can support inductive embedding learning for newly posted questions (unseen nodes during training), and therefore can properly route and assign these kinds of questions to experts in CQAs.
【Keywords】:
【Paper Link】 【Pages】:273-280
【Authors】: Xiaoxiao Sun ; Liyi Chen ; Jufeng Yang
【Abstract】: Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:281-288
【Authors】: Kunihiro Takeoka ; Masafumi Oyamada ; Shinji Nakadai ; Takeshi Okadome
【Abstract】: Given a large amount of table data, how can we find the tables that contain the contents we want? A naive search fails when the column names are ambiguous, such as if columns containing stock price information are named “Close” in one table and named “P” in another table.One way of dealing with this problem that has been gaining attention is the semantic annotation of table data columns by using canonical knowledge. While previous studies successfully dealt with this problem for specific types of table data such as web tables, it still remains for various other types of table data: (1) most approaches do not handle table data with numerical values, and (2) their predictive performance is not satisfactory.This paper presents a novel approach for table data annotation that combines a latent probabilistic model with multilabel classifiers. It features three advantages over previous approaches due to using highly predictive multi-label classifiers in the probabilistic computation of semantic annotation. (1) It is more versatile due to using multi-label classifiers in the probabilistic model, which enables various types of data such as numerical values to be supported. (2) It is more accurate due to the multi-label classifiers and probabilistic model working together to improve predictive performance. (3) It is more efficient due to potential functions based on multi-label classifiers reducing the computational cost for annotation.Extensive experiments demonstrated the superiority of the proposed approach over state-of-the-art approaches for semantic annotation of real data (183 human-annotated tables obtained from the UCI Machine Learning Repository).
【Keywords】:
【Paper Link】 【Pages】:289-296
【Authors】: Zhiwen Tang ; Grace Hui Yang
【Abstract】: Most neural Information Retrieval (Neu-IR) models derive query-to-document ranking scores based on term-level matching. Inspired by TileBars, a classical term distribution visualization method, in this paper, we propose a novel Neu-IR model that handles query-to-document matching at the subtopic and higher levels. Our system first splits the documents into topical segments, “visualizes” the matchings between the query and the segments, and then feeds an interaction matrix into a Neu-IR model, DeepTileBars, to obtain the final ranking scores. DeepTileBars models the relevance signals occurring at different granularities in a document’s topic hierarchy. It better captures the discourse structure of a document and thus the matching patterns. Although its design and implementation are light-weight, DeepTileBars outperforms other state-of-the-art Neu-IR models on benchmark datasets including the Text REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0.
【Keywords】:
【Paper Link】 【Pages】:297-304
【Authors】: Bayu Distiawan Trisedya ; Jianzhong Qi ; Rui Zhang
【Abstract】: The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.
【Keywords】:
【Paper Link】 【Pages】:305-312
【Authors】: Quoc-Tuan Truong ; Hady W. Lauw
【Abstract】: Detecting the sentiment expressed by a document is a key task for many applications, e.g., modeling user preferences, monitoring consumer behaviors, assessing product quality. Traditionally, the sentiment analysis task primarily relies on textual content. Fueled by the rise of mobile phones that are often the only cameras on hand, documents on the Web (e.g., reviews, blog posts, tweets) are increasingly multimodal in nature, with photos in addition to textual content. A question arises whether the visual component could be useful for sentiment analysis as well. In this work, we propose Visual Aspect Attention Network or VistaNet, leveraging both textual and visual components. We observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently of the text. Therefore, instead of using visual information as features, VistaNet relies on visual information as alignment for pointing out the important sentences of a document using attention. Experiments on restaurant reviews showcase the effectiveness of visual aspect attention, vis-à-vis visual features or textual attention.
【Keywords】:
【Paper Link】 【Pages】:313-320
【Authors】: Chun-Hsiang Wang ; Kang-Chun Fan ; Chuan-Ju Wang ; Ming-Feng Tsai
【Abstract】: Customer reviews on platforms such as TripAdvisor and Amazon provide rich information about the ways that people convey sentiment on certain domains. Given these kinds of user reviews, this paper proposes UGSD, a representation learning framework for constructing domain-specific sentiment dictionaries from online customer reviews, in which we leverage the relationship between user-generated reviews and the ratings of the reviews to associate the reviewer sentiment with certain entities. The proposed framework has the following three main advantages. First, no additional annotations of words or external dictionaries are needed for the proposed framework; the only resources needed are the review texts and entity ratings. Second, the framework is applicable across a variety of user-generated content from different domains to construct domain-specific sentiment dictionaries. Finally, each word in the constructed dictionary is associated with a low-dimensional dense representation and a degree of relatedness to a certain rating, which enable us to obtain more fine-grained dictionaries and enhance the application scalability of the constructed dictionaries as the word representations can be adopted for various tasks or applications, such as entity ranking and dictionary expansion. The experimental results on three real-world datasets show that the framework is effective in constructing high-quality domain-specific sentiment dictionaries from customer reviews.
【Keywords】:
【Paper Link】 【Pages】:321-328
【Authors】: Yingkui Wang ; Di Jin ; Katarzyna Musial ; Jianwu Dang
【Abstract】: Network contents including node contents and edge contents can be utilized for community detection in social networks. Thus, the topic of each community can be extracted as its semantic information. A plethora of models integrating topic model and network topologies have been proposed. However, a key problem has not been resolved that is the semantic division of a community. Since the definition of community is based on topology, a community might involve several topics. To ach
【Keywords】:
【Paper Link】 【Pages】:329-337
【Authors】: Zhuo Wang ; Weiping Wang ; Chaokun Wang ; Xiaoyan Gu ; Bo Li ; Dan Meng
【Abstract】: As a major kind of query-dependent community detection, community search finds a densely connected subgraph containing a set of query nodes. As density is the major consideration of community search, most methods of community search often find a dense subgraph with many vertices far from the query nodes, which are not very related to the query nodes. Motivated by this, a new problem called community focusing (CF) is studied. It finds a community where the members are close and densely connected to the query nodes. A distance-sensitive dense subgraph structure called β-attention-core is proposed to remove the vertices loosely connected to or far from the query nodes, and a combinational density is designed to guarantee the density of a subgraph. Then CF is formalized as finding a subgraph with the largest combinational density among the β-attention-core subgraphs containing the query nodes with the largest β. Thereafter, effective methods are devised for CF. Furthermore, a speed-up strategy is developed to make the methods scalable to large networks. Extensive experimental results on real and synthetic networks demonstrate the performance of our methods.
【Keywords】:
【Paper Link】 【Pages】:338-345
【Authors】: Hong Wen ; Jing Zhang ; Quan Lin ; Keping Yang ; Pipei Huang
【Abstract】: Developing effective and efficient recommendation methods is very challenging for modern e-commerce platforms. Generally speaking, two essential modules named “ClickThrough Rate Prediction” (CTR) and “Conversion Rate Prediction” (CVR) are included, where CVR module is a crucial factor that affects the final purchasing volume directly. However, it is indeed very challenging due to its sparseness nature. In this paper, we tackle this problem by proposing multiLevel Deep Cascade Trees (ldcTree), which is a novel decision tree ensemble approach. It leverages deep cascade structures by stacking Gradient Boosting Decision Trees (GBDT) to effectively learn feature representation. In addition, we propose to utilize the cross-entropy in each tree of the preceding GBDT as the input feature representation for next level GBDT, which has a clear explanation, i.e., a traversal from root to leaf nodes in the next level GBDT corresponds to the combination of certain traversals in the preceding GBDT. The deep cascade structure and the combination rule enable the proposed ldcTree to have a stronger distributed feature representation ability. Moreover, inspired by ensemble learning, we propose an Ensemble ldcTree (E-ldcTree) to encourage the model’s diversity and enhance the representation ability further. Finally, we propose an improved Feature learning method based on EldcTree (F-EldcTree) for taking adequate use of weak and strong correlation features identified by pretrained GBDT models. Experimental results on off-line data set and online deployment demonstrate the effectiveness of the proposed methods.
【Keywords】:
【Paper Link】 【Pages】:346-353
【Authors】: Shu Wu ; Yuyuan Tang ; Yanqiao Zhu ; Liang Wang ; Xing Xie ; Tieniu Tan
【Abstract】: The problem of session-based recommendation aims to predict user actions based on anonymous sessions. Previous methods model a session as a sequence and estimate user representations besides item representations to make recommendations. Though achieved promising results, they are insufficient to obtain accurate user vectors in sessions and neglect complex transitions of items. To obtain accurate item embedding and take complex transitions of items into account, we propose a novel method, i.e. Session-based Recommendation with Graph Neural Networks, SR-GNN for brevity. In the proposed method, session sequences are modeled as graphstructured data. Based on the session graph, GNN can capture complex transitions of items, which are difficult to be revealed by previous conventional sequential methods. Each session is then represented as the composition of the global preference and the current interest of that session using an attention network. Extensive experiments conducted on two real datasets show that SR-GNN evidently outperforms the state-of-the-art session-based recommendation methods consistently.
【Keywords】:
【Paper Link】 【Pages】:354-362
【Authors】: Jing Xiao ; Liang Liao ; Qiegen Liu ; Ruimin Hu
【Abstract】: Convolutional neural networks (CNNs) have presented their potential in filling large missing areas with plausible contents. To address the blurriness issue commonly existing in the CNN-based inpainting, a typical approach is to conduct texture refinement on the initially completed images by replacing the neural patch in the predicted region using the closest one in the known region. However, such a processing might introduce undesired content change in the predicted region, especially when the desired content does not exist in the known region. To avoid generating such incorrect content, in this paper, we propose a content inference and style imitation network (CISI-net), which explicitly separate the image data into content code and style code. The content inference is realized by performing inference in the latent space to infer the content code of the corrupted images similar to the one from the original images. It can produce more detailed content than a similar inference procedure in the pixel domain, due to the dimensional distribution of content being lower than that of the entire image. On the other hand, the style code is used to represent the rendering of content, which will be consistent over the entire image. The style code is then integrated with the inferred content code to generate the complete image. Experiments on multiple datasets including structural and natural images demonstrate that our proposed approach out-performs the existing ones in terms of content accuracy as well as texture details.
【Keywords】:
【Paper Link】 【Pages】:363-370
【Authors】: Haitao Xiong ; Hongfu Liu ; Bineng Zhong ; Yun Fu
【Abstract】: Label distribution learning methods effectively address the label ambiguity problem and have achieved great success in image emotion analysis. However, these methods ignore structured and sparse information naturally contained in the annotations of emotions. For example, emotions can be grouped and ordered due to their polarities and degrees. Meanwhile, emotions have the character of intensity and are reflected in different levels of sparse annotations. Motivated by these observations, we present a convolutional neural network based framework called Structured and Sparse annotations for image emotion Distribution Learning (SSDL) to tackle two challenges. In order to utilize structured annotations, the Earth Mover’s Distance is employed to calculate the minimal cost required to transform one distribution to another for ordered emotions and emotion groups. Combined with Kullback-Leibler divergence, we design the loss to penalize the mispredictions according to the dissimilarities of same emotions and different emotions simultaneously. Moreover, in order to handle sparse annotations, sparse regularization based on emotional intensity is adopted. Through combined loss and sparse regularization, SSDL could effectively leverage structured and sparse annotations for predicting emotion distribution. Experiment results demonstrate that our proposed SSDL significantly outperforms the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:371-378
【Authors】: Nan Xu ; Wenji Mao ; Guandan Chen
【Abstract】: As a fundamental task of sentiment analysis, aspect-level sentiment analysis aims to identify the sentiment polarity of a specific aspect in the context. Previous work on aspect-level sentiment analysis is text-based. With the prevalence of multimodal user-generated content (e.g. text and image) on the Internet, multimodal sentiment analysis has attracted increasing research attention in recent years. In the context of aspect-level sentiment analysis, multimodal data are often more important than text-only data, and have various correlations including impacts that aspect brings to text and image as well as the interactions associated with text and image. However, there has not been any related work carried out so far at the intersection of aspect-level and multimodal sentiment analysis. To fill this gap, we are among the first to put forward the new task, aspect based multimodal sentiment analysis, and propose a novel Multi-Interactive Memory Network (MIMN) model for this task. Our model includes two interactive memory networks to supervise the textual and visual information with the given aspect, and learns not only the interactive influences between cross-modality data but also the self influences in single-modality data. We provide a new publicly available multimodal aspect-level sentiment dataset to evaluate our model, and the experimental results demonstrate the effectiveness of our proposed model for this new task.
【Keywords】:
【Paper Link】 【Pages】:379-386
【Authors】: Peng Xu ; Zhaohong Deng ; Kup-Sze Choi ; Longbing Cao ; Shitong Wang
【Abstract】: Multi-view clustering has received much attention recently. Most of the existing multi-view clustering methods only focus on one-sided clustering. As the co-occurring data elements involve the counts of sample-feature co-occurrences, it is more efficient to conduct two-sided clustering along the samples and features simultaneously. To take advantage of two-sided clustering for the co-occurrences in the scene of multi-view clustering, a two-sided multi-view clustering method is proposed, i.e., multi-view information-theoretic co-clustering (MV-ITCC). The proposed method realizes two-sided clustering for co-occurring multi-view data under the formulation of information theory. More specifically, it exploits the agreement and disagreement among views by sharing a common clustering results along the sample dimension and keeping the clustering results of each view specific along the feature dimension. In addition, the mechanism of maximum entropy is also adopted to control the importance of different views, which can give a right balance in leveraging the agreement and disagreement. Extensive experiments are conducted on text and image multiview datasets. The results clearly demonstrate the superiority of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:387-394
【Authors】: Baosong Yang ; Jian Li ; Derek F. Wong ; Lidia S. Chao ; Xing Wang ; Zhaopeng Tu
【Abstract】: Self-attention model has shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which has proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculate the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English⇒German and WMT17 Chinese⇒English translation tasks demonstrate the effectiveness and universality of the proposed methods. Furthermore, we conducted extensive analyses to quantify how the context vectors participate in the self-attention model.
【Keywords】:
【Paper Link】 【Pages】:395-402
【Authors】: Xiao Yang ; Madian Khabsa ; Miaosen Wang ; Wei Wang ; Ahmed Hassan Awadallah ; Daniel Kifer ; C. Lee Giles
【Abstract】: Community-based question answering (CQA) websites represent an important source of information. As a result, the problem of matching the most valuable answers to their corresponding questions has become an increasingly popular research topic. We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue. We employ a generative model to iteratively sample a subset of challenging negative samples to fool our classification model. Both models are alternatively optimized using REINFORCE algorithm. The proposed method is completely different from previous ones, where negative samples in training set are directly used or uniformly down-sampled. Further, we propose using Multi-scale Matching which explicitly inspects the correlation between words and ngrams of different levels of granularity. We evaluate the proposed method on SemEval 2016 and SemEval 2017 datasets and achieves state-of-the-art or similar performance.
【Keywords】:
【Paper Link】 【Pages】:403-410
【Authors】: Xun Yang ; Yunshan Ma ; Lizi Liao ; Meng Wang ; Tat-Seng Chua
【Abstract】: Identifying mix-and-match relationships between fashion items is an urgent task in a fashion e-commerce recommender system. It will significantly enhance user experience and satisfaction. However, due to the challenges of inferring the rich yet complicated set of compatibility patterns in a large e-commerce corpus of fashion items, this task is still underexplored. Inspired by the recent advances in multirelational knowledge representation learning and deep neural networks, this paper proposes a novel Translation-based Neural Fashion Compatibility Modeling (TransNFCM) framework, which jointly optimizes fashion item embeddings and category-specific complementary relations in a unified space via an end-to-end learning manner. TransNFCM places items in a unified embedding space where a category-specific relation (category-comp-category) is modeled as a vector translation operating on the embeddings of compatible items from the corresponding categories. By this way, we not only capture the specific notion of compatibility conditioned on a specific pair of complementary categories, but also preserve the global notion of compatibility. We also design a deep fashion item encoder which exploits the complementary characteristic of visual and textual features to represent the fashion products. To the best of our knowledge, this is the first work that uses category-specific complementary relations to model the category-aware compatibility between items in a translation-based embedding space. Extensive experiments demonstrate the effectiveness of TransNFCM over the state-of-the-arts on two real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:411-418
【Abstract】: Data imbalance is a key limiting factor for Learning to Rank (LTR) models in information retrieval. Resampling methods and ensemble methods cannot handle the imbalance problem well since none of them incorporate more informative data into the training procedure of LTR models. We propose a data generation model based on Adversarial Autoencoder (AAE) for tackling the data imbalance in LTR via informative data augmentation. This model can be utilized for handling two types of data imbalance, namely, imbalance regarding relevance levels for a particular query and imbalance regarding the amount of relevance judgements in different queries. In the proposed model, relevance information is disentangled from the latent representations in this AAE-based model in order to reconstruct data with specific relevance levels. The semantic information of queries, derived from word embeddings, is incorporated in the adversarial training stage for regularizing the distribution of the latent representation. Two informative data augmentation strategies suitable for LTR are designed utilizing the proposed data generation model. Experiments on benchmark LTR datasets demonstrate that our proposed framework can significantly improve the performance of LTR models.
【Keywords】:
【Paper Link】 【Pages】:419-426
【Authors】: Yujin Yuan ; Liyuan Liu ; Siliang Tang ; Zhongfei Zhang ; Yueting Zhuang ; Shiliang Pu ; Fei Wu ; Xiang Ren
【Abstract】: Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.
【Keywords】:
【Paper Link】 【Pages】:427-434
【Authors】: Qi Zeng ; Liangchen Luo ; Wenhao Huang ; Yang Tang
【Abstract】: Extracting valuable facts or informative summaries from multi-dimensional tables, i.e. insight mining, is an important task in data analysis and business intelligence. However, ranking the importance of insights remains a challenging and unexplored task. The main challenge is that explicitly scoring an insight or giving it a rank requires a thorough understanding of the tables and costs a lot of manual efforts, which leads to the lack of available training data for the insight ranking problem. In this paper, we propose an insight ranking model that consists of two parts: A neural ranking model explores the data characteristics, such as the header semantics and the data statistical features, and a memory network model introduces table structure and context information into the ranking process. We also build a dataset with text assistance. Experimental results show that our approach largely improves the ranking precision as reported in multi evaluation metrics.
【Keywords】:
【Paper Link】 【Pages】:435-442
【Authors】: Jing Zhang ; Bowen Hao ; Bo Chen ; Cuiping Li ; Hong Chen ; Jimeng Sun
【Abstract】: The proliferation of massive open online courses (MOOCs) demands an effective way of personalized course recommendation. The recent attention-based recommendation models can distinguish the effects of different historical courses when recommending different target courses. However, when a user has interests in many different courses, the attention mechanism will perform poorly as the effects of the contributing courses are diluted by diverse historical courses. To address such a challenge, we propose a hierarchical reinforcement learning algorithm to revise the user profiles and tune the course recommendation model on the revised profiles.Systematically, we evaluate the proposed model on a real dataset consisting of 1,302 courses, 82,535 users and 458,454 user enrolled behaviors, which were collected from XuetangX—one of the largest MOOCs in China. Experimental results show that the proposed model significantly outperforms the state-of-the-art recommendation models (improving 5.02% to 18.95% in terms of HR@10).
【Keywords】:
【Paper Link】 【Pages】:443-450
【Authors】: Zhirui Zhang ; Shuangzhi Wu ; Shujie Liu ; Mu Li ; Ming Zhou ; Tong Xu
【Abstract】: Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation. To address this issue, we propose a novel model regularization method for NMT training, which aims to improve the agreement between translations generated by left-to-right (L2R) and right-to-left (R2L) NMT decoders. This goal is achieved by introducing two Kullback-Leibler divergence regularization terms into the NMT training objective to reduce the mismatch between output probabilities of L2R and R2L models. In addition, we also employ a joint training strategy to allow L2R and R2L models to improve each other in an interactive update process. Experimental results show that our proposed method significantly outperforms state-of-the-art baselines on Chinese-English and English-German translation tasks.
【Keywords】:
【Paper Link】 【Pages】:451-458
【Authors】: Yang Zhao ; Jiajun Zhang ; Chengqing Zong ; Zhongjun He ; Hua Wu
【Abstract】: Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance in recent years. However, the under-translation problem still remains a big challenge. In this paper, we focus on the under-translation problem and attempt to find out what kinds of source words are more likely to be ignored. Through analysis, we observe that a source word with a large translation entropy is more inclined to be dropped. To address this problem, we propose a coarse-to-fine framework. In coarse-grained phase, we introduce a simple strategy to reduce the entropy of highentropy words through constructing the pseudo target sentences. In fine-grained phase, we propose three methods, including pre-training method, multitask method and two-pass method, to encourage the neural model to correctly translate these high-entropy words. Experimental results on various translation tasks show that our method can significantly improve the translation quality and substantially reduce the under-translation cases of high-entropy words.
【Keywords】:
【Paper Link】 【Pages】:459-466
【Authors】: Yu-Hang Zhou ; Chen Liang ; Nan Li ; Cheng Yang ; Shenghuo Zhu ; Rong Jin
【Abstract】: Recently, online matching problems have attracted much attention due to its emerging applications in internet advertising. Most existing online matching methods have adopted either adversarial or stochastic user arrival assumption, while on both of them significant limitation exists. The adversarial model does not exploit existing knowledge of the user sequence, and thus can be pessimistic in practice. On other hands, the stochastic model assumes that users are drawn from a stationary distribution, which may not be true in real applications. In this paper, we consider a novel user arrival model where users are drawn from drifting distribution, which is a hybrid case between the adversarial and stochastic model, and propose a new approach RDLA to deal with such assumption. Instead of maximizing empirical total revenues on the revealed users, RDLA leverages distributionally robust optimization techniques to learn dual variables via a worst-case consideration over an ambiguity set on the underlying user distribution. Experiments on a real-world dataset exhibit the superiority of our approach.
【Keywords】:
【Paper Link】 【Pages】:468-475
【Authors】: Sheila Alemany ; Jonathan Beltran ; Adrián Pérez ; Sam Ganzfried
【Abstract】: Hurricanes are cyclones circulating about a defined center whose closed wind speeds exceed 75 mph originating over tropical and subtropical waters. At landfall, hurricanes can result in severe disasters. The accuracy of predicting their trajectory paths is critical to reduce economic loss and save human lives. Given the complexity and nonlinearity of weather data, a recurrent neural network (RNN) could be beneficial in modeling hurricane behavior. We propose the application of a fully connected RNN to predict the trajectory of hurricanes. We employed the RNN over a fine grid to reduce typical truncation errors. We utilized their latitude, longitude, wind speed, and pressure publicly provided by the National Hurricane Center (NHC) to predict the trajectory of a hurricane at 6-hour intervals. Results show that this proposed technique is competitive to methods currently employed by the NHC and can predict up to approximately 120 hours of hurricane path.
【Keywords】:
【Paper Link】 【Pages】:476-484
【Authors】: Johan Bjorck ; Brendan H. Rappazzo ; Di Chen ; Richard Bernstein ; Peter H. Wrege ; Carla P. Gomes
【Abstract】: In this work, we consider applying machine learning to the analysis and compression of audio signals in the context of monitoring elephants in sub-Saharan Africa. Earth’s biodiversity is increasingly under threat by sources of anthropogenic change (e.g. resource extraction, land use change, and climate change) and surveying animal populations is critical for developing conservation strategies. However, manually monitoring tropical forests or deep oceans is intractable. For species that communicate acoustically, researchers have argued for placing audio recorders in the habitats as a costeffective and non-invasive method, a strategy known as passive acoustic monitoring (PAM). In collaboration with conservation efforts, we construct a large labeled dataset of passive acoustic recordings of the African Forest Elephant via crowdsourcing, compromising thousands of hours of recordings in the wild. Using state-of-the-art techniques in artificial intelligence we improve upon previously proposed methods for passive acoustic monitoring for classification and segmentation. In real-time detection of elephant calls, network bandwidth quickly becomes a bottleneck and efficient ways to compress the data are needed. Most audio compression schemes are aimed at human listeners and are unsuitable for low-frequency elephant calls. To remedy this, we provide a novel end-to-end differentiable method for compression of audio signals that can be adapted to acoustic monitoring of any species and dramatically improves over naive coding strategies.
【Keywords】:
【Paper Link】 【Pages】:485-492
【Authors】: Cen Chen ; Kenli Li ; Sin G. Teo ; Xiaofeng Zou ; Kang Wang ; Jie Wang ; Zeng Zeng
【Abstract】: Traffic prediction is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as spatial dependency of complicated road networks and temporal dynamics, and many more. The factors make traffic prediction a challenging task due to the uncertainty and complexity of traffic states. In the literature, many research works have applied deep learning methods on traffic prediction problems combining convolutional neural networks (CNNs) with recurrent neural networks (RNNs), which CNNs are utilized for spatial dependency and RNNs for temporal dynamics. However, such combinations cannot capture the connectivity and globality of traffic networks. In this paper, we first propose to adopt residual recurrent graph neural networks (Res-RGNN) that can capture graph-based spatial dependencies and temporal dynamics jointly. Due to gradient vanishing, RNNs are hard to capture periodic temporal correlations. Hence, we further propose a novel hop scheme into Res-RGNN to utilize the periodic temporal dependencies. Based on Res-RGNN and hop Res-RGNN, we finally propose a novel end-to-end multiple Res-RGNNs framework, referred to as “MRes-RGNN”, for traffic prediction. Experimental results on two traffic datasets have demonstrated that the proposed MRes-RGNN outperforms state-of-the-art methods significantly.
【Keywords】:
【Paper Link】 【Pages】:493-500
【Authors】: Di Chen ; Carla P. Gomes
【Abstract】: Citizen science projects are successful at gathering rich datasets for various applications. However, the data collected by citizen scientists are often biased — in particular, aligned more with the citizens’ preferences than with scientific objectives. We propose the Shift Compensation Network (SCN), an end-to-end learning scheme which learns the shift from the scientific objectives to the biased data while compensating for the shift by re-weighting the training data. Applied to bird observational data from the citizen science project eBird, we demonstrate how SCN quantifies the data distribution shift and outperforms supervised learning models that do not address the data bias. Compared with competing models in the context of covariate shift, we further demonstrate the advantage of SCN in both its effectiveness and its capability of handling massive high-dimensional data.
【Keywords】:
【Paper Link】 【Pages】:501-508
【Authors】: Gianlorenzo D'Angelo ; Martin Olsen ; Lorenzo Severini
【Abstract】: Centrality metrics are among the main tools in social network analysis. Being central for a user of a network leads to several benefits to the user: central users are highly influential and play key roles within the network. Therefore, the optimization problem of increasing the centrality of a network user recently received considerable attention. Given a network and a target user v, the centrality maximization problem consists in creating k new links incident to v in such a way that the centrality of v is maximized, according to some centrality metric. Most of the algorithms proposed in the literature are based on showing that a given centrality metric is monotone and submodular with respect to link addition. However, this property does not hold for several shortest-path based centrality metrics if the links are undirected.In this paper we study the centrality maximization problem in undirected networks for one of the most important shortestpath based centrality measures, the coverage centrality. We provide several hardness and approximation results. We first show that the problem cannot be approximated within a factor greater than 1 − 1/e, unless P = NP, and, under the stronger gap-ETH hypothesis, the problem cannot be approximated within a factor better than 1/no(1), where n is the number of users. We then propose two greedy approximation algorithms, and show that, by suitably combining them, we√ can guarantee an approximation factor of Ω(1/ n). We experimentally compare the solutions provided by our approximation algorithm with optimal solutions computed by means of an exact IP formulation. We show that our algorithm produces solutions that are very close to the optimum.
【Keywords】:
【Paper Link】 【Pages】:509-516
【Authors】: Christos Dimitrakakis ; Yang Liu ; David C. Parkes ; Goran Radanovic
【Abstract】: We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty. We argue that recent notions of fairness in machine learning need to explicitly incorporate parameter uncertainty, hence we introduce the notion of Bayesian fairness as a suitable candidate for fair decision rules. Using balance, a definition of fairness introduced in (Kleinberg, Mullainathan, and Raghavan 2016), we show how a Bayesian perspective can lead to well-performing and fair decision rules even under high uncertainty.
【Keywords】:
【Paper Link】 【Pages】:517-524
【Authors】: Wenzheng Feng ; Jie Tang ; Tracy Xiao Liu
【Abstract】: Massive open online courses (MOOCs) have developed rapidly in recent years, and have attracted millions of online users. However, a central challenge is the extremely high dropout rate — recent reports show that the completion rate in MOOCs is below 5% (Onah, Sinclair, and Boyatt 2014; Kizilcec, Piech, and Schneider 2013; Seaton et al. 2014).What are the major factors that cause the users to drop out?What are the major motivations for the users to study in MOOCs? In this paper, employing a dataset from XuetangX1, one of the largest MOOCs in China, we conduct a systematical study for the dropout problem in MOOCs. We found that the users’ learning behavior can be clustered into several distinct categories. Our statistics also reveal high correlation between dropouts of different courses and strong influence between friends’ dropout behaviors. Based on the gained insights, we propose a Context-aware Feature Interaction Network (CFIN) to model and to predict users’ dropout behavior. CFIN utilizes context-smoothing technique to smooth feature values with different context, and use attention mechanism to combine user and course information into the modeling framework. Experiments on two large datasets show that the proposed method achieves better performance than several state-of-the-art methods. The proposed method model has been deployed on a real system to help improve user retention.
【Keywords】:
【Paper Link】 【Pages】:525-532
【Authors】: Meir Friedenberg ; Joseph Y. Halpern
【Abstract】: We provide a formal definition of blameworthiness in settings where multiple agents can collaborate to avoid a negative outcome. We first provide a method for ascribing blameworthiness to groups relative to an epistemic state (a distribution over causal models that describe how the outcome might arise). We then show how we can go from an ascription of blameworthiness for groups to an ascription of blameworthiness for individuals using a standard notion from cooperative game theory, the Shapley value. We believe that getting a good notion of blameworthiness in a group setting will be critical for designing autonomous agents that behave in a moral manner.
【Keywords】:
【Paper Link】 【Pages】:533-540
【Authors】: Serge Gaspers ; Kamran Najeebullah
【Abstract】: The inverse geodesic length (IGL) is a well-known and widely used measure of network performance. It equals the sum of the inverse distances of all pairs of vertices. In network analysis, IGL of a network is often used to assess and evaluate how well heuristics perform in strengthening or weakening a network. We consider the edge-deletion problem MINIGLED. Formally, given a graph G, a budget k, and a target inverse geodesic length T, the question is whether there exists a subset of edges X with |X| ≤ ck, such that the inverse geodesic length of G − X is at most T.In this paper, we design algorithms and study the complexity of MINIGL-ED. We show that it is NP-complete and cannot be solved in subexponential time even when restricted to bipartite or split graphs assuming the Exponential Time Hypothesis. In terms of parameterized complexity, we consider the problem with respect to various parameters. We show that MINIGL-ED is fixed-parameter tractable for parameter T and vertex cover by modeling the problem as an integer quadratic program. We also provide FPT algorithms parameterized by twin cover and neighborhood diversity combined with the deletion budget k. On the negative side we show that MINIGL-ED is W[1]-hard for parameter tree-width.
【Keywords】:
【Paper Link】 【Pages】:541-548
【Authors】: Partha Ghosh ; Arpan Losalka ; Michael J. Black
【Abstract】: Susceptibility of deep neural networks to adversarial attacks poses a major theoretical and practical challenge. All efforts to harden classifiers against such attacks have seen limited success till now. Two distinct categories of samples against which deep neural networks are vulnerable, “adversarial samples” and “fooling samples”, have been tackled separately so far due to the difficulty posed when considered together. In this work, we show how one can defend against them both under a unified framework. Our model has the form of a variational autoencoder with a Gaussian mixture prior on the latent variable, such that each mixture component corresponds to a single class. We show how selective classification can be performed using this model, thereby causing the adversarial objective to entail a conflict. The proposed method leads to the rejection of adversarial samples instead of misclassification, while maintaining high precision and recall on test data. It also inherently provides a way of learning a selective classifier in a semi-supervised scenario, which can similarly resist adversarial attacks. We further show how one can reclassify the detected adversarial samples by iterative optimization.1
【Keywords】:
【Paper Link】 【Pages】:549-556
【Authors】: Paul Gölz ; Ariel D. Procaccia
【Abstract】: Migration presents sweeping societal challenges that have recently attracted significant attention from the scientific community. One of the prominent approaches that have been suggested employs optimization and machine learning to match migrants to localities in a way that maximizes the expected number of migrants who find employment. However, it relies on a strong additivity assumption that, we argue, does not hold in practice, due to competition effects; we propose to enhance the data-driven approach by explicitly optimizing for these effects. Specifically, we cast our problem as the maximization of an approximately submodular function subject to matroid constraints, and prove that the worst-case guarantees given by the classic greedy algorithm extend to this setting. We then present three different models for competition effects, and show that they all give rise to submodular objectives. Finally, we demonstrate via simulations that our approach leads to significant gains across the board.
【Keywords】:
【Paper Link】 【Pages】:557-564
【Authors】: Tomer Golany ; Kira Radinsky
【Abstract】: The Electrocardiogram (ECG) is performed routinely by medical personnel to identify structural, functional and electrical cardiac events. Many attempts were made to automate this task using machine learning algorithms including classic supervised learning algorithms and deep neural networks, reaching state-of-the-art performance. The ECG signal conveys the specific electrical cardiac activity of each subject thus extreme variations are observed between patients. These variations are challenging for deep learning algorithms, and impede generalization. In this work, we propose a semisupervised approach for patient-specific ECG classification. We propose a generative model that learns to synthesize patient-specific ECG signals, which can then be used as additional training data to improve a patient-specific classifier performance. Empirical results prove that the generated signals significantly improve ECG classification in a patient-specific setting.
【Keywords】:
【Paper Link】 【Pages】:565-572
【Authors】: Josiah P. Hanna ; Guni Sharon ; Stephen D. Boyles ; Peter Stone
【Abstract】: This paper examines the impact of tolls on social welfare in the context of a transportation network in which only a portion of the agents are subject to tolls. More specifically, this paper addresses the question: which subset of agents provides the most system benefit if they are compliant with an approximate marginal cost tolling scheme? Since previous work suggests this problem is NP-hard, we examine a heuristic approach. Our experimental results on three real-world traffic scenarios suggest that evaluating the marginal impact of a given agent serves as a particularly strong heuristic for selecting an agent to be compliant. Results from using this heuristic for selecting 7.6% of the agents to be compliant achieved an increase of up to 10.9% in social welfare over not tolling at all. The presented heuristic approach and conclusions can help practitioners target specific agents to participate in an opt-in tolling scheme.
【Keywords】:
【Paper Link】 【Pages】:573-581
【Authors】: Léo Hemamou ; Ghazi Felhi ; Vincent Vandenbussche ; Jean-Claude Martin ; Chloé Clavel
【Abstract】: New technologies drastically change recruitment techniques. Some research projects aim at designing interactive systems that help candidates practice job interviews. Other studies aim at the automatic detection of social signals (e.g. smile, turn of speech, etc...) in videos of job interviews. These studies are limited with respect to the number of interviews they process, but also by the fact that they only analyze simulated job interviews (e.g. students pretending to apply for a fake position). Asynchronous video interviewing tools have become mature products on the human resources market, and thus, a popular step in the recruitment process. As part of a project to help recruiters, we collected a corpus of more than 7000 candidates having asynchronous video job interviews for real positions and recording videos of themselves answering a set of questions. We propose a new hierarchical attention model called HireNet that aims at predicting the hirability of the candidates as evaluated by recruiters. In HireNet, an interview is considered as a sequence of questions and answers containing salient socials signals. Two contextual sources of information are modeled in HireNet: the words contained in the question and in the job position. Our model achieves better F1-scores than previous approaches for each modality (verbal content, audio and video). Results from early and late multimodal fusion suggest that more sophisticated fusion schemes are needed to improve on the monomodal results. Finally, some examples of moments captured by the attention mechanisms suggest our model could potentially be used to help finding key moments in an asynchronous job interview.
【Keywords】:
【Paper Link】 【Pages】:582-589
【Authors】: Hao Huang ; Qian Yan ; Ting Gan ; Di Niu ; Wei Lu ; Yunjun Gao
【Abstract】: To learn the underlying parent-child influence relationships between nodes in a diffusion network, most existing approaches require timestamps that pinpoint the exact time when node infections occur in historical diffusion processes. In many real-world diffusion processes like the spread of epidemics, monitoring such infection temporal information is often expensive and difficult. In this work, we study how to carry out diffusion network inference without infection timestamps, using only the final infection statuses of nodes in each historical diffusion process, which are more readily accessible in practice. Our main result is a probabilistic model that can find for each node an appropriate number of most probable parent nodes, who are most likely to have generated the historical infection results of the node. Extensive experiments on both synthetic and real-world networks are conducted, and the results verify the effectiveness and efficiency of our approach.
【Keywords】:
【Paper Link】 【Pages】:590-597
【Authors】: Jeremy Irvin ; Pranav Rajpurkar ; Michael Ko ; Yifan Yu ; Silviana Ciurea-Ilcus ; Chris Chute ; Henrik Marklund ; Behzad Haghgoo ; Robyn L. Ball ; Katie S. Shpanskaya ; Jayne Seekins ; David A. Mong ; Safwan S. Halabi ; Jesse K. Sandberg ; Ricky Jones ; David B. Larson ; Curtis P. Langlotz ; Bhavik N. Patel ; Matthew P. Lungren ; Andrew Y. Ng
【Abstract】: Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.
【Keywords】:
【Paper Link】 【Pages】:598-605
【Authors】: Songlei Jian ; Liang Hu ; Longbing Cao ; Kai Lu ; Hang Gao
【Abstract】: The formation of a complex network is highly driven by multi-aspect node influences and interactions, reflected on network structures and the content embodied in network nodes. Limited work has jointly modeled all these aspects, which typically focuses on topological structures but overlooks the heterogeneous interactions behind node linkage and contributions of node content to the interactive heterogeneities. Here, we propose a multi-aspect interaction and influence-unified evolutionary coupled system (MAI-ECS) for network representation by involving node content and linkage-based network structure. MAI-ECS jointly and iteratively learns two systems: a multi-aspect interaction learning system to capture heterogeneous hidden interactions between nodes and an influence propagation system to capture multiaspect node influences and their propagation between nodes. MAI-ECS couples, unifies and optimizes the two systems toward an effective representation of explicit node content and network structure, and implicit node interactions and influences. MAI-ECS shows superior performance in node classification and link prediction in comparison with the stateof-the-art methods on two real-world datasets. Further, we demonstrate the semantic interpretability of the results generated by MAI-ECS.
【Keywords】:
【Paper Link】 【Pages】:606-613
【Authors】: Muhammad Raza Khan ; Joshua E. Blumenstock
【Abstract】: With the rapid expansion of mobile phone networks in developing countries, large-scale graph machine learning has gained sudden relevance in the study of global poverty. Recent applications range from humanitarian response and poverty estimation to urban planning and epidemic containment. Yet the vast majority of computational tools and algorithms used in these applications do not account for the multi-view nature of social networks: people are related in myriad ways, but most graph learning models treat relations as binary. In this paper, we develop a graph-based convolutional network for learning on multi-view networks. We show that this method outperforms state-of-the-art semi-supervised learning algorithms on three different prediction tasks using mobile phone datasets from three different developing countries. We also show that, while designed specifically for use in poverty research, the algorithm also outperforms existing benchmarks on a broader set of learning tasks on multi-view networks, including node labelling in citation networks.
【Keywords】:
【Paper Link】 【Pages】:614-621
【Authors】: Arash Khodadadi ; Daniel J. McDonald
【Abstract】: Trends in terrestrial temperature variability are perhaps more relevant for species viability than trends in mean temperature. In this paper, we develop methodology for estimating such trends using multi-resolution climate data from polar orbiting weather satellites. We derive two novel algorithms for computation that are tailored for dense, gridded observations over both space and time. We evaluate our methods with a simulation that mimics these data’s features and on a large, publicly available, global temperature dataset with the eventual goal of tracking trends in cloud reflectance temperature variability.
【Keywords】:
【Paper Link】 【Pages】:622-629
【Authors】: Amanda Kube ; Sanmay Das ; Patrick J. Fowler
【Abstract】: Modern statistical and machine learning methods are increasingly capable of modeling individual or personalized treatment effects. These predictions could be used to allocate different interventions across populations based on individual characteristics. In many domains, like social services, the availability of different possible interventions can be severely resource limited. This paper considers possible improvements to the allocation of such services in the context of homelessness service provision in a major metropolitan area. Using data from the homeless system, we use a counterfactual approach to show potential for substantial benefits in terms of reducing the number of families who experience repeat episodes of homelessness by choosing optimal allocations (based on predicted outcomes) to a fixed number of beds in different types of homelessness service facilities. Such changes in the allocation mechanism would not be without tradeoffs, however; a significant fraction of households are predicted to have a higher probability of re-entry in the optimal allocation than in the original one. We discuss the efficiency, equity and fairness issues that arise and consider potential implications for policy.
【Keywords】:
【Paper Link】 【Pages】:630-638
【Authors】: Sawan Kumar ; Varsha Sreenivasan ; Partha P. Talukdar ; Franco Pestilli ; Devarajan Sridharan
【Abstract】: Diffusion imaging and tractography enable mapping structural connections in the human brain, in-vivo. Linear Fascicle Evaluation (LiFE) is a state-of-the-art approach for pruning spurious connections in the estimated structural connectome, by optimizing its fit to the measured diffusion data. Yet, LiFE imposes heavy demands on computing time, precluding its use in analyses of large connectome databases. Here, we introduce a GPU-based implementation of LiFE that achieves 50-100x speedups over conventional CPU-based implementations for connectome sizes of up to several million fibers. Briefly, the algorithm accelerates generalized matrix multiplications on a compressed tensor through efficient GPU kernels, while ensuring favorable memory access patterns. Leveraging these speedups, we advance LiFE’s algorithm by imposing a regularization constraint on estimated fiber weights during connectome pruning. Our regularized, accelerated, LiFE algorithm (“ReAl-LiFE”) estimates sparser connectomes that also provide more accurate fits to the underlying diffusion signal. We demonstrate the utility of our approach by classifying pathological signatures of structural connectivity in patients with Alzheimer’s Disease (AD). We estimated million fiber whole-brain connectomes, followed by pruning with ReAl-LiFE, for 90 individuals (45 AD patients and 45 healthy controls). Linear classifiers, based on support vector machines, achieved over 80% accuracy in classifying AD patients from healthy controls based on their ReAl-LiFE pruned structural connectomes alone. Moreover, classification based on the ReAl-LiFE pruned connectome outperformed both the unpruned connectome, as well as the LiFE pruned connectome, in terms of accuracy. We propose our GPU-accelerated approach as a widely relevant tool for non-negative least squares optimization, across many domains.
【Keywords】:
【Paper Link】 【Pages】:639-646
【Authors】: Chenchen Li ; Xiang Yan ; Xiaotie Deng ; Yuan Qi ; Wei Chu ; Le Song ; Junlong Qiao ; Jianshan He ; Junwu Xiong
【Abstract】: Current Internet market makers are facing an intense competitive environment, where personalized price reductions or discounted coupons are provided by their peers to attract more customers. Much investment is spent to catch up with each other’s competitors but participants in such a price cut war are often incapable of winning due to their lack of information about others’ strategies or customers’ preference. We formalize the problem as a stochastic game with imperfect and incomplete information and develop a variant of Latent Dirichlet Allocation (LDA) to infer latent variables under the current market environment, which represents preferences of customers and strategies of competitors. Tests on simulated experiments and an open dataset for real data show that, by subsuming all available market information of the market maker’s competitors, our model exhibits a significant improvement for understanding the market environment and finding the best response strategies in the Internet price war. Our work marks the first successful learning method to infer latent information in the environment of price war by the LDA modeling, and sets an example for related competitive applications to follow.
【Keywords】:
【Paper Link】 【Pages】:647-654
【Authors】: Mike Li ; Elija Perrier ; Chang Xu
【Abstract】: Geographic information systems’ (GIS) research is widely used within the social and physical sciences and plays a crucial role in the development and implementation by governments of economic, education, environment and transportation policy. While machine learning methods have been applied to GIS datasets, the uptake of powerful deep learning CNN methodologies has been limited in part due to challenges posed by the complex and often poorly structured nature of the data. In this paper, we demonstrate the utility of GCNNs for GIS analysis via a multi-graph hierarchical spatial-filter GCNN network model in the context of GIS systems to predict election outcomes using socio-economic features drawn from the 2016 Australian Census. We report a marked improvement in performance accuracy of Hierarchical GCNNs over benchmark generalised linear models and standard GCNNs, especially in semi-supervised tasks. These results indicate the widespread potential for GIS-GCNN research methods to enrich socio-economic GIS analysis, aiding the social sciences and policy development.
【Keywords】:
【Paper Link】 【Pages】:655-662
【Authors】: Shuailong Liang ; Olivia Nicol ; Yue Zhang
【Abstract】: Blame games tend to follow major disruptions, be they financial crises, natural disasters or terrorist attacks. To study how the blame game evolves and shapes the dominant crisis narratives is of great significance, as sense-making processes can affect regulatory outcomes, social hierarchies, and cultural norms. However, it takes tremendous time and efforts for social scientists to manually examine each relevant news article and extract the blame ties (A blames B). In this study, we define a new task, Blame Tie Extraction, and construct a new dataset related to the United States financial crisis (20072010) from The New York Times, The Wall Street Journal and USA Today. We build a Bi-directional Long Short-Term Memory (BiLSTM) network for contexts where the entities appear in and it learns to automatically extract such blame ties at the document level. Leveraging the large unsupervised model such as GloVe and ELMo, our best model achieves an F1 score of 70% on the test set for blame tie extraction, making it a useful tool for social scientists to extract blame ties more efficiently.
【Keywords】:
【Paper Link】 【Pages】:663-670
【Authors】: Matt Olfat ; Anil Aswani
【Abstract】: Though there is a growing literature on fairness for supervised learning, incorporating fairness into unsupervised learning has been less well-studied. This paper studies fairness in the context of principal component analysis (PCA). We first define fairness for dimensionality reduction, and our definition can be interpreted as saying a reduction is fair if information about a protected class (e.g., race or gender) cannot be inferred from the dimensionality-reduced data points. Next, we develop convex optimization formulations that can improve the fairness (with respect to our definition) of PCA and kernel PCA. These formulations are semidefinite programs, and we demonstrate their effectiveness using several datasets. We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates.
【Keywords】:
【Paper Link】 【Pages】:671-678
【Authors】: Victor R. Martinez ; Krishna Somandepalli ; Karan Singla ; Anil Ramakrishna ; Yalda T. Uhls ; Shrikanth S. Narayanan
【Abstract】: Violent content in movies can influence viewers’ perception of the society. For example, frequent depictions of certain demographics as perpetrators or victims of abuse can shape stereotyped attitudes. In this work, we propose to characterize aspects of violent content in movies solely from the language used in the scripts. This makes our method applicable to a movie in the earlier stages of content creation even before it is produced. This is complementary to previous works which rely on audio or video post production. Our approach is based on a broad range of features designed to capture lexical, semantic, sentiment and abusive language characteristics. We use these features to learn a vector representation for (1) complete movie, and (2) for an act in the movie. The former representation is used to train a movie-level classification model, and the latter, to train deep-learning sequence classifiers that make use of context. We tested our models on a dataset of 732 Hollywood scripts annotated by experts for violent content. Our performance evaluation suggests that linguistic features are a good indicator for violent content. Furthermore, our ablation studies show that semantic and sentiment features are the most important predictors of violence in this data. To date, we are the first to show the language used in movie scripts is a strong indicator of violent content. This offers novel computational tools to assist in creating awareness of storytelling.
【Keywords】:
【Paper Link】 【Pages】:679-686
【Authors】: Bowen Pan ; Shangfei Wang ; Qisheng Jiang
【Abstract】: The inherent connections among aesthetic attributes and aesthetics are crucial for image aesthetic assessment, but have not been thoroughly explored yet. In this paper, we propose a novel image aesthetic assessment assisted by attributes through both representation-level and label-level. The attributes are used as privileged information, which is only required during training. Specifically, we first propose a multitask deep convolutional rating network to learn the aesthetic score and attributes simultaneously. The attributes are explored to construct better feature representations for aesthetic assessment through multi-task learning. After that, we introduce a discriminator to distinguish the predicted attributes and aesthetics of the multi-task deep network from the ground truth label distribution embedded in the training data. The multi-task deep network wants to output aesthetic score and attributes as close to the ground truth labels as possible. Thus the deep network and the discriminator compete with each other. Through adversarial learning, the attributes are explored to enforce the distribution of the predicted attributes and aesthetics to converge to the ground truth label distribution. Experimental results on two benchmark databases demonstrate the superiority of the proposed method to state of the art work.
【Keywords】:
【Paper Link】 【Pages】:687-694
【Authors】: Hae Won Park ; Ishaan Grover ; Samuel Spaulding ; Louis Gomez ; Cynthia Breazeal
【Abstract】: Personalized education technologies capable of delivering adaptive interventions could play an important role in addressing the needs of diverse young learners at a critical time of school readiness. We present an innovative personalized social robot learning companion system that utilizes children’s verbal and nonverbal affective cues to modulate their engagement and maximize their long-term learning gains. We propose an affective reinforcement learning approach to train a personalized policy for each student during an educational activity where a child and a robot tell stories to each other. Using the personalized policy, the robot selects stories that are optimized for each child’s engagement and linguistic skill progression. We recruited 67 bilingual and English language learners between the ages of 4–6 years old to participate in a between-subjects study to evaluate our system. Over a three-month deployment in schools, a unique storytelling policy was trained to deliver a personalized story curriculum for each child in the Personalized group. We compared their engagement and learning outcomes to a Non-personalized group with a fixed curriculum robot, and a baseline group that had no robot intervention. In the Personalization condition, our results show that the affective policy successfully personalized to each child to boost their engagement and outcomes with respect to learning and retaining more target words as well as using more target syntax structures as compared to children in the other groups.
【Keywords】:
【Paper Link】 【Pages】:695-701
【Authors】: Hanan Rosemarin ; Ariel Rosenfeld ; Sarit Kraus
【Abstract】: Emergency Departments (EDs) provide an imperative source of medical care. Central to the ED workflow is the patientcaregiver scheduling, directed at getting the right patient to the right caregiver at the right time. Unfortunately, common ED scheduling practices are based on ad-hoc heuristics which may not be aligned with the complex and partially conflicting ED's objectives. In this paper, we propose a novel online deep-learning scheduling approach for the automatic assignment and scheduling of medical personnel to arriving patients. Our approach allows for the optimization of explicit, hospital-specific multi-variate objectives and takes advantage of available data, without altering the existing workflow of the ED. In an extensive empirical evaluation, using real-world data, we show that our approach can significantly improve an ED's performance metrics.
【Keywords】:
【Paper Link】 【Pages】:702-709
【Authors】: Tim G. J. Rudner ; Marc Rußwurm ; Jakub Fil ; Ramona Pelich ; Benjamin Bischke ; Veronika Kopacková ; Piotr Bilinski
【Abstract】: We propose a novel approach for rapid segmentation of flooded buildings by fusing multiresolution, multisensor, and multitemporal satellite imagery in a convolutional neural network. Our model significantly expedites the generation of satellite imagery-based flood maps, crucial for first responders and local authorities in the early stages of flood events. By incorporating multitemporal satellite imagery, our model allows for rapid and accurate post-disaster damage assessment and can be used by governments to better coordinate medium- and long-term financial assistance programs for affected areas. The network consists of multiple streams of encoder-decoder architectures that extract spatiotemporal information from medium-resolution images and spatial information from high-resolution images before fusing the resulting representations into a single medium-resolution segmentation map of flooded buildings. We compare our model to state-of-the-art methods for building footprint segmentation as well as to alternative fusion approaches for the segmentation of flooded buildings and find that our model performs best on both tasks. We also demonstrate that our model produces highly accurate segmentation maps of flooded buildings using only publicly available medium-resolution data instead of significantly more detailed but sparsely available very high-resolution data. We release the first open-source dataset of fully preprocessed and labeled multiresolution, multispectral, and multitemporal satellite images of disaster sites along with our source code.
【Keywords】:
【Paper Link】 【Pages】:710-717
【Authors】: Tavpritesh Sethi ; Anant Mittal ; Shubham Maheshwari ; Samarth Chugh
【Abstract】: Life-expectancy is a complex outcome driven by genetic, socio-demographic, environmental and geographic factors. Increasing socio-economic and health disparities in the United States are propagating the longevity-gap, making it a cause for concern. Earlier studies have probed individual factors but an integrated picture to reveal quantifiable actions has been missing. There is a growing concern about a further widening of healthcare inequality caused by Artificial Intelligence (AI) due to differential access to AI-driven services. Hence, it is imperative to explore and exploit the potential of AI for illuminating biases and enabling transparent policy decisions for positive social and health impact. In this work, we reveal actionable interventions for decreasing the longevitygap in the United States by analyzing a County-level data resource containing healthcare, socio-economic, behavioral, education and demographic features. We learn an ensembleaveraged structure, draw inferences using the joint probability distribution and extend it to a Bayesian Decision Network for identifying policy actions. We draw quantitative estimates for the impact of diversity, preventive-care quality and stablefamilies within the unified framework of our decision network. Finally, we make this analysis and dashboard available as an interactive web-application for enabling users and policy-makers to validate our reported findings and to explore the impact of ones beyond reported in this work.
【Keywords】:
【Paper Link】 【Pages】:718-725
【Authors】: Jakub Sliwinski ; Martin Strobel ; Yair Zick
【Abstract】: We study the following problem: given a labeled dataset and a specific datapoint ∼x, how did the i-th feature influence the classification for ∼x? We identify a family of numerical influence measures — functions that, given a datapoint ∼x, assign a numeric value φi(∼x) to every feature i, corresponding to how altering i’s value would influence the outcome for ∼x. This family, which we term monotone influence measures (MIM), is uniquely derived from a set of desirable properties, or axioms. The MIM family constitutes a provably sound methodology for measuring feature influence in classification domains; the values generated by MIM are based on the dataset alone, and do not make any queries to the classifier. While this requirement naturally limits the scope of our framework, we demonstrate its effectiveness on data.
【Keywords】:
【Paper Link】 【Pages】:726-733
【Authors】: Abby Stylianou ; Hong Xuan ; Maya Shende ; Jonathan Brandt ; Richard Souvenir ; Robert Pless
【Abstract】: Recognizing a hotel from an image of a hotel room is important for human trafficking investigations. Images directly link victims to places and can help verify where victims have been trafficked, and where their traffickers might move them or others in the future. Recognizing the hotel from images is challenging because of low image quality, uncommon camera perspectives, large occlusions (often the victim), and the similarity of objects (e.g., furniture, art, bedding) across different hotel rooms. To support efforts towards this hotel recognition task, we have curated a dataset of over 1 million annotated hotel room images from 50,000 hotels. These images include professionally captured photographs from travel websites and crowd-sourced images from a mobile application, which are more similar to the types of images analyzed in real-world investigations. We present a baseline approach based on a standard network architecture and a collection of data-augmentation approaches tuned to this problem domain.
【Keywords】:
【Paper Link】 【Pages】:734-741
【Authors】: Ruangsak Trakunphutthirak ; Yen Cheung ; Vincent C. S. Lee
【Abstract】: Educational data mining provides a way to predict student academic performance. A psychometric factor like time management is one of the major issues affecting Thai students’ academic performance. Current data sources used to predict students’ performance are limited to the manual collection of data or data from a single unit of study which cannot be generalised to indicate overall academic performance. This study uses an additional data source from a university log file to predict academic performance. It investigates the browsing categories and the Internet access activities of students with respect to their time management during their studies. A single source of data is insufficient to identify those students who are at-risk of failing in their academic studies. Furthermore, there is a paucity of recent empirical studies in this area to provide insights into the relationship between students’ academic performance and their Internet access activities. To contribute to this area of research, we employed two datasets such as web-browsing categories and Internet access activity types to select the best outcomes, and compared different weights in the time and frequency domains. We found that the random forest technique provides the best outcome in these datasets to identify those students who are at-risk of failure. We also found that data from their Internet access activities reveals more accurate outcomes than data from browsing categories alone. The combination of two datasets reveals a better picture of students’ Internet usage and thus identifies students who are academically at-risk of failure. Further work involves collecting more Internet access log file data, analysing it over a longer period and relating the period of data collection with events during the academic year.
【Keywords】:
【Paper Link】 【Pages】:742-749
【Authors】: Chun-Chen Tu ; Pai-Shun Ting ; Pin-Yu Chen ; Sijia Liu ; Huan Zhang ; Jinfeng Yi ; Cho-Jui Hsieh ; Shin-Ming Cheng
【Abstract】: Recent studies have shown that adversarial examples in state-of-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which may give a false sense of model robustness due to inefficient query designs. To bridge this gap, we propose a generic framework for query-efficient blackbox attacks. Our framework, AutoZOOM, which is short for Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient black-box attacks: (i) an adaptive random gradient estimation strategy to balance query counts and distortion, and (ii) an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration. Experimental results suggest that, by applying AutoZOOM to a state-of-the-art black-box attack (ZOO), a significant reduction in model queries can be achieved without sacrificing the attack success rate and the visual quality of the resulting adversarial examples. In particular, when compared to the standard ZOO method, AutoZOOM can consistently reduce the mean query counts in finding successful adversarial examples (or reaching the same distortion level) by at least 93% on MNIST, CIFAR-10 and ImageNet datasets, leading to novel insights on adversarial robustness.
【Keywords】:
【Paper Link】 【Pages】:750-757
【Authors】: Jill-Jênn Vie ; Hisashi Kashima
【Abstract】: Knowledge tracing is a sequence prediction problem where the goal is to predict the outcomes of students over questions as they are interacting with a learning platform. By tracking the evolution of the knowledge of some student, one can optimize instruction. Existing methods are either based on temporal latent variable models, or factor analysis with temporal features. We here show that factorization machines (FMs), a model for regression or classification, encompasses several existing models in the educational literature as special cases, notably additive factor model, performance factor model, and multidimensional item response theory. We show, using several real datasets of tens of thousands of users and items, that FMs can estimate student knowledge accurately and fast even when student data is sparsely observed, and handle side information such as multiple knowledge components and number of attempts at item or skill level. Our approach allows to fit student models of higher dimension than existing models, and provides a testbed to try new combinations of features in order to improve existing models.
【Keywords】:
【Paper Link】 【Pages】:758-765
【Authors】: Chaokun Wang ; Junchao Zhu
【Abstract】: Community search is an important problem in network analysis, which has attracted much attention in recent years. It starts with some given nodes, pays more attention to local network structures, and gets personalized resultant communities quickly. In this paper, we argue that there are many real scenarios where some nodes are not allowed to appear in the community. Then, we introduce a new concept called forbidden nodes and present a new problem of forbidden nodes aware community search to describe these scenarios.To address the above problem, three methods are proposed, i.e., k-core based FORTE (Forbidden nOdes awaRe communiTy sEarch), k-truss based FORTE and CW based FORTE, where the effects of both forbidden nodes and query nodes are thoroughly considered for each node in the resultant community. The former two methods are able to make use of popular community structures, while the latter is based on a new metric called weighted conductance. The extensive experiments conducted on real data sets demonstrate the effectiveness of the proposed methods.
【Keywords】:
【Paper Link】 【Pages】:766-773
【Authors】: Hao Wang ; Chengzhi Mao ; Hao He ; Mingmin Zhao ; Tommi S. Jaakkola ; Dina Katabi
【Abstract】: We consider the problem of inferring the values of an arbitrary set of variables (e.g., risk of diseases) given other observed variables (e.g., symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images or EEG). This is a common problem in healthcare since variables of interest often differ for different patients. Existing methods including Bayesian networks and structured prediction either do not incorporate high-dimensional signals or fail to model conditional dependencies among variables. To address these issues, we propose bidirectional inference networks (BIN), which stich together multiple probabilistic neural networks, each modeling a conditional dependency. Predictions are then made via iteratively updating variables using backpropagation (BP) to maximize corresponding posterior probability. Furthermore, we extend BIN to composite BIN (CBIN), which involves the iterative prediction process in the training stage and improves both accuracy and computational efficiency by adaptively smoothing the optimization landscape. Experiments on synthetic and real-world datasets (a sleep study and a dermatology dataset) show that CBIN is a single model that can achieve state-of-the-art performance and obtain better accuracy in most inference tasks than multiple models each specifically trained for a different task.
【Keywords】:
【Paper Link】 【Pages】:774-781
【Abstract】: Over 100 million packages are delivered every day in China due to the fast development of e-commerce. Precisely estimating the time of packages’ arrival (ETA) is significantly important to improving customers’ experience and raising the efficiency of package dispatching. Existing methods mainly focus on predicting the time from an origin to a destination. However, in package delivery problem, one trip contains multiple destinations and the delivery time of all destinations should be predicted at any time. Furthermore, the ETA is affected by many factors especially the sequence of the latest route, the regularity of the delivery pattern and the sequence of packages to be delivered, which are difficult to learn by traditional models. This paper proposed a novel spatial-temporal sequential neural network model (DeepETA) to take fully advantages of the above factors. DeepETA is an end-to-end network that mainly consists of three parts. First, the spatial encoding and the recurrent cells are proposed to capture the spatial-temporal and sequential features of the latest delivery route. Then, two attention-based layers are designed to indicate the most possible ETA from historical frequent and relative delivery routes based on the similarity of the latest route and the future destinations. Finally, a fully connected layer is utilized to jointly learn the delivery time. Experiments on real logistics dataset demonstrate that the proposed approach has outperforming results.
【Keywords】:
【Paper Link】 【Pages】:782-790
【Authors】: Mike Wu ; Milan Mosse ; Noah Goodman ; Chris Piech
【Abstract】: In modern computer science education, massive open online courses (MOOCs) log thousands of hours of data about how students solve coding challenges. Being so rich in data, these platforms have garnered the interest of the machine learning community, with many new algorithms attempting to autonomously provide feedback to help future students learn. But what about those first hundred thousand students? In most educational contexts (i.e. classrooms), assignments do not have enough historical data for supervised learning. In this paper, we introduce a human-in-the-loop “rubric sampling” approach to tackle the “zero shot” feedback challenge. We are able to provide autonomous feedback for the first students working on an introductory programming assignment with accuracy that substantially outperforms data-hungry algorithms and approaches human level fidelity. Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student’s solution and can articulate a student’s misconceptions in the language of the instructor. Deep learning inference enables rubric sampling to further improve as more assignment specific student data is acquired. We demonstrate our results on a novel dataset from Code.org, the world’s largest programming education platform.
【Keywords】:
【Paper Link】 【Pages】:791-800
【Authors】: Seunghyun Yoon ; Kunwoo Park ; Joongbo Shin ; Hongjun Lim ; Seungpil Won ; Meeyoung Cha ; Kyomin Jung
【Abstract】: Some news headlines mislead readers with overrated or false information, and identifying them in advance will better assist readers in choosing proper news stories to consume. This research introduces million-scale pairs of news headline and body text dataset with incongruity label, which can uniquely be utilized for detecting news stories with misleading headlines. On this dataset, we develop two neural networks with hierarchical architectures that model a complex textual representation of news articles and measure the incongruity between the headline and the body text. We also present a data augmentation method that dramatically reduces the text input size a model handles by independently investigating each paragraph of news stories, which further boosts the performance. Our experiments and qualitative evaluations demonstrate that the proposed methods outperform existing approaches and efficiently detect news stories with misleading headlines in the real world.
【Keywords】:
【Paper Link】 【Pages】:801-808
【Authors】: Shuaitao Zhang ; Yuliang Liu ; Lianwen Jin ; Yaoxiong Huang ; Songxuan Lai
【Abstract】: A new method is proposed for removing text from natural images. The challenge is to first accurately localize text on the stroke-level and then replace it with a visually plausible background. Unlike previous methods that require image patches to erase scene text, our method, namely ensconce network (EnsNet), can operate end-to-end on a single image without any prior knowledge. The overall structure is an end-to-end trainable FCN-ResNet-18 network with a conditional generative adversarial network (cGAN). The feature of the former is first enhanced by a novel lateral connection structure and then refined by four carefully designed losses: multiscale regression loss and content loss, which capture the global discrepancy of different level features; texture loss and total variation loss, which primarily target filling the text region and preserving the reality of the background. The latter is a novel local-sensitive GAN, which attentively assesses the local consistency of the text erased regions. Both qualitative and quantitative sensitivity experiments on synthetic images and the ICDAR 2013 dataset demonstrate that each component of the EnsNet is essential to achieve a good performance. Moreover, our EnsNet can significantly outperform previous state-of-the-art methods in terms of all metrics. In addition, a qualitative experiment conducted on the SBMNet dataset further demonstrates that the proposed method can also preform well on general object (such as pedestrians) removal tasks. EnsNet is extremely fast, which can preform at 333 fps on an i5-8600 CPU device.
【Keywords】:
【Paper Link】 【Pages】:809-816
【Authors】: Rongchang Zhao ; Wangmin Liao ; Beiji Zou ; Zailiang Chen ; Shuo Li
【Abstract】: Evidence identification, optic disc segmentation and automated glaucoma diagnosis are the most clinically significant tasks for clinicians to assess fundus images. However, delivering the three tasks simultaneously is extremely challenging due to the high variability of fundus structure and lack of datasets with complete annotations. In this paper, we propose an innovative Weakly-Supervised Multi-Task Learning method (WSMTL) for accurate evidence identification, optic disc segmentation and automated glaucoma diagnosis. The WSMTL method only uses weak-label data with binary diagnostic labels (normal/glaucoma) for training, while obtains pixel-level segmentation mask and diagnosis for testing. The WSMTL is constituted by a skip and densely connected CNN to capture multi-scale discriminative representation of fundus structure; a well-designed pyramid integration structure to generate high-resolution evidence map for evidence identification, in which the pixels with higher value represent higher confidence to highlight the abnormalities; a constrained clustering branch for optic disc segmentation; and a fully-connected discriminator for automated glaucoma diagnosis. Experimental results show that our proposed WSMTL effectively and simultaneously delivers evidence identification, optic disc segmentation (89.6% TP Dice), and accurate glaucoma diagnosis (92.4% AUC). This endows our WSMTL a great potential for the effective clinical assessment of glaucoma.
【Keywords】:
【Paper Link】 【Pages】:817-824
【Authors】: Sendong Zhao ; Ting Liu ; Sicheng Zhao ; Fei Wang
【Abstract】: State-of-the-art studies have demonstrated the superiority of joint modeling over pipeline implementation for medical named entity recognition and normalization due to the mutual benefits between the two processes. To exploit these benefits in a more sophisticated way, we propose a novel deep neural multi-task learning framework with explicit feedback strategies to jointly model recognition and normalization. On one hand, our method benefits from the general representations of both tasks provided by multi-task learning. On the other hand, our method successfully converts hierarchical tasks into a parallel multi-task setting while maintaining the mutual supports between tasks. Both of these aspects improve the model performance. Experimental results demonstrate that our method performs significantly better than state-of-theart approaches on two publicly available medical literature datasets.
【Keywords】:
【Paper Link】 【Pages】:825-832
【Authors】: Ruobing Zheng ; Ze Luo ; Baoping Yan
【Abstract】: Characterizing wildlife habitat is one of the main topics in animal ecology. Locational data obtained from radio tracking and field observation are widely used in habitat analysis. However, such sampling methods are costly and laborious, and insufficient relocations often prevent scientists from conducting large-range and long-term research. In this paper, we innovatively exploit the image-to-image translation technology to expand the range of wildlife habitat analysis. We proposed a novel approach for implementing time-series imageto-image translation via metric embedding. A siamese neural network is used to learn the Euclidean temporal embedding from the image space. This embedding produces temporal vectors which bring time information into the adversarial network. The well-trained framework could effectively map the probabilistic habitat models from remote sensing imagery, helping scientists get rid of the persistent dependence on animal relocations. We illustrate our approach in a real-world application for mapping the habitats of Bar-headed Geese at Qinghai Lake breeding ground. We compare our model against several baselines and achieve promising results.
【Keywords】:
【Paper Link】 【Pages】:834-841
【Authors】: Karan Aggarwal ; Shafiq R. Joty ; Luis Fernández-Luque ; Jaideep Srivastava
【Abstract】: Sufficient physical activity and restful sleep play a major role in the prevention and cure of many chronic conditions. Being able to proactively screen and monitor such chronic conditions would be a big step forward for overall health. The rapid increase in the popularity of wearable devices pro-vides a significant new source, making it possible to track the user’s lifestyle real-time. In this paper, we propose a novel unsupervised representation learning technique called activ-ity2vecthat learns and “summarizes” the discrete-valued ac-tivity time-series. It learns the representations with three com-ponents: (i) the co-occurrence and magnitude of the activ-ity levels in a time-segment, (ii) neighboring context of the time-segment, and (iii) promoting subject-invariance with ad-versarial training. We evaluate our method on four disorder prediction tasks using linear classifiers. Empirical evaluation demonstrates that our proposed method scales and performs better than many strong baselines. The adversarial regime helps improve the generalizability of our representations by promoting subject invariant features. We also show that using the representations at the level of a day works the best since human activity is structured in terms of daily routines.
【Keywords】:
【Paper Link】 【Pages】:842-849
【Authors】: Jacob Baldwin ; Ryan Burnham ; Andrew Meyer ; Robert Dora ; Robert Wright
【Abstract】: Deep learning based automatic feature extraction methods have radically transformed speaker identification and facial recognition. Current approaches are typically specialized for individual domains, such as Deep Vectors (D-Vectors) for speaker identification. We provide two distinct contributions: a generalized framework for biometric verification inspired by D-Vectors and novel models that outperform current stateof-the-art approaches. Our approach supports substitution of various feature extraction models and improves the robustness of verification tests across domains. We demonstrate the framework and models for two different behavioral biometric verification problems: keystroke and mobile gait. We present a comprehensive empirical analysis comparing our framework to the state-of-the-art in both domains. Our models perform verification with higher accuracy using orders of magnitude less data than state-of-the-art approaches in both domains. We believe that the combination of high accuracy and practical data requirements will enable application of behavioral biometric models outside of the laboratory in support of much-needed improvements to cyber security.
【Keywords】:
【Paper Link】 【Pages】:850-857
【Authors】: Gissella Bejarano ; David DeFazio ; Arti Ramesh
【Abstract】: Thoroughly understanding how energy consumption is disaggregated into individual appliances can help reduce household expenses, integrate renewable sources of energy, and lead to efficient use of energy. In this work, we propose a deep latent generative model based on variational recurrent neural networks (VRNNs) for energy disaggregation. Our model jointly disaggregates the aggregated energy signal into individual appliance signals, achieving superior performance when compared to the state-of-the-art models for energy disaggregation, yielding a 29% and 41% performance improvement on two energy datasets, respectively, without explicitly encoding temporal/contextual information or heuristics. Our model also achieves better prediction performance on lowpower appliances, paving the way for a more nuanced disaggregation model. The structured output prediction in our model helps in accurately discerning which appliance(s) contribute to the aggregated power consumption, thus providing a more useful and meaningful disaggregation model.
【Keywords】:
【Paper Link】 【Pages】:858-864
【Authors】: Fiammetta Caccavale ; Anders Søgaard
【Abstract】: One dimension of modernist poetry is introducing entities in surprising contexts, such as wheelbarrow in Bob Dylan’s feel like falling in love with the first woman I meet/ putting her in a wheelbarrow. This paper considers the problem of teaching a neural language model to select poetic entities, based on local context windows. We do so by fine-tuning and evaluating language models on the poetry of American modernists, both on seen and unseen poets, and across a range of experimental designs. We also compare the performance of our poetic language model to human, professional poets. Our main finding is that, perhaps surprisingly, modernist poetry differs most from ordinary language when entities are concrete, like wheelbarrow, and while our fine-tuning strategy successfully adapts to poetic language in general, outperforming professional poets, the biggest error reduction is observed with concrete entities.
【Keywords】:
【Paper Link】 【Pages】:865-872
【Authors】: Cheng Chen ; Qi Dou ; Hao Chen ; Jing Qin ; Pheng-Ann Heng
【Abstract】: This paper presents a novel unsupervised domain adaptation framework, called Synergistic Image and Feature Adaptation (SIFA), to effectively tackle the problem of domain shift. Domain adaptation has become an important and hot topic in recent studies on deep learning, aiming to recover performance degradation when applying the neural networks to new testing domains. Our proposed SIFA is an elegant learning diagram which presents synergistic fusion of adaptations from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features towards the segmentation task. The feature encoder layers are shared by both perspectives to grasp their mutual benefits during the end-to-end learning procedure. Without using any annotation from the target domain, the learning of our unified model is guided by adversarial losses, with multiple discriminators employed from various aspects. We have extensively validated our method with a challenging application of crossmodality medical image segmentation of cardiac structures. Experimental results demonstrate that our SIFA model recovers the degraded performance from 17.2% to 73.0%, and outperforms the state-of-the-art methods by a significant margin.
【Keywords】:
【Paper Link】 【Pages】:873-880
【Authors】: Lisi Chen ; Shuo Shang
【Abstract】: Massive amount of spatio-temporal data that contain location and text content are being generated by location-based social media. These spatio-temporal messages cover a wide range of topics. It is of great significance to discover local trending topics based on users’ location-based and topicbased requirements. We develop a region-based message exploration mechanism that retrieve spatio-temporal message clusters from a stream of spatio-temporal messages based on users’ preferences on message topic and message spatial distribution. Additionally, we propose a region summarization algorithm that finds a subset of representative messages in a cluster to summarize the topics and the spatial attributes of messages in the cluster. We evaluate the efficacy and efficiency of our proposal on two real-world datasets and the results demonstrate that our solution is capable of high efficiency and effectiveness compared with baselines.
【Keywords】:
【Paper Link】 【Pages】:881-889
【Authors】: Michael Dann ; Fabio Zambetta ; John Thangarajah
【Abstract】: Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari games with sparse rewards. Our agent’s performance is comparable to that of state-of-the-art methods, demonstrating the usefulness of the subgoals found.
【Keywords】:
【Paper Link】 【Pages】:890-897
【Authors】: Zulong Diao ; Xin Wang ; Dafang Zhang ; Yingru Liu ; Kun Xie ; Shaoyao He
【Abstract】: Graph convolutional neural networks (GCNN) have become an increasingly active field of research. It models the spatial dependencies of nodes in a graph with a pre-defined Laplacian matrix based on node distances. However, in many application scenarios, spatial dependencies change over time, and the use of fixed Laplacian matrix cannot capture the change. To track the spatial dependencies among traffic data, we propose a dynamic spatio-temporal GCNN for accurate traffic forecasting. The core of our deep learning framework is the finding of the change of Laplacian matrix with a dynamic Laplacian matrix estimator. To enable timely learning with a low complexity, we creatively incorporate tensor decomposition into the deep learning framework, where real-time traffic data are decomposed into a global component that is stable and depends on long-term temporal-spatial traffic relationship and a local component that captures the traffic fluctuations. We propose a novel design to estimate the dynamic Laplacian matrix of the graph with above two components based on our theoretical derivation, and introduce our design basis. The forecasting performance is evaluated with two realtime traffic datasets. Experiment results demonstrate that our network can achieve up to 25% accuracy improvement.
【Keywords】:
【Paper Link】 【Pages】:898-905
【Authors】: Wei Feng ; Wentao Liu ; Tong Li ; Jing Peng ; Chen Qian ; Xiaolin Hu
【Abstract】: Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects’ localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i.e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.
【Keywords】:
【Paper Link】 【Pages】:906-913
【Authors】: Yanjie Fu ; Pengyang Wang ; Jiadi Du ; Le Wu ; Xiaolin Li
【Abstract】: Urban regions are places where people live, work, consume, and entertain. In this study, we investigate the problem of learning an embedding space for regions. Studying the representations of regions can help us to better understand the patterns, structures, and dynamics of cities, support urban planning, and, ultimately, to make our cities more livable and sustainable. While some efforts have been made for learning the embeddings of regions, existing methods can be improved by incorporating locality-constrained spatial autocorrelations into an encode-decode framework. Such embedding strategy is capable of taking into account both intra-region structural information and inter-region spatial autocorrelations. To this end, we propose to learn the representations of regions via a new embedding strategy with awareness of locality-constrained spatial autocorrelations. Specifically, we first construct multi-view (i.e., distance and mobility connectivity) POI-POI networks to represent regions. In addition, we introduce two properties into region embedding: (i) spatial autocorrelations: a global similarity between regions; (ii) top-k locality: spatial autocorrelations locally and approximately reside on top k most autocorrelated regions. We propose a new encoder-decoder based formulation that preserves the two properties while remaining efficient. As an application, we exploit the learned embeddings to predict the mobile checkin popularity of regions. Finally, extensive experiments with real-world urban region data demonstrate the effectiveness and efficiency of our method.
【Keywords】:
【Paper Link】 【Pages】:914-921
【Authors】: Susobhan Ghosh ; Easwar Subramanian ; Sanjay P. Bhat ; Sujit Gujar ; Praveen Paruchuri
【Abstract】: A smart grid is an efficient and sustainable energy system that integrates diverse generation entities, distributed storage capacity, and smart appliances and buildings. A smart grid brings new kinds of participants in the energy market served by it, whose effect on the grid can only be determined through high fidelity simulations. Power TAC offers one such simulation platform using real-world weather data and complex state-of-the-art customer models. In Power TAC, autonomous energy brokers compete to make profits across tariff, wholesale and balancing markets while maintaining the stability of the grid. In this paper, we design an autonomous broker VidyutVanika, the runner-up in the 2018 Power TAC competition. VidyutVanika relies on reinforcement learning (RL) in the tariff market and dynamic programming in the wholesale market to solve modified versions of known Markov Decision Process (MDP) formulations in the respective markets. The novelty lies in defining the reward functions for MDPs, solving these MDPs, and the application of these solutions to real actions in the market. Unlike previous participating agents, VidyutVanika uses a neural network to predict the energy consumption of various customers using weather data. We use several heuristic ideas to bridge the gap between the restricted action spaces of the MDPs and the much more extensive action space available to VidyutVanika. These heuristics allow VidyutVanika to convert near-optimal fixed tariffs to time-of-use tariffs aimed at mitigating transmission capacity fees, spread out its orders across several auctions in the wholesale market to procure energy at a lower price, more accurately estimate parameters required for implementing the MDP solution in the wholesale market, and account for wholesale procurement costs while optimizing tariffs. We use Power TAC 2018 tournament data and controlled experiments to analyze the performance of VidyutVanika, and illustrate the efficacy of the above strategies.
【Keywords】:
【Paper Link】 【Pages】:922-929
【Authors】: Shengnan Guo ; Youfang Lin ; Ning Feng ; Chao Song ; Huaiyu Wan
【Abstract】: Forecasting the traffic flows is a critical issue for researchers and practitioners in the field of transportation. However, it is very challenging since the traffic flows usually show high nonlinearities and complex patterns. Most existing traffic flow prediction methods, lacking abilities of modeling the dynamic spatial-temporal correlations of traffic data, thus cannot yield satisfactory prediction results. In this paper, we propose a novel attention based spatial-temporal graph convolutional network (ASTGCN) model to solve traffic flow forecasting problem. ASTGCN mainly consists of three independent components to respectively model three temporal properties of traffic flows, i.e., recent, daily-periodic and weekly-periodic dependencies. More specifically, each component contains two major parts: 1) the spatial-temporal attention mechanism to effectively capture the dynamic spatialtemporal correlations in traffic data; 2) the spatial-temporal convolution which simultaneously employs graph convolutions to capture the spatial patterns and common standard convolutions to describe the temporal features. The output of the three components are weighted fused to generate the final prediction results. Experiments on two real-world datasets from the Caltrans Performance Measurement System (PeMS) demonstrate that the proposed ASTGCN model outperforms the state-of-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:930-937
【Authors】: Rahul Gupta ; Aditya Kanade ; Shirish K. Shevade
【Abstract】: Novice programmers often struggle with the formal syntax of programming languages. In the traditional classroom setting, they can make progress with the help of real time feedback from their instructors which is often impossible to get in the massive open online course (MOOC) setting. Syntactic error repair techniques have huge potential to assist them at scale. Towards this, we design a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, program text itself, without either supervision or any prior knowledge of the formal syntax of the programming language. We evaluate our technique on a publicly available dataset containing 6975 erroneous C programs with typographic errors, written by students during an introductory programming course. Our technique fixes 1699 (24.4%) programs completely and 1310 (18.8%) program partially, outperforming DeepFix, a state-of-the-art syntactic error repair technique, which uses a fully supervised neural machine translation approach.
【Keywords】:
【Paper Link】 【Pages】:938-945
【Authors】: Yu Hao ; Xien Liu ; Ji Wu ; Ping Lv
【Abstract】: Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.
【Keywords】:
【Paper Link】 【Pages】:946-953
【Authors】: Binbin Hu ; Zhiqiang Zhang ; Chuan Shi ; Jun Zhou ; Xiaolong Li ; Yuan Qi
【Abstract】: As one of the major frauds in financial services, cash-out fraud is that users pursue cash gains with illegal or insincere means. Conventional solutions for the cash-out user detection are to perform subtle feature engineering for each user and then apply a classifier, such as GDBT and Neural Network. However, users in financial services have rich interaction relations, which are seldom fully exploited by conventional solutions. In this paper, with the real datasets in Ant Credit Pay of Ant Financial Services Group, we first study the cashout user detection problem and propose a novel hierarchical attention mechanism based cash-out user detection model, called HACUD. Specifically, we model different types of objects and their rich attributes and interaction relations in the scenario of credit payment service with an Attributed Heterogeneous Information Network (AHIN). The HACUD model enhances feature representation of objects through meta-path based neighbors exploiting different aspects of structure information in AHIN. Furthermore, a hierarchical attention mechanism is elaborately designed to model user’s preferences towards attributes and meta-paths. Experimental results on two real datasets show that the HACUD outperforms the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:954-961
【Authors】: Shiyu Huang ; Hang Su ; Jun Zhu ; Ting Chen
【Abstract】: Deep reinforcement learning (DRL) has achieved surpassing human performance on Atari games, using raw pixels and rewards to learn everything. However, first-person-shooter (FPS) games in 3D environments contain higher levels of human concepts (enemy, weapon, spatial structure, etc.) and a large action space. In this paper, we explore a novel method which can plan on temporally-extended action sequences, which we refer as Combo-Action to compress the action space. We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions. Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games. Extensive experiments show that our method is efficient in training process and outperforms previous stateof-the-art approaches by a large margin. Ablation study experiments also indicate that our method can boost the performance of the FPS agent in a reasonable way.
【Keywords】:
【Paper Link】 【Pages】:962-969
【Authors】: Steve T. K. Jan ; Joseph Messou ; Yen-Chen Lin ; Jia-Bin Huang ; Gang Wang
【Abstract】: While deep learning models have achieved unprecedented success in various domains, there is also a growing concern of adversarial attacks against related applications. Recent results show that by adding a small amount of perturbations to an image (imperceptible to humans), the resulting adversarial examples can force a classifier to make targeted mistakes. So far, most existing works focus on crafting adversarial examples in the digital domain, while limited efforts have been devoted to understanding the physical domain attacks. In this work, we explore the feasibility of generating robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. To validate our method, we conduct a large-scale physical-domain experiment, which involves manually taking more than 3000 physical domain photos. The results show that our method outperforms existing ones by a large margin and demonstrates a high level of robustness and transferability.
【Keywords】:
【Paper Link】 【Pages】:970-977
【Authors】: Jan Karwowski ; Jacek Mandziuk ; Adam Zychowski ; Filip Grajek ; Bo An
【Abstract】: This paper introduces a new type of Security Games (SG) played on a plane with targets moving along predefined straight line trajectories and its respective Mixed Integer Linear Programming (MILP) formulation. Three approaches for solving the game are proposed and experimentally evaluated: application of an MILP solver to finding exact solutions for small-size games, MILP-based extension of recently published zero-sum SG approach to the case of generalsum games for finding approximate solutions of medium-size games, and the use of Memetic Algorithm (MA) for mediumsize and large-size game instances, which are beyond MILP’s scalability. Utilization of MA is, to the best of our knowledge, a new idea in the field of SG. The novelty of proposed solution lies specifically in efficient chromosome-based game encoding and dedicated local improvement heuristics. In vast majority of test cases with known equilibrium profiles, the method leads to optimal solutions with high stability and approximately linear time scalability. Another advantage is an iteration-based construction of the system, which makes the approach essentially an anytime method. This property is of paramount importance in case of restrictive time limits, which could hinder the possibility of calculating an exact solution. On a general note, we believe that MA-based methods may offer a viable alternative to MILP solvers for complex games that require application of approximate solving methods.
【Keywords】:
【Paper Link】 【Pages】:978-985
【Authors】: Hoon Kim ; Kangwook Lee ; Gyeongjo Hwang ; Changho Suh
【Abstract】: Developing a computer vision-based algorithm for identifying dangerous vehicles requires a large amount of labeled accident data, which is difficult to collect in the real world. To tackle this challenge, we first develop a synthetic data generator built on top of a driving simulator. We then observe that the synthetic labels that are generated based on simulation results are very noisy, resulting in poor classification performance. In order to improve the quality of synthetic labels, we propose a new label adaptation technique that first extracts internal states of vehicles from the underlying driving simulator, and then refines labels by predicting future paths of vehicles based on a well-studied motion model. Via real-data experiments, we show that our dangerous vehicle classifier can reduce the missed detection rate by at least 18.5% compared with those trained with real data when time-to-collision is between 1.6s and 1.8s.
【Keywords】:
【Paper Link】 【Pages】:986-995
【Authors】: John Krumm ; Eric Horvitz
【Abstract】: Taking speed reports from vehicles is a proven, inexpensive way to infer traffic conditions. However, due to concerns about privacy and bandwidth, not every vehicle occupant may want to transmit data about their location and speed in real time. We show how to drastically reduce the number of transmissions in two ways, both based on a Markov random field for modeling traffic speed and flow. First, we show that a only a small number of vehicles need to report from each location. We give a simple, probabilistic method that lets a group of vehicles decide on which subset will transmit a report, preserving privacy by coordinating without any communication. The second approach computes the potential value of any location’s speed report, emphasizing those reports that will most affect the overall speed inferences, and omitting those that contribute little value. Both methods significantly reduce the amount of communication necessary for accurate speed inferences on a road network.
【Keywords】:
【Paper Link】 【Pages】:996-1003
【Authors】: Chaozhuo Li ; Senzhang Wang ; Yukun Wang ; Philip S. Yu ; Yanbo Liang ; Yun Liu ; Zhoujun Li
【Abstract】: Nowadays, it is common for one natural person to join multiple social networks to enjoy different kinds of services. Linking identical users across multiple social networks, also known as social network alignment, is an important problem of great research challenges. Existing methods usually link social identities on the pairwise sample level, which may lead to undesirable performance when the number of available annotations is limited. Motivated by the isomorphism information, in this paper we consider all the identities in a social network as a whole and perform social network alignment from the distribution level. The insight is that we aim to learn a projection function to not only minimize the distance between the distributions of user identities in two social networks, but also incorporate the available annotations as the learning guidance. We propose three models SNNAu, SNNAb and SNNAo to learn the projection function under the weakly-supervised adversarial learning framework. Empirically, we evaluate the proposed models over multiple datasets, and the results demonstrate the superiority of our proposals.
【Keywords】:
【Paper Link】 【Pages】:1004-1011
【Authors】: Youru Li ; Zhenfeng Zhu ; Deqiang Kong ; Meixiang Xu ; Yao Zhao
【Abstract】: Bike-sharing systems, aiming at meeting the public’s need for ”last mile” transportation, are becoming popular in recent years. With an accurate demand prediction model, shared bikes, though with a limited amount, can be effectively utilized whenever and wherever there are travel demands. Despite that some deep learning methods, especially long shortterm memory neural networks (LSTMs), can improve the performance of traditional demand prediction methods only based on temporal representation, such improvement is limited due to a lack of mining complex spatial-temporal relations. To address this issue, we proposed a novel model named STG2Vec to learn the representation from heterogeneous spatial-temporal graph. Specifically, we developed an event-flow serializing method to encode the evolution of dynamic heterogeneous graph into a special language pattern such as word sequence in a corpus. Furthermore, a dynamic attention-based graph embedding model is introduced to obtain an importance-awareness vectorized representation of the event flow. Additionally, together with other multi-source information such as geographical position, historical transition patterns and weather, e.g., the representation learned by STG2Vec can be fed into the LSTMs for temporal modeling. Experimental results from Citi-Bike electronic usage records dataset in New York City have illustrated that the proposed model can achieve competitive prediction performance compared with its variants and other baseline models.
【Keywords】:
【Paper Link】 【Pages】:1012-1019
【Authors】: Zhongnian Li ; Tao Zhang ; Peng Wan ; Daoqiang Zhang
【Abstract】: Generative Adversarial Networks (GANs) are powerful tools for reconstructing Compressed Sensing Magnetic Resonance Imaging (CS-MRI). However most recent works lack exploration of structure information of MRI images that is crucial for clinical diagnosis. To tackle this problem, we propose the Structure-Enhanced GAN (SEGAN) that aims at restoring structure information at both local and global scale. SEGAN defines a new structure regularization called Patch Correlation Regularization (PCR) which allows for efficient extraction of structure information. In addition, to further enhance the ability to uncover structure information, we propose a novel generator SU-Net by incorporating multiple-scale convolution filters into each layer. Besides, we theoretically analyze the convergence of stochastic factors contained in training process. Experimental results show that SEGAN is able to learn target structure information and achieves state-of-theart performance for CS-MRI reconstruction.
【Keywords】:
【Paper Link】 【Pages】:1020-1027
【Authors】: Ziqian Lin ; Jie Feng ; Ziyang Lu ; Yong Li ; Depeng Jin
【Abstract】: Crowd flow prediction is of great importance in a wide range of applications from urban planning, traffic control to public safety. It aims to predict the inflow (the traffic of crowds entering a region in a given time interval) and outflow (the traffic of crowds leaving a region for other places) of each region in the city with knowing the historical flow data. In this paper, we propose DeepSTN+, a deep learning-based convolutional model, to predict crowd flows in the metropolis. First, DeepSTN+ employs the ConvPlus structure to model the longrange spatial dependence among crowd flows in different regions. Further, PoI distributions and time factor are combined to express the effect of location attributes to introduce prior knowledge of the crowd movements. Finally, we propose an effective fusion mechanism to stabilize the training process, which further improves the performance. Extensive experimental results based on two real-life datasets demonstrate the superiority of our model, i.e., DeepSTN+ reduces the error of the crowd flow prediction by approximately 8%∼13% compared with the state-of-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:1028-1035
【Authors】: Aishan Liu ; Xianglong Liu ; Jiaxin Fan ; Yuqing Ma ; Anlan Zhang ; Huiyuan Xie ; Dacheng Tao
【Abstract】: Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Recently, adversarial patch, with noise confined to a small and localized patch, emerged for its easy accessibility in real-world. However, existing attack strategies are still far from generating visually natural patches with strong attacking ability, since they often ignore the perceptual sensitivity of the attacked network to the adversarial patch, including both the correlations with the image context and the visual attention. To address this problem, this paper proposes a perceptual-sensitive generative adversarial network (PS-GAN) that can simultaneously enhance the visual fidelity and the attacking ability for the adversarial patch. To improve the visual fidelity, we treat the patch generation as a patch-to-patch translation via an adversarial process, feeding any types of seed patch and outputting the similar adversarial patch with high perceptual correlation with the attacked image. To further enhance the attacking ability, an attention mechanism coupled with adversarial generation is introduced to predict the critical attacking areas for placing the patches, which can help producing more realistic and aggressive patches. Extensive experiments under semi-whitebox and black-box settings on two large-scale datasets GTSRB and ImageNet demonstrate that the proposed PS-GAN outperforms state-of-the-art adversarial patch attack methods.
【Keywords】:
【Paper Link】 【Pages】:1036-1043
【Authors】: Hao Liu ; Ting Li ; Renjun Hu ; Yanjie Fu ; Jingjing Gu ; Hui Xiong
【Abstract】: Multi-modal transportation recommendation has a goal of recommending a travel plan which considers various transportation modes, such as walking, cycling, automobile, and public transit, and how to connect among these modes. The successful development of multi-modal transportation recommendation systems can help to satisfy the diversified needs of travelers and improve the efficiency of transport networks. However, existing transport recommender systems mainly focus on unimodal transport planning. To this end, in this paper, we propose a joint representation learning framework for multi-modal transportation recommendation based on a carefully-constructed multi-modal transportation graph. Specifically, we first extract a multi-modal transportation graph from large-scale map query data to describe the concurrency of users, Origin-Destination (OD) pairs, and transport modes. Then, we provide effective solutions for the optimization problem and develop an anchor embedding for transport modes to initialize the embeddings of transport modes. Moreover, we infer user relevance and OD pair relevance, and incorporate them to regularize the representation learning. Finally, we exploit the learned representations for online multimodal transportation recommendations. Indeed, our method has been deployed into one of the largest navigation Apps to serve hundreds of millions of users, and extensive experimental results with real-world map query data demonstrate the enhanced performance of the proposed method for multimodal transportation recommendations.
【Keywords】:
【Paper Link】 【Pages】:1044-1051
【Authors】: Xiao Liu ; Xiaoting Li ; Rupesh Prajapati ; Dinghao Wu
【Abstract】: Compilers are among the most fundamental programming tools for building software. However, production compilers remain buggy. Fuzz testing is often leveraged with newlygenerated, or mutated inputs in order to find new bugs or security vulnerabilities. In this paper, we propose a grammarbased fuzzing tool called DEEPFUZZ. Based on a generative Sequence-to-Sequence model, DEEPFUZZ automatically and continuously generates well-formed C programs. We use this set of new C programs to fuzz off-the-shelf C compilers, e.g., GCC and Clang/LLVM. We present a detailed case study to analyze the success rate and coverage improvement of the generated C programs for fuzz testing. We analyze the performance of DEEPFUZZ with three types of sampling methods as well as three types of generation strategies. Consequently, DEEPFUZZ improved the testing efficacy in regards to the line, function, and branch coverage. In our preliminary study, we found and reported 8 bugs of GCC, all of which are actively being addressed by developers.
【Keywords】:
【Paper Link】 【Pages】:1052-1060
【Authors】: Chengqiang Lu ; Qi Liu ; Chao Wang ; Zhenya Huang ; Peize Lin ; Lixin He
【Abstract】: Predicting molecular properties (e.g., atomization energy) is an essential issue in quantum chemistry, which could speed up much research progress, such as drug designing and substance discovery. Traditional studies based on density functional theory (DFT) in physics are proved to be time-consuming for predicting large number of molecules. Recently, the machine learning methods, which consider much rule-based information, have also shown potentials for this issue. However, the complex inherent quantum interactions of molecules are still largely underexplored by existing solutions. In this paper, we propose a generalizable and transferable Multilevel Graph Convolutional neural Network (MGCN) for molecular property prediction. Specifically, we represent each molecule as a graph to preserve its internal structure. Moreover, the well-designed hierarchical graph neural network directly extracts features from the conformation and spatial information followed by the multilevel interactions. As a consequence, the multilevel overall representations can be utilized to make the prediction. Extensive experiments on both datasets of equilibrium and off-equilibrium molecules demonstrate the effectiveness of our model. Furthermore, the detailed results also prove that MGCN is generalizable and transferable for the prediction.
【Keywords】:
【Paper Link】 【Pages】:1061-1068
【Authors】: Chien-Yu Lu ; Min-Xin Xue ; Chia-Che Chang ; Che-Rung Lee ; Li Su
【Abstract】: Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domaininvariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timbreenhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output.
【Keywords】:
【Paper Link】 【Pages】:1069-1076
【Authors】: Tianle Ma ; Aidong Zhang
【Abstract】: While deep learning has achieved great success in computer vision and many other fields, currently it does not work very well on patient genomic data with the “big p, small N” problem (i.e., a relatively small number of samples with highdimensional features). In order to make deep learning work with a small amount of training data, we have to design new models that facilitate few-shot learning. Here we present the Affinity Network Model (AffinityNet), a data efficient deep learning model that can learn from a limited number of training examples and generalize well. The backbone of the AffinityNet model consists of stacked k-Nearest-Neighbor (kNN) attention pooling layers. The kNN attention pooling layer is a generalization of the Graph Attention Model (GAM), and can be applied to not only graphs but also any set of objects regardless of whether a graph is given or not. As a new deep learning module, kNN attention pooling layers can be plugged into any neural network model just like convolutional layers. As a simple special case of kNN attention pooling layer, feature attention layer can directly select important features that are useful for classification tasks. Experiments on both synthetic data and cancer genomic data from TCGA projects show that our AffinityNet model has better generalization power than conventional neural network models with little training data.
【Keywords】:
【Paper Link】 【Pages】:1077-1084
【Authors】: Duncan C. McElfresh ; Hoda Bidkhori ; John P. Dickerson
【Abstract】: In barter exchanges, participants directly trade their endowed goods in a constrained economic setting without money. Transactions in barter exchanges are often facilitated via a central clearinghouse that must match participants even in the face of uncertainty—over participants, existence and quality of potential trades, and so on. Leveraging robust combinatorial optimization techniques, we address uncertainty in kidney exchange, a real-world barter market where patients swap (in)compatible paired donors. We provide two scalable robust methods to handle two distinct types of uncertainty in kidney exchange—over the quality and the existence of a potential match. The latter case directly addresses a weakness in all stochastic-optimization-based methods to the kidney exchange clearing problem, which all necessarily require explicit estimates of the probability of a transaction existing—a still-unsolved problem in this nascent market. We also propose a novel, scalable kidney exchange formulation that eliminates the need for an exponential-time constraint generation process in competing formulations, maintains provable optimality, and serves as a subsolver for our robust approach. For each type of uncertainty we demonstrate the benefits of robustness on real data from a large, fielded kidney exchange in the United States. We conclude by drawing parallels between robustness and notions of fairness in the kidney exchange setting.
【Keywords】:
【Paper Link】 【Pages】:1085-1092
【Authors】: Dong Nie ; Li Wang ; Lei Xiang ; Sihang Zhou ; Ehsan Adeli ; Dinggang Shen
【Abstract】: Medical image segmentation is a key step for various applications, such as image-guided radiation therapy and diagnosis. Recently, deep neural networks provided promising solutions for automatic image segmentation; however, they often perform good on regular samples (i.e., easy-to-segment samples), since the datasets are dominated by easy and regular samples. For medical images, due to huge inter-subject variations or disease-specific effects on subjects, there exist several difficult-to-segment cases that are often overlooked by the previous works. To address this challenge, we propose a difficulty-aware deep segmentation network with confidence learning for end-to-end segmentation. The proposed framework has two main contributions: 1) Besides the segmentation network, we also propose a fully convolutional adversarial network for confidence learning to provide voxel-wise and region-wise confidence information for the segmentation network. We relax the adversarial learning to confidence learning by decreasing the priority of adversarial learning, so that we can avoid the training imbalance between generator and discriminator. 2) We propose a difficulty-aware attention mechanism to properly handle hard samples or hard regions considering structural information, which may go beyond the shortcomings of focal loss. We further propose a fusion module to selectively fuse the concatenated feature maps in encoder-decoder architectures. Experimental results on clinical and challenge datasets show that our proposed network can achieve state-of-the-art segmentation accuracy. Further analysis also indicates that each individual component of our proposed network contributes to the overall performance improvement.
【Keywords】:
【Paper Link】 【Pages】:1093-1101
【Authors】: Yuhao Niu ; Lin Gu ; Feng Lu ; Feifan Lv ; Zongji Wang ; Imari Sato ; Zijian Zhang ; Yangyan Xiao ; Xunzhang Dai ; Tingting Cheng
【Abstract】: Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch’s Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a pathological descriptor that can be extracted from the activated neurons of a diabetic retinopathy detector. To visualize the symptom and feature encoded in this descriptor, we propose a GAN based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Besides, with this descriptor, we can arbitrarily manipulate the position and quantity of lesions. As verified by a panel of 5 licensed ophthalmologists, our synthesized images carry the symptoms that are directly related to diabetic retinopathy diagnosis. The panel survey also shows that our generated images is both qualitatively and quantitatively superior to existing methods.
【Keywords】:
【Paper Link】 【Pages】:1102-1109
【Authors】: Galia Nordon ; Gideon Koren ; Varda Shalev ; Benny Kimelfeld ; Uri Shalit ; Kira Radinsky
【Abstract】: Large repositories of medical data, such as Electronic Medical Record (EMR) data, are recognized as promising sources for knowledge discovery. Effective analysis of such repositories often necessitate a thorough understanding of dependencies in the data. For example, if the patient age is ignored, then one might wrongly conclude a causal relationship between cataract and hypertension. Such confounding variables are often identified by causal graphs, where variables are connected by causal relationships. Current approaches to automatically building such graphs are based on text analysis over medical literature; yet, the result is typically a large graph of low precision. There are statistical methods for constructing causal graphs from observational data, but they are less suitable for dealing with a large number of covariates, which is the case in EMR data. Consequently, confounding variables are often identified by medical domain experts via a manual, expensive, and time-consuming process. We present a novel approach for automatically constructing causal graphs between medical conditions. The first part is a novel graph-based method to better capture causal relationships implied by medical literature, especially in the presence of multiple causal factors. Yet even after using these advanced text-analysis methods, the text data still contains many weak or uncertain causal connections. Therefore, we construct a second graph for these terms based on an EMR repository of over 1.5M patients. We combine the two graphs, leaving only edges that have both medical-text-based and observational evidence. We examine several strategies to carry out our approach, and compare the precision of the resulting graphs using medical experts. Our results show a significant improvement in the precision of any of our methods compared to the state of the art.
【Keywords】:
【Paper Link】 【Pages】:1110-1117
【Authors】: Bidisha Samanta ; Abir De ; Gourhari Jana ; Pratim Kumar Chattaraj ; Niloy Ganguly ; Manuel Gomez Rodriguez
【Abstract】: Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In this paper, we propose NeVAE, a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. In addition, by using masking, the decoder is able to guarantee a set of valid properties in the generated molecules. Experiments reveal that our model can discover plausible, diverse and novel molecules more effectively than several state of the art methods. Moreover, by utilizing Bayesian optimization over the continuous latent representation of molecules our model finds, we can also find molecules that maximize certain desirable properties more effectively than alternatives.
【Keywords】:
【Paper Link】 【Pages】:1118-1125
【Authors】: Patrick Schwab ; Walter Karlen
【Abstract】: Parkinson’s disease is a neurodegenerative disease that can affect a person’s movement, speech, dexterity, and cognition. Clinicians primarily diagnose Parkinson’s disease by performing a clinical assessment of symptoms. However, misdiagnoses are common. One factor that contributes to misdiagnoses is that the symptoms of Parkinson’s disease may not be prominent at the time the clinical assessment is performed. Here, we present a machine-learning approach towards distinguishing between people with and without Parkinson’s disease using long-term data from smartphone-based walking, voice, tapping and memory tests. We demonstrate that our attentive deep-learning models achieve significant improvements in predictive performance over strong baselines (area under the receiver operating characteristic curve = 0.85) in data from a cohort of 1853 participants. We also show that our models identify meaningful features in the input data. Our results confirm that smartphone data collected over extended periods of time could in the future potentially be used as a digital biomarker for the diagnosis of Parkinson’s disease.
【Keywords】:
【Paper Link】 【Pages】:1126-1133
【Authors】: Junyuan Shang ; Cao Xiao ; Tengfei Ma ; Hongyan Li ; Jimeng Sun
【Abstract】: Recent progress in deep learning is revolutionizing the healthcare domain including providing solutions to medication recommendations, especially recommending medication combination for patients with complex health conditions. Existing approaches either do not customize based on patient health history, or ignore existing knowledge on drug-drug interactions (DDI) that might lead to adverse outcomes. To fill this gap, we propose the Graph Augmented Memory Networks (GAMENet), which integrates the drug-drug interactions knowledge graph by a memory module implemented as a graph convolutional networks, and models longitudinal patient records as the query. It is trained end-to-end to provide safe and personalized recommendation of medication combination. We demonstrate the effectiveness and safety of GAMENet by comparing with several state-of-the-art methods on real EHR data. GAMENet outperformed all baselines in all effectiveness measures, and also achieved 3.60% DDI rate reduction from existing EHR data.
【Keywords】:
【Paper Link】 【Pages】:1134-1141
【Authors】: Weiwei Shen ; Bin Wang ; Jian Pu ; Jun Wang
【Abstract】: As a competitive alternative to the Markowitz mean-variance portfolio, the Kelly growth optimal portfolio has drawn sufficient attention in investment science. While the growth optimal portfolio is theoretically guaranteed to dominate any other portfolio with probability 1 in the long run, it practically tends to be highly risky in the short term. Moreover, empirical analysis and performance enhancement studies under practical settings are surprisingly short. In particular, how to handle the challenging but realistic condition with insufficient training data has barely been investigated. In order to fill voids, especially grappling with the difficulty from small samples, in this paper, we propose a growth optimal portfolio strategy equipped with ensemble learning. We synergically leverage the bootstrap aggregating algorithm and the random subspace method into portfolio construction to mitigate estimation error. We analyze the behavior and hyperparameter selection of the proposed strategy by simulation, and then corroborate its effectiveness by comparing its out-of-sample performance with those of 10 competing strategies on four datasets. Experimental results lucidly confirm that the new strategy has superiority in extensive evaluation criteria.
【Keywords】:
【Paper Link】 【Pages】:1142-1149
【Authors】: Masamichi Shimosaka ; Yuta Hayakawa ; Kota Tsubouchi
【Abstract】: With the wide use of smartphones with Global Positioning System (GPS) sensors, the analysis of the population from GPS traces has been actively explored in the last decade. We propose herein a brand new population prediction model to capture the population trends in a fine-grained point of interest (POI) densely distributed over large areas and understand the relationship of each POI in terms of spatiality preservation. We propose a new framework, called Spatiality Preservable Factorized Regression (SPFR), to realize this model. The SPFR is inspired by the success of the recently proposed bilinear Poisson regression and the concept of multi-task learning with factorization approach and the graph proximity regularization. Given that the proposed model is written simply in terms of optimization, we achieve scalability using our model. The results of our empirical evaluation, which used a massive dataset of GPS logs in the Tokyo region over 32 M count logs, show that our model is comparable to the stateof-the-art methods in terms of capturing the population trend across meshes while retaining spatial preservation in finer mesh areas.
【Keywords】:
【Paper Link】 【Pages】:1150-1157
【Authors】: Changho Shin ; Sunghwan Joo ; Jaeryun Yim ; Hyoseop Lee ; Taesup Moon ; Wonjong Rhee
【Abstract】: Non-intrusive load monitoring (NILM), also known as energy disaggregation, is a blind source separation problem where a household’s aggregate electricity consumption is broken down into electricity usages of individual appliances. In this way, the cost and trouble of installing many measurement devices over numerous household appliances can be avoided, and only one device needs to be installed. The problem has been well-known since Hart’s seminal paper in 1992, and recently significant performance improvements have been achieved by adopting deep networks. In this work, we focus on the idea that appliances have on/off states, and develop a deep network for further performance improvements. Specifically, we propose a subtask gated network that combines the main regression network with an on/off classification subtask network. Unlike typical multitask learning algorithms where multiple tasks simply share the network parameters to take advantage of the relevance among tasks, the subtask gated network multiply the main network’s regression output with the subtask’s classification probability. When standby-power is additionally learned, the proposed solution surpasses the state-of-the-art performance for most of the benchmark cases. The subtask gated network can be very effective for any problem that inherently has on/off states.
【Keywords】:
【Paper Link】 【Pages】:1158-1165
【Authors】: Christopher Solinas ; Douglas Rebstock ; Michael Buro
【Abstract】: In trick-taking card games, a two-step process of state sampling and evaluation is widely used to approximate move values. While the evaluation component is vital, the accuracy of move value estimates is also fundamentally linked to how well the sampling distribution corresponds the true distribution. Despite this, recent work in trick-taking card game AI has mainly focused on improving evaluation algorithms with limited work on improving sampling. In this paper, we focus on the effect of sampling on the strength of a player and propose a novel method of sampling more realistic states given move history. In particular, we use predictions about locations of individual cards made by a deep neural network — trained on data from human gameplay — in order to sample likely worlds for evaluation. This technique, used in conjunction with Perfect Information Monte Carlo (PIMC) search, provides a substantial increase in cardplay strength in the popular trick-taking card game of Skat.
【Keywords】:
【Paper Link】 【Pages】:1166-1173
【Authors】: Mingfei Teng ; Hengshu Zhu ; Chuanren Liu ; Chen Zhu ; Hui Xiong
【Abstract】: Talent turnover often costs a large amount of business time, money and performance. Therefore, employee turnover prediction is critical for proactive talent management. Existing approaches on turnover prediction are mainly based on profiling of employees and their working environments, while the important contagious effect of employee turnovers has been largely ignored. To this end, in this paper, we propose a contagious effect heterogeneous neural network (CEHNN) for turnover prediction by integrating the employee profiles, the environmental factors, and more importantly, the influence of turnover behaviors of co-workers. Moreover, a global attention mechanism is designed to evaluate the heterogeneous impact on potential turnover behaviors. This attention mechanism can improve the interpretability of turnover prediction and provide actionable insights for talent retention. Finally, we conduct extensive experiments and case studies on a realworld dataset from a large company to validate the effectiveness of the contagious effect for turnover prediction.
【Keywords】:
【Paper Link】 【Pages】:1174-1181
【Authors】: Bryan Wang ; Yi-Hsuan Yang
【Abstract】: Music creation is typically composed of two parts: composing the musical score, and then performing the score with instruments to make sounds. While recent work has made much progress in automatic music generation in the symbolic domain, few attempts have been made to build an AI model that can render realistic music audio from musical scores. Directly synthesizing audio with sound sample libraries often leads to mechanical and deadpan results, since musical scores do not contain performance-level information, such as subtle changes in timing and dynamics. Moreover, while the task may sound like a text-to-speech synthesis problem, there are fundamental differences since music audio has rich polyphonic sounds. To build such an AI performer, we propose in this paper a deep convolutional model that learns in an end-to-end manner the score-to-audio mapping between a symbolic representation of music called the pianorolls and an audio representation of music called the spectrograms. The model consists of two subnets: the ContourNet, which uses a U-Net structure to learn the correspondence between pianorolls and spectrograms and to give an initial result; and the TextureNet, which further uses a multi-band residual network to refine the result by adding the spectral texture of overtones and timbre. We train the model to generate music clips of the violin, cello, and flute, with a dataset of moderate size. We also present the result of a user study that shows our model achieves higher mean opinion score (MOS) in naturalness and emotional expressivity than a WaveNet-based model and two off-the-shelf synthesizers. We open our source code at https://github.com/bwang514/PerformanceNet
【Keywords】:
【Paper Link】 【Pages】:1182-1189
【Authors】: Di Wang ; Jinhui Xu
【Abstract】: In this paper, we study the Differentially Private Empirical Risk Minimization (DP-ERM) problem with non-convex loss functions and give several upper bounds for the utility in different settings. We first consider the problem in low-dimensional space. For DP-ERM with non-smooth regularizer, we generalize an existing work by measuring the utility using ℓ2 norm of the projected gradient. Also, we extend the error bound measurement, for the first time, from empirical risk to population risk by using the expected ℓ2 norm of the gradient. We then investigate the problem in high dimensional space, and show that by measuring the utility with Frank-Wolfe gap, it is possible to bound the utility by the Gaussian Width of the constraint set, instead of the dimensionality p of the underlying space. We further demonstrate that the advantages of this result can be achieved by the measure of ℓ2 norm of the projected gradient. A somewhat surprising discovery is that although the two kinds of measurements are quite different, their induced utility upper bounds are asymptotically the same under some assumptions. We also show that the utility of some special non-convex loss functions can be reduced to a level (i.e., depending only on log p) similar to that of convex loss functions. Finally, we test our proposed algorithms on both synthetic and real world datasets and the experimental results confirm our theoretical analysis.
【Keywords】:
【Paper Link】 【Pages】:1190-1197
【Authors】: Ji Wang ; Weidong Bao ; Lichao Sun ; Xiaomin Zhu ; Bokai Cao ; Philip S. Yu
【Abstract】: The soaring demand for intelligent mobile applications calls for deploying powerful deep neural networks (DNNs) on mobile devices. However, the outstanding performance of DNNs notoriously relies on increasingly complex models, which in turn is associated with an increase in computational expense far surpassing mobile devices’ capacity. What is worse, app service providers need to collect and utilize a large volume of users’ data, which contain sensitive information, to build the sophisticated DNN models. Directly deploying these models on public mobile devices presents prohibitive privacy risk. To benefit from the on-device deep learning without the capacity and privacy concerns, we design a private model compression framework RONA. Following the knowledge distillation paradigm, we jointly use hint learning, distillation learning, and self learning to train a compact and fast neural network. The knowledge distilled from the cumbersome model is adaptively bounded and carefully perturbed to enforce differential privacy. We further propose an elegant query sample selection method to reduce the number of queries and control the privacy loss. A series of empirical evaluations as well as the implementation on an Android mobile device show that RONA can not only compress cumbersome models efficiently but also provide a strong privacy guarantee. For example, on SVHN, when a meaningful (9.83,10−6)-differential privacy is guaranteed, the compact model trained by RONA can obtain 20× compression ratio and 19× speed-up with merely 0.97% accuracy loss.
【Keywords】:
【Paper Link】 【Pages】:1198-1205
【Authors】: Mingliang Wang ; Jiashuang Huang ; Mingxia Liu ; Daoqiang Zhang
【Abstract】: Brain network analysis can help reveal the pathological basis of neurological disorders and facilitate automated diagnosis of brain diseases, by exploring connectivity patterns in the human brain. Effectively representing the brain network has always been the fundamental task of computeraided brain network analysis. Previous studies typically utilize human-engineered features to represent brain connectivity networks, but these features may not be well coordinated with subsequent classifiers. Besides, brain networks are often equipped with multiple hubs (i.e., nodes occupying a central position in the overall organization of a network), providing essential clues to describe connectivity patterns. However, existing studies often fail to explore such hubs from brain connectivity networks. To address these two issues, we propose a Connectivity Network analysis method with discriminative Hub Detection (CNHD) for brain disease diagnosis using functional magnetic resonance imaging (fMRI) data. Specifically, we incorporate both feature extraction of brain networks and network-based classification into a unified model, while discriminative hubs can be automatically identified from data via ℓ1-norm and ℓ2,1-norm regularizers. The proposed CNHD method is evaluated on three real-world schizophrenia datasets with fMRI scans. Experimental results demonstrate that our method not only outperforms several state-of-the-art approaches in disease diagnosis, but also is effective in automatically identifying disease-related network hubs in the human brain.
【Keywords】:
【Paper Link】 【Pages】:1206-1213
【Authors】: Bin Wu
【Abstract】: The next challenge of game AI lies in Real Time Strategy (RTS) games. RTS games provide partially observable gaming environments, where agents interact with one another in an action space much larger than that of GO. Mastering RTS games requires both strong macro strategies and delicate micro level execution. Recently, great progress has been made in micro level execution, while complete solutions for macro strategies are still lacking. In this paper, we propose a novel learning-based Hierarchical Macro Strategy model for mastering MOBA games, a sub-genre of RTS games. Trained by the Hierarchical Macro Strategy model, agents explicitly make macro strategy decisions and further guide their micro level execution. Moreover, each of the agents makes independent strategy decisions, while simultaneously communicating with the allies through leveraging a novel imitated crossagent communication mechanism. We perform comprehensive evaluations on a popular 5v5 Multiplayer Online Battle Arena (MOBA) game. Our 5-AI team achieves a 48% winning rate against human player teams which are ranked top 1% in the player ranking system.
【Keywords】:
【Paper Link】 【Pages】:1214-1221
【Authors】: Bingzhe Wu ; Xiaolu Zhang ; Shiwan Zhao ; Lingxi Xie ; Caihong Zeng ; Zhihong Liu ; Guangyu Sun
【Abstract】: Pathological glomerulus classification plays a key role in the diagnosis of nephropathy. As the difference between different subcategories is subtle, doctors often refer to slides from different staining methods to make decisions. However, creating correspondence across various stains is labor-intensive, bringing major difficulties in collecting data and training a vision-based algorithm to assist nephropathy diagnosis.This paper provides an alternative solution for integrating multi-stained visual cues for glomerulus classification. Our approach, named generator-to-classifier (G2C), is a twostage framework. Given an input image from a specified stain, several generators are first applied to estimate its appearances in other staining methods, and a classifier follows to combine visual cues from different stains for prediction (whether it is pathological, or which type of pathology it has). We optimize these two stages in a joint manner. To provide a reasonable initialization, we pre-train the generators in an unlabeled reference set under an unpaired image-to-image translation task, and then fine-tune them together with the classifier.We conduct experiments on a glomerulus type classification dataset collected by ourselves (there are no publicly available datasets for this purpose). Although joint optimization slightly harms the authenticity of the generated patches, it boosts classification performance, suggesting more effective visual cues are extracted in an automatic way. We also transfer our model to a public dataset for breast cancer classification, and outperform the state-of-the-arts significantly.
【Keywords】:
【Paper Link】 【Pages】:1222-1229
【Authors】: I-Chen Wu ; Ti-Rong Wu ; An-Jen Liu ; Hung Guei ; Tinghan Wei
【Abstract】: This paper proposes an approach to strength adjustment for MCTS-based game-playing programs. In this approach, we use a softmax policy with a strength index z to choose moves. Most importantly, we filter low quality moves by excluding those that have a lower simulation count than a pre-defined threshold ratio of the maximum simulation count. We perform a theoretical analysis, reaching the result that the adjusted policy is guaranteed to choose moves exceeding a lower bound in strength by using a threshold ratio. The approach is applied to the Go program ELF OpenGo. The experiment results show that z is highly correlated to the empirical strength; namely, given a threshold ratio 0.1, z is linearly related to the Elo rating with regression error 47.95 Elo where −2≤z ≤2. Meanwhile, the covered strength range is about 800 Elo ratings in the interval of z in [−2,2]. With the ease of strength adjustment using z, we present two methods to adjust strength and predict opponents’ strengths dynamically. To our knowledge, this result is state-of-the-art in terms of the range of strengths in Elo rating while maintaining a controllable relationship between the strength and a strength index.
【Keywords】:
【Paper Link】 【Pages】:1230-1237
【Authors】: Kui Xu ; Zhe Wang ; Jianping Shi ; Hongsheng Li ; Qiangfeng Cliff Zhang
【Abstract】: Constructing of molecular structural models from CryoElectron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies. Methods have evolved from manual construction by structural biologists to perform 6D translation-rotation searching, which is extremely compute-intensive. In this paper, we propose a learning-based method and formulate this problem as a vision-inspired 3D detection and pose estimation task. We develop a deep learning framework for amino acid determination in a 3D Cryo-EM density volume. We also design a sequence-guided Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form the molecular structure. This framework achieves 91% coverage on our newly proposed dataset and takes only a few minutes for a typical structure with a thousand amino acids. Our method is hundreds of times faster and several times more accurate than existing automated solutions without any human intervention.
【Keywords】:
【Paper Link】 【Pages】:1238-1245
【Authors】: Shuai Yang ; Jiaying Liu ; Wenjing Wang ; Zongming Guo
【Abstract】: Text effects transfer technology automatically makes the text dramatically more impressive. However, previous style transfer methods either study the model for general style, which cannot handle the highly-structured text effects along the glyph, or require manual design of subtle matching criteria for text effects. In this paper, we focus on the use of the powerful representation abilities of deep neural features for text effects transfer. For this purpose, we propose a novel Texture Effects Transfer GAN (TET-GAN), which consists of a stylization subnetwork and a destylization subnetwork. The key idea is to train our network to accomplish both the objective of style transfer and style removal, so that it can learn to disentangle and recombine the content and style features of text effects images. To support the training of our network, we propose a new text effects dataset with as much as 64 professionally designed styles on 837 characters. We show that the disentangled feature representations enable us to transfer or remove all these styles on arbitrary glyphs using one network. Furthermore, the flexible network design empowers TET-GAN to efficiently extend to a new text style via oneshot learning where only one example is required. We demonstrate the superiority of the proposed method in generating high-quality stylized text over the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:1246-1253
【Authors】: Kejing Yin ; Dong Qian ; William K. Cheung ; Benjamin C. M. Fung ; Jonathan Poon
【Abstract】: Non-negative Tensor Factorization (NTF) has been shown effective to discover clinically relevant and interpretable phenotypes from Electronic Health Records (EHR). Existing NTF based computational phenotyping models aggregate data over the observation window, resulting in the learned phenotypes being mixtures of disease states appearing at different times. We argue that by separating the clinical events happening at different times in the input tensor, the temporal dynamics and the disease progression within the observation window could be modeled and the learned phenotypes will correspond to more specific disease states. Yet how to construct the tensor for data samples with different temporal lengths and properly capture the temporal relationship specific to each individual data sample remains an open challenge. In this paper, we propose a novel Collective Non-negative Tensor Factorization (CNTF) model where each patient is represented by a temporal tensor, and all of the temporal tensors are factorized collectively with the phenotype definitions being shared across all patients. The proposed CNTF model is also flexible to incorporate non-temporal data modality and RNN-based temporal regularization. We validate the proposed model using MIMIC-III dataset, and the empirical results show that the learned phenotypes are clinically interpretable. Moreover, the proposed CNTF model outperforms the state-of-the-art computational phenotyping models for the mortality prediction task.
【Keywords】:
【Paper Link】 【Pages】:1254-1261
【Authors】: Chi Zhang ; Yixin Zhu ; Song-Chun Zhu
【Abstract】: An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et al. introduced the neural method. One of the remaining challenges is to balance a trade-off among three critical aspects—speed, flexibility, and quality: (i) the vanilla optimization-based algorithm produces impressive results for arbitrary styles, but is unsatisfyingly slow due to its iterative nature, (ii) the fast approximation methods based on feed-forward neural networks generate satisfactory artistic effects but bound to only a limited number of styles, and (iii) feature-matching methods like AdaIN achieve arbitrary style transfer in a real-time manner but at a cost of the compromised quality. We find it considerably difficult to balance the trade-off well merely using a single feed-forward step and ask, instead, whether there exists an algorithm that could adapt quickly to any style, while the adapted model maintains high efficiency and good image quality. Motivated by this idea, we propose a novel method, coined MetaStyle, which formulates the neural style transfer as a bilevel optimization problem and combines learning with only a few post-processing update steps to adapt to a fast approximation model with satisfying artistic effects, comparable to the optimization-based methods for an arbitrary style. The qualitative and quantitative analysis in the experiments demonstrates that the proposed approach achieves high-quality arbitrary artistic style transfer effectively, with a good trade-off among speed, flexibility, and quality.
【Keywords】:
【Paper Link】 【Pages】:1262-1269
【Authors】: Youzhi Zhang ; Qingyu Guo ; Bo An ; Long Tran-Thanh ; Nicholas R. Jennings
【Abstract】: Most violent crimes happen in urban and suburban cities. With emerging tracking techniques, law enforcement officers can have real-time location information of the escaping criminals and dynamically adjust the security resource allocation to interdict them. Unfortunately, existing work on urban network security games largely ignores such information. This paper addresses this omission. First, we show that ignoring the real-time information can cause an arbitrarily large loss of efficiency. To mitigate this loss, we propose a novel NEtwork purSuiT game (NEST) model that captures the interaction between an escaping adversary and a defender with multiple resources and real-time information available. Second, solving NEST is proven to be NP-hard. Third, after transforming the non-convex program of solving NEST to a linear program, we propose our incremental strategy generation algorithm, including: (i) novel pruning techniques in our best response oracle; and (ii) novel techniques for mapping strategies between subgames and adding multiple best response strategies at one iteration to solve extremely large problems. Finally, extensive experiments show the effectiveness of our approach, which scales up to realistic problem sizes with hundreds of nodes on networks including the real network of Manhattan.
【Keywords】:
【Paper Link】 【Pages】:1270-1277
【Authors】: Ji Zhao ; Dan Peng ; Chuhan Wu ; Huan Chen ; Meiyu Yu ; Wanji Zheng ; Li Ma ; Hua Chai ; Jieping Ye ; Xiaohu Qie
【Abstract】: Point-of-interest (POI) retrieval that searches for relevant destination locations plays a significant role in on-demand ridehailing services. Existing solutions to POI retrieval mainly retrieve and rank POIs based on their semantic similarity scores. Although intuitive, quantifying the relevance of a Query-POI pair by single-field semantic similarity is subject to inherent limitations. In this paper, we propose a novel Query-POI relevance model for effective POI retrieval for ondemand ride-hailing services. Different from existing relevance models, we capture and represent multi-field and local&global semantic features of a Query-POI pair to measure the semantic similarity. Besides, we observe a hidden correlation between origin-destination locations in ride-hailing scenarios, and propose two location embeddings to characterize the specific correlation. By incorporating the geographic correlation with the semantic similarity, our model achieves better performance in POI ranking. Experimental results on two real-world click-through datasets demonstrate the improvements of our model over state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:1278-1285
【Authors】: Panpan Zheng ; Shuhan Yuan ; Xintao Wu
【Abstract】: Many online platforms have deployed anti-fraud systems to detect and prevent fraudulent activities. However, there is usually a gap between the time that a user commits a fraudulent action and the time that the user is suspended by the platform. How to detect fraudsters in time is a challenging problem. Most of the existing approaches adopt classifiers to predict fraudsters given their activity sequences along time. The main drawback of classification models is that the prediction results between consecutive timestamps are often inconsistent. In this paper, we propose a survival analysis based fraud early detection model, SAFE, which maps dynamic user activities to survival probabilities that are guaranteed to be monotonically decreasing along time. SAFE adopts recurrent neural network (RNN) to handle user activity sequences and directly outputs hazard values at each timestamp, and then, survival probability derived from hazard values is deployed to achieve consistent predictions. Because we only observe the user suspended time instead of the fraudulent activity time in the training data, we revise the loss function of the regular survival model to achieve fraud early detection. Experimental results on two real world datasets demonstrate that SAFE outperforms both the survival analysis model and recurrent neural network model alone as well as state-of-theart fraud early detection approaches.
【Keywords】:
【Paper Link】 【Pages】:1286-1293
【Authors】: Panpan Zheng ; Shuhan Yuan ; Xintao Wu ; Jun Li ; Aidong Lu
【Abstract】: Many online applications, such as online social networks or knowledge bases, are often attacked by malicious users who commit different types of actions such as vandalism on Wikipedia or fraudulent reviews on eBay. Currently, most of the fraud detection approaches require a training dataset that contains records of both benign and malicious users. However, in practice, there are often no or very few records of malicious users. In this paper, we develop one-class adversarial nets (OCAN) for fraud detection with only benign users as training data. OCAN first uses LSTM-Autoencoder to learn the representations of benign users from their sequences of online activities. It then detects malicious users by training a discriminator of a complementary GAN model that is different from the regular GAN model. Experimental results show that our OCAN outperforms the state-of-the-art oneclass classification models and achieves comparable performance with the latest multi-source LSTM model that requires both benign and malicious users in the training phase.
【Keywords】:
【Paper Link】 【Pages】:1294-1301
【Authors】: Zefang Zong ; Jie Feng ; Kechun Liu ; Hongzhi Shi ; Yong Li
【Abstract】: Dynamic high resolution data on human population distribution is of great importance for a wide spectrum of activities and real-life applications, but is too difficult and expensive to obtain directly. Therefore, generating fine-scaled population distributions from coarse population data is of great significance. However, there are three major challenges: 1) the complexity in spatial relations between high and low resolution population; 2) the dependence of population distributions on other external information; 3) the difficulty in retrieving temporal distribution patterns. In this paper, we first propose the idea to generate dynamic population distributions in full-time series, then we design dynamic population mapping via deep neural network(DeepDPM), a model that describes both spatial and temporal patterns using coarse data and point of interest information. In DeepDPM, we utilize super-resolution convolutional neural network(SRCNN) based model to directly map coarse data into higher resolution data, and a timeembedded long short-term memory model to effectively capture the periodicity nature to smooth the finer-scaled results from the previous static SRCNN model. We perform extensive experiments on a real-life mobile dataset collected from Shanghai. Our results demonstrate that DeepDPM outperforms previous state-of-the-art methods and a suite of frequent data-mining approaches. Moreover, DeepDPM breaks through the limitation from previous works in time dimension so that dynamic predictions in all-day time slots can be obtained.
【Keywords】:
【Paper Link】 【Pages】:1303-1310
【Authors】: Xiaolin Wu ; Xi Zhang ; Xiao Shu
【Abstract】: Subitizing, or the sense of small natural numbers, is an innate cognitive function of humans and primates; it responds to visual stimuli prior to the development of any symbolic skills, language or arithmetic. Given successes of deep learning (DL) in tasks of visual intelligence and given the primitivity of number sense, a tantalizing question is whether DL can comprehend numbers and perform subitizing. But somewhat disappointingly, extensive experiments of the type of cognitive psychology demonstrate that the examples-driven black box DL cannot see through superficial variations in visual representations and distill the abstract notion of natural number, a task that children perform with high accuracy and confidence. The failure is apparently due to the learning method not the CNN computational machinery itself. A recurrent neural network capable of subitizing does exist, which we construct by encoding a mechanism of mathematical morphology into the CNN convolutional kernels. Also, we investigate, using subitizing as a test bed, the ways to aid the black box DL by cognitive priors derived from human insight. Our findings are mixed and interesting, pointing to both cognitive deficit of pure DL, and some measured successes of boosting DL by predetermined cognitive implements. This case study of DL in cognitive computing is meaningful for visual numerosity represents a minimum level of human intelligence.
【Keywords】:
【Paper Link】 【Pages】:1311-1318
【Authors】: Yujie Wu ; Lei Deng ; Guoqi Li ; Jun Zhu ; Yuan Xie ; Luping Shi
【Abstract】: Spiking neural networks (SNNs) that enables energy efficient implementation on emerging neuromorphic hardware are gaining more attention. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs), due to the lack of effective learning algorithms and efficient programming frameworks. We address this issue from two aspects: (1) We propose a neuron normalization technique to adjust the neural selectivity and develop a direct learning algorithm for deep SNNs. (2) Via narrowing the rate coding window and converting the leaky integrate-and-fire (LIF) model into an explicitly iterative version, we present a Pytorch-based implementation method towards the training of large-scale SNNs. In this way, we are able to train deep SNNs with tens of times speedup. As a result, we achieve significantly better accuracy than the reported works on neuromorphic datasets (N-MNIST and DVSCIFAR10), and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10). To our best knowledge, this is the first work that demonstrates direct training of deep SNNs with high performance on CIFAR10, and the efficient implementation provides a new way to explore the potential of SNNs.
【Keywords】:
【Paper Link】 【Pages】:1319-1326
【Authors】: Lei Zhang ; Shengyuan Zhou ; Tian Zhi ; Zidong Du ; Yunji Chen
【Abstract】: Continuous-valued deep convolutional networks (DNNs) can be converted into accurate rate-coding based spike neural networks (SNNs). However, the substantial computational and energy costs, which is caused by multiple spikes, limit their use in mobile and embedded applications. And recent works have shown that the newly emerged temporal-coding based SNNs converted from DNNs can reduce the computational load effectively. In this paper, we propose a novel method to convert DNNs to temporal-coding SNNs, called TDSNN. Combined with the characteristic of the leaky integrate-andfire (LIF) neural model, we put forward a new coding principle Reverse Coding and design a novel Ticking Neuron mechanism. According to our evaluation, our proposed method achieves 42% total operations reduction on average in large networks comparing with DNNs with no more than 0.5% accuracy loss. The evaluation shows that TDSNN may prove to be one of the key enablers to make the adoption of SNNs widespread.
【Keywords】:
【Paper Link】 【Pages】:1327-1334
【Authors】: Malu Zhang ; Jibin Wu ; Yansong Chua ; Xiaoling Luo ; Zihan Pan ; Dan Liu ; Haizhou Li
【Abstract】: One of the long-standing questions in biology and machine learning is how neural networks may learn important features from the input activities with a delayed feedback, commonly known as the temporal credit-assignment problem. The aggregate-label learning is proposed to resolve this problem by matching the spike count of a neuron with the magnitude of a feedback signal. However, the existing threshold-driven aggregate-label learning algorithms are computationally intensive, resulting in relatively low learning efficiency hence limiting their usability in practical applications. In order to address these limitations, we propose a novel membrane-potential driven aggregate-label learning algorithm, namely MPD-AL. With this algorithm, the easiest modifiable time instant is identified from membrane potential traces of the neuron, and guild the synaptic adaptation based on the presynaptic neurons’ contribution at this time instant. The experimental results demonstrate that the proposed algorithm enables the neurons to generate the desired number of spikes, and to detect useful clues embedded within unrelated spiking activities and background noise with a better learning efficiency over the state-of-the-art TDP1 and Multi-Spike Tempotron algorithms. Furthermore, we propose a data-driven dynamic decoding scheme for practical classification tasks, of which the aggregate labels are hard to define. This scheme effectively improves the classification accuracy of the aggregate-label learning algorithms as demonstrated on a speech recognition task.
【Keywords】:
【Paper Link】 【Pages】:1336-1343
【Authors】: Kezhen Chen ; Irina Rabkina ; Matthew D. McLure ; Kenneth D. Forbus
【Abstract】: Deep learning systems can perform well on some image recognition tasks. However, they have serious limitations, including requiring far more training data than humans do and being fooled by adversarial examples. By contrast, analogical learning over relational representations tends to be far more data-efficient, requiring only human-like amounts of training data. This paper introduces an approach that combines automatically constructed qualitative visual representations with analogical learning to tackle a hard computer vision problem, object recognition from sketches. Results from the MNIST dataset and a novel dataset, the Coloring Book Objects dataset, are provided. Comparison to existing approaches indicates that analogical generalization can be used to identify sketched objects from these datasets with several orders of magnitude fewer examples than deep learning systems require.
【Keywords】:
【Paper Link】 【Pages】:1344-1351
【Authors】: Qiuyuan Huang ; Li Deng ; Dapeng Oliver Wu ; Chang Liu ; Xiaodong He
【Abstract】: This paper proposes a novel neural architecture — Attentive Tensor Product Learning (ATPL) — to represent grammatical structures of natural language in deep learning models. ATPL exploits Tensor Product Representations (TPR), a structured neural-symbolic model developed in cognitive science, to integrate deep learning with explicit natural language structures and rules. The key ideas of ATPL are: 1) unsupervised learning of role-unbinding vectors of words via the TPR-based deep neural network; 2) the use of attention modules to compute TPR; and 3) the integration of TPR with typical deep learning architectures including long short-term memory and feedforward neural networks. The novelty of our approach lies in its ability to extract the grammatical structure of a sentence by using role-unbinding vectors, which are obtained in an unsupervised manner. Our ATPL approach is applied to 1) image captioning, 2) part of speech (POS) tagging, and 3) constituency parsing of a natural language sentence. The experimental results demonstrate the effectiveness of the proposed approach in all these three natural language processing tasks.
【Keywords】:
【Paper Link】 【Pages】:1352-1359
【Authors】: Matthew Riemer ; Tim Klinger ; Djallel Bouneffouf ; Michele Franceschini
【Abstract】: Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts primarily on the first two capabilities while questions of efficiency remain largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are O(n) bits and memories are limited to O(k) bits for k
【Keywords】:
【Paper Link】 【Pages】:1360-1367
【Authors】: Abhishek Sharma ; Keith M. Goolsbey
【Abstract】: Cognitive systems must reason with large bodies of general knowledge to perform complex tasks in the real world. However, due to the intractability of reasoning in large, expressive knowledge bases (KBs), many AI systems have limited reasoning capabilities. Successful cognitive systems have used a variety of machine learning and axiom selection methods to improve inference. In this paper, we describe a search heuristic that uses a Monte-Carlo simulation technique to choose inference steps. We test the efficacy of this approach on a very large and expressive KB, Cyc. Experimental results on hundreds of queries show that this method is highly effective in reducing inference time and improving question-answering (Q/A) performance.
【Keywords】:
【Paper Link】 【Pages】:1368-1375
【Authors】: Di Wang ; Ah-Hwee Tan ; Chunyan Miao ; Ahmed A. Moustafa
【Abstract】: Neurocomputational modelling of long-term memory is a core topic in computational cognitive neuroscience, which is essential towards self-regulating brain-like AI systems. In this paper, we study how people generally lose their memories and emulate various memory loss phenomena using a neurocomputational autobiographical memory model. Specifically, based on prior neurocognitive and neuropsychology studies, we identify three neural processes, namely overload, decay and inhibition, which lead to memory loss in memory formation, storage and retrieval, respectively. For model validation, we collect a memory dataset comprising more than one thousand life events and emulate the three key memory loss processes with model parameters learnt from memory recall behavioural patterns found in human subjects of different age groups. The emulation results show high correlation with human memory recall performance across their life span, even with another population not being used for learning. To the best of our knowledge, this paper is the first research work on quantitative evaluations of autobiographical memory loss using a neurocomputational model.
【Keywords】:
【Paper Link】 【Pages】:1377-1384
【Authors】: Jiaxu Cui ; Bo Yang ; Xia Hu
【Abstract】: Attributed graphs, which contain rich contextual features beyond just network structure, are ubiquitous and have been observed to benefit various network analytics applications. Graph structure optimization, aiming to find the optimal graphs in terms of some specific measures, has become an effective computational tool in complex network analysis. However, traditional model-free methods suffer from the expensive computational cost of evaluating graphs; existing vectorial Bayesian optimization methods cannot be directly applied to attributed graphs and have the scalability issue due to the use of Gaussian processes (GPs). To bridge the gap, in this paper, we propose a novel scalable Deep Graph Bayesian Optimization (DGBO) method on attributed graphs. The proposed DGBO prevents the cubical complexity of the GPs by adopting a deep graph neural network to surrogate black-box functions, and can scale linearly with the number of observations. Intensive experiments are conducted on both artificial and real-world problems, including molecular discovery and urban road network design, and demonstrate the effectiveness of the DGBO compared with the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:1385-1392
【Authors】: Sijie He ; Xinyan Li ; Vidyashankar Sivakumar ; Arindam Banerjee
【Abstract】: An important family of problems in climate science focus on finding predictive relationships between various climate variables. In this paper, we consider the problem of predicting monthly deseasonalized land temperature at different locations worldwide based on sea surface temperature (SST). Contrary to popular belief on the trade-off between (a) simple interpretable but inaccurate models and (b) complex accurate but uninterpretable models, we introduce a weighted Lasso model for the problem which yields interpretable results while being highly accurate. Covariate weights in the regularization of weighted Lasso are pre-determined, and proportional to the spatial distance of the covariate (sea surface location) from the target (land location). We establish finite sample estimation error bounds for weighted Lasso, and illustrate its superior empirical performance and interpretability over complex models such as deep neural networks (Deep nets) and gradient boosted trees (GBT). We also present a detailed empirical analysis of what went wrong with Deep nets here, which may serve as a helpful guideline for application of Deep nets to small sample scientific problems.
【Keywords】:
【Paper Link】 【Pages】:1393-1400
【Authors】: Ling Pan ; Qingpeng Cai ; Zhixuan Fang ; Pingzhong Tang ; Longbo Huang
【Abstract】: Bike sharing provides an environment-friendly way for traveling and is booming all over the world. Yet, due to the high similarity of user travel patterns, the bike imbalance problem constantly occurs, especially for dockless bike sharing systems, causing significant impact on service quality and company revenue. Thus, it has become a critical task for bike sharing operators to resolve such imbalance efficiently. In this paper, we propose a novel deep reinforcement learning framework for incentivizing users to rebalance such systems. We model the problem as a Markov decision process and take both spatial and temporal features into consideration. We develop a novel deep reinforcement learning algorithm called Hierarchical Reinforcement Pricing (HRP), which builds upon the Deep Deterministic Policy Gradient algorithm. Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module. We conduct extensive experiments to evaluate HRP, based on a dataset from Mobike, a major Chinese dockless bike sharing company. Results show that HRP performs close to the 24-timeslot look-ahead optimization, and outperforms state-of-the-art methods in both service level and bike distribution. It also transfers well when applied to unseen areas.
【Keywords】:
【Paper Link】 【Pages】:1401-1408
【Authors】: Yufei Wang ; Zheyuan Ryan Shi ; Lantao Yu ; Yi Wu ; Rohit Singh ; Lucas Joppa ; Fei Fang
【Abstract】: Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents’ subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games.
【Keywords】:
【Paper Link】 【Pages】:1409-1416
【Authors】: Chuxu Zhang ; Dongjin Song ; Yuncong Chen ; Xinyang Feng ; Cristian Lumezanu ; Wei Cheng ; Jingchao Ni ; Bo Zong ; Haifeng Chen ; Nitesh V. Chawla
【Abstract】: Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-ofthe-art baseline methods.
【Keywords】:
【Paper Link】 【Pages】:1418-1426
【Authors】: Sina Aghaei ; Mohammad Javad Azizi ; Phebe Vayanos
【Abstract】: In recent years, automated data-driven decision-making systems have enjoyed a tremendous success in a variety of fields (e.g., to make product recommendations, or to guide the production of entertainment). More recently, these algorithms are increasingly being used to assist socially sensitive decisionmaking (e.g., to decide who to admit into a degree program or to prioritize individuals for public housing). Yet, these automated tools may result in discriminative decision-making in the sense that they may treat individuals unfairly or unequally based on membership to a category or a minority, resulting in disparate treatment or disparate impact and violating both moral and ethical standards. This may happen when the training dataset is itself biased (e.g., if individuals belonging to a particular group have historically been discriminated upon). However, it may also happen when the training dataset is unbiased, if the errors made by the system affect individuals belonging to a category or minority differently (e.g., if misclassification rates for Blacks are higher than for Whites). In this paper, we unify the definitions of unfairness across classification and regression. We propose a versatile mixed-integer optimization framework for learning optimal and fair decision trees and variants thereof to prevent disparate treatment and/or disparate impact as appropriate. This translates to a flexible schema for designing fair and interpretable policies suitable for socially sensitive decision-making. We conduct extensive computational studies that show that our framework improves the state-of-the-art in the field (which typically relies on heuristics) to yield non-discriminative decisions at lower cost to overall accuracy.
【Keywords】:
【Paper Link】 【Pages】:1427-1434
【Authors】: Daniel Anderson ; Gregor Hendel ; Pierre Le Bodic ; Merlin Viernickel
【Abstract】: We propose a simple and general online method to measure the search progress within the Branch-and-Bound algorithm, from which we estimate the size of the remaining search tree. We then show how this information can help solvers algorithmically at runtime by designing a restart strategy for MixedInteger Programming (MIP) solvers that decides whether to restart the search based on the current estimate of the number of remaining nodes in the tree. We refer to this type of algorithm as clairvoyant. Our clairvoyant restart strategy outperforms a state-of-the-art solver on a large set of publicly available MIP benchmark instances. It is implemented in the MIP solver SCIP and will be available in future releases.
【Keywords】:
【Paper Link】 【Pages】:1435-1442
【Authors】: Curtis Bright ; Dragomir Z. Ðokovic ; Ilias Kotsireas ; Vijay Ganesh
【Abstract】: We enumerate all circulant good matrices with odd orders divisible by 3 up to order 70. As a consequence of this we find a previously overlooked set of good matrices of order 27 and a new set of good matrices of order 57. We also find that circulant good matrices do not exist in the orders 51, 63, and 69, thereby finding three new counterexamples to the conjecture that such matrices exist in all odd orders. Additionally, we prove a new relationship between the entries of good matrices and exploit this relationship in our enumeration algorithm. Our method applies the SAT+CAS paradigm of combining computer algebra functionality with modern SAT solvers to efficiently search large spaces which are specified by both algebraic and logical constraints.
【Keywords】:
【Paper Link】 【Pages】:1443-1451
【Authors】: Quentin Cappart ; Emmanuel Goutierre ; David Bergman ; Louis-Martin Rousseau
【Abstract】: Finding tight bounds on the optimal solution is a critical element of practical solution methods for discrete optimization problems. In the last decade, decision diagrams (DDs) have brought a new perspective on obtaining upper and lower bounds that can be significantly better than classical bounding mechanisms, such as linear relaxations. It is well known that the quality of the bounds achieved through this flexible bounding method is highly reliant on the ordering of variables chosen for building the diagram, and finding an ordering that optimizes standard metrics is an NP-hard problem. In this paper, we propose an innovative and generic approach based on deep reinforcement learning for obtaining an ordering for tightening the bounds obtained with relaxed and restricted DDs. We apply the approach to both the Maximum Independent Set Problem and the Maximum Cut Problem. Experimental results on synthetic instances show that the deep reinforcement learning approach, by achieving tighter objective function bounds, generally outperforms ordering methods commonly used in the literature when the distribution of instances is known. To the best knowledge of the authors, this is the first paper to apply machine learning to directly improve relaxation bounds obtained by general-purpose bounding mechanisms for combinatorial optimization problems.
【Keywords】:
【Paper Link】 【Pages】:1452-1459
【Authors】: Alexander Diedrich ; Alexander Maier ; Oliver Niggemann
【Abstract】: Currently, detecting and isolating faults in hybrid systems is often done manually with the help of human operators. In this paper we present a novel model-based diagnosis approach for automatically diagnosing hybrid systems. The approach has two parts: First, modelling dynamic system behaviour is done through well-known state space models using differential equations. Second, from the state space models we calculate Boolean residuals through an observer-pattern. The novelty lies in implementing the observer pattern through the use of a symbolic system description specified in satisfiability theory modulo linear arithmetic. With this, we create a static situation for the diagnosis algorithm and decouple modelling and diagnosis. Evaluating the system description generates one Boolean residual for each component. These residuals constitute the fault symptoms. To find the minimum cardinality diagnosis from these symptoms we employ Reiter’s diagnosis lattice.For the experimental evaluation we use a simulation of the Tennessee Eastman process and a simulation of a four-tank model. We show that the presented approach is able to identify all injected faults.
【Keywords】:
【Paper Link】 【Pages】:1460-1467
【Authors】: Hu Ding ; Mingquan Ye
【Abstract】: In real-world, many problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns, especially in the field of computer vision. Recently, the alignment of geometric patterns in high dimension finds several novel applications, and has attracted more and more attentions. However, the research is still rather limited in terms of algorithms. To the best of our knowledge, most existing approaches for high dimensional alignment are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns and approximately preserve the alignment quality. As a consequence, existing alignment approach can be applied to the compressed geometric patterns and thus the time complexity is significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. We adopt the widely used notion “doubling dimension” to measure the extents of our compression and the resulting approximation. Finally, we test our method on both random and real datasets; the experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the running times (including the times cost for compression) are substantially lower.
【Keywords】:
【Paper Link】 【Pages】:1468-1476
【Authors】: Aritra Dutta ; Filip Hanzely ; Peter Richtárik
【Abstract】: Robust principal component analysis (RPCA) is a well-studied problem whose goal is to decompose a matrix into the sum of low-rank and sparse components. In this paper, we propose a nonconvex feasibility reformulation of RPCA problem and apply an alternating projection method to solve it. To the best of our knowledge, this is the first paper proposing a method that solves RPCA problem without considering any objective function, convex relaxation, or surrogate convex constraints. We demonstrate through extensive numerical experiments on a variety of applications, including shadow removal, background estimation, face detection, and galaxy evolution, that our approach matches and often significantly outperforms current state-of-the-art in various ways.
【Keywords】:
【Paper Link】 【Pages】:1477-1484
【Authors】: Eduard Eiben ; Robert Ganian ; Dusan Knop ; Sebastian Ordyniak
【Abstract】: We study the parameterized complexity of Integer Quadratic Programming under two kinds of restrictions: explicit restrictions on the domain or coefficients, and structural restrictions on variable interactions. We argue that both kinds of restrictions are necessary to achieve tractability for Integer Quadratic Programming, and obtain four new algorithms for the problem that are tuned to possible explicit restrictions of instances that we may wish to solve. The presented algorithms are exact, deterministic, and complemented by appropriate lower bounds.
【Keywords】:
【Paper Link】 【Pages】:1485-1494
【Authors】: Takuro Fukunaga ; Takuya Konishi ; Sumio Fujita ; Ken-ichi Kawarabayashi
【Abstract】: We formulate a new stochastic submodular maximization problem by introducing the performance-dependent costs of items. In this problem, we consider selecting items for the case where the performance of each item (i.e., how much an item contributes to the objective function) is decided randomly, and the cost of an item depends on its performance. The goal of the problem is to maximize the objective function subject to a budget constraint on the costs of the selected items. We present an adaptive algorithm for this problem with a theoretical guaran-√ tee that its expected objective value is at least (1−1/ 4 e)/2 times the maximum value attained by any adaptive algorithms. We verify the performance of the algorithm through numerical experiments.
【Keywords】:
【Paper Link】 【Pages】:1495-1502
【Authors】: Amin Hosseininasab ; Willem-Jan van Hoeve ; André A. Ciré
【Abstract】: Constraint-based sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram (MDD) representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.
【Keywords】:
【Paper Link】 【Pages】:1503-1510
【Authors】: Feihu Huang ; Bin Gu ; Zhouyuan Huo ; Songcan Chen ; Heng Huang
【Abstract】: Proximal gradient method has been playing an important role to solve many machine learning tasks, especially for the nonsmooth problems. However, in some machine learning problems such as the bandit model and the black-box learning problem, proximal gradient method could fail because the explicit gradients of these problems are difficult or infeasible to obtain. The gradient-free (zeroth-order) method can address these problems because only the objective function values are required in the optimization. Recently, the first zeroth-order proximal stochastic algorithm was proposed to solve the nonconvex nonsmooth problems. However, its convergence rate is O(1/√T) for the nonconvex problems, which is significantly slower than the best convergence rate O(T1) of the zerothorder stochastic algorithm, where T is the iteration number. To fill this gap, in the paper, we propose a class of faster zeroth-order proximal stochastic methods with the variance reduction techniques of SVRG and SAGA, which are denoted as ZO-ProxSVRG and ZO-ProxSAGA, respectively. In theoretical analysis, we address the main challenge that an unbiased estimate of the true gradient does not hold in the zerothorder case, which was required in previous theoretical analysis of both SVRG and SAGA. Moreover, we prove that both ZO-ProxSVRG and ZO-ProxSAGA algorithms have O(T1) convergence rates. Finally, the experimental results verify that our algorithms have a faster convergence rate than the existing zeroth-order proximal stochastic algorithm.
【Keywords】:
【Paper Link】 【Pages】:1511-1519
【Authors】: Alexey Ignatiev ; Nina Narodytska ; Joao Marques-Silva
【Abstract】: The growing range of applications of Machine Learning (ML) in a multitude of settings motivates the ability of computing small explanations for predictions made. Small explanations are generally accepted as easier for human decision makers to understand. Most earlier work on computing explanations is based on heuristic approaches, providing no guarantees of quality, in terms of how close such solutions are from cardinality- or subset-minimal explanations. This paper develops a constraint-agnostic solution for computing explanations for any ML model. The proposed solution exploits abductive reasoning, and imposes the requirement that the ML model can be represented as sets of constraints using some target constraint reasoning system for which the decision problem can be answered with some oracle. The experimental results, obtained on well-known datasets, validate the scalability of the proposed approach as well as the quality of the computed solutions.
【Keywords】:
【Paper Link】 【Pages】:1520-1527
【Authors】: Yoichi Iwata ; Takuto Shigemura
【Abstract】: Steiner tree is a classical NP-hard problem that has been extensively studied both theoretically and empirically. In theory, the fastest approach for inputs with a small number of terminals uses the dynamic programming, but in practice, stateof-the-art solvers are based on the branch-and-cut method. In this paper, we present a novel separator-based pruning technique for speeding up a theoretically fast DP algorithm. Our empirical evaluation shows that our pruned DP algorithm is quite effective against real-world instances admitting small separators, scales to more than a hundred terminals, and is competitive with a branch-and-cut solver.
【Keywords】:
【Paper Link】 【Pages】:1528-1535
【Authors】: Ehsan Kazemi ; Liqiang Wang
【Abstract】: Nonconvex and nonsmooth problems have recently attracted considerable attention in machine learning. However, developing efficient methods for the nonconvex and nonsmooth optimization problems with certain performance guarantee remains a challenge. Proximal coordinate descent (PCD) has been widely used for solving optimization problems, but the knowledge of PCD methods in the nonconvex setting is very limited. On the other hand, the asynchronous proximal coordinate descent (APCD) recently have received much attention in order to solve large-scale problems. However, the accelerated variants of APCD algorithms are rarely studied. In this paper, we extend APCD method to the accelerated algorithm (AAPCD) for nonsmooth and nonconvex problems that satisfies the sufficient descent property, by comparing between the function values at proximal update and a linear extrapolated point using a delay-aware momentum value. To the best of our knowledge, we are the first to provide stochastic and deterministic accelerated extension of APCD algorithms for general nonconvex and nonsmooth problems ensuring that for both bounded delays and unbounded delays every limit point is a critical point. By leveraging Kurdyka-Łojasiewicz property, we will show linear and sublinear convergence rates for the deterministic AAPCD with bounded delays. Numerical results demonstrate the practical efficiency of our algorithm in speed.
【Keywords】:
【Paper Link】 【Pages】:1536-1543
【Authors】: Jean-Marie Lagniez ; Pierre Marquis
【Abstract】: We present a recursive algorithm for projected model counting, i.e., the problem consisting in determining the number of models k∃X.Σk of a propositional formula Σ after eliminating from it a given set X of variables. Based on a ”standard” model counter, our algorithm projMC takes advantage of a disjunctive decomposition scheme of ∃X.Σ for computing k∃X.Σk. It also looks for disjoint components in its input for improving the computation. Our experiments show that in many cases projMC is significantly more efficient than the previous algorithms for projected model counting from the literature.
【Keywords】:
【Paper Link】 【Pages】:1544-1551
【Authors】: Liping Li ; Wei Xu ; Tianyi Chen ; Georgios B. Giannakis ; Qing Ling
【Abstract】: In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.
【Keywords】:
【Paper Link】 【Pages】:1552-1559
【Authors】: Jingchang Liu ; Linli Xu ; Junliang Guo ; Xin Sheng
【Abstract】: We focus on empirical risk minimization with a composite regulariser, which has been widely applied in various machine learning tasks to introduce important structural information regarding the problem or data. In general, it is challenging to calculate the proximal operator with the composite regulariser. Recently, proximal average (PA) which involves a feasible proximal operator calculation is proposed to approximate composite regularisers. Augmented with the prevailing variance reducing (VR) stochastic methods (e.g. SVRG, SAGA), PA based algorithms would achieve a better performance. However, existing works require a fixed stepsize, which needs to be rather small to ensure that the PA approximation is sufficiently accurate. In the meantime, the smaller stepsize would incur many more iterations for convergence. In this paper, we propose two fast PA based VR stochastic methods – APA-SVRG and APA-SAGA. By initializing the stepsize with a much larger value and adaptively decreasing it, both of the proposed methods are proved to enjoy the (ô n log 1/ε + mo 1/ε) iteration complexity to achieve the accurate solutions, where m0 is the initial number of inner iterations and n is the number of samples. Moreover, experimental results demonstrate the superiority of the proposed algorithms.
【Keywords】:
【Paper Link】 【Pages】:1560-1567
【Authors】: Shengcai Liu ; Ke Tang ; Xin Yao
【Abstract】: Exploiting parallelism is becoming more and more important in designing efficient solvers for computationally hard problems. However, manually building parallel solvers typically requires considerable domain knowledge and plenty of human effort. As an alternative, automatic construction of parallel portfolios (ACPP) aims at automatically building effective parallel portfolios based on a given problem instance set and a given rich configuration space. One promising way to solve the ACPP problem is to explicitly group the instances into different subsets and promote a component solver to handle each of them. This paper investigates solving ACPP from this perspective, and especially studies how to obtain a good instance grouping. The experimental results on two widely studied problem domains, the boolean satisfiability problems (SAT) and the traveling salesman problems (TSP), showed that the parallel portfolios constructed by the proposed method could achieve consistently superior performances to the ones constructed by the state-of-the-art ACPP methods, and could even rival sophisticated hand-designed parallel solvers.
【Keywords】:
【Paper Link】 【Pages】:1568-1575
【Authors】: Igor Molybog ; Javad Lavaei
【Abstract】: In this paper, we study the semidefinite affine rank feasibility problem, which consists in finding a positive semidefinite matrix of a given rank from its linear measurements. We consider the semidefinite programming relaxations of the problem with different objective functions and study their properties. In particular, we propose an analytical bound on the number of relaxations that are sufficient to solve in order to obtain a solution of a generic instance of the semidefinite affine rank feasibility problem or prove that there is no solution. This is followed by a heuristic algorithm based on semidefinite relaxation and an experimental proof of its performance on a large sample of synthetic data.
【Keywords】:
【Paper Link】 【Pages】:1576-1583
【Authors】: Jarrid Rector-Brooks ; Jun-Kun Wang ; Barzan Mozafari
【Abstract】: We revisit the Frank-Wolfe (FW) optimization under strongly convex constraint sets. We provide a faster convergence rate for FW without line search, showing that a previously overlooked variant of FW is indeed faster than the standard variant. With line search, we show that FW can converge to the global optimum, even for smooth functions that are not convex, but are quasi-convex and locally-Lipschitz. We also show that, for the general case of (smooth) non-convex functions, FW with line search converges with high probability to a stationary point at a rate of O(1/t), as long as the constraint set is strongly convex—one of the fastest convergence rates in non-convex optimization.
【Keywords】:
【Paper Link】 【Pages】:1584-1591
【Authors】: Christoph Scholl ; Jie-Hong Roland Jiang ; Ralf Wimmer ; Aile Ge-Ernst
【Abstract】: Dependency quantified Boolean formulas (DQBFs) are a powerful formalism, which subsumes quantified Boolean formulas (QBFs) and allows an explicit specification of dependencies of existential variables on universal variables. This enables a succinct encoding of decision problems in the NEXPTIME complexity class. As solving general DQBFs is NEXPTIME complete, in contrast to the PSPACE completeness of QBF solving, characterizing DQBF subclasses of lower computational complexity allows their effective solving and is of practical importance.Recently a DQBF proof calculus based on a notion of fork extension, in addition to resolution and universal reduction, was proposed by Rabe in 2017. We show that this calculus is in fact incomplete for general DQBFs, but complete for a subclass of DQBFs, where any two existential variables have either identical or disjoint dependency sets over the universal variables. We further characterize this DQBF subclass to be ΣP3 complete in the polynomial time hierarchy. Essentially using fork extension, a DQBF in this subclass can be converted to an equisatisfiable 3QBF with only a linear increase in formula size. We exploit this conversion for effective solving of this DQBF subclass and point out its potential as a general strategy for DQBF quantifier localization. Experimental results show that the method outperforms state-of-the-art DQBF solvers on a number of benchmarks, including the 2018 DQBF evaluation benchmarks.
【Keywords】:
【Paper Link】 【Pages】:1592-1599
【Authors】: Mate Soos ; Kuldeep S. Meel
【Abstract】: Given a Boolean formula φ, the problem of model counting, also referred to as #SAT is to compute the number of solutions of φ. Model counting is a fundamental problem in artificial intelligence with a wide range of applications including probabilistic reasoning, decision making under uncertainty, quantified information flow, and the like. Motivated by the success of SAT solvers, there has been surge of interest in the design of hashing-based techniques for approximate model counting for the past decade. We profiled the state of the art approximate model counter ApproxMC2 and observed that over 99.99% of time is consumed by the underlying SAT solver, CryptoMiniSat. This observation motivated us to ask: Can we design an efficient underlying CNF-XOR SAT solver that can take advantage of the structure of hashing-based algorithms and would this lead to an efficient approximate model counter? The primary contribution of this paper is an affirmative answer to the above question. We present a novel architecture, called BIRD, to handle CNF-XOR formulas arising from hashingbased techniques. The resulting hashing-based approximate model counter, called ApproxMC3, employs the BIRD framework in its underlying SAT solver, CryptoMiniSat. To the best of our knowledge, we conducted the most comprehensive study of evaluation performance of counting algorithms involving 1896 benchmarks with computational effort totaling 86400 computational hours. Our experimental evaluation demonstrates significant runtime performance improvement for ApproxMC3 over ApproxMC2. In particular, we solve 648 benchmarks more than ApproxMC2, the state of the art approximate model counter and for all the formulas where both ApproxMC2 and ApproxMC3 did not timeout and took more than 1 seconds, the mean speedup is 284.40 – more than two orders of magnitude. Erratum: This research is supported in part by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: [AISG-RP-2018-005])
【Keywords】:
【Paper Link】 【Pages】:1600-1607
【Authors】: Sabine Storandt ; Stefan Funke
【Abstract】: In this paper, we study a problem from the realm of multicriteria decision making in which the goal is to select from a given set S of d-dimensional objects a minimum sized subset S0 with bounded regret. Thereby, regret measures the unhappiness of users which would like to select their favorite object from set S but now can only select their favorite object from the subset S0. Previous work focused on bounding the maximum regret which is determined by the most unhappy user. We propose to consider the average regret instead which is determined by the sum of (un)happiness of all possible users. We show that this regret measure comes with desirable properties as supermodularity which allows to construct approximation algorithms. Furthermore, we introduce the regret minimizing permutation problem and discuss extensions of our algorithms to the recently proposed k-regret measure. Our theoretical results are accompanied with experiments on a variety of inputs with d up to 7.
【Keywords】:
【Paper Link】 【Pages】:1608-1616
【Authors】: Miguel Terra-Neves ; Nuno Machado ; Inês Lynce ; Vasco M. Manquinho
【Abstract】: Current Maximum Satisfiability (MaxSAT) algorithms based on successive calls to a powerful Satisfiability (SAT) solver are now able to solve real-world instances in many application domains. Moreover, replacing the SAT solver with a Satisfiability Modulo Theories (SMT) solver enables effective MaxSMT algorithms. However, MaxSMT has seldom been used in debugging multi-threaded software.Multi-threaded programs are usually non-deterministic due to the huge number of possible thread operation schedules, which makes them much harder to debug than sequential programs. A recent approach to isolate the root cause of concurrency bugs in multi-threaded software is to produce a report that shows the differences between a failing and a non-failing execution. However, since they rely solely on heuristics, these reports can be unnecessarily large. Hence, reports may contain operations that are not relevant to the bug’s occurrence.This paper proposes the use of MaxSMT for the generation of minimal reports for multi-threaded software with concurrency bugs. The proposed techniques report situations that the existing techniques are not able to identify. Experimental results show that using MaxSMT can significantly improve the accuracy of the generated reports and, consequently, their usefulness in debugging the root cause of concurrency bugs.
【Keywords】:
【Paper Link】 【Pages】:1617-1624
【Authors】: Pratibha Vellanki ; Santu Rana ; Sunil Gupta ; David Rubin de Celis Leal ; Alessandra Sutti ; Murray Height ; Svetha Venkatesh
【Abstract】: Real world experiments are expensive, and thus it is important to reach a target in a minimum number of experiments. Experimental processes often involve control variables that change over time. Such problems can be formulated as functional optimisation problem. We develop a novel Bayesian optimisation framework for such functional optimisation of expensive black-box processes. We represent the control function using Bernstein polynomial basis and optimise in the coefficient space. We derive the theory and practice required to dynamically adjust the order of the polynomial degree, and show how prior information about shape can be integrated. We demonstrate the effectiveness of our approach for short polymer fibre design and optimising learning rate schedules for deep networks.
【Keywords】:
【Paper Link】 【Pages】:1625-1632
【Authors】: Sicco Verwer ; Yingqian Zhang
【Abstract】: We provide a new formulation for the problem of learning the optimal classification tree of a given depth as a binary linear program. A limitation of previously proposed Mathematical Optimization formulations is that they create constraints and variables for every row in the training data. As a result, the running time of the existing Integer Linear programming (ILP) formulations increases dramatically with the size of data. In our new binary formulation, we aim to circumvent this problem by making the formulation size largely independent from the training data size. We show experimentally that our formulation achieves better performance than existing formulations on both small and large problem instances within shorter running time.
【Keywords】:
【Paper Link】 【Pages】:1633-1640
【Authors】: Pengfei Wang ; Risheng Liu ; Nenggan Zheng ; Zhefeng Gong
【Abstract】: In machine learning research, many emerging applications can be (re)formulated as the composition optimization problem with nonsmooth regularization penalty. To solve this problem, traditional stochastic gradient descent (SGD) algorithm and its variants either have low convergence rate or are computationally expensive. Recently, several stochastic composition gradient algorithms have been proposed, however, these methods are still inefficient and not scalable to large-scale composition optimization problem instances. To address these challenges, we propose an asynchronous parallel algorithm, named Async-ProxSCVR, which effectively combines asynchronous parallel implementation and variance reduction method. We prove that the algorithm admits the fastest convergence rate for both strongly convex and general nonconvex cases. Furthermore, we analyze the query complexity of the proposed algorithm and prove that linear speedup is accessible when we increase the number of processors. Finally, we evaluate our algorithm Async-ProxSCVR on two representative composition optimization problems including value function evaluation in reinforcement learning and sparse mean-variance optimization problem. Experimental results show that the algorithm achieves significant speedups and is much faster than existing compared methods.
【Keywords】:
【Paper Link】 【Pages】:1641-1649
【Authors】: Po-Wei Wang ; J. Zico Kolter
【Abstract】: This paper proposes a new algorithm for solving MAX2SAT problems based on combining search methods with semidefinite programming approaches. Semidefinite programming techniques are well-known as a theoretical tool for approximating maximum satisfiability problems, but their application has traditionally been very limited by their speed and randomized nature. Our approach overcomes this difficult by using a recent approach to low-rank semidefinite programming, specialized to work in an incremental fashion suitable for use in an exact search algorithm. The method can be used both within complete or incomplete solver, and we demonstrate on a variety of problems from recent competitions. Our experiments show that the approach is faster (sometimes by orders of magnitude) than existing state-of-the-art complete and incomplete solvers, representing a substantial advance in search methods specialized for MAX2SAT problems.
【Keywords】:
【Paper Link】 【Pages】:1650-1657
【Authors】: Yiyang Wang ; Risheng Liu ; Long Ma ; Xiaoliang Song
【Abstract】: We in this paper propose a realizable framework TECU, which embeds task-specific strategies into update schemes of coordinate descent, for optimizing multivariate non-convex problems with coupled objective functions. On one hand, TECU is capable of improving algorithm efficiencies through embedding productive numerical algorithms, for optimizing univariate sub-problems with nice properties. From the other side, it also augments probabilities to receive desired results, by embedding advanced techniques in optimizations of realistic tasks. Integrating both numerical algorithms and advanced techniques together, TECU is proposed in a unified framework for solving a class of non-convex problems. Although the task embedded strategies bring inaccuracies in sub-problem optimizations, we provide a realizable criterion to control the errors, meanwhile, to ensure robust performances with rigid theoretical analyses. By respectively embedding ADMM and a residual-type CNN in our algorithm framework, the experimental results verify both efficiency and effectiveness of embedding task-oriented strategies in coordinate descent for solving practical problems.
【Keywords】:
【Paper Link】 【Pages】:1658-1665
【Authors】: Bryan Wilder ; Bistra Dilkina ; Milind Tambe
【Abstract】: Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely).We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decisionfocused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model’s utility in optimization, and our method’s ability to specify the true goal as the model’s training objective yields substantial dividends across a range of decision problems.
【Keywords】:
【Paper Link】 【Pages】:1666-1673
【Authors】: Jiacheng Wu ; Jian-Xun Wang ; Shawn C. Shadden
【Abstract】: Using observation data to estimate unknown parameters in computational models is broadly important. This task is often challenging because solutions are non-unique due to the complexity of the model and limited observation data. However, the parameters or states of the model are often known to satisfy additional constraints beyond the model. Thus, we propose an approach to improve parameter estimation in such inverse problems by incorporating constraints in a Bayesian inference framework. Constraints are imposed by constructing a likelihood function based on fitness of the solution to the constraints. The posterior distribution of the parameters conditioned on (1) the observed data and (2) satisfaction of the constraints is obtained, and the estimate of the parameters is given by the maximum a posteriori estimation or posterior mean. Both equality and inequality constraints can be considered by this framework, and the strictness of the constraints can be controlled by constraint uncertainty denoting a confidence on its correctness. Furthermore, we extend this framework to an approximate Bayesian inference framework in terms of the ensemble Kalman filter method, where the constraint is imposed by re-weighing the ensemble members based on the likelihood function. A synthetic model is presented to demonstrate the effectiveness of the proposed method and in both the exact Bayesian inference and ensemble Kalman filter scenarios, numerical simulations show that imposing constraints using the method presented improves identification of the true parameter solution among multiple local minima.
【Keywords】:
【Paper Link】 【Pages】:1674-1681
【Authors】: Qiao Xiang ; Haitao Yu ; James Aspnes ; Franck Le ; Linghe Kong ; Yang Richard Yang
【Abstract】: Network resource reservation systems are being developed and deployed, driven by the demand and substantial benefits of providing performance predictability for modern distributed applications. However, existing systems suffer limitations: They either are inefficient in finding the optimal resource reservation, or cause private information (e.g., from the network infrastructure) to be exposed (e.g., to the user). In this paper, we design BoxOpt, a novel system that leverages efficient oracle construction techniques in optimization and learning theory to automatically, and swiftly learn the optimal resource reservations without exchanging any private information between the network and the user. We implement a prototype of BoxOpt and demonstrate its efficiency and efficacy via extensive experiments using real network topology and trace. Results show that (1) BoxOpt has a 100% correctness ratio, and (2) for 95% of requests, BoxOpt learns the optimal resource reservation within 13 seconds.
【Keywords】:
【Paper Link】 【Pages】:1682-1689
【Authors】: Xiaoyong Yuan ; Zheng Feng ; Matthew Norton ; Xiaolin Li
【Abstract】: Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In general, we show that mean and standard deviation are not always the most appropriate choice for the centering and scaling procedure within the BN transformation, particularly if ReLU follows the normalization step. We present a Generalized Batch Normalization (GBN) transformation, which can utilize a variety of alternative deviation measures for scaling and statistics for centering, choices which naturally arise from the theory of generalized deviation measures and risk theory in general. When used in conjunction with the ReLU non-linearity, the underlying risk theory suggests natural, arguably optimal choices for the deviation measure and statistic. Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well. Overall, we propose a more flexible BN transformation supported by a complimentary theoretical framework that can potentially guide design choices.
【Keywords】:
【Paper Link】 【Pages】:1691-1698
【Authors】: Raluca D. Gaina ; Simon M. Lucas ; Diego Pérez-Liébana
【Abstract】: One of the issues general AI game players are required to deal with is the different reward systems in the variety of games they are expected to be able to play at a high level. Some games may present plentiful rewards which the agents can use to guide their search for the best solution, whereas others feature sparse reward landscapes that provide little information to the agents. The work presented in this paper focuses on the latter case, which most agents struggle with. Thus, modifications are proposed for two algorithms, Monte Carlo Tree Search and Rolling Horizon Evolutionary Algorithms, aiming at improving performance in this type of games while maintaining overall win rate across those where rewards are plentiful. Results show that longer rollouts and individual lengths, either fixed or responsive to changes in fitness landscape features, lead to a boost of performance in the games during testing without being detrimental to non-sparse reward scenarios.
【Keywords】:
【Paper Link】 【Pages】:1699-1706
【Authors】: Jakub Kowalski ; Maksymilian Mika ; Jakub Sutowicz ; Marek Szykula
【Abstract】: We propose a new General Game Playing (GGP) language called Regular Boardgames (RBG), which is based on the theory of regular languages. The objective of RBG is to join key properties as expressiveness, efficiency, and naturalness of the description in one GGP formalism, compensating certain drawbacks of the existing languages. This often makes RBG more suitable for various research and practical developments in GGP. While dedicated mostly for describing board games, RBG is universal for the class of all finite deterministic turn-based games with perfect information. We establish foundations of RBG, and analyze it theoretically and experimentally, focusing on the efficiency of reasoning. Regular Boardgames is the first GGP language that allows efficient encoding and playing games with complex rules and with large branching factor (e.g. amazons, arimaa, large chess variants, go, international checkers, paper soccer).
【Keywords】:
【Paper Link】 【Pages】:1707-1714
【Authors】: Yining Lang ; Wei Liang ; Yujia Wang ; Lap-Fai Yu
【Abstract】: Synthesizing 3D faces that give certain personality impressions is commonly needed in computer games, animations, and virtual world applications for producing realistic virtual characters. In this paper, we propose a novel approach to synthesize 3D faces based on personality impression for creating virtual characters. Our approach consists of two major steps. In the first step, we train classifiers using deep convolutional neural networks on a dataset of images with personality impression annotations, which are capable of predicting the personality impression of a face. In the second step, given a 3D face and a desired personality impression type as user inputs, our approach optimizes the facial details against the trained classifiers, so as to synthesize a face which gives the desired personality impression. We demonstrate our approach for synthesizing 3D faces giving desired personality impressions on a variety of 3D face models. Perceptual studies show that the perceived personality impressions of the synthesized faces agree with the target personality impressions specified for synthesizing the faces.
【Keywords】:
【Paper Link】 【Pages】:1715-1722
【Authors】: Juntao Li ; Lidong Bing ; Lisong Qiu ; Dongmin Chen ; Dongyan Zhao ; Rui Yan
【Abstract】: Automatic story generation is a challenging task, which involves automatically comprising a sequence of sentences or words with a consistent topic and novel wordings. Although many attention has been paid to this task and prompting progress has been made, there still exists a noticeable gap between generated stories and those created by humans, especially in terms of thematic consistency and wording novelty. To fill this gap, we propose a cache-augmented conditional variational autoencoder for story generation, where the cache module allows to improve thematic consistency while the conditional variational autoencoder part is used for generating stories with less common words by using a continuous latent variable. For combing the cache module and the autoencoder part, we further introduce an effective gate mechanism. Experimental results on ROCStories and WritingPrompts indicate that our proposed model can generate stories with consistency and wording novelty, and outperforms existing models under both automatic metrics and human evaluations.
【Keywords】:
【Paper Link】 【Pages】:1724-1731
【Authors】: Colleen Alkalay-Houlihan ; Nisarg Shah
【Abstract】: Bitcoin, a cryptocurrency built on the blockchain data structure, has generated significant academic and commercial interest. Contrary to prior expectations, recent research has shown that participants of the protocol (the so-called “miners”) are not always incentivized to follow the protocol. We study the game induced by one such attack – the pool block withholding attack – in which mining pools (groups of miners) attack other mining pools. We focus on the case of two pools attacking each other, with potentially other mining power in the system.We show that this game always admits a pure Nash equilibrium, and its pure price of anarchy, which intuitively measures how much computational power can be wasted due to attacks in an equilibrium, is at most 3. We conjecture, and prove in special cases, that it is in fact at most 2. Our simulations provide compelling evidence for this conjecture, and show that players can quickly converge to the equilibrium by following best response strategies.
【Keywords】:
【Paper Link】 【Pages】:1732-1739
【Authors】: Eshwar Ram Arunachaleswaran ; Siddharth Barman ; Nidhi Rathi
【Abstract】: We study classic fair-division problems in a partial information setting. This paper respectively addresses fair division of rent, cake, and indivisible goods among agents with cardinal preferences. We will show that, for all of these settings and under appropriate valuations, a fair (or an approximately fair) division among n agents can be efficiently computed using only the valuations of n − 1 agents. The nth (secretive) agent can make an arbitrary selection after the division has been proposed and, irrespective of her choice, the computed division will admit an overall fair allocation.For the rent-division setting we prove that well-behaved utilities of n − 1 agents suffice to find a rent division among n rooms such that, for every possible room selection of the secretive agent, there exists an allocation (of the remaining n − 1 rooms among the n − 1 agents) which ensures overall envy freeness (fairness). We complement this existential result by developing a polynomial-time algorithm for the case of quasilinear utilities. In this partial information setting, we also develop efficient algorithms to compute allocations that are envy-free up to one good (EF1) and ε-approximate envy free. These two notions of fairness are applicable in the context of indivisible goods and divisible goods (cake cutting), respectively.One of the main technical contributions of this paper is the development of novel connections between different fairdivision paradigms, e.g., we use our existential results for envy-free rent-division to develop an efficient EF1 algorithm.
【Keywords】:
【Paper Link】 【Pages】:1740-1747
【Authors】: Haris Aziz ; Péter Biró ; Ronald de Haan ; Baharak Rastegari
【Abstract】: The assignment problem is one of the most well-studied settings in multi-agent resource allocation. Aziz, de Haan, and Rastegari (2017) considered this problem with the additional feature that agents’ preferences involve uncertainty. In particular, they considered two uncertainty models neither of which is necessarily compact. In this paper, we focus on three uncertain preferences models whose size is polynomial in the number of agents and items. We consider several interesting computational questions with regard to Pareto optimal assignments. We also present some general characterization and algorithmic results that apply to large classes of uncertainty models.
【Keywords】:
【Paper Link】 【Pages】:1748-1755
【Authors】: Siddharth Barman ; Sanath Kumar Krishnamurthy
【Abstract】: We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations (Eisenberg and Gale 1959; Orlin 2010), such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents’ budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a “near-by,” pure market with an accompanying integral equilibrium.Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency (Caragiannis et al. 2016); here fairness refers to envyfreeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
【Keywords】:
【Paper Link】 【Pages】:1756-1763
【Authors】: Nathanaël Barrot ; Kazunori Ota ; Yuko Sakurai ; Makoto Yokoo
【Abstract】: We study hedonic games under friends appreciation, where each agent considers other agents friends, enemies, or unknown agents. Although existing work assumed that unknown agents have no impact on an agent’s preference, it may be that her preference depends on the number of unknown agents in her coalition. We extend the existing preference, friends appreciation, by proposing two alternative attitudes toward unknown agents, extraversion and introversion, depending on whether unknown agents have a slightly positive or negative impact on preference. When each agent prefers coalitions with more unknown agents, we show that both core stable outcomes and individually stable outcomes may not exist. We also prove that deciding the existence of the core and the existence of an individual stable coalition structure are respectively NPNP-complete and NP-complete.
【Keywords】:
【Paper Link】 【Pages】:1764-1771
【Authors】: Dorothea Baumeister ; Tobias Hogrebe ; Lisa Rey
【Abstract】: The bribery problem in elections asks whether an external agent can make some distinguished candidate win or prevent her from winning, by bribing some of the voters. This problem was studied with respect to the weighted swap distance between two votes by Elkind et al. (2009). We generalize this definition by introducing a bound on the distance between the original and the bribed votes. The distance measures we consider include a restriction of the weighted swap distance and variants of the footrule distance, which capture some realworld models of influence an external agent may have on the voters. We study constructive and destructive variants of distance bribery for scoring rules and obtain polynomial-time algorithms as well as NP-hardness results. For the case of element-weighted swap and element-weighted footrule distances, we give a complete dichotomy result for the class of pure scoring rules.
【Keywords】:
【Paper Link】 【Pages】:1772-1779
【Authors】: Omer Ben-Porat ; Gregory Goren ; Itay Rosenberg ; Moshe Tennenholtz
【Abstract】: Recommendation systems are extremely popular tools for matching users and contents. However, when content providers are strategic, the basic principle of matching users to the closest content, where both users and contents are modeled as points in some semantic space, may yield low social welfare. This is due to the fact that content providers are strategic and optimize their offered content to be recommended to as many users as possible. Motivated by modern applications, we propose the widely studied framework of facility location games to study recommendation systems with strategic content providers. Our conceptual contribution is the introduction of a mediator to facility location models, in the pursuit of better social welfare. We aim at designing mediators that a) induce a game with high social welfare in equilibrium, and b) intervene as little as possible. In service of the latter, we introduce the notion of intervention cost, which quantifies how much damage a mediator may cause to the social welfare when an off-equilibrium profile is adopted. As a case study in high-welfare low-intervention mediator design, we consider the one-dimensional segment as the user domain. We propose a mediator that implements the socially optimal strategy profile as the unique equilibrium profile, and show a tight bound on its intervention cost. Ultimately, we consider some extensions, and highlight open questions for the general agenda.
【Keywords】:
【Paper Link】 【Pages】:1780-1787
【Authors】: Omer Ben-Porat ; Itay Rosenberg ; Moshe Tennenholtz
【Abstract】: We consider a game-theoretic model of information retrieval with strategic authors. We examine two different utility schemes: authors who aim at maximizing exposure and authors who want to maximize active selection of their content (i.e., the number of clicks). We introduce the study of author learning dynamics in such contexts. We prove that under the probability ranking principle (PRP), which forms the basis of the current state-of-the-art ranking methods, any betterresponse learning dynamics converges to a pure Nash equilibrium. We also show that other ranking methods induce a strategic environment under which such a convergence may not occur.
【Keywords】:
【Paper Link】 【Pages】:1788-1795
【Authors】: Gerdus Benadè ; Ariel D. Procaccia ; Mingda Qiao
【Abstract】: Work on implicit utilitarian voting advocates the design of preference aggregation methods that maximize utilitarian social welfare with respect to latent utility functions, based only on observed rankings of the alternatives. This approach has been successfully deployed in order to help people choose a single alternative or a subset of alternatives, but it has previously been unclear how to apply the same approach to the design of social welfare functions, where the desired output is a ranking. We propose to address this problem by assuming that voters’ utilities for rankings are induced by unknown weights and unknown utility functions, which, moreover, have a combinatorial (subadditive) structure. Despite the extreme lack of information about voters’ preferences, we show that it is possible to choose rankings such that the worst-case gap between their social welfare and that of the optimal ranking, called distortion, is no larger (up to polylogarithmic factors) than the distortion associated with much simpler problems. Through experiments, we identify practical methods that achieve nearoptimal social welfare on average.
【Keywords】:
【Paper Link】 【Pages】:1796-1803
【Authors】: Daan Bloembergen ; Davide Grossi ; Martin Lackner
【Abstract】: Liquid democracy is a proxy voting method where proxies are delegable. We propose and study a game-theoretic model of liquid democracy to address the following question: when is it rational for a voter to delegate her vote? We study the existence of pure-strategy Nash equilibria in this model, and how group accuracy is affected by them. We complement these theoretical results by means of agent-based simulations to study the effects of delegations on group’s accuracy on variously structured social networks.
【Keywords】:
【Paper Link】 【Pages】:1804-1811
【Authors】: Allan Borodin ; Omer Lev ; Nisarg Shah ; Tyrone Strangway
【Abstract】: Much of the social choice literature examines direct voting systems, in which voters submit their ranked preferences over candidates and a voting rule picks a winner. Real-world elections and decision-making processes are often more complex and involve multiple stages. For instance, one popular voting system filters candidates through primaries: first, voters affiliated with each political party vote over candidates of their own party and the voting rule picks a candidate from each party, which then compete in a general election.We present a model to analyze such multi-stage elections, and conduct the first quantitative comparison (to the best of our knowledge) of the direct and primary voting systems with two political parties in terms of the quality of the elected candidate. Our main result is that every voting rule is guaranteed to perform almost as well (i.e., within a constant factor) under the primary system as under the direct system. Surprisingly, the converse does not hold: we show settings in which there exist voting rules that perform significantly better under the primary system than under the direct system.
【Keywords】:
【Paper Link】 【Pages】:1812-1819
【Authors】: Simina Brânzei ; Aris Filos-Ratsikas
【Abstract】: In a multi-unit market, a seller brings multiple units of a good and tries to sell them to a set of buyers that have monetary endowments. While a Walrasian equilibrium does not always exist in this model, natural relaxations of the concept that retain its desirable fairness properties do exist. We study the dynamics of (Walrasian) envy-free pricing mechanisms in this environment, showing that for any such pricing mechanism, the best response dynamic starting from truth-telling converges to a pure Nash equilibrium with small loss in revenue and welfare. Moreover, we generalize these bounds to capture all the (reasonable) Nash equilibria for a large class of (monotone) pricing mechanisms. We also identify a natural mechanism, which selects the minimum Walrasian envy-free price, in which for n=2 buyers the best response dynamic converges from any starting profile. We conjecture convergence of the mechanism for any number of buyers and provide simulation results to support our conjecture.
【Keywords】:
【Paper Link】 【Pages】:1820-1828
【Authors】: Gianluca Brero ; Sébastien Lahaie ; Sven Seuken
【Abstract】: Iterative combinatorial auctions (CAs) are often used in multibillion dollar domains like spectrum auctions, and speed of convergence is one of the crucial factors behind the choice of a specific design for practical applications. To achieve fast convergence, current CAs require careful tuning of the price update rule to balance convergence speed and allocative efficiency. Brero and Lahaie (2018) recently introduced a Bayesian iterative auction design for settings with singleminded bidders. The Bayesian approach allowed them to incorporate prior knowledge into the price update algorithm, reducing the number of rounds to convergence with minimal parameter tuning. In this paper, we generalize their work to settings with no restrictions on bidder valuations. We introduce a new Bayesian CA design for this general setting which uses Monte Carlo Expectation Maximization to update prices at each round of the auction. We evaluate our approach via simulations on CATS instances. Our results show that our Bayesian CA outperforms even a highly optimized benchmark in terms of clearing percentage and convergence speed.
【Keywords】:
【Paper Link】 【Pages】:1829-1836
【Authors】: Noam Brown ; Tuomas Sandholm
【Abstract】: Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfectinformation games. In this paper we introduce novel CFR variants that 1) discount regrets from earlier iterations in various ways (in some cases differently for positive and negative regrets), 2) reweight iterations in various ways to obtain the output strategies, 3) use a non-standard regret minimizer and/or 4) leverage “optimistic regret matching”. They lead to dramatically improved performance in many settings. For one, we introduce a variant that outperforms CFR+, the prior state-of-the-art algorithm, in every game tested, including large-scale realistic settings. CFR+ is a formidable benchmark: no other algorithm has been able to outperform it. Finally, we show that, unlike CFR+, many of the important new variants are compatible with modern imperfect-informationgame pruning techniques and one is also compatible with sampling in the game tree.
【Keywords】:
【Paper Link】 【Pages】:1837-1844
【Authors】: Sofia Ceppi ; Ian A. Kash ; Rafael M. Frongillo
【Abstract】: Recent work shows that we can use partial verification instead of money to implement truthful mechanisms. In this paper we develop tools to answer the following question. Given an allocation rule that can be made truthful with payments, what is the minimal verification needed to make it truthful without them? Our techniques leverage the geometric relationship between the type space and the set of possible allocations.
【Keywords】:
【Paper Link】 【Pages】:1845-1852
【Authors】: Yiling Chen ; Yang Liu ; Juntao Wang
【Abstract】: Wagering mechanisms are one-shot betting mechanisms that elicit agents’ predictions of an event. For deterministic wagering mechanisms, an existing impossibility result has shown incompatibility of some desirable theoretical properties. In particular, Pareto optimality (no profitable side bet before allocation) can not be achieved together with weak incentive compatibility, weak budget balance and individual rationality. In this paper, we expand the design space of wagering mechanisms to allow randomization and ask whether there are randomized wagering mechanisms that can achieve all previously considered desirable properties, including Pareto optimality. We answer this question positively with two classes of randomized wagering mechanisms: i) one simple randomized lottery-type implementation of existing deterministic wagering mechanisms, and ii) another family of randomized wagering mechanisms, named surrogate wagering mechanisms, which are robust to noisy ground truth. Surrogate wagering mechanisms are inspired by an idea of learning with noisy labels (Natarajan et al. 2013) as well as a recent extension of this idea to the information elicitation without verification setting (Liu and Chen 2018). We show that a broad set of randomized wagering mechanisms satisfy all desirable theoretical properties.
【Keywords】:
【Paper Link】 【Pages】:1853-1860
【Authors】: Vincent Conitzer ; Rupert Freeman ; Nisarg Shah ; Jennifer Wortman Vaughan
【Abstract】: We consider the problem of fairly dividing a collection of indivisible goods among a set of players. Much of the existing literature on fair division focuses on notions of individual fairness. For instance, envy-freeness requires that no player prefer the set of goods allocated to another player to her own allocation. We observe that an algorithm satisfying such individual fairness notions can still treat groups of players unfairly, with one group desiring the goods allocated to another. Our main contribution is a notion of group fairness, which implies most existing notions of individual fairness. Group fairness (like individual fairness) cannot be satisfied exactly with indivisible goods. Thus, we introduce two “up to one good” style relaxations. We show that, somewhat surprisingly, certain local optima of the Nash welfare function satisfy both relaxations and can be computed in pseudo-polynomial time by local search. Our experiments reveal faster computation and stronger fairness guarantees in practice.
【Keywords】:
【Paper Link】 【Pages】:1861-1868
【Authors】: Trevor Davis ; Kevin Waugh ; Michael Bowling
【Abstract】: Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zerosum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.
【Keywords】:
【Paper Link】 【Pages】:1869-1876
【Authors】: Ilias Diakonikolas ; Chrystalla Pavlou
【Abstract】: Weighted voting games are a family of cooperative games, typically used to model voting situations where a number of agents (players) vote against or for a proposal. In such games, a proposal is accepted if an appropriately weighted sum of the votes exceeds a prespecified threshold. As the influence of a player over the voting outcome is not in general proportional to her assigned weight, various power indices have been proposed to measure each player’s influence. The inverse power index problem is the problem of designing a weighted voting game that achieves a set of target influences according to a predefined power index. In this work, we study the computational complexity of the inverse problem when the power index belongs to the class of semivalues. We prove that the inverse problem is computationally intractable for a broad family of semivalues, including all regular semivalues. As a special case of our general result, we establish computational hardness of the inverse problem for the Banzhaf indices and the Shapley values, arguably the most popular power indices.
【Keywords】:
【Paper Link】 【Pages】:1877-1884
【Authors】: John P. Dickerson ; Karthik Abinav Sankararaman ; Aravind Srinivasan ; Pan Xu
【Abstract】: In bipartite matching problems, vertices on one side of a bipartite graph are paired with those on the other. In its online variant, one side of the graph is available offline, while the vertices on the other side arrive online. When a vertex arrives, an irrevocable and immediate decision should be made by the algorithm; either match it to an available vertex or drop it. Examples of such problems include matching workers to firms, advertisers to keywords, organs to patients, and so on. Much of the literature focuses on maximizing the total relevance—modeled via total weight—of the matching. However, in many real-world problems, it is also important to consider contributions of diversity: hiring a diverse pool of candidates, displaying a relevant but diverse set of ads, and so on. In this paper, we propose the Online Submodular Bipartite Matching (OSBM) problem, where the goal is to maximize a submodular function f over the set of matched edges. This objective is general enough to capture the notion of both diversity (e.g., a weighted coverage function) and relevance (e.g., the traditional linear function)—as well as many other natural objective functions occurring in practice (e.g., limited total budget in advertising settings). We propose novel algorithms that have provable guarantees and are essentially optimal when restricted to various special cases. We also run experiments on real-world and synthetic datasets to validate our algorithms.
【Keywords】:
【Paper Link】 【Pages】:1885-1892
【Authors】: Hossein Esfandiari ; Mohammad Taghi Hajiaghayi ; Brendan Lucier ; Michael Mitzenmacher
【Abstract】: We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.
【Keywords】:
【Paper Link】 【Pages】:1893-1900
【Authors】: Brandon Fain ; Ashish Goel ; Kamesh Munagala ; Nina Prabhu
【Abstract】: We study social choice mechanisms in an implicit utilitarian framework with a metric constraint, where the goal is to minimize Distortion, the worst case social cost of an ordinal mechanism relative to underlying cardinal utilities. We consider two additional desiderata: Constant sample complexity and Squared Distortion. Constant sample complexity means that the mechanism (potentially randomized) only uses a constant number of ordinal queries regardless of the number of voters and alternatives. Squared Distortion is a measure of variance of the Distortion of a randomized mechanism.Our primary contribution is the first social choice mechanism with constant sample complexity and constant Squared Distortion (which also implies constant Distortion). We call the mechanism Random Referee, because it uses a random agent to compare two alternatives that are the favorites of two other random agents. We prove that the use of a comparison query is necessary: no mechanism that only elicits the top-k preferred alternatives of voters (for constant k) can have Squared Distortion that is sublinear in the number of alternatives. We also prove that unlike any top-k only mechanism, the Distortion of Random Referee meaningfully improves on benign metric spaces, using the Euclidean plane as a canonical example. Finally, among top-1 only mechanisms, we introduce Random Oligarchy. The mechanism asks just 3 queries and is essentially optimal among the class of such mechanisms with respect to Distortion.In summary, we demonstrate the surprising power of constant sample complexity mechanisms generally, and just three random voters in particular, to provide some of the best known results in the implicit utilitarian framework.
【Keywords】:
【Paper Link】 【Pages】:1901-1908
【Authors】: Piotr Faliszewski ; Pasin Manurangsi ; Krzysztof Sornat
【Abstract】: In the SHIFT-BRIBERY problem we are given an election, a preferred candidate, and the costs of shifting this preferred candidate up the voters’ preference orders. The goal is to find such a set of shifts that ensures that the preferred candidate wins the election. We give the first polynomial-time approximation scheme for the case of positional scoring rules, and for the Copeland rule we show strong inapproximability results.
【Keywords】:
【Paper Link】 【Pages】:1909-1916
【Authors】: Piotr Faliszewski ; Piotr Skowron ; Arkadii Slinko ; Stanislaw Szufa ; Nimrod Talmon
【Abstract】: We introduce the ELECTION ISOMORPHISM problem and a family of its approximate variants, which we refer to as dISOMORPHISM DISTANCE (d-ID) problems (where d is a metric between preference orders). We show that ELECTION ISOMORPHISM is polynomial-time solvable, and that the d-ISOMORPHISM DISTANCE problems generalize various classic rank-aggregation methods (e.g., those of Kemeny and Litvak). We establish the complexity of our problems (including their inapproximability) and provide initial experiments regarding the ability to solve them in practice.
【Keywords】:
【Paper Link】 【Pages】:1917-1925
【Authors】: Gabriele Farina ; Christian Kroer ; Tuomas Sandholm
【Abstract】: Regret minimization is a powerful tool for solving large-scale extensive-form games. State-of-the-art methods rely on minimizing regret locally at each decision point. In this work we derive a new framework for regret minimization on sequential decision problems and extensive-form games with general compact convex sets at each decision point and general convex losses, as opposed to prior work which has been for simplex decision points and linear losses. We call our framework laminar regret decomposition. It generalizes the CFR algorithm to this more general setting. Furthermore, our framework enables a new proof of CFR even in the known setting, which is derived from a perspective of decomposing polytope regret, thereby leading to an arguably simpler interpretation of the algorithm. Our generalization to convex compact sets and convex losses allows us to develop new algorithms for several problems: regularized sequential decision making, regularized Nash equilibria in zero-sum extensive-form games, and computing approximate extensive-form perfect equilibria. Our generalization also leads to the first regret-minimization algorithm for computing reduced-normal-form quantal response equilibria based on minimizing local regrets. Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games. Our algorithms for (quadratically) regularized equilibrium finding are orders of magnitude faster than the fastest algorithms for Nash equilibrium finding; this suggests regret-minimization algorithms based on decreasing regularization for Nash equilibrium finding as future work. Finally we show that our framework enables a new kind of scalable opponent exploitation approach.
【Keywords】:
【Paper Link】 【Pages】:1926-1932
【Authors】: Michail Fasoulakis ; Evangelos Markakis
【Abstract】: We focus on the problem of computing approximate Nash equilibria in bimatrix games. In particular, we consider the notion of approximate well-supported equilibria, which is one of the standard approaches for approximating equilibria. It is already known that one can compute an ε-well-supported Nash equilibrium in time nO (log n/ε2), for any ε > 0, in games with n pure strategies per player. Such a running time is referred to as quasi-polynomial. Regarding faster algorithms, it has remained an open problem for many years if we can have better running times for small values of the approximation parameter, and it is only known that we can compute in polynomial-time a 0.6528-well-supported Nash equilibrium. In this paper, we investigate further this question and propose a much better quasi-polynomial time algorithm that computes a (1/2 + ε)-well-supported Nash equilibrium in time nO(log logn1/ε/ε2), for any ε > 0. Our algorithm is based on appropriately combining sampling arguments, support enumeration, and solutions to systems of linear inequalities.
【Keywords】:
【Paper Link】 【Pages】:1933-1940
【Authors】: Zack Fitzsimmons ; Edith Hemaspaandra ; Alexander Hoover ; David E. Narváez
【Abstract】: It is important to understand how the outcome of an election can be modified by an agent with control over the structure of the election. Electoral control has been studied for many election systems, but for all these systems the winner problem is in P, and so control is in NP. There are election systems, such as Kemeny, that have many desirable properties, but whose winner problems are not in NP. Thus for such systems control is not in NP, and in fact we show that it is typically complete for ∑p2 (i.e., NPNP, the second level of the polynomial hierarchy). This is a very high level of complexity. Approaches that perform quite well for solving NP problems do not necessarily work for ∑p2-complete problems. However, answer set programming is suited to express problems in ∑p2, and we present an encoding for Kemeny control.
【Keywords】:
【Paper Link】 【Pages】:1941-1948
【Authors】: Till Fluschnik ; Piotr Skowron ; Mervin Triphaus ; Kai Wilker
【Abstract】: We study the following multiagent variant of the knapsack problem. We are given a set of items, a set of voters, and a value of the budget; each item is endowed with a cost and each voter assigns to each item a certain value. The goal is to select a subset of items with the total cost not exceeding the budget, in a way that is consistent with the voters’ preferences. Since the preferences of the voters over the items can vary significantly, we need a way of aggregating these preferences, in order to select the socially best valid knapsack. We study three approaches to aggregating voters’ preferences, which are motivated by the literature on multiwinner elections and fair allocation. This way we introduce the concepts of individually best, diverse, and fair knapsack. We study the computational complexity (including parameterized complexity, and complexity under restricted domains) of the aforementioned multiagent variants of knapsack.
【Keywords】:
【Paper Link】 【Pages】:1949-1956
【Authors】: Dimitris Fotakis ; Kyriakos Lotidis ; Chara Podimata
【Abstract】: We study incentive compatible mechanisms for Combinatorial Auctions where the bidders have submodular (or XOS) valuations and are budget-constrained. Our objective is to maximize the liquid welfare, a notion of efficiency for budgetconstrained bidders introduced by Dobzinski and Paes Leme (2014). We show that some of the known truthful mechanisms that best-approximate the social welfare for Combinatorial Auctions with submodular bidders through demand query oracles can be adapted, so that they retain truthfulness and achieve asymptotically the same approximation guarantees for the liquid welfare. More specifically, for the problem of optimizing the liquid welfare in Combinatorial Auctions with submodular bidders, we obtain a universally truthful randomized O(log m)-approximate mechanism, where m is the number of items, by adapting the mechanism of Krysta and Vöcking (2012).Additionally, motivated by large market assumptions often used in mechanism design, we introduce a notion of competitive markets and show that in such markets, liquid welfare can be approximated within a constant factor by a randomized universally truthful mechanism. Finally, in the Bayesian setting, we obtain a truthful O(1)-approximate mechanism for the case where bidder valuations are generated as independent samples from a known distribution, by adapting the results of Feldman, Gravin and Lucier (2014).
【Keywords】:
【Paper Link】 【Pages】:1957-1964
【Authors】: Rupert Freeman ; David M. Pennock ; Jennifer Wortman Vaughan
【Abstract】: We draw a surprising and direct mathematical equivalence between the class of allocation mechanisms for divisible goods studied in the context of fair division and the class of weakly budget-balanced wagering mechanisms designed for eliciting probabilities. The equivalence rests on the intuition that wagering is an allocation of financial securities among bettors, with a bettor’s value for each security proportional to her belief about the likelihood of a future event. The equivalence leads to theoretical advances and new practical approaches for both fair division and wagering. Known wagering mechanisms based on proper scoring rules yield fair allocation mechanisms with desirable properties, including the first strictly incentive compatible fair-division mechanism. At the same time, allocation mechanisms make for novel wagering rules, including one that requires only ordinal uncertainty judgments and one that outperforms existing rules in a range of simulations.
【Keywords】:
【Paper Link】 【Pages】:1965-1972
【Authors】: Eric J. Friedman ; Vasilis Gkatzelis ; Christos-Alexandros Psomas ; Scott Shenker
【Abstract】: A cache memory unit needs to be shared among n strategic agents. Each agent has different preferences over the files to be brought into memory. The goal is to design a mechanism that elicits these preferences in a truthful manner and outputs a fair and efficient memory allocation. A trivially truthful and fair solution would isolate each agent to a 1/n fraction of the memory. However, this could be very inefficient if the agents have similar preferences and, thus, there is room for cooperation. On the other hand, if the agents are not isolated, unless the mechanism is carefully designed, they have incentives to misreport their preferences and free ride on the files that others bring into memory. In this paper we explore the power and limitations of truthful mechanisms in this setting. We demonstrate that mechanisms blocking agents from accessing parts of the memory can achieve improved efficiency guarantees, despite the inherent inefficiencies of blocking.
【Keywords】:
【Paper Link】 【Pages】:1973-1980
【Authors】: Matthias Gerstgrasser ; Paul W. Goldberg ; Bart de Keijzer ; Philip Lazos ; Alexander Skopalik
【Abstract】: We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular.Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1 − e) approximation mechanism, matching the best known bounds for the single-item setting.
【Keywords】:
【Paper Link】 【Pages】:1981-1988
【Authors】: Mohammad Ghodsi ; Mohamad Latifian ; Masoud Seddighin
【Abstract】: In Spatial Voting Theory, distortion is a measure of how good the winner is. It is proved that no deterministic voting mechanism can guarantee a distortion better than 3, even for simple metrics such as a line. In this study, we wish to answer the following question: how does the distortion value change if we allow less motivated agents to abstain from the election?We consider an election with two candidates and suggest an abstention model, which is a more general form of the abstention model proposed by Kirchgässner (2003). We define the¨ concepts of the expected winner and the expected distortion to evaluate the distortion of an election in our model. Our results fully characterize the distortion value and provide a rather complete picture of the model.
【Keywords】:
【Paper Link】 【Pages】:1989-1995
【Authors】: Gagan Goel ; Vahab S. Mirrokni ; Renato Paes Leme
【Abstract】: We consider auction settings in which agents have limited access to monetary resources but are able to make payments larger than their available resources by taking loans with a certain interest rate. This setting is a strict generalization of budget constrained utility functions (which corresponds to infinite interest rates). Our main result is an incentive compatible and Pareto-efficient auction for a divisible multi-unit setting with 2 players who are able to borrow money with the same interest rate. The auction is an ascending price clock auction that bears some similarities to the clinching auction but at the same time is a considerable departure from this framework: allocated goods can be de-allocated in future and given to other agents and prices for previously allocated goods can be raised.
【Keywords】:
【Paper Link】 【Pages】:1996-2003
【Authors】: Naman Goel ; Boi Faltings
【Abstract】: An important class of game-theoretic incentive mechanisms for eliciting effort from a crowd are the peer based mechanisms, in which workers are paid by matching their answers with one another. The other classic mechanism is to have the workers solve some gold standard tasks and pay them according to their accuracy on gold tasks. This mechanism ensures stronger incentive compatibility than the peer based mechanisms but assigning gold tasks to all workers becomes inefficient at large scale. We propose a novel mechanism that assigns gold tasks to only a few workers and exploits transitivity to derive accuracy of the rest of the workers from their peers’ accuracy. We show that the resulting mechanism ensures a dominant notion of incentive compatibility and fairness.
【Keywords】:
【Paper Link】 【Pages】:2004-2011
【Authors】: Sreenivas Gollapudi ; Kostas Kollias ; Debmalya Panigrahi
【Abstract】: In recent years, a range of online applications have facilitated resource sharing among users, resulting in a significant increase in resource utilization. In all such applications, sharing one’s resources or skills with other agents increases social welfare. In general, each agent will look for other agents whose available resources complement hers, thereby forming natural sharing groups. In this paper, we study settings where a large population self-organizes into sharing groups. In many cases, centralized optimization approaches for creating an optimal partition of the user population are infeasible because either the central authority does not have the necessary information to compute an optimal partition, or it does not have the power to enforce a partition. Instead, the central authority puts in place an incentive structure in the form of a utility sharing method, before letting the participants form the sharing groups by themselves. We first analyze a simple equal-sharing method, which is the one most typically encountered in practice and show that it can lead to highly inefficient equilibria. We then propose a Shapley-sharing method and show that it significantly improves overall social welfare.
【Keywords】:
【Paper Link】 【Pages】:2012-2019
【Authors】: Joachim Gudmundsson ; Sampson Wong
【Abstract】: The yolk is an important concept in spatial voting games: the yolk center generalises the equilibrium and the yolk radius bounds the uncovered set. We present near-linear time algorithms for computing the yolk in the plane. To the best of our knowledge our algorithm is the first that does not precompute median lines, and hence is able to break the best known upper bound of O(n4/3) on the number of limiting median lines. We avoid this requirement by carefully applying Megiddo’s parametric search technique, which is a powerful framework that could lead to faster algorithms for other spatial voting problems.
【Keywords】:
【Paper Link】 【Pages】:2020-2028
【Authors】: Qingyu Guo ; Jiarui Gan ; Fei Fang ; Long Tran-Thanh ; Milind Tambe ; Bo An
【Abstract】: Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader’s payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiments unveil the empirical advantage of the ISE over the SSE.
【Keywords】:
【Paper Link】 【Pages】:2029-2036
【Authors】: Karel Horák ; Branislav Bosanský
【Abstract】: In many real-world problems, there is a dynamic interaction between competitive agents. Partially observable stochastic games (POSGs) are among the most general formal models that capture such dynamic scenarios. The model captures stochastic events, partial information of players about the environment, and the scenario does not have a fixed horizon. Solving POSGs in the most general setting is intractable.Therefore, the research has been focused on subclasses of POSGs that have a value of the game and admit designing (approximate) optimal algorithms. We propose such a subclass for two-player zero-sum games with discounted-sum objective function—POSGs with public observations (POPOSGs)—where each player is able to reconstruct beliefs of the other player over the unobserved states. Our results include: (1) theoretical analysis of PO-POSGs and their value functions showing convexity (concavity) in beliefs of maximizing (minimizing) player, (2) a novel algorithm for approximating the value of the game, and (3) a practical demonstration of scalability of our algorithm. Experimental results show that our algorithm can closely approximate the value of non-trivial games with hundreds of states.
【Keywords】:
【Paper Link】 【Pages】:2037-2044
【Authors】: Sen Huang ; Mingyu Xiao
【Abstract】: The HOUSING MARKET problem is a widely studied resources allocation problem. In this problem, each agent can only receive a single object and has preferences over all objects. Starting from an initial endowment, we want to reach a certain assignment via a sequence of rational trades. We consider the problem whether an object is reachable for a given agent under a social network, where a trade between two agents is allowed if they are neighbors in the network and no participant has a deficit from the trade. Assume that the preferences of the agents are strict (no tie is allowed). This problem is polynomially solvable in a star-network and NPcomplete in a tree-network. It is left as a challenging open problem whether the problem is polynomially solvable when the network is a path. We answer this open problem positively by giving a polynomial-time algorithm. Furthermore, we show that the problem on a path will become NP-hard when the preferences of the agents are weak (ties are allowed).
【Keywords】:
【Paper Link】 【Pages】:2045-2052
【Authors】: Ayumi Igarashi ; Dominik Peters
【Abstract】: We study the problem of allocating indivisible items to agents with additive valuations, under the additional constraint that bundles must be connected in an underlying item graph. Previous work has considered the existence and complexity of fair allocations. We study the problem of finding an allocation that is Pareto-optimal. While it is easy to find an efficient allocation when the underlying graph is a path or a star, the problem is NP-hard for many other graph topologies, even for trees of bounded pathwidth or of maximum degree 3. We show that on a path, there are instances where no Pareto-optimal allocation satisfies envy-freeness up to one good, and that it is NP-hard to decide whether such an allocation exists, even for binary valuations. We also show that, for a path, it is NP-hard to find a Pareto-optimal allocation that satisfies maximin share, but show that a moving-knife algorithm can find such an allocation when agents have binary valuations that have a non-nested interval structure.
【Keywords】:
【Paper Link】 【Pages】:2053-2060
【Authors】: Ayumi Igarashi ; Jakub Sliwinski ; Yair Zick
【Abstract】: A community needs to be partitioned into disjoint groups; each community member has an underlying preference over the groups that they would want to be a member of. We are interested in finding a stable community structure: one where no subset of members S wants to deviate from the current structure. We model this setting as a hedonic game, where players are connected by an underlying interaction network, and can only consider joining groups that are connected subgraphs of the underlying graph. We analyze the relation between network structure, and one’s capability to infer statistically stable (also known as PAC stable) player partitions from data. We show that when the interaction network is a forest, one can efficiently infer PAC stable coalition structures. Furthermore, when the underlying interaction graph is not a forest, efficient PAC stabilizability is no longer achievable. Thus, our results completely characterize when one can leverage the underlying graph structure in order to compute PAC stable outcomes for hedonic games. Finally, given an unknown underlying interaction network, we show that it is NP-hard to decide whether there exists a forest consistent with data samples from the network.
【Keywords】:
【Paper Link】 【Pages】:2061-2068
【Authors】: Batya Kenig ; Benny Kimelfeld
【Abstract】: We study the complexity of estimating the probability of an outcome in an election over probabilistic votes. The focus is on voting rules expressed as positional scoring rules, and two models of probabilistic voters: the uniform distribution over the completions of a partial voting profile (consisting of a partial ordering of the candidates by each voter), and the Repeated Insertion Model (RIM) over the candidates, including the special case of the Mallows distribution. Past research has established that, while exact inference of the probability of winning is computationally hard (#P-hard), an additive polynomial-time approximation (additive FPRAS) is attained by sampling and averaging. There is often, though, a need for multiplicative approximation guarantees that are crucial for important measures such as conditional probabilities. Unfortunately, a multiplicative approximation of the probability of winning cannot be efficient (under conventional complexity assumptions) since it is already NP-complete to determine whether this probability is nonzero. Contrastingly, we devise multiplicative polynomial-time approximations (multiplicative FPRAS) for the probability of the complement event, namely, losing the election.
【Keywords】:
【Paper Link】 【Pages】:2069-2076
【Authors】: Omer Lev ; Yoad Lewenberg
【Abstract】: District-based manipulation, or gerrymandering, is usually taken to refer to agents who are in fixed location, and an external division is imposed upon them. However, in many real-world setting, there is an external, fixed division – an organizational chart of a company, or markets for a particular product. In these cases, agents may wish to move around (“reverse gerrymandering”), as each of them tries to maximize their influence across the company’s subunits, or resources are “working” to be allocated to areas where they will be most needed.In this paper we explore an iterative dynamic in this setting, finding that allowing this decentralized system results, in some particular cases, in a stable equilibrium, though in general, the setting may end up in a cycle. We further examine how this decentralized process affects the social welfare of the system.
【Keywords】:
【Paper Link】 【Pages】:2077-2084
【Authors】: Omer Lev ; Reshef Meir ; Svetlana Obraztsova ; Maria Polukarov
【Abstract】: Decision making under uncertainty is a key component of many AI settings, and in particular of voting scenarios where strategic agents are trying to reach a joint decision. The common approach to handle uncertainty is by maximizing expected utility, which requires a cardinal utility function as well as detailed probabilistic information. However, often such probabilities are not easy to estimate or apply.To this end, we present a framework that allows for “shades of gray” of likelihood without probabilities. Specifically, we create a hierarchy of sets of world states based on a prospective poll, with inner sets contain more likely outcomes. This hierarchy of likelihoods allows us to define what we term ordinally-dominated strategies. We use this approach to justify various known voting heuristics as bounded-rational strategies.
【Keywords】:
【Paper Link】 【Pages】:2085-2092
【Abstract】: Enforcing cooperation among substantial agents is one of the main objectives for multi-agent systems. However, due to the existence of inherent social dilemmas in many scenarios, the free-rider problem may arise during agents’ long-run interactions and things become even severer when self-interested agents work in collusion with each other to get extra benefits. It is commonly accepted that in such social dilemmas, there exists no simple strategy for an agent whereby she can simultaneously manipulate on the utility of each of her opponents and further promote mutual cooperation among all agents. Here, we show that such strategies do exist. Under the conventional repeated public goods game, we novelly identify them and find that, when confronted with such strategies, a single opponent can maximize his utility only via global cooperation and any colluding alliance cannot get the upper hand. Since a full cooperation is individually optimal for any single opponent, a stable cooperation among all players can be achieved. Moreover, we experimentally show that these strategies can still promote cooperation even when the opponents are both self-learning and collusive.
【Keywords】:
【Paper Link】 【Pages】:2093-2100
【Authors】: Zhuoshu Li ; Sanmay Das
【Abstract】: We consider the problem of designing the information environment for revenue maximization in a sealed-bid second price auction with two bidders. Much of the prior literature has focused on signal design in settings where bidders are symmetrically informed, or on the design of optimal mechanisms under fixed information structures. We study commonand interdependent-value settings where the mechanism is fixed (a second-price auction), but the auctioneer controls the signal structure for bidders. We show that in a standard common-value auction setting, there is no benefit to the auctioneer in terms of expected revenue from sharing information with the bidders, although there are effects on the distribution of revenues. In an interdependent-value model with mixed private- and common-value components, however, we show that asymmetric, information-revealing signals can increase revenue.
【Keywords】:
【Paper Link】 【Pages】:2101-2108
【Authors】: Ilan Lobel ; Renato Paes Leme
【Abstract】: We consider a firm that sells products that arrive over time to a buyer. We study this problem under a notion we call positive commitment, where the seller is allowed to make binding positive promises to the buyer about items arriving in the future, but is not allowed to commit not to make further offers to the buyer in the future. We model this problem as a dynamic game where the seller chooses a mechanism at each period subject to a sequential rationality constraint, and characterize the perfect Bayesian equilibrium of this dynamic game. We prove the equilibrium is efficient and that the seller’s revenue is a function of the buyer’s ex ante utility under a no commitment model. In particular, all goods are sold in advance to the buyer at what we call the positive commitment price.
【Keywords】:
【Paper Link】 【Pages】:2109-2116
【Authors】: Pasin Manurangsi ; Warut Suksompong
【Abstract】: We consider a fair division setting in which m indivisible items are to be allocated among n agents, where the agents have additive utilities and the agents’ utilities for individual items are independently sampled from a distribution. Previous work has shown that an envy-free allocation is likely to exist when m = Ω (n log n) but not when m = n + o (n), and left open the question of determining where the phase transition from non-existence to existence occurs. We show that, surprisingly, there is in fact no universal point of transition— instead, the transition is governed by the divisibility relation between m and n. On the one hand, if m is divisible by n, an envy-free allocation exists with high probability as long as m ≥ 2n. On the other hand, if m is not “almost” divisible by , an envy-free allocation is unlikely to exist even when m = Θ(n log n)/log log n).
【Keywords】:
【Paper Link】 【Pages】:2117-2124
【Authors】: Alberto Marchesi ; Gabriele Farina ; Christian Kroer ; Nicola Gatti ; Tuomas Sandholm
【Abstract】: Equilibrium refinements are important in extensive-form (i.e., tree-form) games, where they amend weaknesses of the Nash equilibrium concept by requiring sequential rationality and other beneficial properties. One of the most attractive refinement concepts is quasi-perfect equilibrium. While quasiperfection has been studied in extensive-form games, it is poorly understood in Stackelberg settings—that is, settings where a leader can commit to a strategy—which are important for modeling, for example, security games. In this paper, we introduce the axiomatic definition of quasi-perfect Stackelberg equilibrium. We develop a broad class of game perturbation schemes that lead to them in the limit. Our class of perturbation schemes strictly generalizes prior perturbation schemes introduced for the computation of (non-Stackelberg) quasi-perfect equilibria. Based on our perturbation schemes, we develop a branch-and-bound algorithm for computing a quasi-perfect Stackelberg equilibrium. It leverages a perturbed variant of the linear program for computing a Stackelberg extensive-form correlated equilibrium. Experiments show that our algorithm can be used to find an approximate quasi-perfect Stackelberg equilibrium in games with thousands of nodes.
【Keywords】:
【Paper Link】 【Pages】:2125-2132
【Authors】: Vahab S. Mirrokni ; Renato Paes Leme ; Pingzhong Tang ; Song Zuo
【Abstract】: We are interested in the setting where a seller sells sequentially arriving items, one per period, via a dynamic auction. At the beginning of each period, each buyer draws a private valuation for the item to be sold in that period and this valuation is independent across buyers and periods. The auction can be dynamic in the sense that the auction at period t can be conditional on the bids in that period and all previous periods, subject to certain appropriately defined incentive compatible and individually rational conditions. Perhaps not surprisingly, the revenue optimal dynamic auctions are computationally hard to find and existing literatures that aim to approximate the optimal auctions are all based on solving complex dynamic programs. It remains largely open on the structural interpretability of the optimal dynamic auctions. In this paper, we show that any optimal dynamic auction is a virtual welfare maximizer subject to some monotone allocation constraints. In particular, the explicit definition of the virtual value function above arises naturally from the primal-dual analysis by relaxing the monotone constraints. We further develop an ironing technique that gets rid of the monotone allocation constraints. Quite different from Myerson’s ironing approach, our technique is more technically involved due to the interdependence of the virtual value functions across buyers. We nevertheless show that ironing can be done approximately and efficiently, which in turn leads to a Fully Polynomial Time Approximation Scheme of the optimal dynamic auction.
【Keywords】:
【Paper Link】 【Pages】:2133-2140
【Authors】: Thanh Hong Nguyen ; Yongzhao Wang ; Arunesh Sinha ; Michael P. Wellman
【Abstract】: Allocating resources to defend targets from attack is often complicated by uncertainty about the attacker’s capabilities, objectives, or other underlying characteristics. In a repeated interaction setting, the defender can collect attack data over time to reduce this uncertainty and learn an effective defense. However, a clever attacker can manipulate the attack data to mislead the defender, influencing the learning process toward its own benefit. We investigate strategic deception on the part of an attacker with private type information, who interacts repeatedly with a defender. We present a detailed computation and analysis of both players’ optimal strategies given the attacker may play deceptively. Computational experiments illuminate conditions conducive to strategic deception, and quantify benefits to the attacker. By taking into account the attacker’s deception capacity, the defender can significantly mitigate loss from misleading attack actions.
【Keywords】:
【Paper Link】 【Pages】:2141-2148
【Authors】: Hoon Oh ; Ariel D. Procaccia ; Warut Suksompong
【Abstract】: We investigate the query complexity of the fair allocation of indivisible goods. For two agents with arbitrary monotonic valuations, we design an algorithm that computes an allocation satisfying envy-freeness up to one good (EF1), a relaxation of envy-freeness, using a logarithmic number of queries. We show that the logarithmic query complexity bound also holds for three agents with additive valuations. These results suggest that it is possible to fairly allocate goods in practice even when the number of goods is extremely large. By contrast, we prove that computing an allocation satisfying envyfreeness and another of its relaxations, envy-freeness up to any good (EFX), requires a linear number of queries even when there are only two agents with identical additive valuations.
【Keywords】:
【Paper Link】 【Pages】:2149-2156
【Authors】: Binghui Peng ; Weiran Shen ; Pingzhong Tang ; Song Zuo
【Abstract】: Over the past decades, various theories and algorithms have been developed under the framework of Stackelberg games and part of these innovations have been fielded under the scenarios of national security defenses and wildlife protections. However, one of the remaining difficulties in the literature is that most of theoretical works assume full information of the payoff matrices, while in applications, the leader often has no prior knowledge about the follower’s payoff matrix, but may gain information about the follower’s utility function through repeated interactions. In this paper, we study the problem of learning the optimal leader strategy in Stackelberg (security) games and develop novel algorithms as well as new hardness results.
【Keywords】:
【Paper Link】 【Pages】:2157-2164
【Authors】: Martin Schmid ; Neil Burch ; Marc Lanctot ; Matej Moravcik ; Rudolf Kadlec ; Michael Bowling
【Abstract】: Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, periteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude.
【Keywords】:
【Paper Link】 【Pages】:2165-2172
【Authors】: Sujoy Sikdar ; Sibel Adali ; Lirong Xia
【Abstract】: We extend the Top-Trading-Cycles (TTC) mechanism to select strict core allocations for housing markets with multiple types of items, where each agent may be endowed and allocated with multiple items of each type. In doing so, we advance the state of the art in mechanism design for housing markets along two dimensions: First, our setting is more general than multi-type housing markets (Moulin 1995; Sikdar, Adali, and Xia 2017) and the setting of Fujita et al. (2015). Further, we introduce housing markets with acceptable bundles (HMABs) as a more general setting where each agent may have arbitrary sets of acceptable bundles. Second, our extension of TTC is strict core selecting under the weaker restriction on preferences of CMI-trees, which we introduce as a new domain restriction on preferences that generalizes commonly-studied languages in previous works.
【Keywords】:
【Paper Link】 【Pages】:2173-2180
【Authors】: Samuel Sokota ; Caleb Ho ; Bryce Wiedenbeck
【Abstract】: We present a novel approach for identifying approximate role-symmetric Nash equilibria in large simulation-based games. Our method uses neural networks to learn a mapping from mixed-strategy profiles to deviation payoffs—the expected values of playing pure-strategy deviations from those profiles. This learning can generalize from data about a tiny fraction of a game’s outcomes, permitting tractable analysis of exponentially large normal-form games. We give a procedure for iteratively refining the learned model with new data produced by sampling in the neighborhood of each candidate Nash equilibrium. Relative to the existing state of the art, deviation payoff learning dramatically simplifies the task of computing equilibria and more effectively addresses player asymmetries. We demonstrate empirically that deviation payoff learning identifies better approximate equilibria than previous methods and can handle more difficult settings, including games with many more players, strategies, and roles.
【Keywords】:
【Paper Link】 【Pages】:2181-2188
【Authors】: Nimrod Talmon ; Piotr Faliszewski
【Abstract】: We define and study a general framework for approval-based budgeting methods and compare certain methods within this framework by their axiomatic and computational properties. Furthermore, we visualize their behavior on certain Euclidean distributions and analyze them experimentally.
【Keywords】:
【Paper Link】 【Pages】:2189-2196
【Authors】: Jun Wang ; Sujoy Sikdar ; Tyler Shepherd ; Zhibing Zhao ; Chunheng Jiang ; Lirong Xia
【Abstract】: STV and ranked pairs (RP) are two well-studied voting rules for group decision-making. They proceed in multiple rounds, and are affected by how ties are broken in each round. However, the literature is surprisingly vague about how ties should be broken. We propose the first algorithms for computing the set of alternatives that are winners under some tiebreaking mechanism under STV and RP, which is also known as parallel-universes tiebreaking (PUT). Unfortunately, PUT-winners are NP-complete to compute under STV and RP, and standard search algorithms from AI do not apply. We propose multiple DFS-based algorithms along with pruning strategies, heuristics, sampling and machine learning to prioritize search direction to significantly improve the performance. We also propose novel ILP formulations for PUT-winners under STV and RP, respectively. Experiments on synthetic and realworld data show that our algorithms are overall faster than ILP.
【Keywords】:
【Paper Link】 【Pages】:2197-2204
【Authors】: Tomasz Was ; Talal Rahwan ; Oskar Skibski
【Abstract】: We propose a new centrality measure, called the Random Walk Decay centrality. While most centralities in the literature are based on the notion of shortest paths, this new centrality measure stems from the random walk on the network. We provide an axiomatic characterization and show that the new centrality is closely related to PageRank. More in detail, we show that replacing only one axiom, called Lack of Self-Impact, with another one, called Edge Swap, results in the new axiomatization of PageRank. Finally, we argue that Lack of Self-Impact is desirable in various settings and explain why violating Edge Swap may be beneficial and may contribute to promoting diversity in the centrality measure.
【Keywords】:
【Paper Link】 【Pages】:2205-2212
【Authors】: Anaëlle Wilczynski
【Abstract】: This article deals with strategic voting under incomplete information. We propose a descriptive model, inspired by political elections, where the information about the vote intentions of the electorate comes from public opinion polls and a social network, modeled as a graph over the voters. The voters are assumed to be confident in the poll and they update the communicated results with the information they get from their relatives in the social network. We consider an iterative voting model based on this behavior and study the associated “poll-confident” dynamics. In this context, we ask the question of manipulation by the polling institute.
【Keywords】:
【Paper Link】 【Pages】:2213-2220
【Authors】: Bryan Wilder ; Yevgeniy Vorobeychik
【Abstract】: The integrity of democratic elections depends on voters’ access to accurate information. However, modern media environments, which are dominated by social media, provide malicious actors with unprecedented ability to manipulate elections via misinformation, such as fake news. We study a zerosum game between an attacker, who attempts to subvert an election by propagating a fake new story or other misinformation over a set of advertising channels, and a defender who attempts to limit the attacker’s impact. Computing an equilibrium in this game is challenging as even the pure strategy sets of players are exponential. Nevertheless, we give provable polynomial-time approximation algorithms for computing the defender’s minimax optimal strategy across a range of settings, encompassing different population structures as well as models of the information available to each player. Experimental results confirm that our algorithms provide nearoptimal defender strategies and showcase variations in the difficulty of defending elections depending on the resources and knowledge available to the defender.
【Keywords】:
【Paper Link】 【Pages】:2221-2228
【Authors】: Pan Xu ; Yexuan Shi ; Hao Cheng ; John P. Dickerson ; Karthik Abinav Sankararaman ; Aravind Srinivasan ; Yongxin Tong ; Leonidas Tsepenekas
【Abstract】: Online bipartite matching and allocation models are widely used to analyze and design markets such as Internet advertising, online labor, and crowdsourcing. Traditionally, vertices on one side of the market are fixed and known a priori, while vertices on the other side arrive online and are matched by a central agent to the offline side. The issue of possible conflicts among offline agents emerges in various real scenarios when we need to match each online agent with a set of offline agents.For example, in event-based social networks (e.g., Meetup), offline events conflict for some users since they will be unable to attend mutually-distant events at proximate times; in advertising markets, two competing firms may prefer not to be shown to one user simultaneously; and in online recommendation systems (e.g., Amazon Books), books of the same type “conflict” with each other in some sense due to the diversity requirement for each online buyer.The conflict nature inherent among certain offline agents raises significant challenges in both modeling and online algorithm design. In this paper, we propose a unifying model, generalizing the conflict models proposed in (She et al., TKDE 2016) and (Chen et al., TKDE 16). Our model can capture not only a broad class of conflict constraints on the offline side (which is even allowed to be sensitive to each online agent), but also allows a general arrival pattern for the online side (which is allowed to change over the online phase). We propose an efficient linear programming (LP) based online algorithm and prove theoretically that it has nearly-optimal online performance. Additionally, we propose two LP-based heuristics and test them against two natural baselines on both real and synthetic datasets. Our LP-based heuristics experimentally dominate the baseline algorithms, aligning with our theoretical predictions and supporting our unified approach.
【Keywords】:
【Paper Link】 【Pages】:2229-2236
【Authors】: Hanrui Zhang ; Yu Cheng ; Vincent Conitzer
【Abstract】: In the societal tradeoffs problem, each agent perceives certain quantitative tradeoffs between pairs of activities, and the goal is to aggregate these tradeoffs across agents. This is a problem in social choice; specifically, it is a type of quantitative judgment aggregation problem. A natural rule for this problem was axiomatized by Conitzer et al. [AAAI 2016]; they also provided several algorithms for computing the outcomes of this rule. In this paper, we present a significantly improved algorithm and evaluate it experimentally. Our algorithm is based on a tight connection to minimum-cost flow that we exhibit. We also show that our algorithm cannot be improved without breakthroughs on min-cost flow.
【Keywords】:
【Paper Link】 【Pages】:2237-2244
【Authors】: Hanrui Zhang ; Vincent Conitzer
【Abstract】: Specifying the objective function that an AI system should pursue can be challenging. Especially when the decisions to be made by the system have a moral component, input from multiple stakeholders is often required. We consider approaches that query them about their judgments in individual examples, and then aggregate these judgments into a general policy. We propose a formal learning-theoretic framework for this setting. We then give general results on how to translate classical results from PAC learning into results in our framework. Subsequently, we show that in some settings, better results can be obtained by working directly in our framework. Finally, we discuss how our model can be extended in a variety of ways for future research.
【Keywords】:
【Paper Link】 【Pages】:2245-2252
【Authors】: Boming Zhao ; Pan Xu ; Yexuan Shi ; Yongxin Tong ; Zimu Zhou ; Yuxiang Zeng
【Abstract】: A central issue in on-demand taxi dispatching platforms is task assignment, which designs matching policies among dynamically arrived drivers (workers) and passengers (tasks). Previous matching policies maximize the profit of the platform without considering the preferences of workers and tasks (e.g., workers may prefer high-rewarding tasks while tasks may prefer nearby workers). Such ignorance of preferences impairs user experience and will decrease the profit of the platform in the long run. To address this problem, we propose preference-aware task assignment using online stable matching. Specifically, we define a new model, Online Stable Matching under Known Identical Independent Distributions (OSM-KIID). It not only maximizes the expected total profits (OBJ-1), but also tries to satisfy the preferences among workers and tasks by minimizing the expected total number of blocking pairs (OBJ-2). The model also features a practical arrival assumption validated on real-world dataset. Furthermore, we present a linear program based online algorithm LP-ALG, which achieves an online ratio of at least 1−1/e on OBJ-1 and has at most 0.6·|E| blocking pairs expectedly, where |E| is the total number of edges in the compatible graph. We also show that a natural Greedy can have an arbitrarily bad performance on OBJ-1 while maintaining around 0.5·|E| blocking pairs. Evaluations on both synthetic and real datasets confirm our theoretical analysis and demonstrate that LP-ALG strictly dominates all the baselines on both objectives when tasks notably outnumber workers.
【Keywords】:
【Paper Link】 【Pages】:2253-2260
【Authors】: Tianhang Zheng ; Changyou Chen ; Kui Ren
【Abstract】: Recent work on adversarial attack has shown that Projected Gradient Descent (PGD) Adversary is a universal first-order adversary, and the classifier adversarially trained by PGD is robust against a wide range of first-order attacks. It is worth noting that the original objective of an attack/defense model relies on a data distribution p(x), typically in the form of risk maximization/minimization, e.g., max/min Ep(x) L(x) with p(x) some unknown data distribution and L(·) a loss function. However, since PGD generates attack samples independently for each data sample based on L(·), the procedure does not necessarily lead to good generalization in terms of risk optimization. In this paper, we achieve the goal by proposing distributionally adversarial attack (DAA), a framework to solve an optimal adversarial-data distribution, a perturbed distribution that satisfies the L∞ constraint but deviates from the original data distribution to increase the generalization risk maximally. Algorithmically, DAA performs optimization on the space of potential data distributions, which introduces direct dependency between all data points when generating adversarial samples. DAA is evaluated by attacking state-of-the-art defense models, including the adversarially-trained models provided by MIT MadryLab. Notably, DAA ranks the first place on MadryLab’s white-box leaderboards, reducing the accuracy of their secret MNIST model to 88.56% (with l∞ perturbations of ε = 0.3) and the accuracy of their secret CIFAR model to 44.71% (with l∞ perturbations of ε = 8.0). Code for the experiments is released on https://github.com/tianzheng4/Distributionally-Adversarial-Attack.
【Keywords】:
【Paper Link】 【Pages】:2262-2271
【Authors】: Junwen Ding ; Zhipeng Lü ; Chu-Min Li ; Liji Shen ; Liping Xu ; Fred W. Glover
【Abstract】: Population-based evolutionary algorithms usually manage a large number of individuals to maintain the diversity of the search, which is complex and time-consuming. In this paper, we propose an evolutionary algorithm using only two individuals, called master-apprentice evolutionary algorithm (MAE), for solving the flexible job shop scheduling problem (FJSP). To ensure the diversity and the quality of the evolution, MAE integrates a tabu search procedure, a recombination operator based on path relinking using a novel distance definition, and an effective individual updating strategy, taking into account the multiple complex constraints of FJSP. Experiments on 313 widely-used public instances show that MAE improves the previous best known results for 47 instances and matches the best known results on all except 3 of the remaining instances while consuming the same computational time as current state-of-the-art metaheuristics. MAE additionally establishes solution quality records for 10 hard instances whose previous best values were established by a well-known industrial solver and a state-of-the-art exact method.
【Keywords】:
【Paper Link】 【Pages】:2272-2279
【Authors】: Tobias Friedrich ; Andreas Göbel ; Frank Neumann ; Francesco Quinzan ; Ralf Rothenberger
【Abstract】: We investigate the performance of a deterministic GREEDY algorithm for the problem of maximizing functions under a partition matroid constraint. We consider non-monotone submodular functions and monotone subadditive functions. Even though constrained maximization problems of monotone submodular functions have been extensively studied, little is known about greedy maximization of non-monotone submodular functions or monotone subadditive functions. We give approximation guarantees for GREEDY on these problems, in terms of the curvature. We find that this simple heuristic yields a strong approximation guarantee on a broad class of functions. We discuss the applicability of our results to three real-world problems: Maximizing the determinant function of a positive semidefinite matrix, and related problems such as the maximum entropy sampling problem, the constrained maximum cut problem on directed graphs, and combinatorial auction games. We conclude that GREEDY is well-suited to approach these problems. Overall, we present evidence to support the idea that, when dealing with constrained maximization problems with bounded curvature, one needs not search for (approximate) monotonicity to get good approximate solutions.
【Keywords】:
【Paper Link】 【Pages】:2280-2287
【Authors】: Baokun He ; Swair Shah ; Crystal Maung ; Gordon Arnold ; Guihong Wan ; Haim Schweitzer
【Abstract】: The following are two classical approaches to dimensionality reduction: 1. Approximating the data with a small number of features that exist in the data (feature selection). 2. Approximating the data with a small number of arbitrary features (feature extraction). We study a generalization that approximates the data with both selected and extracted features. We show that an optimal solution to this hybrid problem involves a combinatorial search, and cannot be trivially obtained even if one can solve optimally the separate problems of selection and extraction. Our approach that gives optimal and approximate solutions uses a “best first” heuristic search. The algorithm comes with both an a priori and an a posteriori optimality guarantee similar to those that can be obtained for the classical weighted A* algorithm. Experimental results show the effectiveness of the proposed approach.
【Keywords】:
【Paper Link】 【Pages】:2288-2295
【Authors】: Robert C. Holte ; Sandra Zilles
【Abstract】: Edelkamp et al. (2005) proved that A, given an admissible heuristic, is guaranteed to return an optimal solution in any cost algebra, not just in the traditional shortest path setting. In this paper, we investigate cost-algebraic A’s optimal efficiency: in the cost-algebraic setting, under what conditions is A guaranteed to expand the fewest possible states? In the traditional setting, this question was examined in detail by Dechter & Pearl (1985). They identified five different situations in which A was optimally efficient. We show that three of them continue to hold in the cost-algebraic setting, but that one does not. We also show that one of them is false, it does not hold even in the traditional setting. We introduce an alternative that does hold in the cost-algebraic setting. Finally, we show that a well-known result due to Nilsson does not hold in the general cost-algebraic setting but does hold in a slightly less general setting.
【Keywords】:
【Paper Link】 【Pages】:2296-2303
【Authors】: Zhengxin Huang ; Yuren Zhou ; Zefeng Chen ; Xiaoyu He
【Abstract】: Decomposition-based multiobjective evolutionary algorithms (MOEAs) are a class of popular methods for solving multiobjective optimization problems (MOPs), and have been widely studied in numerical experiments and successfully applied in practice. However, we know little about these algorithms from the theoretical aspect. In this paper, we present a running time analysis of a simple MOEA with crossover based on the MOEA/D framework (MOEA/D-C) on four discrete optimization problems. Our rigorous theoretical analysis shows that the MOEA/D-C can obtain a set of Pareto optimal solutions to cover the Pareto front of these problems in expected running time apparently lower than the one without crossover. Moreover, the MOEA/D-C only needs to decompose an MOP into a few scalar optimization subproblems according to several simple weight vectors. This result suggests that the use of crossover in decomposition-based MOEA can simplify the setting of weight vector for different problems and make the algorithm more efficient. This study theoretically explains why some decomposition-based MOEAs work well in computational experiments and provides insights in design of MOEAs for MOPs in future research.
【Keywords】:
【Paper Link】 【Pages】:2304-2313
【Authors】: Ken Kobayashi ; Naoki Hamada ; Akiyoshi Sannai ; Akinori Tanaka ; Kenichi Bannai ; Masashi Sugiyama
【Abstract】: Multi-objective optimization problems require simultaneously optimizing two or more objective functions. Many studies have reported that the solution set of an M-objective optimization problem often forms an (M − 1)-dimensional topological simplex (a curved line for M = 2, a curved triangle for M = 3, a curved tetrahedron for M = 4, etc.). Since the dimensionality of the solution set increases as the number of objectives grows, an exponentially large sample size is needed to cover the solution set. To reduce the required sample size, this paper proposes a Bézier simplex model and its fitting algorithm. These techniques can exploit the simplex structure of the solution set and decompose a high-dimensional surface fitting task into a sequence of low-dimensional ones. An approximation theorem of Bézier simplices is proven. Numerical experiments with synthetic and real-world optimization problems demonstrate that the proposed method achieves an accurate approximation of high-dimensional solution sets with small samples. In practice, such an approximation will be conducted in the postoptimization process and enable a better trade-off analysis.
【Keywords】:
【Paper Link】 【Pages】:2314-2321
【Authors】: Juho Lauri ; Sourav Dutta
【Abstract】: We propose a simple, powerful, and flexible machine learning framework for (i) reducing the search space of computationally difficult enumeration variants of subset problems and (ii) augmenting existing state-of-the-art solvers with informative cues arising from the input distribution. We instantiate our framework for the problem of listing all maximum cliques in a graph, a central problem in network analysis, data mining, and computational biology. We demonstrate the practicality of our approach on real-world networks with millions of vertices and edges by not only retaining all optimal solutions, but also aggressively pruning the input instance size resulting in several fold speedups of state-of-the-art algorithms. Finally, we explore the limits of scalability and robustness of our proposed framework, suggesting that supervised learning is viable for tackling NP-hard problems in practice.
【Keywords】:
【Paper Link】 【Pages】:2322-2329
【Authors】: Andrei Lissovoi ; Pietro Simone Oliveto ; John Alasdair Warwicker
【Abstract】: Selection hyper-heuristics are automated algorithm selection methodologies that choose between different heuristics during the optimisation process. Recently selection hyperheuristics choosing between a collection of elitist randomised local search heuristics with different neighbourhood sizes have been shown to optimise a standard unimodal benchmark function from evolutionary computation in the optimal expected runtime achievable with the available low-level heuristics. In this paper we extend our understanding to the domain of multimodal optimisation by considering a hyper-heuristic from the literature that can switch between elitist and nonelitist heuristics during the run. We first identify the range of parameters that allow the hyper-heuristic to hillclimb efficiently and prove that it can optimise a standard hillclimbing benchmark function in the best expected asymptotic time achievable by unbiased mutation-based randomised search heuristics. Afterwards, we use standard multimodal benchmark functions to highlight function characteristics where the hyper-heuristic is efficient by swiftly escaping local optima and ones where it is not. For a function class called CLIFFd where a new gradient of increasing fitness can be identified after escaping local optima, the hyper-heuristic is extremely efficient while a wide range of established elitist and non-elitist algorithms are not, including the well-studied Metropolis algorithm. We complete the picture with an analysis of another standard benchmark function called JUMPd as an example to highlight problem characteristics where the hyper-heuristic is inefficient. Yet, it still outperforms the wellestablished non-elitist Metropolis algorithm.
【Keywords】:
【Paper Link】 【Pages】:2330-2337
【Authors】: Julian R. H. Mariño ; Rubens O. Moraes ; Cláudio Toledo ; Levi H. S. Lelis
【Abstract】: A key challenge for planning systems in real-time multiagent domains is to search in large action spaces to decide an agent’s next action. Previous works showed that handcrafted action abstractions allow planning systems to focus their search on a subset of promising actions. In this paper we show that the problem of generating action abstractions can be cast as a problem of selecting a subset of pure strategies from a pool of options. We model the selection of a subset of pure strategies as a two-player game in which the strategy set of the players is the powerset of the pool of options— we call this game the subset selection game. We then present an evolutionary algorithm for solving such a game. Empirical results on small matches of µRTS show that our evolutionary approach is able to converge to a Nash equilibrium for the subset selection game. Also, results on larger matches show that search algorithms using action abstractions derived by our evolutionary approach are able to substantially outperform all state-of-the-art planning systems tested.
【Keywords】:
【Paper Link】 【Pages】:2338-2345
【Authors】: Andrew Mitchell ; Wheeler Ruml ; Fabian Spaniol ; Jörg Hoffmann ; Marek Petrik
【Abstract】: In real-time planning, an agent must select the next action to take within a fixed time bound. Many popular real-time heuristic search methods approach this by expanding nodes using time-limited A* and selecting the action leading toward the frontier node with the lowest f value. In this paper, we reconsider real-time planning as a problem of decision-making under uncertainty. We propose treating heuristic values as uncertain evidence and we explore several backup methods for aggregating this evidence. We then propose a novel lookahead strategy that expands nodes to minimize risk, the expected regret in case a non-optimal action is chosen. We evaluate these methods in a simple synthetic benchmark and the sliding tile puzzle and find that they outperform previous methods. This work illustrates how uncertainty can arise even when solving deterministic planning problems, due to the inherent ignorance of time-limited search algorithms about those portions of the state space that they have not computed, and how an agent can benefit from explicitly metareasoning about this uncertainty.
【Keywords】:
【Paper Link】 【Pages】:2346-2353
【Authors】: Frank Neumann ; Andrew M. Sutton
【Abstract】: We study the ability of a simple mutation-only evolutionary algorithm to solve propositional satisfiability formulas with inherent community structure. We show that the community structure translates to good fitness-distance correlation properties, which implies that the objective function provides a strong signal in the search space for evolutionary algorithms to locate a satisfying assignment efficiently. We prove that when the formula clusters into communities of size s ∈ ω(logn) ∩O(nε/(2ε+2)) for some constant 0
【Keywords】:
【Paper Link】 【Pages】:2354-2361
【Authors】: Vahid Roostapour ; Aneta Neumann ; Frank Neumann ; Tobias Friedrich
【Abstract】: In this paper, we consider the subset selection problem for function f with constraint bound B which changes over time. We point out that adaptive variants of greedy approaches commonly used in the area of submodular optimization are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a φ = (αf/2)(1− α1f )-approximation, where αf is the sube modularity ratio of f, for each possible constraint bound b ≤ B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms.
【Keywords】:
【Paper Link】 【Pages】:2362-2370
【Authors】: Christopher D. Rosin
【Abstract】: Inductive program synthesis, from input/output examples, can provide an opportunity to automatically create programs from scratch without presupposing the algorithmic form of the solution. For induction of general programs with loops (as opposed to loop-free programs, or synthesis for domain-specific languages), the state of the art is at the level of introductory programming assignments. Most problems that require algorithmic subtlety, such as fast sorting, have remained out of reach without the benefit of significant problem-specific background knowledge. A key challenge is to identify cues that are available to guide search towards correct looping programs. We present MAKESPEARE, a simple delayed-acceptance hillclimbing method that synthesizes low-level looping programs from input/output examples. During search, delayed acceptance bypasses small gains to identify significantly-improved stepping stone programs that tend to generalize and enable further progress. The method performs well on a set of established benchmarks, and succeeds on the previously unsolved “Collatz Numbers” program synthesis problem. Additional benchmarks include the problem of rapidly sorting integer arrays, in which we observe the emergence of comb sort (a Shell sort variant that is empirically fast). MAKESPEARE has also synthesized a record-setting program on one of the puzzles from the TIS100 assembly language programming game.
【Keywords】:
【Paper Link】 【Pages】:2371-2378
【Authors】: Shahaf S. Shperberg ; Andrew Coles ; Bence Cserna ; Erez Karpas ; Wheeler Ruml ; Solomon Eyal Shimony
【Abstract】: Making plans that depend on external events can be tricky. For example, an agent considering a partial plan that involves taking a bus must recognize that this partial plan is only viable if completed and selected for execution in time for the agent to arrive at the bus stop. This setting raises the thorny problem of allocating the agent’s planning effort across multiple open search nodes, each of which has an expiration time and an expected completion effort in addition to the usual estimated plan cost. This paper formalizes this metareasoning problem, studies its theoretical properties, and presents several algorithms for solving it. Our theoretical results include a surprising connection to job scheduling, as well as to deliberation scheduling in time-dependent planning. Our empirical results indicate that our algorithms are effective in practice. This work advances our understanding of how heuristic search planners might address realistic problem settings.
【Keywords】:
【Paper Link】 【Pages】:2379-2386
【Authors】: Shahaf S. Shperberg ; Ariel Felner ; Nathan R. Sturtevant ; Solomon Eyal Shimony ; Avi Hayoun
【Abstract】: NBS is a non-parametric bidirectional search algorithm proven to expand at most twice the number of node expansions required to verify the optimality of a solution. We introduce new variants of NBS that are aimed at finding all optimal solutions. We then introduce an algorithmic framework that includes NBS as a special case. Finally, we introduce DVCBS, a new algorithm in this framework that aims to further reduce the number of expansions. Unlike NBS, DVCBS does not have any worst-case bound guarantees, but in practice it outperforms NBS in verifying the optimality of solutions.
【Keywords】:
【Paper Link】 【Pages】:2387-2394
【Authors】: Markus Spies ; Marco Todescato ; Hannes Becker ; Patrick Kesper ; Nicolai Waniek ; Meng Guo
【Abstract】: A wide range of discrete planning problems can be solved optimally using graph search algorithms. However, optimal search quickly becomes infeasible with increased complexity of a problem. In such a case, heuristics that guide the planning process towards the goal state can increase performance considerably. Unfortunately, heuristics are often unavailable or need manual and time-consuming engineering. Building upon recent results on applying deep learning to learn generalized reactive policies, we propose to learn heuristics by imitation learning. After learning heuristics based on optimal examples, they are used to guide a classical search algorithm to solve unseen tasks. However, directly applying learned heuristics in search algorithms such as A∗ breaks optimality guarantees, since learned heuristics are not necessarily admissible. Therefore, we (i) propose a novel method that utilizes learned heuristics to guide Focal Search A∗, a variant of A∗ with guarantees on bounded suboptimality; (ii) compare the complexity and performance of jointly learning individual policies for multiple robots with an approach that learns one policy for all robots; (iii) thoroughly examine how learned policies generalize to previously unseen environments and demonstrate considerably improved performance in a simulated complex dynamic coverage problem.
【Keywords】:
【Paper Link】 【Pages】:2395-2402
【Authors】: Thomas Weise ; Zijun Wu ; Markus Wagner
【Abstract】: A commonly used strategy for improving optimization algorithms is to restart the algorithm when it is believed to be trapped in an inferior part of the search space. Building on the recent success of BET-AND-RUN approaches for restarted local search solvers, we introduce a more generic version that makes use of performance prediction. It is our goal to obtain the best possible results within a given time budget t using a given black-box optimization algorithm. If no prior knowledge about problem features and algorithm behavior is available, the question about how to use the time budget most efficiently arises. We first start k ≥ 1 independent runs of the algorithm during an initialization budget t1 < t, pause these runs, then apply a decision maker D to choose 1 ≤ m < k runs from them (consuming t2 ≥ 0 time units in doing so), and then continue these runs for the remaining t3 = t−t1−t2 time units. In previous BET-AND-RUN strategies, the decision maker D = currentBest would simply select the run with the best-so-far results at negligible time. We propose using more advanced methods to discriminate between “good” and “bad” sample runs with the goal of increasing the correlation of the chosen run with the a-posteriori best one. In over 157 million experiments, we test different approaches to predict which run may yield the best results if granted the remaining budget. We show (1) that the currentBest method is indeed a very reliable and robust baseline approach, and (2) that our approach can yield better results than the previous methods.
【Keywords】:
【Paper Link】 【Pages】:2403-2410
【Authors】: Aimin Zhou ; Jinyuan Zhang ; Jianyong Sun ; Guixu Zhang
【Abstract】: In evolutionary optimization, the preselection is an efficient operator to improve the search efficiency, which aims to filter unpromising candidate solutions before fitness evaluation. Most existing preselection operators rely on fitness values, surrogate models, or classification models. Basically, the classification based preselection regards the preselection as a classification procedure, i.e., differentiating promising and unpromising candidate solutions. However, the difference between promising and unpromising classes becomes fuzzy as the running process goes on, as all the left solutions are likely to be promising ones. Facing this challenge, this paper proposes a fuzzy classification based preselection (FCPS) scheme, which utilizes the membership function to measure the quality of candidate solutions. The proposed FCPS scheme is applied to two state-of-the-art evolutionary algorithms on a test suite. The experimental results show the potential of FCPS on improving algorithm performance.
【Keywords】:
【Paper Link】 【Pages】:2412-2420
【Authors】: Tameem Adel ; Isabel Valera ; Zoubin Ghahramani ; Adrian Weller
【Abstract】: There is currently a great expansion of the impact of machine learning algorithms on our lives, prompting the need for objectives other than pure performance, including fairness. Fairness here means that the outcome of an automated decisionmaking system should not discriminate between subgroups characterized by sensitive attributes such as gender or race. Given any existing differentiable classifier, we make only slight adjustments to the architecture including adding a new hidden layer, in order to enable the concurrent adversarial optimization for fairness and accuracy. Our framework provides one way to quantify the tradeoff between fairness and accuracy, while also leading to strong empirical performance.
【Keywords】:
【Paper Link】 【Pages】:2421-2428
【Authors】: Shani Alkoby ; Zihe Wang ; David Sarne ; Pingzhong Tang
【Abstract】: Information plays a key role in many decision situations. The rapid advancement in communication technologies makes information providers more accessible, and various information providing platforms can be found nowadays, most of which are strategic in the sense that their goal is to maximize the providers’ expected profit. In this paper, we consider the common problem of a strategic information provider offering prospective buyers information which can disambiguate uncertainties the buyers have, which can be valuable for their decision making. Unlike prior work, we do not limit the information provider’s strategy to price setting but rather enable her flexibility over the way information is sold, specifically enabling querying about specific outcomes and the elimination of a subset of non-true world states alongside the traditional approach of disclosing the true world state. We prove that for the case where the buyer is self-interested (and the information provider does not know the true world state beforehand) all three methods (i.e., disclosing the true worldstate value, offering to check a specific value, and eliminating a random value) are equivalent, yielding the same expected profit to the information provider. For the case where buyers are human subjects, using an extensive set of experiments we show that the methods result in substantially different outcomes. Furthermore, using standard machine learning techniques the information provider can rather accurately predict the performance of the different methods for new problem settings, hence substantially increase profit.
【Keywords】:
【Paper Link】 【Pages】:2429-2437
【Authors】: Gagan Bansal ; Besmira Nushi ; Ece Kamar ; Daniel S. Weld ; Walter S. Lasecki ; Eric Horvitz
【Abstract】: AI systems are being deployed to support human decision making in high-stakes domains such as healthcare and criminal justice. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI’s inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI’s predictive performance, they may also lead to behavioral changes that are at odds with the user’s prior experiences and confidence in the AI’s inferences. We show that updates that increase AI performance may actually hurt team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes classification tasks show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff across different datasets, enabling more compatible yet accurate updates.
【Keywords】:
【Paper Link】 【Pages】:2438-2445
【Authors】: Alvaro H. C. Correia ; Freddy Lécué
【Abstract】: Feature selection is a crucial step in the conception of Machine Learning models, which is often performed via datadriven approaches that overlook the possibility of tapping into the human decision-making of the model’s designers and users. We present a human-in-the-loop framework that interacts with domain experts by collecting their feedback regarding the variables (of few samples) they evaluate as the most relevant for the task at hand. Such information can be modeled via Reinforcement Learning to derive a per-example feature selection method that tries to minimize the model’s loss function by focusing on the most pertinent variables from a human perspective. We report results on a proof-of-concept image classification dataset and on a real-world risk classification task in which the model successfully incorporated feedback from experts to improve its accuracy.
【Keywords】:
【Paper Link】 【Pages】:2446-2453
【Authors】: Gil Einziger ; Maayan Goldstein ; Yaniv Sa'ar ; Itai Segall
【Abstract】: Gradient boosted models are a fundamental machine learning technique. Robustness to small perturbations of the input is an important quality measure for machine learning models, but the literature lacks a method to prove the robustness of gradient boosted models.This work introduces VERIGB, a tool for quantifying the robustness of gradient boosted models. VERIGB encodes the model and the robustness property as an SMT formula, which enables state of the art verification tools to prove the model’s robustness. We extensively evaluate VERIGB on publicly available datasets and demonstrate a capability for verifying large models. Finally, we show that some model configurations tend to be inherently more robust than others.
【Keywords】:
【Paper Link】 【Pages】:2454-2461
【Authors】: Andrew Forney ; Elias Bareinboim
【Abstract】: Randomized clinical trials (RCTs) like those conducted by the FDA provide medical practitioners with average effects of treatments, and are generally more desirable than observational studies due to their control of unobserved confounders (UCs), viz., latent factors that influence both treatment and recovery. However, recent results from causal inference have shown that randomization results in a subsequent loss of information about the UCs, which may impede treatment efficacy if left uncontrolled in practice (Bareinboim, Forney, and Pearl 2015). Our paper presents a novel experimental design that can be noninvasively layered atop past and future RCTs to not only expose the presence of UCs in a system, but also reveal patient- and practitioner-specific treatment effects in order to improve decision-making. Applications are given to personalized medicine, second opinions in diagnosis, and employing offline results in online recommender systems.
【Keywords】:
【Paper Link】 【Pages】:2462-2470
【Authors】: Vinicius G. Goecks ; Gregory M. Gremillion ; Vernon J. Lawhern ; John Valasek ; Nicholas R. Waytowich
【Abstract】: This paper investigates how to utilize different forms of human interaction to safely train autonomous systems in realtime by learning from both human demonstrations and interventions. We implement two components of the Cycle-of Learning for Autonomous Systems, which is our framework for combining multiple modalities of human interaction. The current effort employs human demonstrations to teach a desired behavior via imitation learning, then leverages intervention data to correct for undesired behaviors produced by the imitation learner to teach novel tasks to an autonomous agent safely, after only minutes of training. We demonstrate this method in an autonomous perching task using a quadrotor with continuous roll, pitch, yaw, and throttle commands and imagery captured from a downward-facing camera in a high-fidelity simulated environment. Our method improves task completion performance for the same amount of human interaction when compared to learning from demonstrations alone, while also requiring on average 32% less data to achieve that performance. This provides evidence that combining multiple modes of human interaction can increase both the training speed and overall performance of policies for autonomous systems.
【Keywords】:
【Paper Link】 【Pages】:2471-2478
【Authors】: Mingxuan Jing ; Xiaojian Ma ; Wen-bing Huang ; Fuchun Sun ; Huaping Liu
【Abstract】: The goal of task transfer in reinforcement learning is migrating the action policy of an agent to the target task from the source task. Given their successes on robotic action planning, current methods mostly rely on two requirements: exactlyrelevant expert demonstrations or the explicitly-coded cost function on target task, both of which, however, are inconvenient to obtain in practice. In this paper, we relax these two strong conditions by developing a novel task transfer framework where the expert preference is applied as a guidance. In particular, we alternate the following two steps: Firstly, letting experts apply pre-defined preference rules to select related expert demonstrates for the target task. Secondly, based on the selection result, we learn the target cost function and trajectory distribution simultaneously via enhanced Adversarial MaxEnt IRL and generate more trajectories by the learned target distribution for the next preference selection. The theoretical analysis on the distribution learning and convergence of the proposed algorithm are provided. Extensive simulations on several benchmarks have been conducted for further verifying the effectiveness of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:2479-2487
【Authors】: Anagha Kulkarni ; Siddharth Srivastava ; Subbarao Kambhampati
【Abstract】: Users of AI systems may rely upon them to produce plans for achieving desired objectives. Such AI systems should be able to compute obfuscated plans whose execution in adversarial situations protects privacy, as well as legible plans which are easy for team members to understand in cooperative situations. We develop a unified framework that addresses these dual problems by computing plans with a desired level of comprehensibility from the point of view of a partially informed observer. For adversarial settings, our approach produces obfuscated plans with observations that are consistent with at least k goals from a set of decoy goals. By slightly varying our framework, we present an approach for producing legible plans in cooperative settings such that the observation sequence projected by the plan is consistent with at most j goals from a set of confounding goals. In addition, we show how the observability of the observer can be controlled to either obfuscate or convey the actions in a plan when the goal is known to the observer. We present theoretical results on the complexity analysis of our approach. We also present an empirical evaluation to show the feasibility and usefulness of our approaches using IPC domains.
【Keywords】:
【Paper Link】 【Pages】:2488-2495
【Authors】: Dongze Lian ; Ziheng Zhang ; Weixin Luo ; Lina Hu ; Minye Wu ; Zechao Li ; Jingyi Yu ; Shenghua Gao
【Abstract】: This paper tackles RGBD based gaze estimation with Convolutional Neural Networks (CNNs). Specifically, we propose to decompose gaze point estimation into eyeball pose, head pose, and 3D eye position estimation. Compared with RGB image-based gaze tracking, having depth modality helps to facilitate head pose estimation and 3D eye position estimation. The captured depth image, however, usually contains noise and black holes which noticeably hamper gaze tracking. Thus we propose a CNN-based multi-task learning framework to simultaneously refine depth images and predict gaze points. We utilize a generator network for depth image generation with a Generative Neural Network (GAN), where the generator network is partially shared by both the gaze tracking network and GAN-based depth synthesizing. By optimizing the whole network simultaneously, depth image synthesis improves gaze point estimation and vice versa. Since the only existing RGBD dataset (EYEDIAP) is too small, we build a large-scale RGBD gaze tracking dataset for performance evaluation. As far as we know, it is the largest RGBD gaze dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the EYEDIAP dataset.
【Keywords】:
【Paper Link】 【Pages】:2496-2505
【Authors】: Yuzuru Okajima ; Kunihiko Sadamasa
【Abstract】: Deep neural networks achieve high predictive accuracy by learning latent representations of complex data. However, the reasoning behind their decisions is difficult for humans to understand. On the other hand, rule-based approaches are able to justify the decisions by showing the decision rules leading to them, but they have relatively low accuracy. To improve the interpretability of neural networks, several techniques provide post-hoc explanations of decisions made by neural networks, but they cannot guarantee that the decisions are always explained in a simple form like decision rules because their explanations are generated after the decisions are made by neural networks.In this paper, to balance the accuracy of neural networks and the interpretability of decision rules, we propose a hybrid technique called rule-constrained networks, namely, neural networks that make decisions by selecting decision rules from a given ruleset. Because the networks are forced to make decisions based on decision rules, it is guaranteed that every decision is supported by a decision rule. Furthermore, we propose a technique to jointly optimize the neural network and the ruleset from which the network select rules. The log likelihood of correct classifications is maximized under a model with hyper parameters about the ruleset size and the prior probabilities of rules being selected. This feature makes it possible to limit the ruleset size or prioritize human-made rules over automatically acquired rules for promoting the interpretability of the output. Experiments on datasets of time-series and sentiment classification showed rule-constrained networks achieved accuracy as high as that achieved by original neural networks and significantly higher than that achieved by existing rule-based models, while presenting decision rules supporting the decisions.
【Keywords】:
【Paper Link】 【Pages】:2506-2513
【Authors】: Linsen Song ; Jie Cao ; Lingxiao Song ; Yibo Hu ; Ran He
【Abstract】: Face completion is a challenging generation task because it requires generating visually pleasing new pixels that are semantically consistent with the unmasked face region. This paper proposes a geometry-aware Face Completion and Editing NETwork (FCENet) by systematically studying facial geometry from the unmasked region. Firstly, a facial geometry estimator is learned to estimate facial landmark heatmaps and parsing maps from the unmasked face image. Then, an encoder-decoder structure generator serves to complete a face image and disentangle its mask areas conditioned on both the masked face image and the estimated facial geometry images. Besides, since low-rank property exists in manually labeled masks, a low-rank regularization term is imposed on the disentangled masks, enforcing our completion network to manage occlusion area with various shape and size. Furthermore, our network can generate diverse results from the same masked input by modifying estimated facial geometry, which provides a flexible mean to edit the completed face appearance. Extensive experimental results qualitatively and quantitatively demonstrate that our network is able to generate visually pleasing face completion results and edit face attributes as well.
【Keywords】:
【Paper Link】 【Pages】:2514-2521
【Authors】: Nicholay Topin ; Manuela Veloso
【Abstract】: Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(|F|2|tr samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.
【Keywords】:
【Paper Link】 【Pages】:2522-2530
【Authors】: Vaibhav V. Unhelkar ; Julie A. Shah
【Abstract】: Artificial agents that interact with other (human or artificial) agents require models in order to reason about those other agents’ behavior. In addition to the predictive utility of these models, maintaining a model that is aligned with an agent’s true generative model of behavior is critical for effective human-agent interaction. In applications wherein observations and partial specification of the agent’s behavior are available, achieving model alignment is challenging for a variety of reasons. For one, the agent’s decision factors are often not completely known; further, prior approaches that rely upon observations of agents’ behavior alone can fail to recover the true model, since multiple models can explain observed behavior equally well. To achieve better model alignment, we provide a novel approach capable of learning aligned models that conform to partial knowledge of the agent’s behavior. Central to our approach are a factored model of behavior (AMM), along with Bayesian nonparametric priors, and an inference approach capable of incorporating partial specifications as constraints for model learning. We evaluate our approach in experiments and demonstrate improvements in metrics of model alignment.
【Keywords】:
【Paper Link】 【Pages】:2531-2538
【Authors】: Loïs Vanhée ; Laurent Jeanpierre ; Abdel-Illah Mouaddib
【Abstract】: This paper introduces Advice-MDPs, an expansion of Markov Decision Processes for generating policies that take into consideration advising on the desirability, undesirability, and prohibition of certain states and actions. AdviceMDPs enable the design of designing semi-autonomous systems (systems that require operator support for at least handling certain situations) that can efficiently handle unexpected complex environments. Operators, through advising, can augment the planning model for covering unexpected real-world irregularities. This advising can swiftly augment the degree of autonomy of the system, so it can work without subsequent human intervention. This paper details the Advice-MDP formalism, a fast AdviceMDP resolution algorithm, and its applicability for real-world tasks, via the design of a professional-class semi-autonomous robot system ready to be deployed in a wide range of unexpected environments and capable of efficiently integrating operator advising.
【Keywords】:
【Paper Link】 【Pages】:2539-2546
【Authors】: Sandareka Wickramanayake ; Wynne Hsu ; Mong-Li Lee
【Abstract】: Explaining the decisions of a Deep Learning Network is imperative to safeguard end-user trust. Such explanations must be intuitive, descriptive, and faithfully explain why a model makes its decisions. In this work, we propose a framework called FLEX (Faithful Linguistic EXplanations) that generates post-hoc linguistic justifications to rationalize the decision of a Convolutional Neural Network. FLEX explains a model’s decision in terms of features that are responsible for the decision. We derive a novel way to associate such features to words, and introduce a new decision-relevance metric that measures the faithfulness of an explanation to a model’s reasoning. Experiment results on two benchmark datasets demonstrate that the proposed framework can generate discriminative and faithful explanations compared to state-of-the-art explanation generators. We also show how FLEX can generate explanations for images of unseen classes as well as automatically annotate objects in images.
【Keywords】:
【Paper Link】 【Pages】:2547-2554
【Authors】: Ziyu Yao ; Xiujun Li ; Jianfeng Gao ; Brian M. Sadler ; Huan Sun
【Abstract】: Given a text description, most existing semantic parsers synthesize a program in one shot. However, it is quite challenging to produce a correct program solely based on the description, which in reality is often ambiguous or incomplete. In this paper, we investigate interactive semantic parsing, where the agent can ask the user clarification questions to resolve ambiguities via a multi-turn dialogue, on an important type of programs called “If-Then recipes.” We develop a hierarchical reinforcement learning (HRL) based agent that significantly improves the parsing performance with minimal questions to the user. Results under both simulation and human evaluation show that our agent substantially outperforms non-interactive semantic parsers and rule-based agents.1
【Keywords】:
【Paper Link】 【Pages】:2556-2563
【Authors】: Abhay M. S. Aradhya ; Aditya Joglekar ; Sundaram Suresh ; Mahardhika Pratama
【Abstract】: Analysis of resting state - functional Magnetic Resonance Imaging (rs-fMRI) data has been a challenging problem due to a high homogeneity, large intra-class variability, limited samples and difference in acquisition technologies/techniques. These issues are predominant in the case of Attention Deficit Hyperactivity Disorder (ADHD). In this paper, we propose a new Deep Transformation Method (DTM) that extracts the discriminant latent feature space from rsfMRI and projects it in the subsequent layer for classification of rs-fMRI data. The hidden transformation layer in DTM projects the original rs-fMRI data into a new space using the learning policy and extracts the spatio-temporal correlations of the functional activities as a latent feature space. The subsequent convolution and decision layers transform the latent feature space into high-level features and provide accurate classification. The performance of DTM has been evaluated using the ADHD200 rs-fMRI benchmark data with crossvalidation. The results show that the proposed DTM achieves a mean classification accuracy of 70.36% and an improvement of 8.25% on the state of the art methodologies was observed. The improvement is due to concurrent analysis of the spatio-temporal correlations between the different regions of the brain and can be easily extended to study other cognitive disorders using rs-fMRI. Further, brain network analysis has been studied to identify the difference in functional activities and the corresponding regions behind cognitive symptoms in ADHD.
【Keywords】:
【Paper Link】 【Pages】:2564-2571
【Authors】: Nan Cao ; Xin Yan ; Yang Shi ; Chaoran Chen
【Abstract】: Sketch drawings play an important role in assisting humans in communication and creative design since ancient period. This situation has motivated the development of artificial intelligence (AI) techniques for automatically generating sketches based on user input. Sketch-RNN, a sequence-to-sequence variational autoencoder (VAE) model, was developed for this purpose and known as a state-of-the-art technique. However, it suffers from limitations, including the generation of lowquality results and its incapability to support multi-class generations. To address these issues, we introduced AI-Sketcher, a deep generative model for generating high-quality multiclass sketches. Our model improves drawing quality by employing a CNN-based autoencoder to capture the positional information of each stroke at the pixel level. It also introduces an influence layer to more precisely guide the generation of each stroke by directly referring to the training data. To support multi-class sketch generation, we provided a conditional vector that can help differentiate sketches under various classes. The proposed technique was evaluated based on two large-scale sketch datasets, and results demonstrated its power in generating high-quality sketches.
【Keywords】:
【Paper Link】 【Pages】:2572-2579
【Authors】: Lin Chen ; Lei Xu ; Shouhuai Xu ; Zhimin Gao ; Weidong Shi
【Abstract】: Bribery in election (or computational social choice in general) is an important problem that has received a considerable amount of attention. In the classic bribery problem, the briber (or attacker) bribes some voters in attempting to make the briber’s designated candidate win an election. In this paper, we introduce a novel variant of the bribery problem, “Election with Bribed Voter Uncertainty” or BVU for short, accommodating the uncertainty that the vote of a bribed voter may or may not be counted. This uncertainty occurs either because a bribed voter may not cast its vote in fear of being caught, or because a bribed voter is indeed caught and therefore its vote is discarded. As a first step towards ultimately understanding and addressing this important problem, we show that it does not admit any multiplicative O(1)-approximation algorithm modulo standard complexity assumptions. We further show that there is an approximation algorithm that returns a solution with an additive-ε error in FPT time for any fixed ε.
【Keywords】:
【Paper Link】 【Pages】:2580-2587
【Authors】: Xiao Guo ; Jongmoo Choi
【Abstract】: Human motion prediction from motion capture data is a classical problem in the computer vision, and conventional methods take the holistic human body as input. These methods ignore the fact that, in various human activities, different body components (limbs and the torso) have distinctive characteristics in terms of the moving pattern. In this paper, we argue local representations on different body components should be learned separately and, based on such idea, propose a network, Skeleton Network (SkelNet), for long-term human motion prediction. Specifically, at each time-step, local structure representations of input (human body) are obtained via SkelNet’s branches of component-specific layers, then the shared layer uses local spatial representations to predict the future human pose. Our SkelNet is the first to use local structure representations for predicting the human motion. Then, for short-term human motion prediction, we propose the second network, named as Skeleton Temporal Network (Skel-TNet). Skel-TNet consists of three components: SkelNet and a Recurrent Neural Network, they have advantages in learning spatial and temporal dependencies for predicting human motion, respectively; a feed-forward network that outputs the final estimation. Our methods achieve promising results on the Human3.6M dataset and the CMU motion capture dataset, and the code is publicly available 1.
【Keywords】:
【Paper Link】 【Pages】:2588-2595
【Authors】: Yaman Kumar ; Rohit Jain ; Khwaja Mohd. Salik ; Rajiv Ratn Shah ; Yifang Yin ; Roger Zimmermann
【Abstract】: Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary mapping. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper.
【Keywords】:
【Paper Link】 【Pages】:2596-2603
【Authors】: Keting Lu ; Shiqi Zhang ; Xiaoping Chen
【Abstract】: Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dialogues in early learning phase. Hindsight experience replay (HER) enables learning from failures, but the vanilla HER is inapplicable to dialogue learning due to the implicit goals. In this work, we develop two complex HER methods providing different tradeoffs between complexity and performance, and, for the first time, enabled HER-based dialogue policy learning. Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.
【Keywords】:
【Paper Link】 【Pages】:2604-2611
【Authors】: Rafael R. Padovani ; Lucas N. Ferreira ; Levi H. S. Lelis
【Abstract】: System accuracy is a crucial factor influencing user experience in intelligent interactive systems. Although accuracy is known to be important, little is known about the role of the system’s error distribution in user experience. In this paper we study, in the context of background music selection for tabletop games, how the error distribution of an intelligent system affects the user’s perceived experience. In particular, we show that supervised learning algorithms that solely optimize for prediction accuracy can make the system “indecisive”. That is, it can make the system’s errors sparsely distributed throughout the game session. We hypothesize that sparsely distributed errors can harm the users’ perceived experience and it is preferable to use a model that is somewhat inaccurate but decisive, than a model that is accurate but often indecisive. In order to test our hypothesis we introduce an ensemble approach with a restrictive voting rule that instead of erring sparsely through time, it errs consistently for a period of time. A user study in which people watched videos of Dungeons and Dragons sessions supports our hypothesis.
【Keywords】:
【Paper Link】 【Pages】:2612-2619
【Authors】: Hermann Schichl ; Meinolf Sellmann
【Abstract】: We consider the task of aggregating scores provided by experts that each have scored only a subset of all objects to be rated. Since experts only see a subset of all objects, they lack global information on the overall quality of all objects, as well as the global range in quality. Inherently, the only reliable information we get from experts is therefore the relative scores over the objects that they have scored each. We propose several variants of a new aggregation framework that takes this into account by computing consensual affine transformations of each expert’s scores to reach a globally balanced view. Numerical comparisons with other aggregation methods, such as rank-based methods, Kemeny-Young scoring, and a maximum likelihood estimator, show that the new method gives significantly better results in practice. Moreover, the computation is practically affordable and scales well even to larger numbers of experts and objects.
【Keywords】:
【Paper Link】 【Pages】:2620-2627
【Authors】: Sicheng Zhao ; Chuang Lin ; Pengfei Xu ; Sendong Zhao ; Yuchen Guo ; Ravi Krishna ; Guiguang Ding ; Kurt Keutzer
【Abstract】: Deep neural networks excel at learning from large-scale labeled training data, but cannot well generalize the learned knowledge to new domains or datasets. Domain adaptation studies how to transfer models trained on one labeled source domain to another sparsely labeled or unlabeled target domain. In this paper, we investigate the unsupervised domain adaptation (UDA) problem in image emotion classification. Specifically, we develop a novel cycle-consistent adversarial model, termed CycleEmotionGAN, by enforcing emotional semantic consistency while adapting images cycleconsistently. By alternately optimizing the CycleGAN loss, the emotional semantic consistency loss, and the target classification loss, CycleEmotionGAN can adapt source domain images to have similar distributions to the target domain without using aligned image pairs. Simultaneously, the annotation information of the source images is preserved. Extensive experiments are conducted on the ArtPhoto and FI datasets, and the results demonstrate that CycleEmotionGAN significantly outperforms the state-of-the-art UDA approaches.
【Keywords】:
【Paper Link】 【Pages】:2629-2636
【Authors】: Yan Zhao ; Jinfu Xia ; Guanfeng Liu ; Han Su ; Defu Lian ; Shuo Shang ; Kai Zheng
【Abstract】: With the ubiquity of smart devices, Spatial Crowdsourcing (SC) has emerged as a new transformative platform that engages mobile users to perform spatio-temporal tasks by physically traveling to specified locations. Thus, various SC techniques have been studied for performance optimization, among which one of the major challenges is how to assign workers the tasks that they are really interested in and willing to perform. In this paper, we propose a novel preference-aware spatial task assignment system based on workers’ temporal preferences, which consists of two components: History-based Context-aware Tensor Decomposition (HCTD) for workers’ temporal preferences modeling and preference-aware task assignment. We model worker preferences with a three-dimension tensor (worker-task-time). Supplementing the missing entries of the tensor through HCTD with the assistant of historical data and other two context matrices, we recover worker preferences for different categories of tasks in different time slots. Several preference-aware task assignment algorithms are then devised, aiming to maximize the total number of task assignments at every time instance, in which we give higher priorities to the workers who are more interested in the tasks. We conduct extensive experiments using a real dataset, verifying the practicability of our proposed methods.
【Keywords】:
【Paper Link】 【Pages】:2638-2645
【Authors】: Erman Acar ; Massimo Benerecetti ; Fabio Mogavero
【Abstract】: In the design of complex systems, model-checking and satisfiability arise as two prominent decision problems. While model-checking requires the designed system to be provided in advance, satisfiability allows to check if such a system even exists. With very few exceptions, the second problem turns out to be harder than the first one from a complexity-theoretic standpoint. In this paper, we investigate the connection between the two problems for a non-trivial fragment of Strategy Logic (SL, for short). SL extends LTL with first-order quantifications over strategies, thus allowing to explicitly reason about the strategic abilities of agents in a multi-agent system. Satisfiability for the full logic is known to be highly undecidable, while model-checking is non-elementary.The SL fragment we consider is obtained by preventing strategic quantifications within the scope of temporal operators. The resulting logic is quite powerful, still allowing to express important game-theoretic properties of multi-agent systems, such as existence of Nash and immune equilibria, as well as to formalize the rational synthesis problem. We show that satisfiability for such a fragment is PSPACE-COMPLETE, while its model-checking complexity is 2EXPTIME-HARD. The result is obtained by means of an elegant encoding of the problem into the satisfiability of conjunctive-binding first-order logic, a recently discovered decidable fragment of first-order logic.
【Keywords】:
【Paper Link】 【Pages】:2646-2653
【Authors】: Natasha Alechina ; Tomás Brázdil ; Giuseppe De Giacomo ; Paolo Felli ; Brian Logan ; Moshe Y. Vardi
【Abstract】: There has recently been increasing interest in using reactive synthesis techniques to automate the production of manufacturing process plans. Previous work has assumed that the set of manufacturing resources is known and fixed in advance. In this paper, we consider the more general problem of whether a controller can be synthesized given sufficient resources. In the unbounded setting, only the types of available manufacturing resources are given, and we want to know whether it is possible to manufacture a product using only resources of those type(s), and, if so, how many resources of each type are needed. We model manufacturing processes and facilities as transducers (automata with output), and show that the unbounded orchestration problem is decidable and the (Pareto) optimal set of resources necessary to manufacture a product is computable for uni-transducers. However, for multitransducers, the problem is undecidable.
【Keywords】:
【Paper Link】 【Pages】:2654-2661
【Authors】: Medina Andresel ; Yazmin Ibáñez-García ; Magdalena Ortiz ; Mantas Simkus
【Abstract】: We advocate the use of ontologies for relaxing and restraining queries, so that they retrieve either more or less answers, enabling the exploration of a given dataset. We propose a set of rewriting rules to relax and restrain conjunctive queries (CQs) over datasets mediated by an ontology written in a dialect of DL-Lite with complex role inclusions (CRIs). The addition of CRI enables the representation of knowledge about data involving ordered hierarchies of categories, in the style of multi-dimensional data models. Although CRIs in general destroy the first-order rewritability of CQs, we identify settings in which CQs remain rewritable.
【Keywords】:
【Paper Link】 【Pages】:2662-2669
【Authors】: Alexander Bagnall ; Gordon Stewart
【Abstract】: We present MLCERT, a novel system for doing practical mechanized proof of the generalization of learning procedures, bounding expected error in terms of training or test error. MLCERT is mechanized in that we prove generalization bounds inside the theorem prover Coq; thus the bounds are machine checked by Coq’s proof checker. MLCERT is practical in that we extract learning procedures defined in Coq to executable code; thus procedures with proved generalization bounds can be trained and deployed in real systems. MLCERT is well documented and open source; thus we expect it to be usable even by those without Coq expertise. To validate MLCERT, which is compatible with external tools such as TensorFlow, we use it to prove generalization bounds on neural networks trained using TensorFlow on the extended MNIST data set.
【Keywords】:
【Paper Link】 【Pages】:2670-2677
【Authors】: Ringo Baumann ; Gerhard Brewka
【Abstract】: This paper continues the rather recent line of research on the dynamics of non-monotonic formalisms. In particular, we consider semantic changes in Dung’s abstract argumentation formalism. One of the most studied problems in this context is the so-called enforcing problem which is concerned with manipulating argumentation frameworks (AFs) such that a certain desired set of arguments becomes an extension. Here we study the inverse problem, namely the extension removal problem: is it possible – and if so how – to modify a given argumentation framework in such a way that certain undesired extensions are no longer generated? Analogously to the well known AGM paradigm we develop an axiomatic approach to the removal problem, i.e. a certain set of axioms will determine suitable manipulations. Although contraction (that is, the elimination of a particular belief) is conceptually quite different from extension removal, there are surprisingly deep connections between the two: it turns out that postulates for removal can be directly obtained as reformulations of the AGM contraction postulates. We prove a series of formal results including conditional and unconditional existence and semantical uniqueness of removal operators as well as various impossibility results – and show possible ways out.
【Keywords】:
【Paper Link】 【Pages】:2678-2685
【Authors】: Sander Beckers ; Joseph Y. Halpern
【Abstract】: We consider a sequence of successively more restrictive definitions of abstraction for causal models, starting with a notion introduced by Rubenstein et al. (2017) called exact transformation that applies to probabilistic causal models, moving to a notion of uniform transformation that applies to deterministic causal models and does not allow differences to be hidden by the “right” choice of distribution, and then to abstraction, where the interventions of interest are determined by the map from low-level states to high-level states, and strong abstraction, which takes more seriously all potential interventions in a model, not just the allowed interventions. We show that procedures for combining micro-variables into macro-variables are instances of our notion of strong abstraction, as are all the examples considered by Rubenstein et al.
【Keywords】:
【Paper Link】 【Pages】:2686-2693
【Authors】: Bart Bogaerts
【Abstract】: Weighted abstract dialectical frameworks (wADFs) were recently introduced, extending abstract dialectical frameworks to incorporate degrees of acceptance. In this paper, we propose a different view on wADFs: we develop semantics for wADFs based on approximation fixpoint theory, an abstract algebraic theory designed to capture semantics of various non-monotonic reasoning formalisms. Our formalism deviates from the original definition on some basic assumptions, the most fundamental is that we assume an ordering on acceptance degrees. We discuss the impact of the differences, the relationship between the two versions of the formalism, and the advantages each of the approaches offers. We furthermore study complexity of various semantics.
【Keywords】:
【Paper Link】 【Pages】:2694-2702
【Authors】: Jori Bomanson ; Tomi Janhunen ; Antonius Weinzierl
【Abstract】: Answer-Set Programming (ASP) is an expressive rule-based knowledge-representation formalism. Lazy grounding is a solving technique that avoids the well-known grounding bottleneck of traditional ASP evaluation but is restricted to normal rules, severely limiting its expressive power. In this work, we introduce a framework to handle aggregates by normalizing them on demand during lazy grounding, hence relieving the restrictions of lazy grounding significantly. We term our approach as lazy normalization and demonstrate its feasibility for different types of aggregates. Asymptotic behavior is analyzed and correctness of the presented lazy normalizations is shown. Benchmark results indicate that lazy normalization can bring up-to exponential gains in space and time as well as enable ASP to be used in new application areas.
【Keywords】:
【Paper Link】 【Pages】:2703-2710
【Authors】: Blai Bonet ; Guillem Francès ; Hector Geffner
【Abstract】: Generalized planning is concerned with the computation of plans that solve not one but multiple instances of a planning domain. Recently, it has been shown that generalized plans can be expressed as mappings of feature values into actions, and that they can often be computed with fully observable non-deterministic (FOND) planners. The actions in such plans, however, are not the actions in the instances themselves, which are not necessarily common to other instances, but abstract actions that are defined on a set of common features. The formulation assumes that the features and the abstract actions are given. In this work, we address this limitation by showing how to learn them automatically. The resulting account of generalized planning combines learning and planning in a novel way: a learner, based on a Max SAT formulation, yields the features and abstract actions from sampled state transitions, and a FOND planner uses this information, suitably transformed, to produce the general plans. Correctness guarantees are given and experimental results on several domains are reported.
【Keywords】:
【Paper Link】 【Pages】:2711-2718
【Authors】: Stefan Borgwardt ; Ismail Ilkan Ceylan ; Thomas Lukasiewicz
【Abstract】: Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results.
【Keywords】:
【Paper Link】 【Pages】:2719-2726
【Authors】: Camille Bourgaux ; Ana Ozaki
【Abstract】: Attributed description logic is a recently proposed formalism, targeted for graph-based representation formats, which enriches description logic concepts and roles with finite sets of attribute-value pairs, called annotations. One of the most important uses of annotations is to record provenance information. In this work, we first investigate the complexity of satisfiability and query answering for attributed DL-LiteR ontologies. We then propose a new semantics, based on provenance semirings, for integrating provenance information with query answering. Finally, we establish complexity results for satisfiability and query answering under this semantics.
【Keywords】:
【Paper Link】 【Pages】:2727-2735
【Authors】: Andreas Bunte ; Benno Stein ; Oliver Niggemann
【Abstract】: This paper introduces a novel approach to Model-Based Diagnosis (MBD) for hybrid technical systems. Unlike existing approaches which normally rely on qualitative diagnosis models expressed in logic, our approach applies a learned quantitative model that is used to derive residuals. Based on these residuals a diagnosis model is generated and used for a root cause identification. The new solution has several advantages such as the easy integration of new machine learning algorithms into MBD, a seamless integration of qualitative models, and a significant speed-up of the diagnosis runtime. The paper at hand formally defines the new approach, outlines its advantages and drawbacks, and presents an evaluation with real-world use cases.
【Keywords】:
【Paper Link】 【Pages】:2736-2743
【Authors】: David Carral ; Larry González ; Patrick Koopmann
【Abstract】: Ontology-based access to large data-sets has recently gained a lot of attention. To access data efficiently, one approach is to rewrite the ontology into Datalog, and then use powerful Datalog engines to compute implicit entailments. Existing rewriting techniques support Description Logics (DLs) from ELH to Horn-SHIQ. We go one step further and present one such data-independent rewriting technique for Horn-SRIQ⊓, the extension of Horn-SHIQ that supports role chain axioms, an expressive feature prominently used in many real-world ontologies. We evaluated our rewriting technique on a large known corpus of ontologies. Our experiments show that the resulting rewritings are of moderate size, and that our approach is more efficient than state-of-the-art DL reasoners when reasoning with data-intensive ontologies.
【Keywords】:
【Paper Link】 【Pages】:2744-2751
【Authors】: Juan D. Correa ; Jin Tian ; Elias Bareinboim
【Abstract】: Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout the data-driven sciences since they translate into stable and generalizable explanations as well as efficient and robust decision-making capabilities. Inferring these relations from data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and selection biases. The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample. In this paper, we consider the problem of identifiability of causal effects when both confounding and selection biases are simultaneously present. We first investigate the problem of identifiability when all the available data is biased. We prove that the algorithm proposed by [Bareinboim and Tian, 2015] is, in fact, complete, namely, whenever the algorithm returns a failure condition, no identifiability claim about the causal relation can be made by any other method. We then generalize this setting to when, in addition to the biased data, another piece of external data is available, without bias. It may be the case that a subset of the covariates could be measured without bias (e.g., from census). We examine the problem of identifiability when a combination of biased and unbiased data is available. We propose a new algorithm that subsumes the current state-of-the-art method based on the back-door criterion.
【Keywords】:
【Paper Link】 【Pages】:2752-2759
【Authors】: Kristijonas Cyras ; Dimitrios Letsios ; Ruth Misener ; Francesca Toni
【Abstract】: Mathematical optimization offers highly-effective tools for finding solutions for problems with well-defined goals, notably scheduling. However, optimization solvers are often unexplainable black boxes whose solutions are inaccessible to users and which users cannot interact with. We define a novel paradigm using argumentation to empower the interaction between optimization solvers and users, supported by tractable explanations which certify or refute solutions. A solution can be from a solver or of interest to a user (in the context of ‘what-if’ scenarios). Specifically, we define argumentative and natural language explanations for why a schedule is (not) feasible, (not) efficient or (not) satisfying fixed user decisions, based on models of the fundamental makespan scheduling problem in terms of abstract argumentation frameworks (AFs). We define three types of AFs, whose stable extensions are in one-to-one correspondence with schedules that are feasible, efficient and satisfying fixed decisions, respectively. We extract the argumentative explanations from these AFs and the natural language explanations from the argumentative ones.
【Keywords】:
【Paper Link】 【Pages】:2760-2767
【Authors】: Daniel de Leng ; Fredrik Heintz
【Abstract】: Stream reasoning can be defined as incremental reasoning over incrementally-available information. The formula progression procedure for Metric Temporal Logic (MTL) makes use of syntactic formula rewritings to incrementally evaluate formulas against incrementally-available states. Progression however assumes complete state information, which can be problematic when not all state information is available or can be observed, such as in qualitative spatial reasoning tasks or in robotics applications. In those cases, there may be uncertainty as to which state out of a set of possible states represents the ‘true’ state. The main contribution of this paper is therefore an extension of the progression procedure that efficiently keeps track of all consistent hypotheses. The resulting procedure is flexible, allowing a trade-off between faster but approximate and slower but precise progression under uncertainty. The proposed approach is empirically evaluated by considering the time and space requirements, as well as the impact of permitting varying degrees of uncertainty.
【Keywords】:
【Paper Link】 【Pages】:2768-2775
【Authors】: Warren Del-Pinto ; Renate A. Schmidt
【Abstract】: Abductive reasoning generates explanatory hypotheses for new observations using prior knowledge. This paper investigates the use of forgetting, also known as uniform interpolation, to perform ABox abduction in description logic (ALC) ontologies. Non-abducibles are specified by a forgetting signature which can contain concept, but not role, symbols. The resulting hypotheses are semantically minimal and consist of a disjunction of ABox axioms. These disjuncts are each independent explanations, and are not redundant with respect to the background ontology or the other disjuncts, representing a form of hypothesis space. The observations and hypotheses handled by the method can contain both atomic or complex ALC concepts, excluding role assertions, and are not restricted to Horn clauses. Two approaches to redundancy elimination are explored in practice: full and approximate. Using a prototype implementation, experiments were performed over a corpus of real world ontologies to investigate the practicality of both approaches across several settings.
【Keywords】:
【Paper Link】 【Pages】:2776-2783
【Authors】: Heshan Du ; Natasha Alechina
【Abstract】: Several qualitative spatial logics used in reasoning about geospatial data have a sound and complete axiomatisation over metric spaces. It has been open whether the same axiomatisation is also sound and complete for 2D Euclidean spaces. We answer this question negatively by showing that the axiomatisations presented in (Du et al. 2013; Du and Alechina 2016) are not complete for 2D Euclidean spaces and, moreover, the logics are not finitely axiomatisable.
【Keywords】:
【Paper Link】 【Pages】:2784-2791
【Authors】: Jianfeng Du ; Jeff Z. Pan ; Sylvia Wang ; Kunxun Qi ; Yuming Shen ; Yu Deng
【Abstract】: This paper proposes a validation mechanism for newly added triples in a growing knowledge graph. Given a logical theory, a knowledge graph, a text corpus, and a new triple to be validated, this mechanism computes a sorted list of explanations for the new triple to facilitate the validation of it, where an explanation, called an abductive text evidence, is a set of pairs of the form (triple, window) where appending the set of triples on the left to the knowledge graph enforces entailment of the new triple under the logical theory, while every sentence window on the right which is contained in the text corpus explains to some degree why the triple on the left is true. From the angle of practice, a special class of abductive text evidences called TEP-based abductive text evidence is proposed, which is constructed from explanation patterns seen before in the knowledge graph. Accordingly, a method for computing the complete set of TEP-based abductive text evidences is proposed. Moreover, a method for sorting abductive text evidences based on distantly supervised learning is proposed. To evaluate the proposed validation mechanism, four knowledge graphs with logical theories are constructed from the four great classical masterpieces of Chinese literature. Experimental results on these datasets demonstrate the efficiency and effectiveness of the proposed mechanism.
【Keywords】:
【Paper Link】 【Pages】:2792-2800
【Authors】: Phan Minh Dung ; Phan Minh Thang ; Tran Cao Son
【Abstract】: We study defeasible knowledge bases with conditional preferences (DKB). A DKB consists of a set of undisputed facts and a rule-based system that contains different types of rules: strict, defeasible, and preference. A major challenge in defining the semantics of DKB lies in determining how conditional preferences interact with the attack relations represented by rebuts and undercuts, between arguments. We introduce the notions of preference attack relations as sets of attacks between preference arguments and the rebuts or undercuts among arguments as well as of preference attack relation assignments which map knowledge bases to preference attack relations. We present five rational properties (referred to as regular properties), the inconsistency-resolving, effective rebuts, context-independence, attack monotonicity and link-orientation properties generalizing the properties of the same names for the case of unconditional preferences. Preference attack relation assignment are defined as regular if they satisfy all regular properties. We show that the set of regular assignments forms a complete lower semilattice whose least element is referred to as the canonical preference attack relation assignment. Canonical attack relation assignment represents the semantics of preferences in defeasible knowledge bases as intuitively, it could be viewed as being uniquely identified by the regular properties together with the principle of minimal removal of undesired attacks. We also present the normal preference attack relation assignment as an approximation of the canonical attack relation assignment.
【Keywords】:
【Paper Link】 【Pages】:2801-2808
【Authors】: Wolfgang Dvorák ; Stefan Woltran
【Abstract】: Abstract argumentation frameworks have been introduced by Dung as part of an argumentation process, where arguments and conflicts are derived from a given knowledge base. It is solely this relation between arguments that is then used in order to identify acceptable sets of arguments. A final step concerns the acceptance status of particular statements by reviewing the actual contents of the acceptable arguments. Complexity analysis of abstract argumentation so far has neglected this final step and is concerned with argument names instead of their contents, i.e. their claims. As we outline in this paper, this is not only a slight deviation but can lead to different complexity results. We, therefore, give a comprehensive complexity analysis of abstract argumentation under a claim-centric view and analyse the four main decision problems under seven popular semantics. In addition, we also address the complexity of common sub-classes and introduce novel parameterisations – which exploit the nature of claims explicitly – along with fixed-parameter tractability results.
【Keywords】:
【Paper Link】 【Pages】:2809-2816
【Authors】: Wolfgang Faber ; Michael Morak ; Stefan Woltran
【Abstract】: Epistemic Logic Programs (ELPs), that is, Answer Set Programming (ASP) extended with epistemic operators, have received renewed interest in recent years, which led to a flurry of new research, as well as efficient solvers. An important question is under which conditions a sub-program can be replaced by another one without changing the meaning, in any context. This problem is known as strong equivalence, and is well-studied for ASP. For ELPs, this question has been approached by embedding them into epistemic extensions of equilibrium logics. In this paper, we consider a simpler, more direct characterization that is directly applicable to the language used in state-of-the-art ELP solvers. This also allows us to give tight complexity bounds, showing that strong equivalence for ELPs remains coNP-complete, as for ASP. We further use our results to provide syntactic characterizations for tautological rules and rule subsumption for ELPs.
【Keywords】:
【Paper Link】 【Pages】:2817-2826
【Authors】: Liangda Fang ; Kewen Wang ; Zhe Wang ; Ximing Wen
【Abstract】: Modal logics are primary formalisms for multi-agent systems but major reasoning tasks in such logics are intractable, which impedes applications of multi-agent modal logics such as automatic planning. One technique of tackling the intractability is to identify a fragment called a normal form of multiagent logics such that it is expressive but tractable for reasoning tasks such as entailment checking, bounded conjunction transformation and forgetting. For instance, DNF of propositional logic is tractable for these reasoning tasks. In this paper, we first introduce a notion of logical separability and then define a novel disjunctive normal form SDNF for the multiagent logic Kn, which overcomes some shortcomings of existing approaches. In particular, we show that every modal formula in Kn can be equivalently casted as a formula in SDNF, major reasoning tasks tractable in propositional DNF are also tractable in SDNF, and moreover, formulas in SDNF enjoy the property of logical separability. To demonstrate the usefulness of our approach, we apply SDNF in multi-agent epistemic planning. Finally, we extend these results to three more complex multi-agent logics Dn, K45n and KD45n.
【Keywords】:
【Paper Link】 【Pages】:2827-2834
【Authors】: Johannes Klaus Fichte ; Markus Hecher ; Arne Meier
【Abstract】: In this paper, we consider counting and projected model counting of extensions in abstract argumentation for various semantics. When asking for projected counts we are interested in counting the number of extensions of a given argumentation framework while multiple extensions that are identical when restricted to the projected arguments count as only one projected extension. We establish classical complexity results and parameterized complexity results when the problems are parameterized by treewidth of the undirected argumentation graph. To obtain upper bounds for counting projected extensions, we introduce novel algorithms that exploit small treewidth of the undirected argumentation graph of the input instance by dynamic programming (DP). Our algorithms run in time double or triple exponential in the treewidth depending on the considered semantics. Finally, we take the exponential time hypothesis (ETH) into account and establish lower bounds of bounded treewidth algorithms for counting extensions and projected extension.
【Keywords】:
【Paper Link】 【Pages】:2835-2842
【Authors】: Tian Gao ; Jie Chen ; Vijil Chenthamarakshan ; Michael Witbrock
【Abstract】: Consider a general machine learning setting where the output is a set of labels or sequences. This output set is unordered and its size varies with the input. Whereas multi-label classification methods seem a natural first resort, they are not readily applicable to set-valued outputs because of the growth rate of the output space; and because conventional sequence generation doesn’t reflect sets’ order-free nature. In this paper, we propose a unified framework—sequential set generation (SSG)—that can handle output sets of labels and sequences. SSG is a meta-algorithm that leverages any probabilistic learning method for label or sequence prediction, but employs a proper regularization such that a new label or sequence is generated repeatedly until the full set is produced. Though SSG is sequential in nature, it does not penalize the ordering of the appearance of the set elements and can be applied to a variety of set output problems, such as a set of classification labels or sequences. We perform experiments with both benchmark and synthetic data sets and demonstrate SSG’s strong performance over baseline methods.
【Keywords】:
【Paper Link】 【Pages】:2843-2850
【Authors】: Ricardo Gonçalves ; Tomi Janhunen ; Matthias Knorr ; João Leite ; Stefan Woltran
【Abstract】: Modular programming facilitates the creation and reuse of large software, and has recently gathered considerable interest in the context of Answer Set Programming (ASP). In this setting, forgetting, or the elimination of middle variables no longer deemed relevant, is of importance as it allows one to, e.g., simplify a program, make it more declarative, or even hide some of its parts without affecting the consequences for those parts that are relevant. While forgetting in the context of ASP has been extensively studied, its known limitations make it unsuitable to be used in Modular ASP. In this paper, we present a novel class of forgetting operators and show that such operators can always be successfully applied in Modular ASP to forget all kinds of atoms – input, output and hidden – overcoming the impossibility results that exist for general ASP. Additionally, we investigate conditions under which this class of operators preserves the module theorem in Modular ASP, thus ensuring that answer sets of modules can still be composed, and how the module theorem can always be preserved if we further allow the reconfiguration of modules.
【Keywords】:
【Paper Link】 【Pages】:2851-2858
【Authors】: Joseph Y. Halpern ; Evan Piermont
【Abstract】: We develop a modal logic to capture partial awareness. The logic has three building blocks: objects, properties, and concepts. Properties are unary predicates on objects; concepts are Boolean combinations of properties. We take an agent to be partially aware of a concept if she is aware of the concept without being aware of the properties that define it. The logic allows for quantification over objects and properties, so that the agent can reason about her own unawareness. We then apply the logic to contracts, which we view as syntactic objects that dictate outcomes based on the truth of formulas. We show that when agents are unaware of some relevant properties, referencing concepts that agents are only partially aware of can improve welfare.
【Keywords】:
【Paper Link】 【Pages】:2859-2866
【Authors】: Pan Hu ; Boris Motik ; Ian Horrocks
【Abstract】: The seminaïve algorithm can be used to materialise all consequences of a datalog program, and it also forms the basis for algorithms that incrementally update a materialisation as the input facts change. Certain (combinations of) rules, however, can be handled much more efficiently using custom algorithms. To integrate such algorithms into a general reasoning approach that can handle arbitrary rules, we propose a modular framework for computing and maintaining a materialisation. We split a datalog program into modules that can be handled using specialised algorithms, and we handle the remaining rules using the semina¨ıve algorithm. We also present two algorithms for computing the transitive and the symmetric– transitive closure of a relation that can be used within our framework. Finally, we show empirically that our framework can handle arbitrary datalog programs while outperforming existing approaches, often by orders of magnitude.
【Keywords】:
【Paper Link】 【Pages】:2867-2875
【Authors】: Xuanxiang Huang ; Kehang Fang ; Liangda Fang ; Qingliang Chen ; Zhao-Rong Lai ; Linfeng Wei
【Abstract】: In this paper, we present a novel data structure for compact representation and effective manipulations of Boolean functions, called Bi-Kronecker Functional Decision Diagrams (BKFDDs). BKFDDs integrate the classical expansions (the Shannon and Davio expansions) and their bi-versions. Thus, BKFDDs are the generalizations of existing decision diagrams: BDDs, FDDs, KFDDs and BBDDs. Interestingly, under certain conditions, it is sufficient to consider the above expansions (the classical expansions and their bi-versions). By imposing reduction and ordering rules, BKFDDs are compact and canonical forms of Boolean functions. The experimental results demonstrate that BKFDDs outperform other existing decision diagrams in terms of sizes.
【Keywords】:
【Paper Link】 【Pages】:2876-2885
【Authors】: Saravanan Kandasamy ; Arnab Bhattacharyya ; Vasant G. Honavar
【Abstract】: Eliciting causal effects from interventions and observations is one of the central concerns of science, and increasingly, artificial intelligence. We provide an algorithm that, given a causal graph G, determines MIC(G), a minimum intervention cover of G, i.e., a minimum set of interventions that suffices for identifying every causal effect that is identifiable in a causal model characterized by G. We establish the completeness of do-calculus for computing MIC(G). MIC(G) effectively offers an efficient compilation of all of the information obtainable from all possible interventions in a causal model characterized by G. Minimum intervention cover finds applications in a variety of contexts including counterfactual inference, and generalizing causal effects across experimental settings. We analyze the computational complexity of minimum intervention cover and identify some special cases of practical interest in which MIC(G) can be computed in time that is polynomial in the size of G.
【Keywords】:
【Paper Link】 【Pages】:2886-2894
【Authors】: Phokion G. Kolaitis ; Lucian Popa ; Kun Qian
【Abstract】: In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size.
【Keywords】:
【Paper Link】 【Pages】:2895-2902
【Authors】: Fanshuang Kong ; Richong Zhang ; Yongyi Mao ; Ting Deng
【Abstract】: Embedding based models for knowledge base completion have demonstrated great successes and attracted significant research interest. In this work, we observe that existing embedding models all have their loss functions decomposed into atomic loss functions, each on a triple or an postulated edge in the knowledge graph. Such an approach essentially implies that conditioned on the embeddings of the triple, whether the triple is factual is independent of the structure of the knowledge graph. Although arguably the embeddings of the entities and relation in the triple contain certain structural information of the knowledge base, we believe that the global information contained in the embeddings of the triple can be insufficient and such an assumption is overly optimistic in heterogeneous knowledge bases. Motivated by this understanding, in this work we propose a new embedding model in which we discard the assumption that the embeddings of the entities and relation in a triple is a sufficient statistic for the triple’s factual existence. More specifically, the proposed model assumes that whether a triple is factual depends not only on the embedding of the triple but also on the embeddings of the entities and relations in a larger graph neighbourhood. In this model, attention mechanisms are constructed to select the relevant information in the graph neighbourhood so that irrelevant signals in the neighbourhood are suppressed. Termed locality-expanded neural embedding with attention (LENA), this model is tested on four standard datasets and compared with several stateof-the-art models for knowledge base completion. Extensive experiments suggest that LENA outperforms the existing models in virtually every metric.
【Keywords】:
【Paper Link】 【Pages】:2903-2910
【Authors】: Patrick Koopmann
【Abstract】: We investigate ontology-based query answering for data that are both temporal and probabilistic, which might occur in contexts such as stream reasoning or situation recognition with uncertain data. We present a framework that allows to represent temporal probabilistic data, and introduce a query language with which complex temporal and probabilistic patterns can be described. Specifically, this language combines conjunctive queries with operators from linear time logic as well as probability operators. We analyse the complexities of evaluating queries in this language in various settings. While in some cases, combining the temporal and the probabilistic dimension in such a way comes at the cost of increased complexity, we also determine cases for which this increase can be avoided.
【Keywords】:
【Paper Link】 【Pages】:2911-2918
【Authors】: Nikhil Krishnaswamy ; Scott Friedman ; James Pustejovsky
【Abstract】: Many modern machine learning approaches require vast amounts of training data to learn new concepts; conversely, human learning often requires few examples—sometimes only one—from which the learner can abstract structural concepts. We present a novel approach to introducing new spatial structures to an AI agent, combining deep learning over qualitative spatial relations with various heuristic search algorithms. The agent extracts spatial relations from a sparse set of noisy examples of block-based structures, and trains convolutional and sequential models of those relation sets. To create novel examples of similar structures, the agent begins placing blocks on a virtual table, uses a CNN to predict the most similar complete example structure after each placement, an LSTM to predict the most likely set of remaining moves needed to complete it, and recommends one using heuristic search. We verify that the agent learned the concept by observing its virtual block-building activities, wherein it ranks each potential subsequent action toward building its learned concept. We empirically assess this approach with human participants’ ratings of the block structures. Initial results and qualitative evaluations of structures generated by the trained agent show where it has generalized concepts from the training data, which heuristics perform best within the search space, and how we might improve learning and execution.
【Keywords】:
【Paper Link】 【Pages】:2919-2928
【Authors】: Mark Law ; Alessandra Russo ; Elisa Bertino ; Krysia Broda ; Jorge Lobo
【Abstract】: In this paper we introduce an extension of context-free grammars called answer set grammars (ASGs). These grammars allow annotations on production rules, written in the language of Answer Set Programming (ASP), which can express context-sensitive constraints. We investigate the complexity of various classes of ASG with respect to two decision problems: deciding whether a given string belongs to the language of an ASG and deciding whether the language of an ASG is non-empty. Specifically, we show that the complexity of these decision problems can be lowered by restricting the subset of the ASP language used in the annotations. To aid the applicability of these grammars to computational problems that require context-sensitive parsers for partially known languages, we propose a learning task for inducing the annotations of an ASG. We characterise the complexity of this task and present an algorithm for solving it. An evaluation of a (prototype) implementation is also discussed.
【Keywords】:
【Paper Link】 【Pages】:2929-2937
【Authors】: Tiep Le ; Tran Cao Son ; Enrico Pontelli
【Abstract】: This paper proposes Multi-context System for Optimization Problems (MCS-OP) by introducing conditional costassignment bridge rules to Multi-context Systems (MCS). This novel feature facilitates the definition of a preorder among equilibria, based on the total incurred cost of applied bridge rules. As an application of MCS-OP, the paper describes how MCS-OP can be used in modeling Distributed Constraint Optimization Problems (DCOP), a prominent class of distributed optimization problems that is frequently employed in multi-agent system (MAS) research. The paper shows, by means of an example, that MCS-OP is more expressive than DCOP, and hence, could potentially be useful in modeling distributed optimization problems which cannot be easily dealt with using DCOPs. It also contains a complexity analysis of MCS-OP.
【Keywords】:
【Paper Link】 【Pages】:2938-2945
【Authors】: Tuomo Lehtonen ; Johannes Peter Wallner ; Matti Järvisalo
【Abstract】: Focusing on assumption-based argumentation (ABA) as a central structured formalism to AI argumentation, we propose a new approach to reasoning in ABA with and without preferences. While previous approaches apply either specialized algorithms or translate ABA reasoning to reasoning over abstract argumentation frameworks, we develop a direct approach by encoding ABA reasoning tasks in answer set programming. This significantly improves on the empirical performance of current ABA reasoning systems. We also give new complexity results for reasoning in ABA+, suggesting that the integration of preferential information into ABA results in increased problem complexity for several central argumentation semantics.
【Keywords】:
【Paper Link】 【Pages】:2946-2953
【Authors】: Jianwen Li ; Kristin Y. Rozier ; Geguang Pu ; Yueling Zhang ; Moshe Y. Vardi
【Abstract】: We present a SAT-based framework for LTLf (Linear Temporal Logic on Finite Traces) satisfiability checking. We use propositional SAT-solving techniques to construct a transition system for the input LTLf formula; satisfiability checking is then reduced to a path-search problem over this transition system. Furthermore, we introduce CDLSC (Conflict-Driven LTLf Satisfiability Checking), a novel algorithm that leverages information produced by propositional SAT solvers from both satisfiability and unsatisfiability results. Experimental evaluations show that CDLSC outperforms all other existing approaches for LTLf satisfiability checking, by demonstrating an approximate four-fold speed-up compared to the second-best solver.
【Keywords】:
【Paper Link】 【Pages】:2954-2961
【Abstract】: Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings independent from the corpus, while such information plays an important role in expressing the exact meanings of words for parataxis languages like Chinese. In this paper, after constructing the Chinese lexical and semantic ontology based on word-formation, we propose a novel approach to implanting the structured rational knowledge into distributed representation at morpheme level, naturally avoiding heavy disambiguation in the corpus. We design a template to create the instances as pseudo-sentences merely from the pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical information and tackle the data sparseness problem, the instance proliferation technique is applied based on similarity to expand the collection of pseudo-sentences. The distributed representation for morphemes can then be trained on these pseudo-sentences using word2vec. For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge.
【Keywords】:
【Paper Link】 【Pages】:2962-2969
【Authors】: Thomas Lukasiewicz ; Enrico Malizia ; Andrius Vaicenavicius
【Abstract】: Querying inconsistent ontological knowledge bases is an important problem in practice, for which several inconsistencytolerant query answering semantics have been proposed, including query answering relative to all repairs, relative to the intersection of repairs, and relative to the intersection of closed repairs. In these semantics, one assumes that the input database is erroneous, and the notion of repair describes a maximally consistent subset of the input database, where different notions of maximality (such as subset and cardinality maximality) are considered. In this paper, we give a precise picture of the computational complexity of inconsistencytolerant (Boolean conjunctive) query answering in a wide range of Datalog± languages under the cardinality-based versions of the above three repair semantics.
【Keywords】:
【Paper Link】 【Pages】:2970-2977
【Authors】: Daoming Lyu ; Fangkai Yang ; Bo Liu ; Steven Gustafson
【Abstract】: Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options.This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:2978-2985
【Authors】: Ke Ma ; Qianqian Xu ; Zhiyong Yang ; Xiaochun Cao
【Abstract】: In the absence of prior knowledge, ordinal embedding methods obtain new representation for items in a low-dimensional Euclidean space via a set of quadruple-wise comparisons. These ordinal comparisons often come from human annotators, and sufficient comparisons induce the success of classical approaches. However, collecting a large number of labeled data is known as a hard task, and most of the existing work pay little attention to the generalization ability with insufficient samples. Meanwhile, recent progress in large margin theory discloses that rather than just maximizing the minimum margin, both the margin mean and variance, which characterize the margin distribution, are more crucial to the overall generalization performance. To address the issue of insufficient training samples, we propose a margin distribution learning paradigm for ordinal embedding, entitled Distributional Margin based Ordinal Embedding (DMOE). Precisely, we first define the margin for ordinal embedding problem. Secondly, we formulate a concise objective function which avoids maximizing margin mean and minimizing margin variance directly but exhibits the similar effect. Moreover, an Augmented Lagrange Multiplier based algorithm is customized to seek the optimal solution of DMOE effectively. Experimental studies on both simulated and realworld datasets are provided to show the effectiveness of the proposed algorithm.
【Keywords】:
【Paper Link】 【Pages】:2986-2994
【Authors】: Takanori Maehara ; Yuma Inoue
【Abstract】: Permutation is a fundamental combinatorial object appeared in various areas in mathematics, computer science, and artificial intelligence. In some applications, a subset of a permutation group must be maintained efficiently. In this study, we develop a new data structure, called group decision diagram (GDD), to maintain a set of permutations. This data structure combines the zero-suppressed binary decision diagram with the computable subgroup chain of the permutation group. The data structure enables efficient operations, such as membership testing, set operations (e.g., union, intersection, and difference), and Cartesian product. Our experiments demonstrate that the data structure is efficient (i.e., 20–300 times faster) than the existing methods when the permutation group is considerably smaller than the symmetric group, or only subsets constructed by a few operations over generators are maintained.
【Keywords】:
【Paper Link】 【Pages】:2995-3002
【Authors】: Stephanie McIntyre ; Alexander Borgida ; David Toman ; Grant E. Weddell
【Abstract】: Standard reasoning problems are complete for EXPTIME in common feature-based description logics—ones in which all roles are restricted to being functions. We show how to control conjunctions on left-hand-sides of subsumptions and use this restriction to develop a parameter-tractable algorithm for reasoning about knowledge base consistency. We then show how the resulting logic can simulate partial features, and present algorithms for efficient query answering in that setting.
【Keywords】:
【Paper Link】 【Pages】:3003-3010
【Authors】: Arindam Mitra ; Peter Clark ; Oyvind Tafjord ; Chitta Baral
【Abstract】: While in recent years machine learning (ML) based approaches have been the popular approach in developing endto-end question answering systems, such systems often struggle when additional knowledge is needed to correctly answer the questions. Proposed alternatives involve translating the question and the natural language text to a logical representation and then use logical reasoning. However, this alternative falters when the size of the text gets bigger. To address this we propose an approach that does logical reasoning over premises written in natural language text. The proposed method uses recent features of Answer Set Programming (ASP) to call external NLP modules (which may be based on ML) which perform simple textual entailment. To test our approach we develop a corpus based on the life cycle questions and showed that Our system achieves up to 18% performance gain when compared to standard MCQ solvers.
【Keywords】:
【Paper Link】 【Pages】:3011-3018
【Authors】: Pavel Naumov ; Jia Tao
【Abstract】: There are multiple notions of coalitional responsibility. The focus of this paper is on the blameworthiness defined through the principle of alternative possibilities: a coalition is blamable for a statement if the statement is true, but the coalition had a strategy to prevent it. The main technical result is a sound and complete bimodal logical system that describes properties of blameworthiness in one-shot games.
【Keywords】:
【Paper Link】 【Pages】:3019-3026
【Authors】: Jandson S. Ribeiro ; Abhaya Nayak ; Renata Wassermann
【Abstract】: Belief change and non-monotonic reasoning are arguably different perspectives on the same phenomenon, namely, jettisoning of currently held beliefs in response to some incompatible evidence. Investigations in this area typically assume, among other things, that the underlying (background) logic is compact, that is, whatever can be inferred from a set of sentences X can be inferred from a finite subset of X. Recent research in the field shows that this compactness assumption can be dispensed without inflicting much damage on the AGM paradigm of belief change. In this paper we investigate the impact of such relaxation on non-monotonic logics instead. In particular, we show that, when compactness is not guaranteed, while the bridge from the AGM paradigm of belief change to expectation logics remains unaffected, the “return trip” from expectation logics to AGM paradigm is no longer guaranteed. We finally explore the conditions under which such guarantee can be given.
【Keywords】:
【Paper Link】 【Pages】:3027-3035
【Authors】: Maarten Sap ; Ronan Le Bras ; Emily Allaway ; Chandra Bhagavatula ; Nicholas Lourie ; Hannah Rashkin ; Brendan Roof ; Noah A. Smith ; Yejin Choi
【Abstract】: We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., “if X pays Y a compliment, then Y will likely return the compliment”). We propose nine if-then relation types to distinguish causes vs. effects, agents vs. themes, voluntary vs. involuntary events, and actions vs. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.
【Keywords】:
【Paper Link】 【Pages】:3036-3043
【Authors】: Md. Kamruzzaman Sarker ; Pascal Hitzler
【Abstract】: Concept Induction refers to the problem of creating complex Description Logic class descriptions (i.e., TBox axioms) from instance examples (i.e., ABox data). In this paper we look particularly at the case where both a set of positive and a set of negative instances are given, and complex class expressions are sought under which the positive but not the negative examples fall. Concept induction has found applications in ontology engineering, but existing algorithms have fundamental performance issues in some scenarios, mainly because a high number of invokations of an external Description Logic reasoner is usually required. In this paper we present a new algorithm for this problem which drastically reduces the number of reasoner invokations needed. While this comes at the expense of a more limited traversal of the search space, we show that our approach improves execution times by up to several orders of magnitude, while output correctness, measured in the amount of correct coverage of the input instances, remains reasonably high in many cases. Our approach thus should provide a strong alternative to existing systems, in particular in settings where other systems are prohibitively slow.
【Keywords】:
【Paper Link】 【Pages】:3044-3051
【Authors】: Haseeb Shah ; Johannes Villmow ; Adrian Ulges ; Ulrich Schwanecke ; Faisal Shafait
【Abstract】: We present a novel extension to embedding-based knowledge graph completion models which enables them to perform open-world link prediction, i.e. to predict facts for entities unseen in training based on their textual description. Our model combines a regular link prediction model learned from a knowledge graph with word embeddings learned from a textual corpus. After training both independently, we learn a transformation to map the embeddings of an entity’s name and description to the graph-based embedding space.In experiments on several datasets including FB20k, DBPedia50k and our new dataset FB15k-237-OWE, we demonstrate competitive results. Particularly, our approach exploits the full knowledge graph structure even when textual descriptions are scarce, does not require a joint training on graph and text, and can be applied to any embedding-based link prediction model, such as TransE, ComplEx and DistMult.
【Keywords】:
【Paper Link】 【Pages】:3052-3059
【Authors】: Farhad Shakerin ; Gopal Gupta
【Abstract】: We present a heuristic based algorithm to induce nonmonotonic logic programs that will explain the behavior of XGBoost trained classifiers. We use the technique based on the LIME approach to locally select the most important features contributing to the classification decision. Then, in order to explain the model’s global behavior, we propose the LIME-FOLD algorithm —a heuristic-based inductive logic programming (ILP) algorithm capable of learning nonmonotonic logic programs—that we apply to a transformed dataset produced by LIME. Our proposed approach is agnostic to the choice of the ILP algorithm. Our experiments with UCI standard benchmarks suggest a significant improvement in terms of classification evaluation metrics. Meanwhile, the number of induced rules dramatically decreases compared to ALEPH, a state-of-the-art ILP system.
【Keywords】:
【Paper Link】 【Pages】:3060-3067
【Authors】: Chao Shang ; Yun Tang ; Jing Huang ; Jinbo Bi ; Xiaodong He ; Bowen Zhou
【Abstract】: Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to the current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model knowledge graphs. The model can be efficiently trained and scalable to large knowledge graphs. However, there is no structure enforcement in the embedding space of ConvE. The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure. In this work, we propose a novel end-to-end StructureAware Convolutional Network (SACN) that takes the benefit of GCN and ConvE together. SACN consists of an encoder of a weighted graph convolutional network (WGCN), and a decoder of a convolutional network called Conv-TransE. WGCN utilizes knowledge graph node structure, node attributes and edge relation types. It has learnable weights that adapt the amount of information from neighbors used in local aggregation, leading to more accurate embeddings of graph nodes. Node attributes in the graph are represented as additional nodes in the WGCN. The decoder Conv-TransE enables the state-of-the-art ConvE to be translational between entities and relations while keeps the same link prediction performance as ConvE. We demonstrate the effectiveness of the proposed SACN on standard FB15k-237 and WN18RR datasets, and it gives about 10% relative improvement over the state-of-theart ConvE in terms of HITS@1, HITS@3 and HITS@10.
【Keywords】:
【Paper Link】 【Pages】:3068-3075
【Authors】: Chengchao Shen ; Xinchao Wang ; Jie Song ; Li Sun ; Mingli Song
【Abstract】: With the rapid development of deep learning, there have been an unprecedentedly large number of trained deep network models available online. Reusing such trained models can significantly reduce the cost of training the new models from scratch, if not infeasible at all as the annotations used for the training original networks are often unavailable to public. We propose in this paper to study a new model-reusing task, which we term as knowledge amalgamation. Given multiple trained teacher networks, each of which specializes in a different classification problem, the goal of knowledge amalgamation is to learn a lightweight student model capable of handling the comprehensive classification. We assume no other annotations except the outputs from the teacher models are available, and thus focus on extracting and amalgamating knowledge from the multiple teachers. To this end, we propose a pilot two-step strategy to tackle the knowledge amalgamation task, by learning first the compact feature representations from teachers and then the network parameters in a layer-wise manner so as to build the student model. We apply this approach to four public datasets and obtain very encouraging results: even without any human annotation, the obtained student model is competent to handle the comprehensive classification task and in most cases outperforms the teachers in individual sub-tasks.
【Keywords】:
【Paper Link】 【Pages】:3076-3083
【Authors】: Marlo Souza ; Álvaro F. Moreira ; Renata Vieira
【Abstract】: AGM’s belief revision is one of the main paradigms in the study of belief change operations. In this context, belief bases (prioritised bases) have been largely used to specify the agent’s belief state - whether representing the agent’s ‘explicit beliefs’ or as a computational model for her belief state. While the connection of iterated AGM-like operations and their encoding in dynamic epistemic logics have been studied before, few works considered how well-known postulates from iterated belief revision theory can be characterised by means of belief bases and their counterpart in dynamic epistemic logic. This work investigates how priority graphs, a syntactic representation of preference relations deeply connected to prioritised bases, can be used to characterise belief change operators, focusing on well-known postulates of Iterated Belief Change. We provide syntactic representations of belief change operators in a dynamic context, as well as new negative results regarding the possibility of representing an iterated belief revision operation using transformations on priority graphs.
【Keywords】:
【Paper Link】 【Pages】:3084-3091
【Authors】: Roni Stern ; Brendan Juba
【Abstract】: Model-based diagnosis (MBD) is difficult to use in practice because it requires a model of the diagnosed system, which is often very hard to obtain. We explore theoretically how observing the system when it is in a normal state can provide information about the system that is sufficient to learn a partial system model that allows automated diagnosis. We analyze the number of observations needed to learn a model capable of finding faulty components in most cases. Then, we explore how knowing the system topology can help us to learn a useful model from the normal observations for settings in which many of the internal system variables cannot be observed. Unlike other data-driven methods, our learned model is safe, in the sense that subsystems identified as faulty are guaranteed to truly be faulty.
【Keywords】:
【Paper Link】 【Pages】:3092-3099
【Authors】: Przemyslaw Andrzej Walega ; Mark Kaminski ; Bernardo Cuenca Grau
【Abstract】: We study stream reasoning in datalogMTL—an extension of Datalog with metric temporal operators. We propose a sound and complete stream reasoning algorithm that is applicable to a fragment datalogMTLFP of datalogMTL, in which propagation of derived information towards past time points is precluded. Memory consumption in our algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This is undesirable since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set. Finally, we provide tight bounds to the data complexity of standard query answering in datalogMTLFP without punctual intervals in rules, which yield a new PSPACE lower bound to the data complexity of the full datalogMTL.
【Keywords】:
【Paper Link】 【Pages】:3100-3107
【Authors】: Jun Yuan ; Neng Gao ; Ji Xiang
【Abstract】: Embedding knowledge graphs (KGs) into continuous vector space is an essential problem in knowledge extraction. Current models continue to improve embedding by focusing on discriminating relation-specific information from entities with increasingly complex feature engineering. We noted that they ignored the inherent relevance between relations and tried to learn unique discriminate parameter set for each relation. Thus, these models potentially suffer from high time complexity and large parameters, preventing them from efficiently applying on real-world KGs. In this paper, we follow the thought of parameter sharing to simultaneously learn more expressive features, reduce parameters and avoid complex feature engineering. Based on gate structure from LSTM, we propose a novel model TransGate and develop shared discriminate mechanism, resulting in almost same space complexity as indiscriminate models. Furthermore, to develop a more effective and scalable model, we reconstruct the gate with weight vectors making our method has comparative time complexity against indiscriminate model. We conduct extensive experiments on link prediction and triplets classification. Experiments show that TransGate not only outperforms state-of-art baselines, but also reduces parameters greatly. For example, TransGate outperforms ConvE and RGCN with 6x and 17x fewer parameters, respectively. These results indicate that parameter sharing is a superior way to further optimize embedding and TransGate finds a better trade-off between complexity and expressivity.
【Keywords】:
【Paper Link】 【Pages】:3108-3115
【Authors】: Hao Zhang ; Shuigeng Zhou ; Chuanxu Yan ; Jihong Guan ; Xin Wang
【Abstract】: This paper addresses two important issues in causality inference. One is how to reduce redundant conditional independence (CI) tests, which heavily impact the efficiency and accuracy of existing constraint-based methods. Another is how to construct the true causal graph from a set of Markov equivalence classes returned by these methods.For the first issue, we design a recursive decomposition approach where the original data (a set of variables) is first decomposed into three small subsets, each of which is then recursively decomposed into three smaller subsets until none of subsets can be decomposed further. Consequently, redundant CI tests can be reduced by inferring causality from these subsets. Advantage of this decomposition scheme lies in two aspects: 1) it requires only low-order CI tests, and 2) it does not violate d-separation. Thus, the complete causality can be reconstructed by merging all the partial results of the subsets.For the second issue, we employ regression-based conditional independence test to check CIs in linear non-Gaussian additive noise cases, which can identify more causal directions by x−E(x|Z)⊥z (or y−E(y|Z)⊥z). Therefore, causal direction learning is no longer limited by the number of returned Vstructures and the consistent propagation.Extensive experiments show that the proposed method can not only substantially reduce redundant CI tests but also effectively distinguish the equivalence classes, thus is superior to the state of the art constraint-based methods in causality inference.
【Keywords】:
【Paper Link】 【Pages】:3116-3124
【Authors】: Yizheng Zhao ; Ghadah Alghamdi ; Renate A. Schmidt ; Hao Feng ; Giorgos Stoilos ; Damir Juric ; Mohammad Khodadadi
【Abstract】: This paper explores how the logical difference between two ontologies can be tracked using a forgetting-based or uniform interpolation (UI)-based approach. The idea is that rather than computing all entailments of one ontology not entailed by the other ontology, which would be computationally infeasible, only the strongest entailments not entailed in the other ontology are computed. To overcome drawbacks of existing forgetting/uniform interpolation tools we introduce a new forgetting method designed for the task of computing the logical difference between different versions of large-scale ontologies. The method is sound and terminating, and can compute uniform interpolants for ALC-ontologies as large as SNOMED CT and NCIt. Our evaluation shows that the method can achieve considerably better success rates (>90%) and provides a feasible approach to computing the logical difference in large-scale ontologies, as a case study on different versions of SNOMED CT and NCIt ontologies shows.
【Keywords】:
【Paper Link】 【Pages】:3125-3132
【Authors】: Zili Zhou ; Shaowu Liu ; Guandong Xu ; Wu Zhang
【Abstract】: Multi-relation embedding is a popular approach to knowledge base completion that learns embedding representations of entities and relations to compute the plausibility of missing triplet. The effectiveness of embedding approach depends on the sparsity of KB and falls for infrequent entities that only appeared a few times. This paper addresses this issue by proposing a new model exploiting the entity-independent transitive relation patterns, namely Transitive Relation Embedding (TRE). The TRE model alleviates the sparsity problem for predicting on infrequent entities while enjoys the generalisation power of embedding. Experiments on three public datasets against seven baselines showed the merits of TRE in terms of knowledge base completion accuracy as well as computational complexity.
【Keywords】:
【Paper Link】 【Pages】:3134-3142
【Authors】: David Abel ; Dilip Arumugam ; Kavosh Asadi ; Yuu Jinnai ; Michael L. Littman ; Lawson L. S. Wong
【Abstract】: State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion theory, the classic Blahut-Arimoto algorithm, and the Information Bottleneck method to develop an algorithm for computing state abstractions that approximate the optimal tradeoff between compression and performance. We illustrate the power of this algorithmic structure to offer insights into effective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.
【Keywords】:
【Paper Link】 【Pages】:3143-3150
【Authors】: Karim T. Abou-Moustafa ; Csaba Szepesvári
【Abstract】: There is an accumulating evidence in the literature that stability of learning algorithms is a key characteristic that permits a learning algorithm to generalize. Despite various insightful results in this direction, there seems to be an overlooked dichotomy in the type of stability-based generalization bounds we have in the literature. On one hand, the literature seems to suggest that exponential generalization bounds for the estimated risk, which are optimal, can be only obtained through stringent, distribution independent and computationally intractable notions of stability such as uniform stability. On the other hand, it seems that weaker notions of stability such as hypothesis stability, although it is distribution dependent and more amenable to computation, can only yield polynomial generalization bounds for the estimated risk, which are suboptimal. In this paper, we address the gap between these two regimes of results. In particular, the main question we address here is whether it is possible to derive exponential generalization bounds for the estimated risk using a notion of stability that is computationally tractable and distribution dependent, but weaker than uniform stability. Using recent advances in concentration inequalities, and using a notion of stability that is weaker than uniform stability but distribution dependent and amenable to computation, we derive an exponential tail bound for the concentration of the estimated risk of a hypothesis returned by a general learning rule, where the estimated risk is expressed in terms of the deleted estimate. Interestingly, we note that our final bound has similarities to previous exponential generalization bounds for the deleted estimate, in particular, the result of Bousquet and Elisseeff (2002) for the regression case.
【Keywords】:
【Paper Link】 【Pages】:3151-3158
【Authors】: Arpit Agarwal ; Katharina Muelling ; Katerina Fragkiadaki
【Abstract】: We propose an exploration method that incorporates lookahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies. Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.
【Keywords】:
【Paper Link】 【Pages】:3159-3166
【Authors】: Rami Al-Rfou' ; Dokook Choe ; Noah Constant ; Mandy Guo ; Llion Jones
【Abstract】: LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model (Vaswani et al. 2017) with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.
【Keywords】:
【Paper Link】 【Pages】:3167-3174
【Authors】: Scott Alfeld ; Ara Vartanian ; Lucas Newman-Johnson ; Benjamin I. P. Rubinstein
【Abstract】: While machine learning systems are known to be vulnerable to data-manipulation attacks at both training and deployment time, little is known about how to adapt attacks when the defender transforms data prior to model estimation. We consider the setting where the defender Bob first transforms the data then learns a model from the result; Alice, the attacker, perturbs Bob’s input data prior to him transforming it. We develop a general-purpose “plug and play” framework for gradient-based attacks based on matrix differentials, focusing on ordinary least-squares linear regression. This allows learning algorithms and data transformations to be paired and composed arbitrarily: attacks can be adapted through the use of the chain rule—analogous to backpropagation on neural network parameters—to compositional learning maps. Bestresponse attacks can be computed through matrix multiplications from a library of attack matrices for transformations and learners. Our treatment of linear regression extends state-ofthe-art attacks at training time, by permitting the attacker to affect both features and targets optimally and simultaneously. We explore several transformations broadly used across machine learning with a driving motivation for our work being autogressive modeling. There, Bob transforms a univariate time series into a matrix of observations and vector of target values which can then be fed into standard learners. Under this learning reduction, a perturbation from Alice to a single value of the time series affects features of several data points along with target values.
【Keywords】:
【Paper Link】 【Pages】:3175-3182
【Authors】: Abdul Fatir Ansari ; Harold Soh
【Abstract】: We address the problem of unsupervised disentanglement of latent representations learnt via deep generative models. In contrast to current approaches that operate on the evidence lower bound (ELBO), we argue that statistical independence in the latent space of VAEs can be enforced in a principled hierarchical Bayesian manner. To this effect, we augment the standard VAE with an inverse-Wishart (IW) prior on the covariance matrix of the latent code. By tuning the IW parameters, we are able to encourage (or discourage) independence in the learnt latent dimensions. Extensive experimental results on a range of datasets (2DShapes, 3DChairs, 3DFaces and CelebA) show our approach to outperform the β-VAE and is competitive with the state-of-the-art FactorVAE. Our approach achieves significantly better disentanglement and reconstruction on a new dataset (CorrelatedEllipses) which introduces correlations between the factors of variation.
【Keywords】:
【Paper Link】 【Pages】:3183-3190
【Authors】: Chidubem Arachie ; Bert Huang
【Abstract】: We consider the task of training classifiers without labels. We propose a weakly supervised method—adversarial label learning—that trains classifiers to perform well against an adversary that chooses labels for training data. The weak supervision constrains what labels the adversary can choose. The method therefore minimizes an upper bound of the classifier’s error rate using projected primal-dual subgradient descent. Minimizing this bound protects against bias and dependencies in the weak supervision. Experiments on real datasets show that our method can train without labels and outperforms other approaches for weakly supervised learning.
【Keywords】:
【Paper Link】 【Pages】:3191-3198
【Authors】: Mohammadreza Armandpour ; Patrick Ding ; Jianhua Huang ; Xia Hu
【Abstract】: Many recent network embedding algorithms use negative sampling (NS) to approximate a variant of the computationally expensive Skip-Gram neural network architecture (SGA) objective. In this paper, we provide theoretical arguments that reveal how NS can fail to properly estimate the SGA objective, and why it is not a suitable candidate for the network embedding problem as a distinct objective. We show NS can learn undesirable embeddings, as the result of the “Popular Neighbor Problem.” We use the theory to develop a new method “R-NS” that alleviates the problems of NS by using a more intelligent negative sampling scheme and careful penalization of the embeddings. R-NS is scalable to large-scale networks, and we empirically demonstrate the superiority of R-NS over NS for multi-label classification on a variety of real-world networks including social networks and language networks.
【Keywords】:
【Paper Link】 【Pages】:3199-3206
【Authors】: Kyohei Atarashi ; Subhransu Maji ; Satoshi Oyama
【Abstract】: Although kernel methods efficiently use feature combinations without computing them directly, they do not scale well with the size of the training dataset. Factorization machines (FMs) and related models, on the other hand, enable feature combinations efficiently, but their optimization generally requires solving a non-convex problem. We present random feature maps for the itemset kernel, which uses feature combinations, and includes the ANOVA kernel, the all-subsets kernel, and the standard dot product. Linear models using one of our proposed maps can be used as an alternative to kernel methods and FMs, resulting in better scalability during both training and evaluation. We also present theoretical results for a proposed map, discuss the relationship between factorization machines and linear models using a proposed map for the ANOVA kernel, and relate the proposed feature maps to prior work. Furthermore, we show that the maps can be calculated more efficiently by using a signed circulant matrix projection technique. Finally, we demonstrate the effectiveness of using the proposed maps for real-world datasets..
【Keywords】:
【Paper Link】 【Pages】:3207-3214
【Authors】: Georgia Avarikioti ; Alain Ryser ; Yuyi Wang ; Roger Wattenhofer
【Abstract】: Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called r-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the runtime of approximating r-nets in high-dimensional spaces with1 and `2 metrics from, where . These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g.,pproximate kth-nearest neighbor distance-approximate Min-Max clustering,-approximate k-center clustering. In addition, we build an algorithm that-approximates greedy permutations in time O˜((dn+n2−α)·logΦ) where Φ is the spread of the input. This algorithm is used to -approximate k-center with the same time complexity.
【Keywords】:
【Paper Link】 【Pages】:3215-3223
【Authors】: Wissam J. Baddar ; Yong Man Ro
【Abstract】: Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.
【Keywords】:
【Paper Link】 【Pages】:3224-3231
【Authors】: Christopher Bartley ; Wei Liu ; Mark Reynolds
【Abstract】: One of the factors hindering the use of classification models in decision making is that their predictions may contradict expectations. In domains such as finance and medicine, the ability to include knowledge of monotone (nondecreasing) relationships is sought after to increase accuracy and user satisfaction. As one of the most successful classifiers, attempts have been made to do so for Random Forest. Ideally a solution would (a) maximise accuracy; (b) have low complexity and scale well; (c) guarantee global monotonicity; and (d) cater for multi-class. This paper first reviews the state-of-theart from both the literature and statistical libraries, and identifies opportunities for improvement. A new rule-based method is then proposed, with a maximal accuracy variant and a faster approximate variant. Simulated and real datasets are then used to perform the most comprehensive ordinal classification benchmarking in the monotone forest literature. The proposed approaches are shown to reduce the bias induced by monotonisation and thereby improve accuracy.
【Keywords】:
【Paper Link】 【Pages】:3232-3239
【Authors】: Ege Beyazit ; Jeevithan Alagurajah ; Xindong Wu
【Abstract】: We study the problem of online learning with varying feature spaces. The problem is challenging because, unlike traditional online learning problems, varying feature spaces can introduce new features or stop having some features without following a pattern. Other existing methods such as online streaming feature selection (Wu et al. 2013), online learning from trapezoidal data streams (Zhang et al. 2016), and learning with feature evolvable streams (Hou, Zhang, and Zhou 2017) are not capable to learn from arbitrarily varying feature spaces because they make assumptions about the feature space dynamics. In this paper, we propose a novel online learning algorithm OLVF to learn from data with arbitrarily varying feature spaces. The OLVF algorithm learns to classify the feature spaces and the instances from feature spaces simultaneously. To classify an instance, the algorithm dynamically projects the instance classifier and the training instance onto their shared feature subspace. The feature space classifier predicts the projection confidences for a given feature space. The instance classifier will be updated by following the empirical risk minimization principle and the strength of the constraints will be scaled by the projection confidences. Afterwards, a feature sparsity method is applied to reduce the model complexity. Experiments on 10 datasets with varying feature spaces have been conducted to demonstrate the performance of the proposed OLVF algorithm. Moreover, experiments with trapezoidal data streams on the same datasets have been conducted to show that OLVF performs better than the state-of-the-art learning algorithm (Zhang et al. 2016).
【Keywords】:
【Paper Link】 【Pages】:3240-3247
【Authors】: Akhilan Boopathy ; Tsui-Wei Weng ; Pin-Yu Chen ; Sijia Liu ; Luca Daniel
【Abstract】: Verifying robustness of neural network classifiers has attracted great interests and attention due to the success of deep neural networks and their unexpected vulnerability to adversarial perturbations. Although finding minimum adversarial distortion of neural networks (with ReLU activations) has been shown to be an NP-complete problem, obtaining a non-trivial lower bound of minimum distortion as a provable robustness guarantee is possible. However, most previous works only focused on simple fully-connected layers (multilayer perceptrons) and were limited to ReLU activations. This motivates us to propose a general and efficient framework, CNN-Cert, that is capable of certifying robustness on general convolutional neural networks. Our framework is general – we can handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions; our approach is efficient – by exploiting the special structure of convolutional layers, we achieve up to 17 and 11 times of speed-up compared to the state-of-the-art certification algorithms (e.g. Fast-Lin, CROWN) and 366 times of speed-up compared to the dual-LP approach while our algorithm obtains similar or even better verification bounds. In addition, CNN-Cert generalizes state-of-the-art algorithms e.g. Fast-Lin and CROWN. We demonstrate by extensive experiments that our method outperforms state-of-the-art lowerbound-based certification algorithms in terms of both bound quality and speed.
【Keywords】:
【Paper Link】 【Pages】:3248-3255
【Authors】: Cory J. Butz ; Jhonatan de S. Oliveira ; André E. dos Santos ; André L. Teixeira
【Abstract】: We give conditions under which convolutional neural networks (CNNs) define valid sum-product networks (SPNs). One subclass, called convolutional SPNs (CSPNs), can be implemented using tensors, but also can suffer from being too shallow. Fortunately, tensors can be augmented while maintaining valid SPNs. This yields a larger subclass of CNNs, which we call deep convolutional SPNs (DCSPNs), where the convolutional and sum-pooling layers form rich directed acyclic graph structures. One salient feature of DCSPNs is that they are a rigorous probabilistic model. As such, they can exploit multiple kinds of probabilistic reasoning, including marginal inference and most probable explanation (MPE) inference. This allows an alternative method for learning DCSPNs using vectorized differentiable MPE, which plays a similar role to the generator in generative adversarial networks (GANs). Image sampling is yet another application demonstrating the robustness of DCSPNs. Our preliminary results on image sampling are encouraging, since the DCSPN sampled images exhibit variability. Experiments on image completion show that DCSPNs significantly outperform competing methods by achieving several state-of-the-art mean squared error (MSE) scores in both left-completion and bottom-completion in benchmark datasets.
【Keywords】:
【Paper Link】 【Pages】:3256-3263
【Authors】: Xu Cai ; Yang Wu ; Guanbin Li ; Ziliang Chen ; Liang Lin
【Abstract】: FRAME (Filters, Random fields, And Maximum Entropy) is an energy-based descriptive model that synthesizes visual realism by capturing mutual patterns from structural input signals. The maximum likelihood estimation (MLE) is applied by default, yet conventionally causes the unstable training energy that wrecks the generated structures, which remains unexplained. In this paper, we provide a new theoretical insight to analyze FRAME, from a perspective of particle physics ascribing the weird phenomenon to KL-vanishing issue. In order to stabilize the energy dissipation, we propose an alternative Wasserstein distance in discrete time based on the conclusion that the Jordan-Kinderlehrer-Otto (JKO) discrete flow approximates KL discrete flow when the time step size tends to 0. Besides, this metric can still maintain the model’s statistical consistency. Quantitative and qualitative experiments have been respectively conducted on several widely used datasets. The empirical studies have evidenced the effectiveness and superiority of our method.
【Keywords】:
【Paper Link】 【Pages】:3264-3271
【Authors】: Junyu Cao ; Wei Sun
【Abstract】: Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the platform due to marketing fatigue. We propose a novel sequential choice model to capture multiple interactions taking place between the platform and its user: Upon receiving a message, a user decides on one of the three actions: accept the message, skip and receive the next message, or abandon the platform. Based on user feedback, the platform dynamically learns users’ abandonment distribution and their valuations of messages to determine the length of the sequence and the order of the messages, while maximizing the cumulative payoff over a horizon of length T. We refer to this online learning task as the sequential choice bandit problem. For the offline combinatorial optimization problem, we show a polynomialtime algorithm. For the online problem, we propose an algorithm that balances exploration and exploitation, and characterize its regret bound. Lastly, we demonstrate how to extend the model with user contexts to incorporate personalization.
【Keywords】:
【Paper Link】 【Pages】:3272-3279
【Authors】: Miriam Cha ; Youngjune L. Gwon ; H. T. Kung
【Abstract】: We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching the text input. A key to our training methods is how to form positive and negative training examples with respect to the class label of a given image. Instead of selecting random training examples, we perform negative sampling based on the semantic distance from a positive example in the class. We evaluate our approach using the Oxford-102 flower dataset, adopting the inception score and multi-scale structural similarity index (MS-SSIM) metrics to assess discriminability and diversity of the generated images. The empirical results indicate greater diversity in the generated images, especially when we gradually select more negative training examples closer to a positive example in the semantic space.
【Keywords】:
【Paper Link】 【Pages】:3280-3287
【Authors】: Sarath Chandar ; Chinnadhurai Sankar ; Eugene Vorontsov ; Samira Ebrahimi Kahou ; Yoshua Bengio
【Abstract】: Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases. Gradients can be attenuated by transition operators and are attenuated or dropped by activation functions. Canonical architectures like LSTM alleviate this issue by skipping information through a memory mechanism. We propose a new recurrent architecture (Non-saturating Recurrent Unit; NRU) that relies on a memory mechanism but forgoes both saturating activation functions and saturating gates, in order to further alleviate vanishing gradients. In a series of synthetic and real world tasks, we demonstrate that the proposed model is the only model that performs among the top 2 models across all tasks with and without long-term dependencies, when compared against a range of other architectures.
【Keywords】:
【Paper Link】 【Pages】:3288-3295
【Authors】: Xiaobin Chang ; Yongxin Yang ; Tao Xiang ; Timothy M. Hospedales
【Abstract】: In this paper, a unified approach is presented to transfer learning that addresses several source and target domain labelspace and annotation assumptions with a single model. It is particularly effective in handling a challenging case, where source and target label-spaces are disjoint, and outperforms alternatives in both unsupervised and semi-supervised settings. The key ingredient is a common representation termed Common Factorised Space. It is shared between source and target domains, and trained with an unsupervised factorisation loss and a graph-based loss. With a wide range of experiments, we demonstrate the flexibility, relevance and efficacy of our method, both in the challenging cases with disjoint label spaces, and in the more conventional cases such as unsupervised domain adaptation, where the source and target domains share the same label-sets.
【Keywords】:
【Paper Link】 【Pages】:3296-3303
【Authors】: Chao Chen ; Zhihong Chen ; Boyuan Jiang ; Xinyu Jin
【Abstract】: Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities. However, most of existing work only concentrates on learning shared feature representation by minimizing the distribution discrepancy across different domains. Due to the fact that all the domain alignment approaches can only reduce, but not remove the domain shift, target domain samples distributed near the edge of the clusters, or far from their corresponding class centers are easily to be misclassified by the hyperplane learned from the source domain. To alleviate this issue, we propose to joint domain alignment and discriminative feature learning, which could benefit both domain alignment and final classification. Specifically, an instance-based discriminative feature learning method and a center-based discriminative feature learning method are proposed, both of which guarantee the domain invariant features with better intra-class compactness and inter-class separability. Extensive experiments show that learning the discriminative features in the shared feature space can significantly boost the performance of deep domain adaptation methods.
【Keywords】:
【Paper Link】 【Pages】:3304-3311
【Authors】: Chen Chen ; Haobo Wang ; Weiwei Liu ; Xingyuan Zhao ; Tianlei Hu ; Gang Chen
【Abstract】: Label embedding has been widely used as a method to exploit label dependency with dimension reduction in multilabel classification tasks. However, existing embedding methods intend to extract label correlations directly, and thus they might be easily trapped by complex label hierarchies. To tackle this issue, we propose a novel Two-Stage Label Embedding (TSLE) paradigm that involves Neural Factorization Machine (NFM) to jointly project features and labels into a latent space. In encoding phase, we introduce a Twin Encoding Network (TEN) that digs out pairwise feature and label interactions in the first stage and then efficiently learn higherorder correlations with deep neural networks (DNNs) in the second stage. After the codewords are obtained, a set of hidden layers is applied to recover the output labels in decoding phase. Moreover, we develop a novel learning model by leveraging a max margin encoding loss and a label-correlation aware decoding loss, and we adopt the mini-batch Adam to optimize our learning model. Lastly, we also provide a kernel insight to better understand our proposed TSLE. Extensive experiments on various real-world datasets demonstrate that our proposed model significantly outperforms other state-ofthe-art approaches.
【Keywords】:
【Paper Link】 【Pages】:3312-3320
【Authors】: Haokun Chen ; Xinyi Dai ; Han Cai ; Weinan Zhang ; Xuejian Wang ; Ruiming Tang ; Yuzhou Zhang ; Yong Yu
【Abstract】: Reinforcement learning (RL) has recently been introduced to interactive recommender systems (IRS) because of its nature of learning from dynamic interactions and planning for long-run performance. As IRS is always with thousands of items to recommend (i.e., thousands of actions), most existing RL-based methods, however, fail to handle such a large discrete action space problem and thus become inefficient. The existing work that tries to deal with the large discrete action space problem by utilizing the deep deterministic policy gradient framework suffers from the inconsistency between the continuous action representation (the output of the actor network) and the real discrete action. To avoid such inconsistency and achieve high efficiency and recommendation effectiveness, in this paper, we propose a Tree-structured Policy Gradient Recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree. Extensive experiments on carefully-designed environments based on two real-world datasets demonstrate that our model provides superior recommendation performance and significant efficiency improvement over state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:3321-3328
【Authors】: Kaixuan Chen ; Lina Yao ; Dalin Zhang ; Xiaojun Chang ; Guodong Long ; Sen Wang
【Abstract】: Semi-supervised learning is crucial for alleviating labelling burdens in people-centric sensing. However, humangenerated data inherently suffer from distribution shift in semi-supervised learning due to the diverse biological conditions and behavior patterns of humans. To address this problem, we propose a generic distributionally robust model for semi-supervised learning on distributionally shifted data. Considering both the discrepancy and the consistency between the labeled data and the unlabeled data, we learn the latent features that reduce person-specific discrepancy and preserve task-specific consistency. We evaluate our model in a variety of people-centric recognition tasks on real-world datasets, including intention recognition, activity recognition, muscular movement recognition and gesture recognition. The experiment results demonstrate that the proposed model outperforms the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:3329-3336
【Authors】: Shangyu Chen ; Wenya Wang ; Sinno Jialin Pan
【Abstract】: The advancement of deep models poses great challenges to real-world deployment because of the limited computational ability and storage space on edge devices. To solve this problem, existing works have made progress to prune or quantize deep models. However, most existing methods rely heavily on a supervised training process to achieve satisfactory performance, acquiring large amount of labeled training data, which may not be practical for real deployment. In this paper, we propose a novel layer-wise quantization method for deep neural networks, which only requires limited training data (1% of original dataset). Specifically, we formulate parameters quantization for each layer as a discrete optimization problem, and solve it using Alternative Direction Method of Multipliers (ADMM), which gives an efficient closed-form solution. We prove that the final performance drop after quantization is bounded by a linear combination of the reconstructed errors caused at each layer. Based on the proved theorem, we propose an algorithm to quantize a deep neural network layer by layer with an additional weights update step to minimize the final error. Extensive experiments on benchmark deep models are conducted to demonstrate the effectiveness of our proposed method using 1% of CIFAR10 and ImageNet datasets. Codes are available in: https://github.com/csyhhu/L-DNQ
【Keywords】:
【Paper Link】 【Pages】:3337-3346
【Authors】: Sheng-Wei Chen ; Chun-Nan Chou ; Edward Y. Chang
【Abstract】: For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kroneckerfactored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.
【Keywords】:
【Paper Link】 【Pages】:3347-3354
【Authors】: Shuo Chen ; Chen Gong ; Jian Yang ; Ying Tai ; Le Hui ; Jun Li
【Abstract】: The central problem for most existing metric learning methods is to find a suitable projection matrix on the differences of all pairs of data points. However, a single unified projection matrix can hardly characterize all data similarities accurately as the practical data are usually very complicated, and simply adopting one global projection matrix might ignore important local patterns hidden in the dataset. To address this issue, this paper proposes a novel method dubbed “Data-Adaptive Metric Learning” (DAML), which constructs a data-adaptive projection matrix for each data pair by selectively combining a set of learned candidate matrices. As a result, every data pair can obtain a specific projection matrix, enabling the proposed DAML to flexibly fit the training data and produce discriminative projection results. The model of DAML is formulated as an optimization problem which jointly learns candidate projection matrices and their sparse combination for every data pair. Nevertheless, the over-fitting problem may occur due to the large amount of parameters to be learned. To tackle this issue, we adopt the Total Variation (TV) regularizer to align the scales of data embedding produced by all candidate projection matrices, and thus the generated metrics of these learned candidates are generally comparable. Furthermore, we extend the basic linear DAML model to the kernerlized version (denoted “KDAML”) to handle the non-linear cases, and the Iterative Shrinkage-Thresholding Algorithm (ISTA) is employed to solve the optimization model. Intensive experimental results on various applications including retrieval, classification, and verification clearly demonstrate the superiority of our algorithm to other state-of-the-art metric learning methodologies.
【Keywords】:
【Paper Link】 【Pages】:3355-3362
【Authors】: Weijie Chen ; Yuan Zhang ; Di Xie ; Shiliang Pu
【Abstract】: Neuron pruning is an efficient method to compress the network into a slimmer one for reducing the computational cost and storage overhead. Most of state-of-the-art results are obtained in a layer-by-layer optimization mode. It discards the unimportant input neurons and uses the survived ones to reconstruct the output neurons approaching to the original ones in a layer-by-layer manner. However, an unnoticed problem arises that the information loss is accumulated as layer increases since the survived neurons still do not encode the entire information as before. A better alternative is to propagate the entire useful information to reconstruct the pruned layer instead of directly discarding the less important neurons. To this end, we propose a novel Layer DecompositionRecomposition Framework (LDRF) for neuron pruning, by which each layer’s output information is recovered in an embedding space and then propagated to reconstruct the following pruned layers with useful information preserved. We mainly conduct our experiments on ILSVRC-12 benchmark with VGG-16 and ResNet-50. What should be emphasized is that our results before end-to-end fine-tuning are significantly superior owing to the information-preserving property of our proposed framework. With end-to-end fine-tuning, we achieve state-of-the-art results of 5.13× and 3× speed-up with only 0.5% and 0.65% top-5 accuracy drop respectively, which outperform the existing neuron pruning methods.
【Keywords】:
【Paper Link】 【Pages】:3363-3370
【Authors】: Xuelu Chen ; Muhao Chen ; Weijia Shi ; Yizhou Sun ; Carlo Zaniolo
【Abstract】: Embedding models for deterministic Knowledge Graphs (KG) have been extensively studied, with the purpose of capturing latent semantic relations between entities and incorporating the structured knowledge they contain into machine learning. However, there are many KGs that model uncertain knowledge, which typically model the inherent uncertainty of relations facts with a confidence score, and embedding such uncertain knowledge represents an unresolved challenge. The capturing of uncertain knowledge will benefit many knowledge-driven applications such as question answering and semantic search by providing more natural characterization of the knowledge. In this paper, we propose a novel uncertain KG embedding model UKGE, which aims to preserve both structural and uncertainty information of relation facts in the embedding space. Unlike previous models that characterize relation facts with binary classification techniques, UKGE learns embeddings according to the confidence scores of uncertain relation facts. To further enhance the precision of UKGE, we also introduce probabilistic soft logic to infer confidence scores for unseen relation facts during training. We propose and evaluate two variants of UKGE based on different confidence score modeling strategies. Experiments are conducted on three real-world uncertain KGs via three tasks, i.e. confidence prediction, relation fact ranking, and relation fact classification. UKGE shows effectiveness in capturing uncertain knowledge by achieving promising results, and it consistently outperforms baselines on these tasks.
【Keywords】:
【Paper Link】 【Pages】:3371-3378
【Authors】: Zitai Chen ; Chuan Chen ; Zibin Zheng ; Yi Zhu
【Abstract】: Clustering on multilayer networks has been shown to be a promising approach to enhance the accuracy. Various multilayer networks clustering algorithms assume all networks derive from a latent clustering structure, and jointly learn the compatible and complementary information from different networks to excavate one shared underlying structure. However, such an assumption is in conflict with many emerging real-life applications due to the existence of noisy/irrelevant networks. To address this issue, we propose Centroid-based Multilayer Network Clustering (CMNC), a novel approach which can divide irrelevant relationships into different network groups and uncover the cluster structure in each group simultaneously. The multilayer networks is represented within a unified tensor framework for simultaneously capturing multiple types of relationships between a set of entities. By imposing the rank-(Lr,Lr,1) block term decomposition with nonnegativity, we are able to have well interpretations on the multiple clustering results based on graph cut theory. Numerically, we transform this tensor decomposition problem to an unconstrained optimization, thus can solve it efficiently under the nonlinear least squares (NLS) framework. Extensive experimental results on synthetic and real-world datasets show the effectiveness and robustness of our method against noise and irrelevant data.
【Keywords】:
【Paper Link】 【Pages】:3379-3386
【Authors】: Zitian Chen ; Yanwei Fu ; Kaiyu Chen ; Yu-Gang Jiang
【Abstract】: Given one or a few training instances of novel classes, oneshot learning task requires that the classifier generalizes to these novel classes. Directly training one-shot classifier may suffer from insufficient training instances in one-shot learning. Previous one-shot learning works investigate the metalearning or metric-based algorithms; in contrast, this paper proposes a Self-Training Jigsaw Augmentation (Self-Jig) method for one-shot learning. Particularly, we solve one-shot learning by directly augmenting the training images through leveraging the vast unlabeled instances. Precisely our proposed Self-Jig algorithm can synthesize new images from the labeled probe and unlabeled gallery images. The labels of gallery images are predicted to help the augmentation process, which can be taken as a self-training scheme. Intrinsically, we argue that we provide a very useful way of directly generating massive amounts of training images for novel classes. Extensive experiments and ablation study not only evaluate the efficacy but also reveal the insights, of the proposed Self-Jig method.
【Keywords】:
【Paper Link】 【Pages】:3387-3395
【Authors】: Richard Cheng ; Gábor Orosz ; Richard M. Murray ; Joel W. Burdick
【Abstract】: Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.
【Keywords】:
【Paper Link】 【Pages】:3396-3403
【Authors】: Daeyoung Choi ; Wonjong Rhee
【Abstract】: Statistical characteristics of deep network representations, such as sparsity and correlation, are known to be relevant to the performance and interpretability of deep learning. When a statistical characteristic is desired, often an adequate regularizer can be designed and applied during the training phase. Typically, such a regularizer aims to manipulate a statistical characteristic over all classes together. For classification tasks, however, it might be advantageous to enforce the desired characteristic per class such that different classes can be better distinguished. Motivated by the idea, we design two class-wise regularizers that explicitly utilize class information: class-wise Covariance Regularizer (cw-CR) and classwise Variance Regularizer (cw-VR). cw-CR targets to reduce the covariance of representations calculated from the same class samples for encouraging feature independence. cw-VR is similar, but variance instead of covariance is targeted to improve feature compactness. For the sake of completeness, their counterparts without using class information, Covariance Regularizer (CR) and Variance Regularizer (VR), are considered together. The four regularizers are conceptually simple and computationally very efficient, and the visualization shows that the regularizers indeed perform distinct representation shaping. In terms of classification performance, significant improvements over the baseline and L1/L2 weight regularization methods were found for 21 out of 22 tasks over popular benchmark datasets. In particular, cw-VR achieved the best performance for 13 tasks including ResNet-32/110.
【Keywords】:
【Paper Link】 【Pages】:3404-3411
【Authors】: Andrew Cohen ; Xingye Qiao ; Lei Yu ; Elliot Way ; Xiangrong Tong
【Abstract】: We address the challenge of effective exploration while maintaining good performance in policy gradient methods. As a solution, we propose diverse exploration (DE) via conjugate policies. DE learns and deploys a set of conjugate policies which can be conveniently generated as a byproduct of conjugate gradient descent. We provide both theoretical and empirical results showing the effectiveness of DE at achieving exploration, improving policy performance, and the advantage of DE over exploration by random policy perturbations.
【Keywords】:
【Paper Link】 【Pages】:3412-3420
【Authors】: Eric Crawford ; Joelle Pineau
【Abstract】: There are many reasons to expect an ability to reason in terms of objects to be a crucial skill for any generally intelligent agent. Indeed, recent machine learning literature is replete with examples of the benefits of object-like representations: generalization, transfer to new tasks, and interpretability, among others. However, in order to reason in terms of objects, agents need a way of discovering and detecting objects in the visual world - a task which we call unsupervised object detection. This task has received significantly less attention in the literature than its supervised counterpart, especially in the case of large images containing many objects. In the current work, we develop a neural network architecture that effectively addresses this large-image, many-object setting. In particular, we combine ideas from Attend, Infer, Repeat (AIR), which performs unsupervised object detection but does not scale well, with recent developments in supervised object detection. We replace AIR’s core recurrent network with a convolutional (and thus spatially invariant) network, and make use of an object-specification scheme that describes the location of objects with respect to local grid cells rather than the image as a whole. Through a series of experiments, we demonstrate a number of features of our architecture: that, unlike AIR, it is able to discover and detect objects in large, many-object scenes; that it has a significant ability to generalize to images that are larger and contain more objects than images encountered during training; and that it is able to discover and detect objects with enough accuracy to facilitate non-trivial downstream processing.
【Keywords】:
【Paper Link】 【Pages】:3421-3428
【Authors】: Giovanni Da San Martino ; Alessandro Sperduti ; Fabio Aiolli ; Alessandro Moschitti
【Abstract】: Kernel methods are popular and effective techniques for learning on structured data, such as trees and graphs. One of their major drawbacks is the computational cost related to making a prediction on an example, which manifests in the classification phase for batch kernel methods, and especially in online learning algorithms. In this paper, we analyze how to speed up the prediction when the kernel function is an instance of the Mapping Kernels, a general framework for specifying kernels for structured data which extends the popular convolution kernel framework. We theoretically study the general model, derive various optimization strategies and show how to apply them to popular kernels for structured data. Additionally, we derive a reliable empirical evidence on semantic role labeling task, which is a natural language classification task, highly dependent on syntactic trees. The results show that our faster approach can clearly improve on standard kernel-based SVMs, which cannot run on very large datasets.
【Keywords】:
【Paper Link】 【Pages】:3429-3436
【Authors】: Songmin Dai ; Xiaoqiang Li ; Lu Wang ; Pin Wu ; Weiqin Tong ; Yimin Chen
【Abstract】: An instance with a bad mask might make a composite image that uses it look fake. This encourages us to learn segmentation by generating realistic composite images. To achieve this, we propose a novel framework that exploits a new proposed prior called the independence prior based on Generative Adversarial Networks (GANs). The generator produces an image with multiple category-specific instance providers, a layout module and a composition module. Firstly, each provider independently outputs a category-specific instance image with a soft mask. Then the provided instances’ poses are corrected by the layout module. Lastly, the composition module combines these instances into a final image. Training with adversarial loss and penalty for mask area, each provider learns a mask that is as small as possible but enough to cover a complete category-specific instance. Weakly supervised semantic segmentation methods widely use grouping cues modeling the association between image parts, which are either artificially designed or learned with costly segmentation labels or only modeled on local pairs. Unlike them, our method automatically models the dependence between any parts and learns instance segmentation. We apply our framework in two cases: (1) Foreground segmentation on category-specific images with box-level annotation. (2) Unsupervised learning of instance appearances and masks with only one image of homogeneous object cluster (HOC). We get appealing results in both tasks, which shows the independence prior is useful for instance segmentation and it is possible to unsupervisedly learn instance masks with only one image.
【Keywords】:
【Paper Link】 【Pages】:3437-3444
【Authors】: Sumanth Dathathri ; Sicun Gao ; Richard M. Murray
【Abstract】: Neural networks in real-world applications have to satisfy critical properties such as safety and reliability. The analysis of such properties typically requires extracting information through computing pre-images of the network transformations, but it is well-known that explicit computation of pre-images is intractable. We introduce new methods for computing compact symbolic abstractions of pre-images by computing their overapproximations and underapproximations through all layers. The abstraction of pre-images enables formal analysis and knowledge extraction without affecting standard learning algorithms. We use inverse abstractions to automatically extract simple control laws and compact representations for pre-images corresponding to unsafe outputs. We illustrate that the extracted abstractions are interpretable and can be used for analyzing complex properties.
【Keywords】:
【Paper Link】 【Pages】:3445-3453
【Authors】: Maria Dimakopoulou ; Zhengyuan Zhou ; Susan Athey ; Guido Imbens
【Abstract】: Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data.
【Keywords】:
【Paper Link】 【Pages】:3454-3461
【Authors】: Lizhong Ding ; Zhi Liu ; Yu Li ; Shizhong Liao ; Yong Liu ; Peng Yang ; Ge Yu ; Ling Shao ; Xin Gao
【Abstract】: We propose a framework for analyzing and comparing distributions without imposing any parametric assumptions via empirical likelihood methods. Our framework is used to study two fundamental statistical test problems: the two-sample test and the goodness-of-fit test. For the two-sample test, we need to determine whether two groups of samples are from different distributions; for the goodness-of-fit test, we examine how likely it is that a set of samples is generated from a known target distribution. Specifically, we propose empirical likelihood ratio (ELR) statistics for the two-sample test and the goodness-of-fit test, both of which are of linear time complexity and show higher power (i.e., the probability of correctly rejecting the null hypothesis) than the existing linear statistics for high-dimensional data. We prove the nonparametric Wilks’ theorems for the ELR statistics, which illustrate that the limiting distributions of the proposed ELR statistics are chi-square distributions. With these limiting distributions, we can avoid bootstraps or simulations to determine the threshold for rejecting the null hypothesis, which makes the ELR statistics more efficient than the recently proposed linear statistic, finite set Stein discrepancy (FSSD). We also prove the consistency of the ELR statistics, which guarantees that the test power goes to 1 as the number of samples goes to infinity. In addition, we experimentally demonstrate and theoretically analyze that FSSD has poor performance or even fails to test for high-dimensional data. Finally, we conduct a series of experiments to evaluate the performance of our ELR statistics as compared to state-of-the-art linear statistics.
【Keywords】:
【Paper Link】 【Pages】:3462-3469
【Authors】: Lizhong Ding ; Yong Liu ; Shizhong Liao ; Yu Li ; Peng Yang ; Yijie Pan ; Chao Huang ; Ling Shao ; Xin Gao
【Abstract】: Kernel selection is fundamental to the generalization performance of kernel-based learning algorithms. Approximate kernel selection is an efficient kernel selection approach that exploits the convergence property of the kernel selection criteria and the computational virtue of kernel matrix approximation. The convergence property is measured by the notion of approximate consistency. For the existing Nyström approximations, whose sampling distributions are independent of the specific learning task at hand, it is difficult to establish the strong approximate consistency. They mainly focus on the quality of the low-rank matrix approximation, rather than the performance of the kernel selection criterion used in conjunction with the approximate matrix. In this paper, we propose a novel Nyström approximate kernel selection algorithm by customizing a criterion-driven adaptive sampling distribution for the Nyström approximation, which adaptively reduces the error between the approximate and accurate criteria. We theoretically derive the strong approximate consistency of the proposed Nyström approximate kernel selection algorithm. Finally, we empirically evaluate the approximate consistency of our algorithm as compared to state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:3470-3477
【Authors】: Thang Doan ; João Monteiro ; Isabela Albuquerque ; Bogdan Mazoure ; Audrey Durand ; Joelle Pineau ; R. Devon Hjelm
【Abstract】: Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.
【Keywords】:
【Paper Link】 【Pages】:3478-3485
【Authors】: Bo Dong ; Yang Gao ; Swarup Chandra ; Latifur Khan
【Abstract】: In supervised learning, availability of sufficient labeled data is of prime importance. Unfortunately, they are sparingly available in many real-world applications. Particularly when performing classification over a non-stationary data stream, unavailability of sufficient labeled data undermines the classifier’s long-term performance by limiting its adaptability to changes in data distribution over time. Recently, studies in such settings have appealed to transfer learning techniques over a data stream while detecting drifts in data distribution over time. Here, the data stream is represented by two independent non-stationary streams, one containing labeled data instances (called source stream) having a biased distribution compared to the unlabeled data instances (called target stream). The task of label prediction under this representation is called Multistream Classification, where instances in the two streams occur independently. While these studies have addressed various challenges in the multistream setting, it still suffers from large computational overhead mainly due to frequent bias correction and drift adaptation methods employed. In this paper, we focus on utilizing an alternative bias correction technique, called relative density-ratio estimation, which is known to be computationally faster. Importantly, we propose a novel mechanism to automatically learn an appropriate mixture of relative density that adapts to changes in the multistream setting over time. We theoretically study its properties and empirically demonstrate its superior performance, within a multistream framework called MSCRDR, on benchmark datasets by comparing with other competing methods.
【Keywords】:
【Paper Link】 【Pages】:3486-3493
【Authors】: Qi Dong ; Xiatian Zhu ; Shaogang Gong
【Abstract】: The objective learning formulation is essential for the success of convolutional neural networks. In this work, we analyse thoroughly the standard learning objective functions for multiclass classification CNNs: softmax regression (SR) for singlelabel scenario and logistic regression (LR) for multi-label scenario. Our analyses lead to an inspiration of exploiting LR for single-label classification learning, and then the disclosing of the negative class distraction problem in LR. To address this problem, we develop two novel LR based objective functions that not only generalise the conventional LR but importantly turn out to be competitive alternatives to SR in single label classification. Extensive comparative evaluations demonstrate the model learning advantages of the proposed LR functions over the commonly adopted SR in single-label coarse-grained object categorisation and cross-class fine-grained person instance identification tasks. We also show the performance superiority of our method on clothing attribute classification in comparison to the vanilla LR function. The code had been made publicly available.
【Keywords】:
【Paper Link】 【Pages】:3494-3501
【Authors】: Yonathan Efroni ; Gal Dalal ; Bruno Scherrer ; Shie Mannor
【Abstract】: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero (Silver et al. 2017b)). Referring to the planning problem as tree search, a reasonable practice in these implementations is to back up the value only at the leaves while the information obtained at the root is not leveraged other than for updating the policy. Here, we question the potency of this approach. Namely, the latter procedure is non-contractive in general, and its convergence is not guaranteed. Our proposed enhancement is straightforward and simple: use the return from the optimal tree path to back up the values at the descendants of the root. This leads to a γh-contracting procedure, where γ is the discount factor and h is the tree depth. To establish our results, we first introduce a notion called multiple-step greedy consistency. We then provide convergence rates for two algorithmic instantiations of the above enhancement in the presence of noise injected to both the tree search stage and value estimation stage.
【Keywords】:
【Paper Link】 【Pages】:3502-3509
【Authors】: Qing En ; Lijuan Duan ; Zhaoxiang Zhang ; Xiang Bai ; Yundong Zhang
【Abstract】: We explore a principle method to address the weakly supervised detection problem. Many deep learning methods solve weakly supervised detection by mining various object proposal or pooling strategies, which may cause redundancy and generate a coarse location. To overcome this limitation, we propose a novel human-like active searching strategy that recurrently ignores the background and discovers class-specific objects by erasing undesired pixels from the image. The proposed detector acts as an agent, providing guidance to erase unremarkable regions and eventually concentrating the attention on the foreground. The proposed agents, which are composed of a deep Q-network and are trained by the Q-learning algorithm, analyze the contents of the image features to infer the localization action according to the learned policy. To the best of our knowledge, this is the first attempt to apply reinforcement learning to address weakly supervised localization with only image-level labels. Consequently, the proposed method is validated on the PASCAL VOC 2007 and PASCAL VOC 2012 datasets. The experimental results show that the proposed method is capable of locating a single object within 5 steps and has great significance to the research on weakly supervised localization with a human-like mechanism.
【Keywords】:
【Paper Link】 【Pages】:3510-3517
【Authors】: Lijie Fan ; Wenbing Huang ; Chuang Gan ; Junzhou Huang ; Boqing Gong
【Abstract】: The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip. In this paper, for the sake of both furthering this exploration and our own interest in a realistic application, we study imageto-video translation and particularly focus on the videos of facial expressions. This problem challenges the deep neural networks by another temporal dimension comparing to the image-to-image translation. Moreover, its single input image fails most existing video generation methods that rely on recurrent models. We propose a user-controllable approach so as to generate video clips of various lengths from a single face image. The lengths and types of the expressions are controlled by users. To this end, we design a novel neural network architecture that can incorporate the user input into its skip connections and propose several improvements to the adversarial training method for the neural network. Experiments and user studies verify the effectiveness of our approach. Especially, we would like to highlight that even for the face images in the wild (downloaded from the Web and the authors’ own photos), our model can generate high-quality facial expression videos of which about 50% are labeled as real by Amazon Mechanical Turk workers.
【Keywords】:
【Paper Link】 【Pages】:3518-3525
【Authors】: Jun-Peng Fang ; Min-Ling Zhang
【Abstract】: In partial multi-label learning (PML), each training example is associated with multiple candidate labels which are only partially valid. The task of PML naturally arises in learning scenarios with inaccurate supervision, and the goal is to induce a multi-label predictor which can assign a set of proper labels for unseen instance. To learn from PML training examples, the training procedure is prone to be misled by the false positive labels concealed in candidate label set. In light of this major difficulty, a novel two-stage PML approach is proposed which works by eliciting credible labels from the candidate label set for model induction. In this way, most false positive labels are expected to be excluded from the training procedure. Specifically, in the first stage, the labeling confidence of candidate label for each PML training example is estimated via iterative label propagation. In the second stage, by utilizing credible labels with high labeling confidence, multi-label predictor is induced via pairwise label ranking with virtual label splitting or maximum a posteriori (MAP) reasoning. Extensive experiments on synthetic as well as real-world data sets clearly validate the effectiveness of credible label elicitation in learning from PML examples.
【Keywords】:
【Paper Link】 【Pages】:3526-3533
【Authors】: Bahare Fatemi ; Siamak Ravanbakhsh ; David Poole
【Abstract】: Knowledge graphs are used to represent relational information in terms of triples. To enable learning about domains, embedding models, such as tensor factorization models, can be used to make predictions of new triples. Often there is background taxonomic information (in terms of subclasses and subproperties) that should also be taken into account. We show that existing fully expressive (a.k.a. universal) models cannot provably respect subclass and subproperty information. We show that minimal modifications to an existing knowledge graph completion method enables injection of taxonomic information. Moreover, we prove that our model is fully expressive, assuming a lower-bound on the size of the embeddings. Experimental results on public knowledge graphs show that despite its simplicity our approach is surprisingly effective.
【Keywords】:
【Paper Link】 【Pages】:3534-3541
【Authors】: Chao Feng ; Chao Qian ; Ke Tang
【Abstract】: Dimensionality reduction is often employed to deal with the data with a huge number of features, which can be generally divided into two categories: feature transformation and feature selection. Due to the interpretability, the efficiency during inference and the abundance of unlabeled data, unsupervised feature selection has attracted much attention. In this paper, we consider its natural formulation, column subset selection (CSS), which is to minimize the reconstruction error of a data matrix by selecting a subset of features. We propose an anytime randomized iterative approach POCSS, which minimizes the reconstruction error and the number of selected features simultaneously. Its approximation guarantee is well bounded. Empirical results exhibit the superior performance of POCSS over the state-of-the-art algorithms.
【Keywords】:
【Paper Link】 【Pages】:3542-3549
【Abstract】: Partial label learning deals with the problem where each training instance is assigned a set of candidate labels, only one of which is correct. This paper provides the first attempt to leverage the idea of self-training for dealing with partially labeled examples. Specifically, we propose a unified formulation with proper constraints to train the desired model and perform pseudo-labeling jointly. For pseudo-labeling, unlike traditional self-training that manually differentiates the ground-truth label with enough high confidence, we introduce the maximum infinity norm regularization on the modeling outputs to automatically achieve this consideratum, which results in a convex-concave optimization problem. We show that optimizing this convex-concave problem is equivalent to solving a set of quadratic programming (QP) problems. By proposing an upper-bound surrogate objective function, we turn to solving only one QP problem for improving the optimization efficiency. Extensive experiments on synthesized and real-world datasets demonstrate that the proposed approach significantly outperforms the state-of-the-art partial label learning approaches.
【Keywords】:
【Paper Link】 【Pages】:3550-3557
【Authors】: Lei Feng ; Bo An ; Shuo He
【Abstract】: It is well-known that exploiting label correlations is crucially important to multi-label learning. Most of the existing approaches take label correlations as prior knowledge, which may not correctly characterize the real relationships among labels. Besides, label correlations are normally used to regularize the hypothesis space, while the final predictions are not explicitly correlated. In this paper, we suggest that for each individual label, the final prediction involves the collaboration between its own prediction and the predictions of other labels. Based on this assumption, we first propose a novel method to learn the label correlations via sparse reconstruction in the label space. Then, by seamlessly integrating the learned label correlations into model training, we propose a novel multi-label learning approach that aims to explicitly account for the correlated predictions of labels while training the desired model simultaneously. Extensive experimental results show that our approach outperforms the state-of-the-art counterparts.
【Keywords】:
【Paper Link】 【Pages】:3558-3565
【Authors】: Yifan Feng ; Haoxuan You ; Zizhao Zhang ; Rongrong Ji ; Yue Gao
【Abstract】: In this paper, we present a hypergraph neural networks (HGNN) framework for data representation learning, which can encode high-order data correlation in a hypergraph structure. Confronting the challenges of learning representation for complex data in real practice, we propose to incorporate such data structure in a hypergraph, which is more flexible on data modeling, especially when dealing with complex data. In this method, a hyperedge convolution operation is designed to handle the data correlation during representation learning. In this way, traditional hypergraph learning procedure can be conducted using hyperedge convolution operations efficiently. HGNN is able to learn the hidden layer representation considering the high-order data structure, which is a general framework considering the complex data correlations. We have conducted experiments on citation network classification and visual object recognition tasks and compared HGNN with graph convolutional networks and other traditional methods. Experimental results demonstrate that the proposed HGNN method outperforms recent state-of-theart methods. We can also reveal from the results that the proposed HGNN is superior when dealing with multi-modal data compared with existing methods.
【Keywords】:
【Paper Link】 【Pages】:3566-3573
【Authors】: Vasilii Feofanov ; Emilie Devijver ; Massih-Reza Amini
【Abstract】: In this paper, we propose a transductive bound over the risk of the majority vote classifier learned with partially labeled data for the multi-class classification. The bound is obtained by considering the class confusion matrix as an error indicator and it involves the margin distribution of the classifier over each class and a bound over the risk of the associated Gibbs classifier. When this latter bound is tight and, the errors of the majority vote classifier per class are concentrated on a low margin zone; we prove that the bound over the Bayes classifier’ risk is tight. As an application, we extend the self-learning algorithm to the multi-class case. The algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the proposed transductive bound. Empirical results on different data sets show the effectiveness of our approach compared to the same algorithm where the threshold is fixed manually, to the extension of TSVM to multi-class classification and to a graph-based semi-supervised algorithm.
【Keywords】:
【Paper Link】 【Pages】:3574-3581
【Authors】: Stanislav Fort ; Adam Scherlis
【Abstract】: We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, H, of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of H, and 2) a large value of Tr(H)/||H|| at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the Goldilocks zone. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the ReLU and tanh non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of a network initialization. We show that the high and stable accuracy reached when optimizing on random, low-dimensional hypersurfaces is directly related to the overlap between the hypersurface and the Goldilocks zone, and as a corollary demonstrate that the notion of intrinsic dimension is initialization-dependent. We note that common initialization techniques initialize neural networks in this particular region of unusually high convexity/prevalence of positive curvature, and offer a geometric intuition for their success. Furthermore, we demonstrate that initializing a neural network at a number of points and selecting for high measures of local convexity such as Tr(H)/||H||, number of positive eigenvalues of H, or low initial loss, leads to statistically significantly faster training on MNIST. Based on our observations, we hypothesize that the Goldilocks zone contains an unusually high density of suitable initialization configurations.
【Keywords】:
【Paper Link】 【Pages】:3582-3589
【Authors】: Vincent François-Lavet ; Yoshua Bengio ; Doina Precup ; Joelle Pineau
【Abstract】: In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this paper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.
【Keywords】:
【Paper Link】 【Pages】:3590-3597
【Authors】: Yasuhiro Fujiwara ; Sekitoshi Kanai ; Junya Arai ; Yasutoshi Ida ; Naonori Ueda
【Abstract】: One-class SVM is a popular method for one-class classification but it needs high computation cost. This paper proposes Quix as an efficient training algorithm for one-class SVM. It prunes unnecessary data points before applying the SVM solver by computing upper and lower bounds of a parameter that determines the hyper-plane. Since we can efficiently check optimality of the hyper-plane by using the bounds, it guarantees the identical classification results to the original approach. Experiments show that it is up to 6800 times faster than existing approaches without degrading optimality.
【Keywords】:
【Paper Link】 【Pages】:3598-3605
【Authors】: Ryosuke Furuta ; Naoto Inoue ; Toshihiko Yamasaki
【Abstract】: This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. We also propose an effective learning method for pixelRL that significantly improves the performance by considering not only the future states of the own pixel but also those of the neighbor pixels. The proposed method can be applied to some image processing tasks that require pixel-wise manipulations, where deep RL has never been applied.We apply the proposed method to three image processing tasks: image denoising, image restoration, and local color enhancement. Our experimental results demonstrate that the proposed method achieves comparable or better performance, compared with the state-of-the-art methods based on supervised learning.
【Keywords】:
【Paper Link】 【Pages】:3606-3613
【Authors】: Futoshi Futami ; Zhenghang Cui ; Issei Sato ; Masashi Sugiyama
【Abstract】: In Bayesian inference, the posterior distributions are difficult to obtain analytically for complex models such as neural networks. Variational inference usually uses a parametric distribution for approximation, from which we can easily draw samples. Recently discrete approximation by particles has attracted attention because of its high expression ability. An example is Stein variational gradient descent (SVGD), which iteratively optimizes particles. Although SVGD has been shown to be computationally efficient empirically, its theoretical properties have not been clarified yet and no finite sample bound of the convergence rate is known. Another example is the Stein points (SP) method, which minimizes kernelized Stein discrepancy directly. Althoughafinitesampleboundisassuredtheoretically, SP is computationally inefficient empirically, especially in high-dimensional problems. In this paper, we propose a novel method named maximum mean discrepancy minimization by the Frank-Wolfe algorithm (MMD-FW), which minimizes MMD in a greedy way by the FW algorithm. Our method is computationally efficient empirically and we show that its finite sample convergence bound is in a linear order in finite dimensions.
【Keywords】:
【Paper Link】 【Pages】:3614-3621
【Authors】: Jinyang Gao ; Junjie Yao ; Yingxia Shao
【Abstract】: In this paper, we focus on delivering reliable learning results for high stakes applications such as self-driving, financial investment and clinical diagnosis, where the accuracy of predictions is considered as a more crucial requirement than giving predictions for all query samples. We adopt the learning with reject option framework where the learning model only predict those samples which they convince to give the correct answer. However, for most prevailing deep learning predictors, the confidence estimated by the model themselves are far from reflecting the real generalization performance. To model the reliability of prediction concisely, we propose an exploratory solution called GALVE (Generative Adversarial Learning with Variance Expansion) which adopts generative adversarial learning to implicitly measure the region where the model achieve good generalization performance. By applying GALVE to measure the reliability of predictions, we achieved an error rate less than half of which straightforwardly measured by confidence in CIFAR10 and SVHN computer vision tasks.
【Keywords】:
【Paper Link】 【Pages】:3622-3629
【Authors】: Jingyue Gao ; Xiting Wang ; Yasha Wang ; Xing Xie
【Abstract】: Recommender systems have been playing an increasingly important role in our daily life due to the explosive growth of information. Accuracy and explainability are two core aspects when we evaluate a recommendation model and have become one of the fundamental trade-offs in machine learning. In this paper, we propose to alleviate the trade-off between accuracy and explainability by developing an explainable deep model that combines the advantages of deep learning-based models and existing explainable methods. The basic idea is to build an initial network based on an explainable deep hierarchy (e.g., Microsoft Concept Graph) and improve the model accuracy by optimizing key variables in the hierarchy (e.g., node importance and relevance). To ensure accurate rating prediction, we propose an attentive multi-view learning framework. The framework enables us to handle sparse and noisy data by co-regularizing among different feature levels and combining predictions attentively. To mine readable explanations from the hierarchy, we formulate personalized explanation generation as a constrained tree node selection problem and propose a dynamic programming algorithm to solve it. Experimental results show that our model outperforms state-of-the-art methods in terms of both accuracy and explainability.
【Keywords】:
【Paper Link】 【Pages】:3630-3637
【Authors】: Tingran Gao ; Shahab Asoodeh ; Yi Huang ; James Evans
【Abstract】: Inspired by recent interests of developing machine learning and data mining algorithms on hypergraphs, we investigate in this paper the semi-supervised learning algorithm of propagating ”soft labels” (e.g. probability distributions, class membership scores) over hypergraphs, by means of optimal transportation. Borrowing insights from Wasserstein propagation on graphs [Solomon et al. 2014], we re-formulate the label propagation procedure as a message-passing algorithm, which renders itself naturally to a generalization applicable to hypergraphs through Wasserstein barycenters. Furthermore, in a PAC learning framework, we provide generalization error bounds for propagating one-dimensional distributions on graphs and hypergraphs using 2-Wasserstein distance, by establishing the algorithmic stability of the proposed semisupervised learning algorithm. These theoretical results also shed new lights upon deeper understandings of the Wasserstein propagation on graphs.
【Keywords】:
【Paper Link】 【Pages】:3638-3646
【Authors】: Yuyang Gao ; Liang Zhao ; Lingfei Wu ; Yanfang Ye ; Hui Xiong ; Chaowei Yang
【Abstract】: Due to the potentially significant benefits for society, forecasting spatio-temporal societal events is currently attracting considerable attention from researchers. Beyond merely predicting the occurrence of future events, practitioners are now looking for information about specific subtypes of future events in order to allocate appropriate amounts and types of resources to manage such events and any associated social risks. However, forecasting event subtypes is far more complex than merely extending binary prediction to cover multiple classes, as 1) different locations require different models to handle their characteristic event subtype patterns due to spatial heterogeneity; 2) historically, many locations have only experienced a incomplete set of event subtypes, thus limiting the local model’s ability to predict previously “unseen” subtypes; and 3) the subtle discrepancy among different event subtypes requires more discriminative and profound representations of societal events. In order to address all these challenges concurrently, we propose a Spatial Incomplete Multi-task Deep leArning (SIMDA) framework that is capable of effectively forecasting the subtypes of future events. The new framework formulates spatial locations into tasks to handle spatial heterogeneity in event subtypes, and learns a joint deep representation of subtypes across tasks. Furthermore, based on the “first law of geography”, spatiallyclosed tasks share similar event subtype patterns such that adjacent tasks can share knowledge with each other effectively. Optimizing the proposed model amounts to a new nonconvex and strongly-coupled problem, we propose a new algorithm based on Alternating Direction Method of Multipliers (ADMM) that can decompose the complex problem into subproblems that can be solved efficiently. Extensive experiments on six real-world datasets demonstrate the effectiveness and efficiency of the proposed model.
【Keywords】:
【Paper Link】 【Pages】:3647-3655
【Authors】: Carles Gelada ; Marc G. Bellemare
【Abstract】: In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pioneered by Hallak et al. (2017). Under this method, online updates to the value function are reweighted to avoid divergence issues typical of off-policy learning. While Hallak et al.’s solution is appealing, it cannot easily be transferred to nonlinear function approximation. First, it requires a projection step onto the probability simplex; second, even though the operator describing the expected behavior of the off-policy learning algorithm is convergent, it is not known to be a contraction mapping, and hence, may be more unstable in practice. We address these two issues by introducing a discount factor into COP-TD. We analyze the behavior of discounted COP-TD and find it better behaved from a theoretical perspective. We also propose an alternative soft normalization penalty that can be minimized online and obviates the need for an explicit projection step. We complement our analysis with an empirical evaluation of the two techniques in an off-policy setting on the game Pong from the Atari domain where we find discounted COP-TD to be better behaved in practice than the soft normalization penalty. Finally, we perform a more extensive evaluation of discounted COP-TD in 5 games of the Atari domain, where we find performance gains for our approach.
【Keywords】:
【Paper Link】 【Pages】:3656-3663
【Authors】: Xu Geng ; Yaguang Li ; Leye Wang ; Lingyu Zhang ; Qiang Yang ; Jieping Ye ; Yan Liu
【Abstract】: Region-level demand forecasting is an essential task in ridehailing services. Accurate ride-hailing demand forecasting can guide vehicle dispatching, improve vehicle utilization, reduce the wait-time, and mitigate traffic congestion. This task is challenging due to the complicated spatiotemporal dependencies among regions. Existing approaches mainly focus on modeling the Euclidean correlations among spatially adjacent regions while we observe that non-Euclidean pair-wise correlations among possibly distant regions are also critical for accurate forecasting. In this paper, we propose the spatiotemporal multi-graph convolution network (ST-MGCN), a novel deep learning model for ride-hailing demand forecasting. We first encode the non-Euclidean pair-wise correlations among regions into multiple graphs and then explicitly model these correlations using multi-graph convolution. To utilize the global contextual information in modeling the temporal correlation, we further propose contextual gated recurrent neural network which augments recurrent neural network with a contextual-aware gating mechanism to re-weights different historical observations. We evaluate the proposed model on two real-world large scale ride-hailing demand datasets and observe consistent improvement of more than 10% over stateof-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:3664-3671
【Authors】: AmirEmad Ghassami ; Saber Salehkaleybar ; Negar Kiyavash ; Kun Zhang
【Abstract】: A directed acyclic graph (DAG) is the most common graphical model for representing causal relationships among a set of variables. When restricted to using only observational data, the structure of the ground truth DAG is identifiable only up to Markov equivalence, based on conditional independence relations among the variables. Therefore, the number of DAGs equivalent to the ground truth DAG is an indicator of the causal complexity of the underlying structure–roughly speaking, it shows how many interventions or how much additional information is further needed to recover the underlying DAG. In this paper, we propose a new technique for counting the number of DAGs in a Markov equivalence class. Our approach is based on the clique tree representation of chordal graphs. We show that in the case of bounded degree graphs, the proposed algorithm is polynomial time. We further demonstrate that this technique can be utilized for uniform sampling from a Markov equivalence class, which provides a stochastic way to enumerate DAGs in the equivalence class and may be needed for finding the best DAG or for causal inference given the equivalence class as input. We also extend our counting and sampling method to the case where prior knowledge about the underlying DAG is available, and present applications of this extension in causal experiment design and estimating the causal effect of joint interventions.
【Keywords】:
【Paper Link】 【Pages】:3672-3680
【Authors】: Soheil Ghili ; Ehsan Kazemi ; Amin Karbasi
【Abstract】: How can we control for latent discrimination in predictive models? How can we provably remove it? Such questions are at the heart of algorithmic fairness and its impacts on society. In this paper, we define a new operational fairness criteria, inspired by the well-understood notion of omitted variable-bias in statistics and econometrics. Our notion of fairness effectively controls for sensitive features and provides diagnostics for deviations from fair decision making. We then establish analytical and algorithmic results about the existence of a fair classifier in the context of supervised learning. Our results readily imply a simple, but rather counter-intuitive, strategy for eliminating latent discrimination. In order to prevent other features proxying for sensitive features, we need to include sensitive features in the training phase, but exclude them in the test/evaluation phase while controlling for their effects. We evaluate the performance of our algorithm on several realworld datasets and show how fairness for these datasets can be improved with a very small loss in accuracy.
【Keywords】:
【Paper Link】 【Pages】:3681-3688
【Authors】: Amirata Ghorbani ; Abubakar Abid ; James Y. Zou
【Abstract】: In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recently to interpret neural network predictions by providing, for example, feature importance maps. For both scientific robustness and security reasons, it is important to know to what extent can the interpretations be altered by small systematic perturbations to the input data, which might be generated by adversaries or by measurement biases. In this paper, we demonstrate how to generate adversarial perturbations that produce perceptively indistinguishable inputs that are assigned the same predicted label, yet have very different interpretations. We systematically characterize the robustness of interpretations generated by several widely-used feature importance interpretation methods (feature importance maps, integrated gradients, and DeepLIFT) on ImageNet and CIFAR-10. In all cases, our experiments show that systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly susceptible to adversarial attack. Our analysis of the geometry of the Hessian matrix gives insight on why robustness is a general challenge to current interpretation approaches.
【Keywords】:
【Paper Link】 【Pages】:3689-3696
【Authors】: Joachim Giesen ; Sören Laue ; Andreas Löhne ; Christopher Schneider
【Abstract】: Regularized loss minimization, where a statistical model is obtained from minimizing the sum of a loss function and weighted regularization terms, is still in widespread use in machine learning. The statistical performance of the resulting models depends on the choice of weights (regularization parameters) that are typically tuned by cross-validation. For finding the best regularization parameters, the regularized minimization problem needs to be solved for the whole parameter domain. A practically more feasible approach is covering the parameter domain with approximate solutions of the loss minimization problem for some prescribed approximation accuracy. The problem of computing such a covering is known as the approximate solution gamut problem. Existing algorithms for the solution gamut problem suffer from several problems. For instance, they require a grid on the parameter domain whose spacing is difficult to determine in practice, and they are not generic in the sense that they rely on problem specific plug-in functions. Here, we show that a well-known algorithm from vector optimization, namely the Benson algorithm, can be used directly for computing approximate solution gamuts while avoiding the problems of existing algorithms. Experiments for the Elastic Net on real world data sets demonstrate the effectiveness of Benson’s algorithm for regularization parameter tracking.
【Keywords】:
【Paper Link】 【Pages】:3697-3704
【Authors】: Bin Gu ; Zhouyuan Huo ; Heng Huang
【Abstract】: Pairwise learning is an important learning topic in the machine learning community, where the loss function involves pairs of samples (e.g., AUC maximization and metric learning). Existing pairwise learning algorithms do not perform well in the generality, scalability and efficiency simultaneously. To address these challenging problems, in this paper, we first analyze the relationship between the statistical accuracy and the regularized empire risk for pairwise loss. Based on the relationship, we propose a scalable and efficient adaptive doubly stochastic gradient algorithm (AdaDSG) for generalized regularized pairwise learning problems. More importantly, we prove that the overall computational cost of AdaDSG is O(n) to achieve the statistical accuracy on the full training set with the size of n, which is the best theoretical result for pairwise learning to the best of our knowledge. The experimental results on a variety of real-world datasets not only confirm the effectiveness of our AdaDSG algorithm, but also show that AdaDSG has significantly better scalability and efficiency than the existing pairwise learning algorithms.
【Keywords】:
【Paper Link】 【Pages】:3705-3713
【Authors】: Ning Gui ; Danni Ge ; Ziyin Hu
【Abstract】: As an effective data preprocessing step, feature selection has shown its effectiveness to prepare high-dimensional data for many machine learning tasks. The proliferation of high di-mension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing feature-selection techniques. This paper introduces a novel neural network-based feature selection architecture, dubbed Attention-based Feature Selec-tion (AFS). AFS consists of two detachable modules: an at-tention module for feature weight generation and a learning module for the problem modeling. The attention module for-mulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature selec-tion patterns adjusted by backpropagation during the training process. The detachable structure allows existing off-the-shelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also introduced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several state-of-art feature selection algorithms upon both MNIST, noisy MNIST and several datasets with small samples.
【Keywords】:
【Paper Link】 【Pages】:3714-3722
【Authors】: Hongyu Guo ; Yongyi Mao ; Richong Zhang
【Abstract】: MixUp (Zhang et al. 2017) is a recently proposed dataaugmentation scheme, which linearly interpolates a random pair of training examples and correspondingly the one-hot representations of their labels. Training deep neural networks with such additional data is shown capable of significantly improving the predictive accuracy of the current art. The power of MixUp, however, is primarily established empirically and its working and effectiveness have not been explained in any depth. In this paper, we develop an understanding for MixUp as a form of “out-of-manifold regularization”, which imposes certain “local linearity” constraints on the model’s input space beyond the data manifold. This analysis enables us to identify a limitation of MixUp, which we call “manifold intrusion”. In a nutshell, manifold intrusion in MixUp is a form of under-fitting resulting from conflicts between the synthetic labels of the mixed-up examples and the labels of original training data. Such a phenomenon usually happens when the parameters controlling the generation of mixing policies are not sufficiently fine-tuned on the training data. To address this issue, we propose a novel adaptive version of MixUp, where the mixing policies are automatically learned from the data using an additional network and objective function designed to avoid manifold intrusion. The proposed regularizer, AdaMixUp, is empirically evaluated on several benchmark datasets. Extensive experiments demonstrate that AdaMixUp improves upon MixUp when applied to the current art of deep classification models.
【Keywords】:
【Paper Link】 【Pages】:3723-3730
【Authors】: Junliang Guo ; Xu Tan ; Di He ; Tao Qin ; Linli Xu ; Tie-Yan Liu
【Abstract】: Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline (Gu et al. 2017) by 5.11 BLEU scores on WMT14 English-German task and 4.72 BLEU scores on WMT16 English-Romanian task.
【Keywords】:
【Paper Link】 【Pages】:3731-3738
【Authors】: Tianyu Guo ; Chang Xu ; Boxin Shi ; Chao Xu ; Dacheng Tao
【Abstract】: Generative Adversarial Networks (GANs) have demonstrated a strong ability to fit complex distributions since they were presented, especially in the field of generating natural images. Linear interpolation in the noise space produces a continuously changing in the image space, which is an impressive property of GANs. However, there is no special consideration on this property in the objective function of GANs or its derived models. This paper analyzes the perturbation on the input of the generator and its influence on the generated images. A smooth generator is then developed by investigating the tolerable input perturbation. We further integrate this smooth generator with a gradient penalized discriminator, and design smooth GAN that generates stable and high-quality images. Experiments on real-world image datasets demonstrate the necessity of studying smooth generator and the effectiveness of the proposed algorithm.
【Keywords】:
【Paper Link】 【Pages】:3739-3746
【Authors】: Xiaoxiao Guo ; Shiyu Chang ; Mo Yu ; Gerald Tesauro ; Murray Campbell
【Abstract】: Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.
【Keywords】:
【Paper Link】 【Pages】:3747-3754
【Authors】: Vivek Gupta ; Rahul Wadbude ; Nagarajan Natarajan ; Harish Karnick ; Prateek Jain ; Piyush Rai
【Abstract】: We present a label embedding based approach to large-scale multi-label learning, drawing inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings. Besides leading to a highly scalable model for multi-label learning, our approach highlights interesting connections between label embedding methods commonly used for multi-label learning and paragraph embedding methods commonly used for learning representations of text data. The framework easily extends to incorporating auxiliary information such as label-label correlations; this is crucial especially when many training instances are only partially annotated. To facilitate end-to-end learning, we develop a joint learning algorithm that can learn the embeddings as well as a regression model that predicts these embeddings for the new input to be annotated, via efficient gradient based methods. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed models perform favorably as compared to state-of-the-art methods for large-scale multi-label learning.
【Keywords】:
【Paper Link】 【Pages】:3755-3762
【Authors】: Eyal Gutflaish ; Aryeh Kontorovich ; Sivan Sabato ; Ofer Biller ; Oded Sofer
【Abstract】: We propose a hybrid approach to temporal anomaly detection in access data of users to databases — or more generally, any kind of subject-object co-occurrence data. We consider a high-dimensional setting that also requires fast computation at test time. Our methodology identifies anomalies based on a single stationary model, instead of requiring a full temporal one, which would be prohibitive in this setting. We learn a low-rank stationary model from the training data, and then fit a regression model for predicting the expected likelihood score of normal access patterns in the future. The disparity between the predicted likelihood score and the observed one is used to assess the “surprise” at test time. This approach enables calibration of the anomaly score, so that time-varying normal behavior patterns are not considered anomalous. We provide a detailed description of the algorithm, including a convergence analysis, and report encouraging empirical results. One of the data sets that we tested is new for the public domain. It consists of two months’ worth of database access records from a live system. This data set and our code are publicly available at https://github.com/eyalgut/TLR anomaly detection.git.
【Keywords】:
【Paper Link】 【Pages】:3763-3770
【Authors】: Xiao He ; Francesco Alesiani ; Ammar Shaker
【Abstract】: Many real-world large-scale regression problems can be formulated as Multi-task Learning (MTL) problems with a massive number of tasks, as in retail and transportation domains. However, existing MTL methods still fail to offer both the generalization performance and the scalability for such problems. Scaling up MTL methods to problems with a tremendous number of tasks is a big challenge. Here, we propose a novel algorithm, named Convex Clustering Multi-Task regression Learning (CCMTL), which integrates with convex clustering on the k-nearest neighbor graph of the prediction models. Further, CCMTL efficiently solves the underlying convex problem with a newly proposed optimization method. CCMTL is accurate, efficient to train, and empirically scales linearly in the number of tasks. On both synthetic and real-world datasets, the proposed CCMTL outperforms seven state-of-the-art (SoA) multi-task learning methods in terms of prediction accuracy as well as computational efficiency. On a real-world retail dataset with 23,812 tasks, CCMTL requires only around 30 seconds to train on a single thread, while the SoA methods need up to hours or even days.
【Keywords】:
【Paper Link】 【Pages】:3771-3778
【Authors】: Byeongho Heo ; Minsik Lee ; Sangdoo Yun ; Jin Young Choi
【Abstract】: Many recent works on knowledge distillation have provided ways to transfer the knowledge of a trained network for improving the learning process of a new one, but finding a good technique for knowledge distillation is still an open problem. In this paper, we provide a new perspective based on a decision boundary, which is one of the most important component of a classifier. The generalization performance of a classifier is closely related to the adequacy of its decision boundary, so a good classifier bears a good decision boundary. Therefore, transferring information closely related to the decision boundary can be a good attempt for knowledge distillation. To realize this goal, we utilize an adversarial attack to discover samples supporting a decision boundary. Based on this idea, to transfer more accurate information about the decision boundary, the proposed algorithm trains a student classifier based on the adversarial samples supporting the decision boundary. Experiments show that the proposed method indeed improves knowledge distillation and achieves the state-of-the-arts performance.
【Keywords】:
【Paper Link】 【Pages】:3779-3787
【Authors】: Byeongho Heo ; Minsik Lee ; Sangdoo Yun ; Jin Young Choi
【Abstract】: An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classificationfriendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this paper, we propose a knowledge transfer method via distillation of activation boundaries formed by hidden neurons. For the distillation, we propose an activation transfer loss that has the minimum value when the boundaries generated by the student coincide with those by the teacher. Since the activation transfer loss is not differentiable, we design a piecewise differentiable loss approximating the activation transfer loss. By the proposed method, the student learns a separating boundary between activation region and deactivation region formed by each neuron in the teacher. Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:3788-3795
【Authors】: Sibylle Hess ; Wouter Duivesteijn ; Philipp Honysz ; Katharina Morik
【Abstract】: When it comes to clustering nonconvex shapes, two paradigms are used to find the most suitable clustering: minimum cut and maximum density. The most popular algorithms incorporating these paradigms are Spectral Clustering and DBSCAN. Both paradigms have their pros and cons. While minimum cut clusterings are sensitive to noise, density-based clusterings have trouble handling clusters with varying densities. In this paper, we propose SPECTACL: a method combining the advantages of both approaches, while solving the two mentioned drawbacks. Our method is easy to implement, such as Spectral Clustering, and theoretically founded to optimize a proposed density criterion of clusterings. Through experiments on synthetic and real-world data, we demonstrate that our approach provides robust and reliable clusterings.
【Keywords】:
【Paper Link】 【Pages】:3796-3803
【Authors】: Matteo Hessel ; Hubert Soyer ; Lasse Espeholt ; Wojciech Czarnecki ; Simon Schmitt ; Hado van Hasselt
【Abstract】: The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.
【Keywords】:
【Paper Link】 【Pages】:3804-3811
【Authors】: Fuxing Hong ; Dongbo Huang ; Ge Chen
【Abstract】: Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interacting with different features. In this work, we propose a novel model named Interaction-aware Factorization Machine (IFM) by introducing Interaction-Aware Mechanism (IAM), which comprises the feature aspect and the field aspect, to learn flexible interactions on two levels. The feature aspect learns feature interaction importance via an attention network while the field aspect learns the feature interaction effect as a parametric similarity of the feature interaction vector and the corresponding field interaction prototype. IFM introduces more structured control and learns feature interaction importance in a stratified manner, which allows for more leverage in tweaking the interactions on both feature-wise and field-wise levels. Besides, we give a more generalized architecture and propose Interaction-aware Neural Network (INN) and DeepIFM to capture higher-order interactions. To further improve both the performance and efficiency of IFM, a sampling scheme is developed to select interactions based on the field aspect importance. The experimental results from two well-known datasets show the superiority of the proposed models over the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:3812-3821
【Authors】: Hanzhang Hu ; Debadeepta Dey ; Martial Hebert ; J. Andrew Bagnell
【Abstract】: This work considers the trade-off between accuracy and testtime computational cost of deep neural networks (DNNs) via anytime predictions from auxiliary predictions. Specifically, we optimize auxiliary losses jointly in an adaptive weighted sum, where the weights are inversely proportional to average of each loss. Intuitively, this balances the losses to have the same scale. We demonstrate theoretical considerations that motivate this approach from multiple viewpoints, including connecting it to optimizing the geometric mean of the expectation of each loss, an objective that ignores the scale of losses. Experimentally, the adaptive weights induce more competitive anytime predictions on multiple recognition data-sets and models than non-adaptive approaches including weighing all losses equally. In particular, anytime neural networks (ANNs) can achieve the same accuracy faster using adaptive weights on a small network than using static constant weights on a large one. For problems with high performance saturation, we also show a sequence of exponentially deepening ANNs can achieve near-optimal anytime results at any budget, at the cost of a const fraction of extra computation.
【Keywords】:
【Paper Link】 【Pages】:3822-3829
【Authors】: Hao Hu ; Liqiang Wang ; Guo-Jun Qi
【Abstract】: Recent advancements in recurrent neural network (RNN) research have demonstrated the superiority of utilizing multiscale structures in learning temporal representations of time series. Currently, most of multiscale RNNs use fixed scales, which do not comply with the nature of dynamical temporal patterns among sequences. In this paper, we propose Adaptively Scaled Recurrent Neural Networks (ASRNN), a simple but efficient way to handle this problem. Instead of using predefined scales, ASRNNs are able to learn and adjust scales based on different temporal contexts, making them more flexible in modeling multiscale patterns. Compared with other multiscale RNNs, ASRNNs are bestowed upon dynamical scaling capabilities with much simpler structures, and are easy to be integrated with various RNN cells. The experiments on multiple sequence modeling tasks indicate ASRNNs can efficiently adapt scales based on different sequence contexts and yield better performances than baselines without dynamical scaling abilities.
【Keywords】:
【Paper Link】 【Pages】:3830-3837
【Authors】: Liang Hu ; Songlei Jian ; Longbing Cao ; Zhiping Gu ; Qingkui Chen ; Artak Amirbekyan
【Abstract】: Classic recommender systems face challenges in addressing the data sparsity and cold-start problems with only modeling the user-item relation. An essential direction is to incorporate and understand the additional heterogeneous relations, e.g., user-user and item-item relations, since each user-item interaction is often influenced by other users and items, which form the user’s/item’s influential contexts. This induces important yet challenging issues, including modeling heterogeneous relations, interactions, and the strength of the influence from users/items in the influential contexts. To this end, we design Influential-Context Aggregation Units (ICAU) to aggregate the user-user/item-item relations within a given context as the influential context embeddings. Accordingly, we propose a Heterogeneous relations-Embedded Recommender System (HERS) based on ICAUs to model and interpret the underlying motivation of user-item interactions by considering user-user and item-item influences. The experiments on two real-world datasets show the highly improved recommendation quality made by HERS and its superiority in handling the cold-start problem. In addition, we demonstrate the interpretability of modeling influential contexts in explaining the recommendation results.
【Keywords】:
【Paper Link】 【Pages】:3838-3845
【Authors】: Menglei Hu ; Songcan Chen
【Abstract】: Real data are often with multiple modalities or from multiple heterogeneous sources, thus forming so-called multi-view data, which receives more and more attentions in machine learning. Multi-view clustering (MVC) becomes its important paradigm. In real-world applications, some views often suffer from instances missing. Clustering on such multi-view datasets is called incomplete multi-view clustering (IMC) and quite challenging. To date, though many approaches have been developed, most of them are offline and have high computational and memory costs especially for large scale datasets. To address this problem, in this paper, we propose an One-Pass Incomplete Multi-view Clustering framework (OPIMC). With the help of regularized matrix factorization and weighted matrix factorization, OPIMC can relatively easily deal with such problem. Different from the existing and sole online IMC method, OPIMC can directly get clustering results and effectively determine the termination of iteration process by introducing two global statistics. Finally, extensive experiments conducted on four real datasets demonstrate the efficiency and effectiveness of the proposed OPIMC method.
【Keywords】:
【Paper Link】 【Pages】:3846-3853
【Authors】: Yi-Qi Hu ; Yang Yu ; Wei-Wei Tu ; Qiang Yang ; Yuqiang Chen ; Wenyuan Dai
【Abstract】: Automatic machine learning (AutoML) aims at automatically choosing the best configuration for machine learning tasks. However, a configuration evaluation can be very time consuming particularly on learning tasks with large datasets. This limitation usually restrains derivative-free optimization from releasing its full power for a fine configuration search using many evaluations. To alleviate this limitation, in this paper, we propose a derivative-free optimization framework for AutoML using multi-fidelity evaluations. It uses many lowfidelity evaluations on small data subsets and very few highfidelity evaluations on the full dataset. However, the lowfidelity evaluations can be badly biased, and need to be corrected with only a very low cost. We thus propose the Transfer Series Expansion (TSE) that learns the low-fidelity correction predictor efficiently by linearly combining a set of base predictors. The base predictors can be obtained cheaply from down-scaled and experienced tasks. Experimental results on real-world AutoML problems verify that the proposed framework can accelerate derivative-free configuration search significantly by making use of the multi-fidelity evaluations.
【Keywords】:
【Paper Link】 【Pages】:3854-3861
【Authors】: Kun Huang ; Bingbing Ni ; Xiaokang Yang
【Abstract】: Quantization has shown stunning efficiency on deep neural network, especially for portable devices with limited resources. Most existing works uncritically extend weight quantization methods to activations. However, we take the view that best performance can be obtained by applying different quantization methods to weights and activations respectively. In this paper, we design a new activation function dubbed CReLU from the quantization perspective and further complement this design with appropriate initialization method and training procedure. Moreover, we develop a specific quantization strategy in which we formulate the forward and backward approximation of weights with binary values and quantize the activations to low bitwdth using linear or logarithmic quantizer. We show, for the first time, our final quantized model with binary weights and ultra low bitwidth activations outperforms the previous best models by large margins on ImageNet as well as achieving nearly a 10.85× theoretical speedup with ResNet-18. Furthermore, ablation experiments and theoretical analysis demonstrate the effectiveness and robustness of CReLU in comparison with other activation functions.
【Keywords】:
【Paper Link】 【Pages】:3862-3869
【Authors】: Silu Huang ; Chi Wang ; Bolin Ding ; Surajit Chaudhuri
【Abstract】: A configuration of training refers to the combinations of feature engineering, learner, and its associated hyperparameters. Given a set of configurations and a large dataset randomly split into training and testing set, we study how to efficiently identify the best configuration with approximately the highest testing accuracy when trained from the training set. To guarantee small accuracy loss, we develop a solution using confidence interval (CI)-based progressive sampling and pruning strategy. Compared to using full data to find the exact best configuration, our solution achieves more than two orders of magnitude speedup, while the returned top configuration has identical or close test accuracy.
【Keywords】:
【Paper Link】 【Pages】:3870-3877
【Authors】: Wenzhen Huang ; Junge Zhang ; Kaiqi Huang
【Abstract】: Model-based reinforcement learning (RL) methods attempt to learn a dynamics model to simulate the real environment and utilize the model to make better decisions. However, the learned environment simulator often has more or less model error which would disturb making decision and reduce performance. We propose a bootstrapped model-based RL method which bootstraps the modules in each depth of the planning tree. This method can quantify the uncertainty of environment model on different state-action pairs and lead the agent to explore the pairs with higher uncertainty to reduce the potential model errors. Moreover, we sample target values from their bootstrap distribution to connect the uncertainties at current and subsequent time-steps and introduce the prior mechanism to improve the exploration efficiency. Experiment results demonstrate that our method efficiently decreases model error and outperforms TreeQN and other stateof-the-art methods on multiple Atari games.
【Keywords】:
【Paper Link】 【Pages】:3878-3885
【Authors】: Xiao Huang ; Qingquan Song ; Fan Yang ; Xia Hu
【Abstract】: Feature embedding aims to learn a low-dimensional vector representation for each instance to preserve the information in its features. These representations can benefit various offthe-shelf learning algorithms. While embedding models for a single type of features have been well-studied, real-world instances often contain multiple types of correlated features or even information within a different modality such as networks. Existing studies such as multiview learning show that it is promising to learn unified vector representations from all sources. However, high computational costs of incorporating heterogeneous information limit the applications of existing algorithms. The number of instances and dimensions of features in practice are often large. To bridge the gap, we propose a scalable framework FeatWalk, which can model and incorporate instance similarities in terms of different types of features into a unified embedding representation. To enable the scalability, FeatWalk does not directly calculate any similarity measure, but provides an alternative way to simulate the similarity-based random walks among instances to extract the local instance proximity and preserve it in a set of instance index sequences. These sequences are homogeneous with each other. A scalable word embedding algorithm is applied to them to learn a joint embedding representation of instances. Experiments on four real-world datasets demonstrate the efficiency and effectiveness of FeatWalk.
【Keywords】:
【Paper Link】 【Pages】:3886-3893
【Authors】: Zhiwu Huang ; Jiqing Wu ; Luc Van Gool
【Abstract】: Generative modeling over natural images is one of the most fundamental machine learning problems. However, few modern generative models, including Wasserstein Generative Adversarial Nets (WGANs), are studied on manifold-valued images that are frequently encountered in real-world applications. To fill the gap, this paper first formulates the problem of generating manifold-valued images and exploits three typical instances: hue-saturation-value (HSV) color image generation, chromaticity-brightness (CB) color image generation, and diffusion-tensor (DT) image generation. For the proposed generative modeling problem, we then introduce a theorem of optimal transport to derive a new Wasserstein distance of data distributions on complete manifolds, enabling us to achieve a tractable objective under the WGAN framework. In addition, we recommend three benchmark datasets that are CIFAR-10 HSV/CB color images, ImageNet HSV/CB color images, UCL DT image datasets. On the three datasets, we experimentally demonstrate the proposed manifold-aware WGAN model can generate more plausible manifold-valued images than its competitors.
【Keywords】:
【Paper Link】 【Pages】:3894-3901
【Authors】: Le Hui ; Xiang Li ; Chen Gong ; Meng Fang ; Joey Tianyi Zhou ; Jian Yang
【Abstract】: Convolutional Neural Networks (CNNs) have shown great power in various classification tasks and have achieved remarkable results in practical applications. However, the distinct learning difficulties in discriminating different pairs of classes are largely ignored by the existing networks. For instance, in CIFAR-10 dataset, distinguishing cats from dogs is usually harder than distinguishing horses from ships. By carefully studying the behavior of CNN models in the training process, we observe that the confusion level of two classes is strongly correlated with their angular separability in the feature space. That is, the larger the inter-class angle is, the lower the confusion will be. Based on this observation, we propose a novel loss function dubbed “Inter-Class Angular Loss” (ICAL), which explicitly models the class correlation and can be directly applied to many existing deep networks. By minimizing the proposed ICAL, the networks can effectively discriminate the examples in similar classes by enlarging the angle between their corresponding class vectors. Thorough experimental results on a series of vision and nonvision datasets confirm that ICAL critically improves the discriminative ability of various representative deep neural networks and generates superior performance to the original networks with conventional softmax loss.
【Keywords】:
【Paper Link】 【Pages】:3902-3909
【Authors】: Tsuyoshi Idé
【Abstract】: This paper proposes a new method for change detection and analysis using tensor regression. Change detection in our setting is to detect changes in the relationship between the input tensor and the output scalar while change analysis is to compute the responsibility score of individual tensor modes and dimensions for the change detected. We develop a new probabilistic tensor regression method, which can be viewed as a probabilistic generalization of the alternating least squares algorithm. Thanks to the probabilistic formulation, the derived change scores have a clear information-theoretic interpretation. We apply our method to semiconductor manufacturing to demonstrate the utility. To the best of our knowledge, this is the first work of change analysis based on probabilistic tensor regression.
【Keywords】:
【Paper Link】 【Pages】:3910-3918
【Authors】: Akira Imakura ; Momo Matsuda ; Xiucai Ye ; Tetsuya Sakurai
【Abstract】: Dimensionality reduction methods that project highdimensional data to a low-dimensional space by matrix trace optimization are widely used for clustering and classification. The matrix trace optimization problem leads to an eigenvalue problem for a low-dimensional subspace construction, preserving certain properties of the original data. However, most of the existing methods use only a few eigenvectors to construct the low-dimensional space, which may lead to a loss of useful information for achieving successful classification. Herein, to overcome the deficiency of the information loss, we propose a novel complex moment-based supervised eigenmap including multiple eigenvectors for dimensionality reduction. Furthermore, the proposed method provides a general formulation for matrix trace optimization methods to incorporate with ridge regression, which models the linear dependency between covariate variables and univariate labels. To reduce the computational complexity, we also propose an efficient and parallel implementation of the proposed method. Numerical experiments indicate that the proposed method is competitive compared with the existing dimensionality reduction methods for the recognition performance. Additionally, the proposed method exhibits high parallel efficiency.
【Keywords】:
【Paper Link】 【Pages】:3919-3926
【Authors】: Akane Iseki ; Yusuke Mukuta ; Yoshitaka Ushiku ; Tatsuya Harada
【Abstract】: Many real-world systems involve interacting time series. The ability to detect causal dependencies between system components from observed time series of their outputs is essential for understanding system behavior. The quantification of causal influences between time series is based on the definition of some causality measure. Partial Canonical Correlation Analysis (Partial CCA) and its extensions are examples of methods used for robustly estimating the causal relationships between two multidimensional time series even when the time series are short. These methods assume that the input data are complete and have no missing values. However, real-world data often contain missing values. It is therefore crucial to estimate the causality measure robustly even when the input time series is incomplete. Treating this problem as a semi-supervised learning problem, we propose a novel semi-supervised extension of probabilistic Partial CCA called semi-Bayesian Partial CCA. Our method exploits the information in samples with missing values to prevent the overfitting of parameter estimation even when there are few complete samples. Experiments based on synthesized and real data demonstrate the ability of the proposed method to estimate causal relationships more correctly than existing methods when the data contain missing values, the dimensionality is large, and the number of samples is small.
【Keywords】:
【Paper Link】 【Pages】:3927-3934
【Authors】: Roxana Istrate ; Florian Scheidegger ; Giovanni Mariani ; Dimitrios S. Nikolopoulos ; Constantine Bekas ; Adelmo Cristiano Innocenza Malossi
【Abstract】: In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated highperformance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the dataset-difficulty which allows us to re-tune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform large-scale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR-10 and 81.01% for CIFAR-100, verified by training. These networks are performance competitive with other automatically discovered state-of-the-art networks however we only needed a small fraction of the time to solution and computational resources.
【Keywords】:
【Paper Link】 【Pages】:3935-3942
【Authors】: Tomoharu Iwata ; Hitoshi Shimizu
【Abstract】: We propose a probabilistic model for estimating population flow, which is defined as populations of the transition between areas over time, given aggregated spatio-temporal population data. Since there is no information about individual trajectories in the aggregated data, it is not straightforward to estimate population flow. With the proposed method, we utilize a collective graphical model with which we can learn individual transition models from the aggregated data by analytically marginalizing the individual locations. Learning a spatio-temporal collective graphical model only from the aggregated data is an ill-posed problem since the number of parameters to be estimated exceeds the number of observations. The proposed method reduces the effective number of parameters by modeling the transition probabilities with a neural network that takes the locations of the origin and the destination areas and the time of day as inputs. By this modeling, we can automatically learn nonlinear spatio-temporal relationships flexibly among transitions, locations, and times. With four real-world population data sets in Japan and China, we demonstrate that the proposed method can estimate the transition population more accurately than existing methods.
【Keywords】:
【Paper Link】 【Pages】:3943-3950
【Authors】: Andrew Jacobsen ; Matthew Schlegel ; Cameron Linke ; Thomas Degris ; Adam White ; Martha White
【Abstract】: This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update—a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the stepsize parameters to minimize prediction error. These metadescent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental metadescent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.
【Keywords】:
【Paper Link】 【Pages】:3951-3958
【Authors】: Vinamra Jain ; Prashant Doshi ; Bikramjit Banerjee
【Abstract】: The problem of learning an expert’s unknown reward function using a limited number of demonstrations recorded from the expert’s behavior is investigated in the area of inverse reinforcement learning (IRL). To gain traction in this challenging and underconstrained problem, IRL methods predominantly represent the reward function of the expert as a linear combination of known features. Most of the existing IRL algorithms either assume the availability of a transition function or provide a complex and inefficient approach to learn it. In this paper, we present a model-free approach to IRL, which casts IRL in the maximum likelihood framework. We present modifications of the model-free Q-learning that replace its maximization to allow computing the gradient of the Q-function. We use gradient ascent to update the feature weights to maximize the likelihood of expert’s trajectories. We demonstrate on two problem domains that our approach improves the likelihood compared to previous methods.
【Keywords】:
【Paper Link】 【Pages】:3959-3966
【Authors】: Jaromír Janisch ; Tomás Pevný ; Viliam Lisý
【Abstract】: We study a classification problem where each feature can be acquired for a cost and the goal is to optimize a trade-off between the expected classification error and the feature cost. We revisit a former approach that has framed the problem as a sequential decision-making problem and solved it by Q-learning with a linear approximation, where individual actions are either requests for feature values or terminate the episode by providing a classification decision. On a set of eight problems, we demonstrate that by replacing the linear approximation with neural networks the approach becomes comparable to the state-of-the-art algorithms developed specifically for this problem. The approach is flexible, as it can be improved with any new reinforcement learning enhancement, it allows inclusion of pre-trained high-performance classifier, and unlike prior art, its performance is robust across all evaluated datasets.
【Keywords】:
【Paper Link】 【Pages】:3967-3974
【Authors】: Neal Jean ; Sherrie Wang ; Anshul Samar ; George Azzari ; David B. Lobell ; Stefano Ermon
【Abstract】: Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to spatially distributed data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations for both image and non-image datasets. Our learned representations significantly improve performance in downstream classification tasks and, similarly to word vectors, allow visual analogies to be obtained via simple arithmetic in the latent space.
【Keywords】:
【Paper Link】 【Pages】:3975-3982
【Authors】: Bin-Bin Jia ; Min-Ling Zhang
【Abstract】: Multi-dimensional classification (MDC) deals with the problem where one instance is associated with multiple class variables, each of which specifies its class membership w.r.t. one specific class space. Existing approaches learn from MDC examples by focusing on modeling dependencies among class variables, while the potential usefulness of manipulating feature space hasn’t been investigated. In this paper, a first attempt towards feature manipulation for MDC is proposed which enriches the original feature space with kNNaugmented features. Specifically, simple counting statistics on the class membership of neighboring MDC examples are used to generate augmented feature vector. In this way, discriminative information from class space is encoded into the feature space to help train the multi-dimensional classification model. To validate the effectiveness of the proposed feature augmentation techniques, extensive experiments over eleven benchmark data sets as well as four state-of-the-art MDC approaches are conducted. Experimental results clearly show that, compared to the original feature space, classification performance of existing MDC approaches can be significantly improved by incorporating kNN-augmented features.
【Keywords】:
【Paper Link】 【Pages】:3983-3990
【Authors】: Bingbing Jiang ; Xingyu Wu ; Kui Yu ; Huanhuan Chen
【Abstract】: With the increasing data dimensionality, feature selection has become a fundamental task to deal with high-dimensional data. Semi-supervised feature selection focuses on the problem of how to learn a relevant feature subset in the case of abundant unlabeled data with few labeled data. In recent years, many semi-supervised feature selection algorithms have been proposed. However, these algorithms are implemented by separating the processes of feature selection and classifier training, such that they cannot simultaneously select features and learn a classifier with the selected features. Moreover, they ignore the difference of reliability inside unlabeled samples and directly use them in the training stage, which might cause performance degradation. In this paper, we propose a joint semi-supervised feature selection and classification algorithm (JSFS) which adopts a Bayesian approach to automatically select the relevant features and simultaneously learn a classifier. Instead of using all unlabeled samples indiscriminately, JSFS associates each unlabeled sample with a self-adjusting weight to distinguish the difference between them, which can effectively eliminate the irrelevant unlabeled samples via introducing a left-truncated Gaussian prior. Experiments on various datasets demonstrate the effectiveness and superiority of JSFS.
【Keywords】:
【Paper Link】 【Pages】:3991-3998
【Authors】: Hansi Jiang ; Haoyu Wang ; Wenhao Hu ; Deovrat Kakde ; Arin Chaudhuri
【Abstract】: Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier αi’s are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only O(k2), where k is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.
【Keywords】:
【Paper Link】 【Pages】:3999-4006
【Authors】: Heinrich Jiang
【Abstract】: We derive high-probability finite-sample uniform rates of consistency for k-NN regression that are optimal up to logarithmic factors under mild assumptions. We moreover show that k-NN regression adapts to an unknown lower intrinsic dimension automatically in the sup-norm. We then apply the k-NN regression rates to establish new results about estimating the level sets and global maxima of a function from noisy observations.
【Keywords】:
【Paper Link】 【Pages】:4007-4014
【Authors】: Jiatao Jiang ; Zhen Cui ; Chunyan Xu ; Jian Yang
【Abstract】: Learning representation on graph plays a crucial role in numerous tasks of pattern recognition. Different from gridshaped images/videos, on which local convolution kernels can be lattices, however, graphs are fully coordinate-free on vertices and edges. In this work, we propose a Gaussianinduced convolution (GIC) framework to conduct local convolution filtering on irregular graphs. Specifically, an edgeinduced Gaussian mixture model is designed to encode variations of subgraph region by integrating edge information into weighted Gaussian models, each of which implicitly characterizes one component of subgraph variations. In order to coarsen a graph, we derive a vertex-induced Gaussian mixture model to cluster vertices dynamically according to the connection of edges, which is approximately equivalent to the weighted graph cut. We conduct our multi-layer graph convolution network on several public datasets of graph classification. The extensive experiments demonstrate that our GIC is effective and can achieve the state-of-the-art results.
【Keywords】:
【Paper Link】 【Pages】:4015-4022
【Authors】: Yue Jiang ; Zhouhui Lian ; Yingmin Tang ; Jianguo Xiao
【Abstract】: Automatic generation of Chinese fonts that consist of large numbers of glyphs with complicated structures is now still a challenging and ongoing problem in areas of AI and Computer Graphics (CG). Traditional CG-based methods typically rely heavily on manual interventions, while recentlypopularized deep learning-based end-to-end approaches often obtain synthesis results with incorrect structures and/or serious artifacts. To address those problems, this paper proposes a structure-guided Chinese font generation system, SCFont, by using deep stacked networks. The key idea is to integrate the domain knowledge of Chinese characters with deep generative networks to ensure that high-quality glyphs with correct structures can be synthesized. More specifically, we first apply a CNN model to learn how to transfer the writing trajectories with separated strokes in the reference font style into those in the target style. Then, we train another CNN model learning how to recover shape details on the contour for synthesized writing trajectories. Experimental results validate the superiority of the proposed SCFont compared to the state of the art in both visual and quantitative assessments.
【Keywords】:
【Paper Link】 【Pages】:4023-4030
【Authors】: Binbin Jin ; Hongke Zhao ; Enhong Chen ; Qi Liu ; Yong Ge
【Abstract】: Crowdfunding is an emerging mechanism for entrepreneurs or individuals to solicit funding from the public for their creative ideas. However, in these platforms, quite a large proportion of campaigns (projects) fail to raise enough money of backers’ supports by the declared expiration date. Actually, it is very urgent to predict the exact success time of campaigns. But this problem has not been well explored due to a series of domain and technical challenges. In this paper, we notice the implicit factor of distribution of backing behaviors has a positive impact on estimating the success time of the campaign. Therefore, we present a focused study on predicting two specific tasks, i.e., backing distribution prediction and success time prediction of campaigns. Specifically, we propose a Seq2seq based model with Multi-facet Priors (SMP), which can integrate heterogeneous features to jointly model the backing distribution and success time. Additionally, to keep the change of backing distributions more smooth as the backing behaviors increases, we develop a linear evolutionary prior for backing distribution prediction. Furthermore, due to high failure rate, the success time of most campaigns is unobservable. We model this censoring phenomenon from the survival analysis perspective and also develop a non-increasing prior and a partial prior for success time prediction. Finally, we conduct extensive experiments on a real-world dataset from Indiegogo. Experimental results clearly validate the effectiveness of SMP.
【Keywords】:
【Paper Link】 【Pages】:4031-4038
【Authors】: Sunghwan Joo ; Sungmin Cha ; Taesup Moon
【Abstract】: We propose DoPAMINE, a new neural network based multiplicative noise despeckling algorithm. Our algorithm is inspired by Neural AIDE (N-AIDE), which is a recently proposed neural adaptive image denoiser. While the original NAIDE was designed for the additive noise case, we show that the same framework, i.e., adaptively learning a network for pixel-wise affine denoisers by minimizing an unbiased estimate of MSE, can be applied to the multiplicative noise case as well. Moreover, we derive a double-sided masked CNN architecture which can control the variance of the activation values in each layer and converge fast to high denoising performance during supervised training. In the experimental results, we show our DoPAMINE possesses high adaptivity via fine-tuning the network parameters based on the given noisy image and achieves significantly better despeckling results compared to SAR-DRN, a state-of-the-art CNN-based algorithm.
【Keywords】:
【Paper Link】 【Pages】:4039-4048
【Authors】: Brendan Juba ; Hai S. Le
【Abstract】: Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set negatively impacts the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of machine learning admits no dependence on the imbalance of classes at all. The basic theorems of statistical learning establish the number of examples needed to estimate the accuracy of a classifier as a function of its complexity (VC-dimension) and the confidence desired; the class imbalance does not enter these formulas anywhere. In this work, we consider the measures of classifier performance in terms of precision and recall, a measure that is widely suggested as more appropriate to the classification of imbalanced data. We observe that whenever the precision is moderately large, the worse of the precision and recall is within a small constant factor of the accuracy weighted by the class imbalance. A corollary of this observation is that a larger number of examples is necessary and sufficient to address class imbalance, a finding we also illustrate empirically.
【Keywords】:
【Paper Link】 【Pages】:4049-4056
【Authors】: Ata Kabán
【Abstract】: Learning from high dimensional data is challenging in general – however, often the data is not truly high dimensional in the sense that it may have some hidden low complexity geometry. We give new, user-friendly PAC-bounds that are able to take advantage of such benign geometry to reduce dimensional-dependence of error-guarantees in settings where such dependence is known to be essential in general. This is achieved by employing random projection as an analytic tool, and exploiting its structure-preserving compression ability. We introduce an auxiliary function class that operates on reduced dimensional inputs, and a new complexity term, as the distortion of the loss under random projections. The latter is a hypothesis-dependent data-complexity, whose analytic estimates turn out to recover various regularisation schemes in parametric models, and a notion of intrinsic dimension, as quantified by the Gaussian width of the input support in the case of the nearest neighbour rule. If there is benign geometry present, then the bounds become tighter, otherwise they recover the original dimension-dependent bounds.
【Keywords】:
【Paper Link】 【Pages】:4057-4064
【Authors】: Zhao Kang ; Yiwei Lu ; Yuanzhang Su ; Changsheng Li ; Zenglin Xu
【Abstract】: Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semisupervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-ofthe-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.
【Keywords】:
【Paper Link】 【Pages】:4065-4072
【Authors】: Rohit Keshari ; Richa Singh ; Mayank Vatsa
【Abstract】: Dropout is often used in deep neural networks to prevent over-fitting. Conventionally, dropout training invokes random drop of nodes from the hidden layers of a Neural Network. It is our hypothesis that a guided selection of nodes for intelligent dropout can lead to better generalization as compared to the traditional dropout. In this research, we propose “guided dropout” for training deep neural network which drop nodes by measuring the strength of each node. We also demonstrate that conventional dropout is a specific case of the proposed guided dropout. Experimental evaluation on multiple datasets including MNIST, CIFAR10, CIFAR100, SVHN, and Tiny ImageNet demonstrate the efficacy of the proposed guided dropout.
【Keywords】:
【Paper Link】 【Pages】:4073-4081
【Authors】: Shun Kiyono ; Jun Suzuki ; Kentaro Inui
【Abstract】: The current success of deep neural networks (DNNs) in an increasingly broad range of tasks involving artificial intelligence strongly depends on the quality and quantity of labeled training data. In general, the scarcity of labeled data, which is often observed in many natural language processing tasks, is one of the most important issues to be addressed. Semisupervised learning (SSL) is a promising approach to overcoming this issue by incorporating a large amount of unlabeled data. In this paper, we propose a novel scalable method of SSL for text classification tasks. The unique property of our method, Mixture of Expert/Imitator Networks, is that imitator networks learn to “imitate” the estimated label distribution of the expert network over the unlabeled data, which potentially contributes a set of features for the classification. Our experiments demonstrate that the proposed method consistently improves the performance of several types of baseline DNNs. We also demonstrate that our method has the more data, better performance property with promising scalability to the amount of unlabeled data.
【Keywords】:
【Paper Link】 【Pages】:4082-4089
【Authors】: Matthew Klawonn ; Eric Heim ; James A. Hendler
【Abstract】: In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via these means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the classes in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.
【Keywords】:
【Paper Link】 【Pages】:4090-4097
【Authors】: Quan Kong ; Bin Tong ; Martin Klinkigt ; Yuki Watanabe ; Naoto Akira ; Tomokazu Murakami
【Abstract】: Sufficient supervised information is crucial for any machine learning models to boost performance. However, labeling data is expensive and sometimes difficult to obtain. Active learning is an approach to acquire annotations for data from a human oracle by selecting informative samples with a high probability to enhance performance. In recent emerging studies, a generative adversarial network (GAN) has been integrated with active learning to generate good candidates to be presented to the oracle. In this paper, we propose a novel model that is able to obtain labels for data in a cheaper manner without the need to query an oracle. In the model, a novel reward for each sample is devised to measure the degree of uncertainty, which is obtained from a classifier trained with existing labeled data. This reward is used to guide a conditional GAN to generate informative samples with a higher probability for a certain label. With extensive evaluations, we have confirmed the effectiveness of the model, showing that the generated samples are capable of improving the classification performance in popular image classification tasks.
【Keywords】:
【Paper Link】 【Pages】:4098-4105
【Authors】: Mark Kozdoba ; Jakub Marecek ; Tigran T. Tchrakian ; Shie Mannor
【Abstract】: The Kalman filter is a key tool for time-series forecasting and analysis. We show that the dependence of a prediction of Kalman filter on the past is decaying exponentially, whenever the process noise is non-degenerate. Therefore, Kalman filter may be approximated by regression on a few recent observations. Surprisingly, we also show that having some process noise is essential for the exponential decay. With no process noise, it may happen that the forecast depends on all of the past uniformly, which makes forecasting more difficult.Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations. We use our decay results to provide the first regret bounds w.r.t. to Kalman filters within learning an LDS. That is, we compare the results of our algorithm to the best, in hindsight, Kalman filter for a given signal. Also, the algorithm is practical: its per-update run-time is linear in the regression depth.
【Keywords】:
【Paper Link】 【Pages】:4106-4113
【Authors】: Atsutoshi Kumagai ; Tomoharu Iwata
【Abstract】: We propose a simple yet effective method for unsupervised domain adaptation. When training and test distributions are different, standard supervised learning methods perform poorly. Semi-supervised domain adaptation methods have been developed for the case where labeled data in the target domain are available. However, the target data are often unlabeled in practice. Therefore, unsupervised domain adaptation, which does not require labels for target data, is receiving a lot of attention. The proposed method minimizes the discrepancy between the source and target distributions of input features by transforming the feature space of the source domain. Since such unilateral transformations transfer knowledge in the source domain to the target one without reducing dimensionality, the proposed method can effectively perform domain adaptation without losing information to be transfered. With the proposed method, it is assumed that the transformed features and the original features differ by a small residual to preserve the relationship between features and labels. This transformation is learned by aligning the higher-order moments of the source and target feature distributions based on the maximum mean discrepancy, which enables to compare two distributions without density estimation. Once the transformation is found, we learn supervised models by using the transformed source data and their labels. We use two real-world datasets to demonstrate experimentally that the proposed method achieves better classification performance than existing methods for unsupervised domain adaptation.
【Keywords】:
【Paper Link】 【Pages】:4114-4121
【Authors】: Richard Kurle ; Stephan Günnemann ; Patrick van der Smagt
【Abstract】: Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned on a different information source. This allows us to relate the sources via the shared latent variables by computing divergence measures between individual source’s posterior approximations. We explore a variety of options to learn these encoders and to integrate the beliefs they compute into a consistent posterior approximation. We visualise learned beliefs on a toy dataset and evaluate our methods for learning shared representations and structured output prediction, showing trade-offs of learning separate encoders for each information source. Furthermore, we demonstrate how conflict detection and redundancy can increase robustness of inference in a multi-source setting.
【Keywords】:
【Paper Link】 【Pages】:4122-4129
【Authors】: Seiichi Kuroki ; Nontawat Charoenphakdee ; Han Bao ; Junya Honda ; Issei Sato ; Masashi Sugiyama
【Abstract】: Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different and labels in the target domain are unavailable. An important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. Existing discrepancy measures for unsupervised domain adaptation either require high computation costs or have no theoretical guarantee. To mitigate these problems, this paper proposes a novel discrepancy measure called source-guided discrepancy (S-disc), which exploits labels in the source domain unlike the existing ones. As a consequence, S-disc can be computed efficiently with a finitesample convergence guarantee. In addition, it is shown that S-disc can provide a tighter generalization error bound than the one based on an existing discrepancy measure. Finally, experimental results demonstrate the advantages of S-disc over the existing discrepancy measures.
【Keywords】:
【Paper Link】 【Pages】:4130-4138
【Authors】: Yi-Yu Lai ; Jennifer Neville ; Dan Goldwasser
【Abstract】: Representation learning (RL) for social networks facilitates real-world tasks such as visualization, link prediction and friend recommendation. Traditional knowledge graph embedding models learn continuous low-dimensional embedding of entities and relations. However, when applied to social networks, existing approaches do not consider the rich textual communications between users, which contains valuable information to describe social relationships. In this paper, we propose TransConv, a novel approach that incorporates textual interactions between pair of users to improve representation learning of both users and relationships. Our experiments on real social network data show TransConv learns better user and relationship embeddings compared to other state-of-theart knowledge graph embedding models. Moreover, the results illustrate that our model is more robust for sparse relationships where there are fewer examples.
【Keywords】:
【Paper Link】 【Pages】:4139-4146
【Authors】: Liang Lan ; Yu Geng
【Abstract】: Factorization Machines (FMs), a general predictor that can efficiently model high-order feature interactions, have been widely used for regression, classification and ranking problems. However, despite many successful applications of FMs, there are two main limitations of FMs: (1) FMs consider feature interactions among input features by using only polynomial expansion which fail to capture complex nonlinear patterns in data. (2) Existing FMs do not provide interpretable prediction to users. In this paper, we present a novel method named Subspace Encoding Factorization Machines (SEFM) to overcome these two limitations by using non-parametric subspace feature mapping. Due to the high sparsity of new feature representation, our proposed method achieves the same time complexity as the standard FMs but can capture more complex nonlinear patterns. Moreover, since the prediction score of our proposed model for a sample is a sum of contribution scores of the bins and grid cells that this sample lies in low-dimensional subspaces, it works similar like a scoring system which only involves data binning and score addition. Therefore, our proposed method naturally provides interpretable prediction. Our experimental results demonstrate that our proposed method efficiently provides accurate and interpretable prediction.
【Keywords】:
【Paper Link】 【Pages】:4147-4154
【Authors】: Jay Yoon Lee ; Sanket Vaibhav Mehta ; Michael Wick ; Jean-Baptiste Tristan ; Jaime G. Carbonell
【Abstract】: Practitioners apply neural networks to increasingly complex problems in natural language processing, such as syntactic parsing and semantic role labeling that have rich output structures. Many such structured-prediction problems require deterministic constraints on the output values; for example, in sequence-to-sequence syntactic parsing, we require that the sequential outputs encode valid trees. While hidden units might capture such properties, the network is not always able to learn such constraints from the training data alone, and practitioners must then resort to post-processing. In this paper, we present an inference method for neural networks that enforces deterministic constraints on outputs without performing rule-based post-processing or expensive discrete search. Instead, in the spirit of gradient-based training, we enforce constraints with gradient-based inference (GBI): for each input at test-time, we nudge continuous model weights until the network’s unconstrained inference procedure generates an output that satisfies the constraints. We study the efficacy of GBI on three tasks with hard constraints: semantic role labeling, syntactic parsing, and sequence transduction. In each case, the algorithm not only satisfies constraints, but improves accuracy, even when the underlying network is stateof-the-art.
【Keywords】:
【Paper Link】 【Pages】:4155-4163
【Authors】: Kyubin Lee ; Akshay Sood ; Mark Craven
【Abstract】: In many application domains, it is important to characterize how complex learned models make their decisions across the distribution of instances. One way to do this is to identify the features and interactions among them that contribute to a model’s predictive accuracy. We present a model-agnostic approach to this task that makes the following specific contributions. Our approach (i) tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model’s loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications.
【Keywords】:
【Paper Link】 【Pages】:4164-4172
【Authors】: Sanghack Lee ; Elias Bareinboim
【Abstract】: Causal knowledge is sought after throughout data-driven fields due to its explanatory power and potential value to inform decision-making. If the targeted system is well-understood in terms of its causal components, one is able to design more precise and surgical interventions so as to bring certain desired outcomes about. The idea of leveraging the causal understanding of a system to improve decision-making has been studied in the literature under the rubric of structural causal bandits (Lee and Bareinboim, 2018). In this setting, (1) pulling an arm corresponds to performing a causal intervention on a set of variables, while (2) the associated rewards are governed by the underlying causal mechanisms. One key assumption of this work is that any observed variable (X) in the system is manipulable, which means that intervening and making do(X = x) is always realizable. In many real-world scenarios, however, this is a too stringent requirement. For instance, while scientific evidence may support that obesity shortens life, it’s not feasible to manipulate obesity directly, but, for example, by decreasing the amount of soda consumption (Pearl, 2018). In this paper, we study a relaxed version of the structural causal bandit problem when not all variables are manipulable. Specifically, we develop a procedure that takes as argument partially specified causal knowledge and identifies the possibly-optimal arms in structural bandits with non-manipulable variables. We further introduce an algorithm that uncovers non-trivial dependence structure among the possibly-optimal arms. Finally, we corroborate our findings with simulations, which shows that MAB solvers enhanced with causal knowledge and leveraging the newly discovered dependence structure among arms consistently outperform causal-insensitive solvers.
【Keywords】:
【Paper Link】 【Pages】:4173-4180
【Authors】: Chunyuan Li ; Changyou Chen ; Yunchen Pu ; Ricardo Henao ; Lawrence Carin
【Abstract】: Learning probability distributions on the weights of neural networks has recently proven beneficial in many applications. Bayesian methods such as Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) offer an elegant framework to reason about model uncertainty in neural networks. However, these advantages usually come with a high computational cost. We propose accelerating SG-MCMC under the masterworker framework: workers asynchronously and in parallel share responsibility for gradient computations, while the master collects the final samples. To reduce communication overhead, two protocols (downpour and elastic) are developed to allow periodic interaction between the master and workers. We provide a theoretical analysis on the finite-time estimation consistency of posterior expectations, and establish connections to sample thinning. Our experiments on various neural networks demonstrate that the proposed algorithms can greatly reduce training time while achieving comparable (or better) test accuracy/log-likelihood levels, relative to traditional SG-MCMC. When applied to reinforcement learning, it naturally provides exploration for asynchronous policy optimization, with encouraging performance improvement.
【Keywords】:
【Paper Link】 【Pages】:4181-4188
【Authors】: Jia Li ; Cong Fang ; Zhouchen Lin
【Abstract】: We propose a new optimization method for training feedforward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feedforward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multiconvex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations. Most notably, we only use the mapping of the activation function itself, rather than its derivative, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. Its parameter tuning is also much simpler. We further prove the convergence of updating the layer-wise weights and activations and point out that the optimization could be made parallel by asynchronous update. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.
【Keywords】:
【Paper Link】 【Pages】:4189-4196
【Authors】: Jingjing Li ; Mengmeng Jing ; Ke Lu ; Lei Zhu ; Yang Yang ; Zi Huang
【Abstract】: Zero-shot learning (ZSL) and cold-start recommendation (CSR) are two challenging problems in computer vision and recommender system, respectively. In general, they are independently investigated in different communities. This paper, however, reveals that ZSL and CSR are two extensions of the same intension. Both of them, for instance, attempt to predict unseen classes and involve two spaces, one for direct feature representation and the other for supplementary description. Yet there is no existing approach which addresses CSR from the ZSL perspective. This work, for the first time, formulates CSR as a ZSL problem, and a tailor-made ZSL method is proposed to handle CSR. Specifically, we propose a Lowrank Linear Auto-Encoder (LLAE), which challenges three cruxes, i.e., domain shift, spurious correlations and computing efficiency, in this paper. LLAE consists of two parts, a low-rank encoder maps user behavior into user attributes and a symmetric decoder reconstructs user behavior from user attributes. Extensive experiments on both ZSL and CSR tasks verify that the proposed method is a win-win formulation, i.e., not only can CSR be handled by ZSL models with a significant performance improvement compared with several conventional state-of-the-art methods, but the consideration of CSR can benefit ZSL as well.
【Keywords】:
【Paper Link】 【Pages】:4197-4204
【Authors】: Linwei Li ; Liangchen Guo ; Zhenying He ; Yinan Jing ; X. Sean Wang
【Abstract】: Text clustering is a widely studied problem in the text mining domain. The Dirichlet Multinomial Mixture (DMM) model based clustering algorithms have shown good performance to cope with high dimensional sparse text data, obtaining reasonable results in both clustering accuracy and computational efficiency. However, the time complexity of DMM model training is proportional to the average document length and the number of clusters, making it inefficient for scaling up to long text and large corpora, which is common in realworld applications such as documents organization, retrieval and recommendation. In this paper, we leverage a symmetric prior setting for Dirichlet distribution, and build indices to decrease the time complexity of the sampling-based training for DMM from O(K∗L) to O(K∗U), where K is the number of clusters, L the average length of document, and U the average number of unique words in each document. We introduce a Metropolis-Hastings sampling algorithm, which further reduces the sampling time complexity from O(K∗U) to O(U) in the nearly-to-convergence training stages. Moreover, we also parallelize the DMM model training to obtain a further acceleration by using an uncollapsed Gibbs sampler. We combine all these optimizations into a highly efficient implementation, called X-DMM, which enables the DMM model to scale up for long and large-scale text clustering. We evaluate the performance of X-DMM on several real world datasets, and the experimental results show that XDMM achieves substantial speed up compared with existing state-of-the-art algorithms without clustering accuracy degradation.
【Keywords】:
【Paper Link】 【Pages】:4205-4212
【Authors】: Ping Li
【Abstract】: The method of 1-bit (“sign-sign”) random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v ∈ ℝD, one can generate x = ∑i=1D uiri, and y = ∑i=1D viri, where ri ∼ N(0, 1) iid. Then one can estimate the cosine similarity ρ from sgn(x) and sgn(y). In this paper, we study a series of estimators for “sign-full” random projections. First we prove E(sgn(x)y) = √2/πρ, which provides an estimator for ρ. Interestingly this estimator can be substantially improved by normalizing y. Then we study estimators based on E (y−1x≥0 + y+1x<0) and its normalized version. We analyze the theoretical limit (using the MLE) and conclude that, among the proposed estimators, no single estimator can achieve (close to) the theoretical optimal asymptotic variance, for the entire range of ρ. On the other hand, the estimators can be combined to achieve the variance close to that of the MLE. In applications such as near neighbor search, duplicate detection, knn-classification, etc, the training data are first transformed via random projections and then only the signs of the projected data points are stored (i.e., the sgn(x)). The original training data are discarded. When a new data point arrives, we apply random projections but we do not necessarily need to quantize the projected data (i.e., the y) to 1-bit. Therefore, sign-full random projections can be practically useful. This gain essentially comes at no additional cost.
【Keywords】:
【Paper Link】 【Pages】:4213-4220
【Authors】: Shihui Li ; Yi Wu ; Xinyue Cui ; Honghua Dong ; Fei Fang ; Stuart J. Russell
【Abstract】: Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.
【Keywords】:
【Paper Link】 【Pages】:4221-4228
【Authors】: Xiang Li ; Ben Kao ; Zhaochun Ren ; Dawei Yin
【Abstract】: A heterogeneous information network (HIN) is one whose objects are of different types and links between objects could model different object relations. We study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures.
【Keywords】:
【Paper Link】 【Pages】:4229-4236
【Authors】: Yanjun Li ; Kai Zhang ; Jun Wang ; Sanjiv Kumar
【Abstract】: Random Fourier features are a powerful framework to approximate shift invariant kernels with Monte Carlo integration, which has drawn considerable interest in scaling up kernel-based learning, dimensionality reduction, and information retrieval. In the literature, many sampling schemes have been proposed to improve the approximation performance. However, an interesting theoretic and algorithmic challenge still remains, i.e., how to optimize the design of random Fourier features to achieve good kernel approximation on any input data using a low spectral sampling rate? In this paper, we propose to compute more adaptive random Fourier features with optimized spectral samples (wj’s) and feature weights (pj’s). The learning scheme not only significantly reduces the spectral sampling rate needed for accurate kernel approximation, but also allows joint optimization with any supervised learning framework. We establish generalization bounds using Rademacher complexity, and demonstrate advantages over previous methods. Moreover, our experiments show that the empirical kernel approximation provides effective regularization for supervised learning.
【Keywords】:
【Paper Link】 【Pages】:4237-4244
【Authors】: Yu-Feng Li ; Hai Wang ; Tong Wei ; Wei-Wei Tu
【Abstract】: Automated Machine Learning (AutoML) aims to build an appropriate machine learning model for any unseen dataset automatically, i.e., without human intervention. Great efforts have been devoted on AutoML while they typically focus on supervised learning. In many applications, however, semisupervised learning (SSL) are widespread and current AutoML systems could not well address SSL problems. In this paper, we propose to present an automated learning system for SSL (AUTO-SSL). First, meta-learning with enhanced meta-features is employed to quickly suggest some instantiations of the SSL techniques which are likely to perform quite well. Second, a large margin separation method is proposed to fine-tune the hyperparameters and more importantly, alleviate performance deterioration. The basic idea is that, if a certain hyperparameter owns a high quality, its predictive results on unlabeled data may have a large margin separation. Extensive empirical results over 200 cases demonstrate that our proposal on one side achieves highly competitive or better performance compared to the state-of-the-art AutoML system AUTO-SKLEARN and classical SSL techniques, on the other side unlike classical SSL techniques which often significantly degenerate performance, our proposal seldom suffers from such deficiency.
【Keywords】:
【Paper Link】 【Pages】:4245-4252
【Authors】: Zejian Li ; Yongchuan Tang ; Wei Li ; Yongxing He
【Abstract】: Unsupervised disentangled representation learning is one of the foundational methods to learn interpretable factors in the data. Existing learning methods are based on the assumption that disentangled factors are mutually independent and incorporate this assumption with the evidence lower bound. However, our experiment reveals that factors in real-world data tend to be pairwise independent. Accordingly, we propose a new method based on a pairwise independence assumption to learn the disentangled representation. The evidence lower bound implicitly encourages mutual independence of latent codes so it is too strong for our assumption. Therefore, we introduce another lower bound in our method. Extensive experiments show that our proposed method gives competitive performances as compared with other state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:4253-4260
【Authors】: Zheng Li ; Ying Wei ; Yu Zhang ; Xiang Zhang ; Xin Li
【Abstract】: Aspect-level sentiment classification (ASC) aims at identifying sentiment polarities towards aspects in a sentence, where the aspect can behave as a general Aspect Category (AC) or a specific Aspect Term (AT). However, due to the especially expensive and labor-intensive labeling, existing public corpora in AT-level are all relatively small. Meanwhile, most of the previous methods rely on complicated structures with given scarce data, which largely limits the efficacy of the neural models. In this paper, we exploit a new direction named coarse-to-fine task transfer, which aims to leverage knowledge learned from a rich-resource source domain of the coarse-grained AC task, which is more easily accessible, to improve the learning in a low-resource target domain of the fine-grained AT task. To resolve both the aspect granularity inconsistency and feature mismatch between domains, we propose a Multi-Granularity Alignment Network (MGAN). In MGAN, a novel Coarse2Fine attention guided by an auxiliary task can help the AC task modeling at the same finegrained level with the AT task. To alleviate the feature false alignment, a contrastive feature alignment method is adopted to align aspect-specific feature representations semantically. In addition, a large-scale multi-domain dataset for the AC task is provided. Empirically, extensive experiments demonstrate the effectiveness of the MGAN.
【Keywords】:
【Paper Link】 【Pages】:4261-4268
【Authors】: Ziyao Li ; Liang Zhang ; Guojie Song
【Abstract】: Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to super-large networks, automatic implementation in distributed learning and further adaptations. We demonstrate the effectiveness of this approach on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks.
【Keywords】:
【Paper Link】 【Pages】:4269-4276
【Authors】: Shangsong Liang
【Abstract】: In this paper, we study the problem of dynamic user profiling in the context of streams of short texts. Previous work on user profiling works with long documents, do not consider collaborative information, and do not diversify the keywords for profiling users’ interests. In contrast, we address the problem by proposing a user profiling algorithm (UPA), which consists of two models: the proposed collaborative interest tracking topic model (CITM) and the proposed streaming keyword diversification model (SKDM). UPA first utilizes CITM to collaboratively track each user’s and his followees’ dynamic interest distributions in the context of streams of short texts, and then utilizes SKDM to obtain top-k relevant and diversified keywords to profile users’ interests at a specific point in time. Experiments were conducted on a Twitter dataset and we found that UPA outperforms state-of-the-art non-dynamic and dynamic user profiling algorithms.
【Keywords】:
【Paper Link】 【Pages】:4277-4286
【Authors】: Yitao Liang ; Guy Van den Broeck
【Abstract】: This paper proposes a new classification model called logistic circuits. On MNIST and Fashion datasets, our learning algorithm outperforms neural networks that have an order of magnitude more parameters. Yet, logistic circuits have a distinct origin in symbolic AI, forming a discriminative counterpart to probabilistic-logical circuits such as ACs, SPNs, and PSDDs. We show that parameter learning for logistic circuits is convex optimization, and that a simple local search algorithm can induce strong model structures from data.
【Keywords】:
【Paper Link】 【Pages】:4287-4294
【Authors】: Siyu Liao ; Bo Yuan
【Abstract】: Deep neural networks (DNNs), especially deep convolutional neural networks (CNNs), have emerged as the powerful technique in various machine learning applications. However, the large model sizes of DNNs yield high demands on computation resource and weight storage, thereby limiting the practical deployment of DNNs. To overcome these limitations, this paper proposes to impose the circulant structure to the construction of convolutional layers, and hence leads to circulant convolutional layers (CircConvs) and circulant CNNs. The circulant structure and models can be either trained from scratch or re-trained from a pre-trained non-circulant model, thereby making it very flexible for different training environments. Through extensive experiments, such strong structureimposing approach is proved to be able to substantially reduce the number of parameters of convolutional layers and enable significant saving of computational cost by using fast multiplication of the circulant tensor.
【Keywords】:
【Paper Link】 【Pages】:4295-4303
【Authors】: Rung-Tzuo Liaw ; Chuan-Kang Ting
【Abstract】: Evolutionary multitasking is a significant emerging search paradigm that utilizes evolutionary algorithms to concurrently optimize multiple tasks. The multi-factorial evolutionary algorithm renders an effectual realization of evolutionary multitasking on two or three tasks. However, there remains room for improvement on the performance and capability of evolutionary multitasking. Beyond three tasks, this paper proposes a novel framework, called the symbiosis in biocoenosis optimization (SBO), to address evolutionary many-tasking optimization. The SBO leverages the notion of symbiosis in biocoenosis for transferring information and knowledge among different tasks through three major components: 1) transferring information through inter-task individual replacement, 2) measuring symbiosis through intertask paired evaluations, and 3) coordinating the frequency and quantity of transfer based on symbiosis in biocoenosis. The inter-task individual replacement with paired evaluations caters for estimation of symbiosis, while the symbiosis in biocoenosis provides a good estimator of transfer. This study examines the effectiveness and efficiency of the SBO on a suite of many-tasking benchmark problems, designed to deal with 30 tasks simultaneously. The experimental results show that SBO leads to better solutions and faster convergence than the state-of-the-art evolutionary multitasking algorithms. Moreover, the results indicate that SBO is highly capable of identifying the similarity between problems and transferring information appropriately.
【Keywords】:
【Paper Link】 【Pages】:4304-4311
【Authors】: Chen Lin ; Xiaolin Shen ; Si Chen ; Muhua Zhu ; Yanghua Xiao
【Abstract】: The study of consumer psychology reveals two categories of consumption decision procedures: compensatory rules and non-compensatory rules. Existing recommendation models which are based on latent factor models assume the consumers follow the compensatory rules, i.e. they evaluate an item over multiple aspects and compute a weighted or/and summated score which is used to derive the rating or ranking of the item. However, it has been shown in the literature of consumer behavior that, consumers adopt non-compensatory rules more often than compensatory rules. Our main contribution in this paper is to study the unexplored area of utilizing non-compensatory rules in recommendation models.Our general assumptions are (1) there are K universal hidden aspects. In each evaluation session, only one aspect is chosen as the prominent aspect according to user preference. (2) Evaluations over prominent and non-prominent aspects are non-compensatory. Evaluation is mainly based on item performance on the prominent aspect. For non-prominent aspects the user sets a minimal acceptable threshold. We give a conceptual model for these general assumptions. We show how this conceptual model can be realized in both pointwise rating prediction models and pair-wise ranking prediction models. Experiments on real-world data sets validate that adopting non-compensatory rules improves recommendation performance for both rating and ranking models.
【Keywords】:
【Paper Link】 【Pages】:4312-4319
【Authors】: Ming Lin ; Shuang Qiu ; Jieping Ye ; Xiaomin Song ; Qi Qian ; Liang Sun ; Shenghuo Zhu ; Rong Jin
【Abstract】: Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank k generalized FM of d dimensional input, the previous best known sampling complexity is O[k3d · polylog(kd)] under Gaussian distribution. This bound is sub-optimal comparing to the information theoretical lower bound O(kd). In this work, we aim to tighten this bound towards optimal and generalize the analysis to sub-gaussian distribution. We prove that when the input data satisfies the so-called τ-Moment Invertible Property, the sampling complexity of generalized FM can be improved to O[k2d · polylog(kd)/τ2]. When the second order self-interaction terms are excluded in the generalized FM, the bound can be improved to the optimal O[kd · polylog(kd)] up to the logarithmic factors. Our analysis also suggests that the positive semi-definite constraint in the conventional FM is redundant as it does not improve the sampling complexity while making the model difficult to optimize. We evaluate our improved FM model in real-time high precision GPS signal calibration task to validate its superiority.
【Keywords】:
【Paper Link】 【Pages】:4320-4327
【Authors】: Zhenhua Lin ; Hongtu Zhu
【Abstract】: We consider the problem of performing dimension reduction on heteroscedastic functional data where the variance is in different scales over entire domain. The aim of this paper is to propose a novel multiscale functional principal component analysis (MFPCA) approach to address such heteroscedastic issue. The key ideas of MFPCA are to partition the whole domain into several subdomains according to the scale of variance, and then to conduct the usual functional principal component analysis (FPCA) on each individual subdomain. Both theoretically and numerically, we show that MFPCA can capture features on areas of low variance without estimating high-order principal components, leading to overall improvement of performance on dimension reduction for heteroscedastic functional data. In contrast, traditional FPCA prioritizes optimizing performance on the subdomain of larger data variance and requires a practically prohibitive number of components to characterize data in the region bearing relatively small variance.
【Keywords】:
【Paper Link】 【Pages】:4328-4335
【Authors】: Ao Liu ; Zhibing Zhao ; Chao Liao ; Pinyan Lu ; Lirong Xia
【Abstract】: We propose an EM-based framework for learning Plackett-Luce model and its mixtures from partial orders. The core of our framework is the efficient sampling of linear extensions of partial orders under Plackett-Luce model. We propose two Markov Chain Monte Carlo (MCMC) samplers: Gibbs sampler and the generalized repeated insertion method tuned by MCMC (GRIM-MCMC), and prove the efficiency of GRIM-MCMC for a large class of preferences.Experiments on synthetic data show that the algorithm with Gibbs sampler outperforms that with GRIM-MCMC. Experiments on real-world data show that the likelihood of test dataset increases when (i) partial orders provide more information; or (ii) the number of components in mixtures of PlackettLuce model increases.
【Keywords】:
【Paper Link】 【Pages】:4336-4343
【Authors】: Ao Liu ; Qiong Wu ; Zhenming Liu ; Lirong Xia
【Abstract】: This paper studies a stylized, yet natural, learning-to-rank problem and points out the critical incorrectness of a widely used nearest neighbor algorithm. We consider a model with n agents (users) {xi}i∈[n] and m alternatives (items) {yl}l∈[m], each of which is associated with a latent feature vector. Agents rank items nondeterministically according to the Plackett-Luce model, where the higher the utility of an item to the agent, the more likely this item will be ranked high by the agent. Our goal is to identify near neighbors of an arbitrary agent in the latent space for prediction.We first show that the Kendall-tau distance based kNN produces incorrect results in our model. Next, we propose a new anchor-based algorithm to find neighbors of an agent. A salient feature of our algorithm is that it leverages the rankings of many other agents (the so-called “anchors”) to determine the closeness/similarities of two agents. We provide a rigorous analysis for one-dimensional latent space, and complement the theoretical results with experiments on synthetic and real datasets. The experiments confirm that the new algorithm is robust and practical.
【Keywords】:
【Paper Link】 【Pages】:4344-4351
【Authors】: Dan Liu ; Dawei Du ; Libo Zhang ; Tiejian Luo ; Yanjun Wu ; Feiyue Huang ; Siwei Lyu
【Abstract】: Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i.e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection. In this paper, we propose a new Scale Invariant Fully Convolutional Network (SIFCN) trained in an end-to-end fashion to detect hands efficiently. Specifically, we merge the feature maps from high to low layers in an iterative way, which handles different scales of hands better with less time overhead comparing to concatenating them simply. Moreover, we develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers to achieve scale invariance. To deal with rotated hand detection, we present the rotation map to get rid of complex rotation and derotation layers. Besides, we design the multi-scale loss scheme to accelerate the training process significantly by adding supervision to the intermediate layers of the network. Compared with the state-of-the-art methods, our algorithm shows comparable accuracy and runs a 4.23 times faster speed on the VIVA dataset and achieves better average precision on Oxford hand detection dataset at a speed of 62.5 fps.
【Keywords】:
【Paper Link】 【Pages】:4352-4359
【Authors】: Guoqing Liu ; Li Zhao ; Feidiao Yang ; Jiang Bian ; Tao Qin ; Nenghai Yu ; Tie-Yan Liu
【Abstract】: Evolution Strategies (ES), a class of black-box optimization algorithms, has recently been demonstrated to be a viable alternative to popular MDP-based RL techniques such as Qlearning and Policy Gradients. ES achieves fairly good performance on challenging reinforcement learning problems and is easier to scale in a distributed setting. However, standard ES algorithms perform one gradient update per data sample, which is not very efficient. In this paper, with the purpose of more efficient using of sampled data, we propose a novel iterative procedure that optimizes a surrogate objective function, enabling to reuse data sample for multiple epochs of updates. We prove monotonic improvement guarantee for such procedure. By making several approximations to the theoretically-justified procedure, we further develop a practical algorithm called Trust Region Evolution Strategies (TRES). Our experiments demonstrate the effectiveness of TRES on a range of popular MuJoCo locomotion tasks in the OpenAI Gym, achieving better performance than ES algorithm.
【Keywords】:
【Paper Link】 【Pages】:4360-4367
【Authors】: Pengfei Liu ; Jie Fu ; Yue Dong ; Xipeng Qiu ; Jackie Chi Kit Cheung
【Abstract】: We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different tasks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.
【Keywords】:
【Paper Link】 【Pages】:4368-4375
【Authors】: Risheng Liu ; Yuxi Zhang ; Shichao Cheng ; Xin Fan ; Zhongxuan Luo
【Abstract】: Magnetic Resonance Imaging (MRI) is one of the most dynamic and safe imaging techniques available for clinical applications. However, the rather slow speed of MRI acquisitions limits the patient throughput and potential indications. Compressive Sensing (CS) has proven to be an efficient technique for accelerating MRI acquisition. The most widely used CS-MRI model, founded on the premise of reconstructing an image from an incompletely filled k-space, leads to an ill-posed inverse problem. In the past years, lots of efforts have been made to efficiently optimize the CS-MRI model. Inspired by deep learning techniques, some preliminary works have tried to incorporate deep architectures into CS-MRI process. Unfortunately, the convergence issues (due to the experience-based networks) and the robustness (i.e., lack real-world noise modeling) of these deeply trained optimization methods are still missing. In this work, we develop a new paradigm to integrate designed numerical solvers and the data-driven architectures for CS-MRI. By introducing an optimal condition checking mechanism, we can successfully prove the convergence of our established deep CS-MRI optimization scheme. Furthermore, we explicitly formulate the Rician noise distributions within our framework and obtain an extended CS-MRI network to handle the real-world nosies in the MRI process. Extensive experimental results verify that the proposed paradigm outperforms the existing state-of-theart techniques both in reconstruction accuracy and efficiency as well as robustness to noises in real scene.
【Keywords】:
【Paper Link】 【Pages】:4376-4383
【Authors】: Rui Liu ; Tianyi Wu ; Barzan Mozafari
【Abstract】: There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of subsequent queries is not sufficiently large to amortize the cost. Furthermore, existing methods do not have the ability to directly control the suboptimality of their approximate results with theoretical guarantees. In this paper, we propose the first approximate algorithm for MIPS that does not require any preprocessing, and allows users to control and bound the suboptimality of the results. We cast MIPS as a Best Arm Identification problem, and introduce a new bandit setting that can fully exploit the special structure of MIPS. Our approach outperforms state-of-the-art methods on both synthetic and real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:4384-4391
【Authors】: Vincent Liu ; Raksha Kumaraswamy ; Lei Le ; Martha White
【Abstract】: We investigate sparse representations for control in reinforcement learning. While these representations are widely used in computer vision, their prevalence in reinforcement learning is limited to sparse coding where extracting representations for new data can be computationally intensive. Here, we begin by demonstrating that learning a control policy incrementally with a representation from a standard neural network fails in classic control domains, whereas learning with a representation obtained from a neural network that has sparsity properties enforced is effective. We provide evidence that the reason for this is that the sparse representation provides locality, and so avoids catastrophic interference, and particularly keeps consistent, stable values for bootstrapping. We then discuss how to learn such sparse representations. We explore the idea of Distributional Regularizers, where the activation of hidden nodes is encouraged to match a particular distribution that results in sparse activation across time. We identify a simple but effective way to obtain sparse representations, not afforded by previously proposed strategies, making it more practical for further investigation into sparse representations for reinforcement learning.
【Keywords】:
【Paper Link】 【Pages】:4392-4399
【Authors】: Xinwang Liu ; Xinzhong Zhu ; Miaomiao Li ; Chang Tang ; En Zhu ; Jianping Yin ; Wen Gao
【Abstract】: Incomplete multi-view clustering (IMVC) optimally fuses multiple pre-specified incomplete views to improve clustering performance. Among various excellent solutions, the recently proposed multiple kernel k-means with incomplete kernels (MKKM-IK) forms a benchmark, which redefines IMVC as a joint optimization problem where the clustering and kernel matrix imputation tasks are alternately performed until convergence. Though demonstrating promising performance in various applications, we observe that the manner of kernel matrix imputation in MKKM-IK would incur intensive computational and storage complexities, overcomplicated optimization and limitedly improved clustering performance. In this paper, we propose an Efficient and Effective Incomplete Multi-view Clustering (EE-IMVC) algorithm to address these issues. Instead of completing the incomplete kernel matrices, EE-IMVC proposes to impute each incomplete base matrix generated by incomplete views with a learned consensus clustering matrix. We carefully develop a three-step iterative algorithm to solve the resultant optimization problem with linear computational complexity and theoretically prove its convergence. Further, we conduct comprehensive experiments to study the proposed EE-IMVC in terms of clustering accuracy, running time, evolution of the learned consensus clustering matrix and the convergence. As indicated, our algorithm significantly and consistently outperforms some state-of-the-art algorithms with much less running time and memory.
【Keywords】:
【Paper Link】 【Pages】:4400-4407
【Authors】: Xuanwu Liu ; Guoxian Yu ; Carlotta Domeniconi ; Jun Wang ; Yazhou Ren ; Maozu Guo
【Abstract】: Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing hashing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper, we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next, to expand the semantic representation power of hand-crafted features, RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications.
【Keywords】:
【Paper Link】 【Pages】:4408-4415
【Authors】: Yanbin Liu ; Yan Yan ; Ling Chen ; Yahong Han ; Yi Yang
【Abstract】: In this paper, we propose a new online feature selection algorithm for streaming data. We aim to focus on the following two problems which remain unaddressed in literature. First, most existing online feature selection algorithms merely utilize the first-order information of the data streams, regardless of the fact that second-order information explores the correlations between features and significantly improves the performance. Second, most online feature selection algorithms are based on the balanced data presumption, which is not true in many real-world applications. For example, in fraud detection, the number of positive examples are much less than negative examples because most cases are not fraud. The balanced assumption will make the selected features biased towards the majority class and fail to detect the fraud cases. We propose an Adaptive Sparse Confidence-Weighted (ASCW) algorithm to solve the aforementioned two problems. We first introduce an `0-norm constraint into the second-order confidence-weighted (CW) learning for feature selection. Then the original loss is substituted with a cost-sensitive loss function to address the imbalanced data issue. Furthermore, our algorithm maintains multiple sparse CW learner with the corresponding cost vector to dynamically select an optimal cost. We theoretically enhance the theory of sparse CW learning and analyze the performance behavior in F-measure. Empirical studies show the superior performance over the stateof-the-art online learning methods in the online-batch setting.
【Keywords】:
【Paper Link】 【Pages】:4416-4423
【Authors】: Zhao-Yang Liu ; Sheng-Jun Huang
【Abstract】: Open-set classification is a common problem in many real world tasks, where data is collected for known classes, and some novel classes occur at the test stage. In this paper, we focus on a more challenging case where the data examples collected for known classes are all unlabeled. Due to the high cost of label annotation, it is rather important to train a model with least labeled data for both accurate classification on known classes and effective detection of novel classes. Firstly, we propose an active learning method by incorporating structured sparsity with diversity to select representative examples for annotation. Then a latent low-rank representation is employed to simultaneously perform classification and novel class detection. Also, the method along with a fast optimization solution is extended to a multi-stage scenario, where classes occur and disappear in batches at each stage. Experimental results on multiple datasets validate the superiority of the proposed method with regard to different performance measures.
【Keywords】:
【Paper Link】 【Pages】:4424-4431
【Authors】: Ziqi Liu ; Chaochao Chen ; Longfei Li ; Jun Zhou ; Xiaolong Li ; Le Song ; Yuan Qi
【Abstract】: We present, GeniePath, a scalable approach for learning adaptive receptive fields of neural networks defined on permutation invariant graph data. In GeniePath, we propose an adaptive path layer consists of two complementary functions designed for breadth and depth exploration respectively, where the former learns the importance of different sized neighborhoods, while the latter extracts and filters signals aggregated from neighbors of different hops away. Our method works in both transductive and inductive settings, and extensive experiments compared with competitive methods show that our approaches yield state-of-the-art results on large graphs.
【Keywords】:
【Paper Link】 【Pages】:4432-4439
【Authors】: Guansong Lu ; Zhiming Zhou ; Yuxuan Song ; Kan Ren ; Yong Yu
【Abstract】: CycleGAN is capable of learning a one-to-one mapping between two data distributions without paired examples, achieving the task of unsupervised data translation. However, there is no theoretical guarantee on the property of the learned one-to-one mapping in CycleGAN. In this paper, we experimentally find that, under some circumstances, the one-to-one mapping learned by CycleGAN is just a random one within the large feasible solution space. Based on this observation, we explore to add extra constraints such that the one-to-one mapping is controllable and satisfies more properties related to specific tasks. We propose to solve an optimal transport mapping restrained by a task-specific cost function that reflects the desired properties, and use the barycenters of optimal transport mapping to serve as references for CycleGAN. Our experiments indicate that the proposed algorithm is capable of learning a one-to-one mapping with the desired properties.
【Keywords】:
【Paper Link】 【Pages】:4440-4447
【Authors】: Yao Lu ; Guangming Lu ; Bob Zhang ; Yuanrong Xu ; Jinxing Li
【Abstract】: To construct small mobile networks without performance loss and address the over-fitting issues caused by the less abundant training datasets, this paper proposes a novel super sparse convolutional (SSC) kernel, and its corresponding network is called SSC-Net. In a SSC kernel, every spatial kernel has only one non-zero parameter and these non-zero spatial positions are all different. The SSC kernel can effectively select the pixels from the feature maps according to its non-zero positions and perform on them. Therefore, SSC can preserve the general characteristics of the geometric and the channels’ differences, resulting in preserving the quality of the retrieved features and meeting the general accuracy requirements. Furthermore, SSC can be entirely implemented by the “shift” and “group point-wise” convolutional operations without any spatial kernels (e.g., “3×3”). Therefore, SSC is the first method to remove the parameters’ redundancy from the both spatial extent and the channel extent, leading to largely decreasing the parameters and Flops as well as further reducing the img2col and col2img operations implemented by the low leveled libraries. Meanwhile, SSC-Net can improve the sparsity and overcome the over-fitting more effectively than the other mobile networks. Comparative experiments were performed on the less abundant CIFAR and low resolution ImageNet datasets. The results showed that the SSC-Nets can significantly decrease the parameters and the computational Flops without any performance losses. Additionally, it can also improve the ability of addressing the over-fitting problem on the more challenging less abundant datasets.
【Keywords】:
【Paper Link】 【Pages】:4448-4455
【Authors】: You Lu ; Zhiyuan Liu ; Bert Huang
【Abstract】: Traditional learning methods for training Markov random fields require doing inference over all variables to compute the likelihood gradient. The iteration complexity for those methods therefore scales with the size of the graphical models. In this paper, we propose block belief propagation learning (BBPL), which uses block-coordinate updates of approximate marginals to compute approximate gradients, removing the need to compute inference on the entire graphical model. Thus, the iteration complexity of BBPL does not scale with the size of the graphs. We prove that the method converges to the same solution as that obtained by using full inference per iteration, despite these approximations, and we empirically demonstrate its scalability improvements over standard training methods.
【Keywords】:
【Paper Link】 【Pages】:4456-4463
【Authors】: Yuanfu Lu ; Chuan Shi ; Linmei Hu ; Zhiyuan Liu
【Abstract】: Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.
【Keywords】:
【Paper Link】 【Pages】:4464-4471
【Authors】: Chen Luo ; Anshumali Shrivastava
【Abstract】: Split-Merge MCMC (Monte Carlo Markov Chain) is one of the essential and popular variants of MCMC for problems when an MCMC state consists of an unknown number of components. It is well known that state-of-the-art methods for split-merge MCMC do not scale well. Strategies for rapid mixing requires smart and informative proposals to reduce the rejection rate. However, all known smart proposals involve expensive operations to suggest informative transitions. As a result, the cost of each iteration is prohibitive for massive scale datasets. It is further known that uninformative but computationally efficient proposals, such as random split-merge, leads to extremely slow convergence. This tradeoff between mixing time and per update cost seems hard to get around.We leverage some unique properties of weighted MinHash, which is a popular LSH, to design a novel class of split-merge proposals which are significantly more informative than random sampling but at the same time efficient to compute. Overall, we obtain a superior tradeoff between convergence and per update cost. As a direct consequence, our proposals are around 6X faster than the state-of-the-art sampling methods on two large real datasets KDDCUP and PubMed with several millions of entities and thousands of clusters.
【Keywords】:
【Paper Link】 【Pages】:4472-4479
【Authors】: Lei Luo ; Jie Xu ; Cheng Deng ; Heng Huang
【Abstract】: Dictionary Learning (DL) plays a crucial role in numerous machine learning tasks. It targets at finding the dictionary over which the training set admits a maximally sparse representation. Most existing DL algorithms are based on solving an optimization problem, where the noise variance and sparsity level should be known as the prior knowledge. However, in practice applications, it is difficult to obtain these knowledge. Thus, non-parametric Bayesian DL has recently received much attention of researchers due to its adaptability and effectiveness. Although many hierarchical priors have been used to promote the sparsity of the representation in non-parametric Bayesian DL, the problem of redundancy for the dictionary is still overlooked, which greatly decreases the performance of sparse coding. To address this problem, this paper presents a novel robust dictionary learning framework via Bayesian inference. In particular, we employ the orthogonality-promoting regularization to mitigate correlations among dictionary atoms. Such a regularization, encouraging the dictionary atoms to be close to being orthogonal, can alleviate overfitting to training data and improve the discrimination of the model. Moreover, we impose Scale mixture of the Vector variate Gaussian (SMVG) distribution on the noise to capture its structure. A Regularized Expectation Maximization Algorithm is developed to estimate the posterior distribution of the representation and dictionary with orthogonality-promoting regularization. Numerical results show that our method can learn the dictionary with an accuracy better than existing methods, especially when the number of training signals is limited.
【Keywords】:
【Paper Link】 【Pages】:4480-4487
【Authors】: Lei Luo ; Jie Xu ; Cheng Deng ; Heng Huang
【Abstract】: In recent research, metric learning methods have attracted increasing interests in machine learning community and have been applied to many applications. However, the existing metric learning methods usually use a fixed L2-norm to measure the distance between pairwise data samples in the projection space, which cannot provide an effective mechanism to automatically remove the noise that exist in data samples. To address this issue, we propose a new robust formulation of metric learning. Our new model constructs a projection from higher dimensional Grassmann manifold into the one in a relative low-dimensional with more discriminative capability, where the errors between sample points are considered as an MLE (maximum likelihood estimation)-like estimator. An efficient iteratively reweighted algorithm is derived to solve the proposed metric learning model. More importantly, we establish the generalization bounds for the proposed algorithm by utilizing the techniques of U-statistics. Experiments on six benchmark datasets clearly show that the proposed method achieves consistent improvements in discrimination accuracy, in comparison to state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:4488-4495
【Authors】: Simon Luo ; Mahito Sugiyama
【Abstract】: Hierarchical probabilistic models are able to use a large number of parameters to create a model with a high representation power. However, it is well known that increasing the number of parameters also increases the complexity of the model which leads to a bias-variance trade-off. Although it is a classical problem, the bias-variance trade-off between hiddenlayers and higher-order interactions have not been well studied. In our study, we propose an efficient inference algorithm for the log-linear formulation of the higher-order Boltzmann machine using a combination of Gibbs sampling and annealed importance sampling. We then perform a bias-variance decomposition to study the differences in hidden layers and higher-order interactions. Our results have shown that using hidden layers and higher-order interactions have a comparable error with a similar order of magnitude and using higherorder interactions produce less variance for smaller sample size.
【Keywords】:
【Paper Link】 【Pages】:4496-4503
【Authors】: Siqiang Luo
【Abstract】: PageRank is a classic measure that effectively evaluates the node importance in large graphs, and has been applied in numerous applications ranging from data mining, Web algorithms, recommendation systems, load balancing, search, and identifying connectivity structures. Computing PageRank for large graphs is challenging and this has motivated the studies of distributed algorithms to compute PageRank. Previously, little works have been spent on the distributed PageRank algorithms with provably desired complexity and accuracy. Given a graph with n nodes and if we model the distributed computation model as the well-known congested clique model, the state-of-the-art algorithm takes O(√logn) communication rounds to approximate the PageRank value of each node in G, with a probability at least 1−1/n. In this paper, we present improved distributed algorithms for computing PageRank. Particularly, our algorithm performs O(log log√n) rounds (a significant improvement compared with O(√logn) rounds) to approximate the PageRank values with a probability at least 1−1/n. Moreover, under a reasonable assumption, our algorithm also reduces the edge bandwidth (i.e., the maximum communication message size that can be exchanged through an edge during a communication round) by a O(logn) factor compared with the state-of-the-art algorithm. Finally, we show that our algorithm can be adapted to efficiently compute another variant of PageRank, i.e., the batch one-hop Personalized PageRanks, in O(log logn) communication rounds.
【Keywords】:
【Paper Link】 【Pages】:4504-4511
【Authors】: Clare Lyle ; Marc G. Bellemare ; Pablo Samuel Castro
【Abstract】: Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.
【Keywords】:
【Paper Link】 【Pages】:4512-4519
【Authors】: Shuai Ma ; Jia Yuan Yu
【Abstract】: In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We propose three successively more general state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the reward simplification, and show how the SATs enable a variance formula to work on Markov processes with general reward functions.
【Keywords】:
【Paper Link】 【Pages】:4520-4527
【Authors】: Yuchao Ma ; Hassan Ghasemzadeh
【Abstract】: Activity recognition is central to many motion analysis applications ranging from health assessment to gaming. However, the need for obtaining sufficiently large amounts of labeled data has limited the development of personalized activity recognition models. Semi-supervised learning has traditionally been a promising approach in many application domains to alleviate reliance on large amounts of labeled data by learning the label information from a small set of seed labels. Nonetheless, existing approaches perform poorly in highly dynamic settings, such as wearable systems, because some algorithms rely on predefined hyper-parameters or distribution models that needs to be tuned for each user or context. To address these challenges, we introduce LabelForest 1, a novel non-parametric semi-supervised learning framework for activity recognition. LabelForest has two algorithms at its core: (1) a spanning forest algorithm for sample selection and label inference; and (2) a silhouette-based filtering method to finalize label augmentation for machine learning model training. Our thorough analysis on three human activity datasets demonstrate that LabelForest achieves a labeling accuracy of 90.1% in presence of a skewed label distribution in the seed data. Compared to self-training and other sequential learning algorithms, LabelForest achieves up to 56.9% and 175.3% improvement in the accuracy on balanced and unbalanced seed data, respectively.
【Keywords】:
【Paper Link】 【Pages】:4528-4535
【Authors】: Kehelwala D. G. Maduranga ; Kyle E. Helfrich ; Qiang Ye
【Abstract】: Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well known difficulty in using RNNs is the vanishing or exploding gradient problem. Recently, there have been several different RNN architectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN) which parameterizes the orthogonal recurrent weight matrix through a scaled Cayley transform. This parametrization contains a diagonal scaling matrix consisting of positive or negative one entries that can not be optimized by gradient descent. Thus the scaling matrix is fixed before training and a hyperparameter is introduced to tune the matrix for each particular task. In this paper, we develop a unitary RNN architecture based on a complex scaled Cayley transform. Unlike the real orthogonal case, the transformation uses a diagonal scaling matrix consisting of entries on the complex unit circle which can be optimized using gradient descent and no longer requires the tuning of a hyperparameter. We also provide an analysis of a potential issue of the modReLU activiation function which is used in our work and several other unitary RNNs. In the experiments conducted, the scaled Cayley unitary recurrent neural network (scuRNN) achieves comparable or better results than scoRNN and other unitary RNNs without fixing the scaling matrix.
【Keywords】:
【Paper Link】 【Pages】:4536-4543
【Authors】: Saeed Mahloujifar ; Dimitrios I. Diochnos ; Mohammad Mahmoody
【Abstract】: Many modern machine learning classifiers are shown to be vulnerable to adversarial perturbations of the instances. Despite a massive amount of work focusing on making classifiers robust, the task seems quite challenging. In this work, through a theoretical study, we investigate the adversarial risk and robustness of classifiers and draw a connection to the well-known phenomenon of “concentration of measure” in metric measure spaces. We show that if the metric probability space of the test instance is concentrated, any classifier with some initial constant error is inherently vulnerable to adversarial perturbations.One class of concentrated metric probability spaces are the so-called Lévy families that include many natural distributions. In this special case, our attacks only need to perturb the test instance by at most O(√n) to make it misclassified, where n is the data dimension. Using our general result about Lévy instance spaces, we first recover as special case some of the previously proved results about the existence of adversarial examples. However, many more Lévy families are known (e.g., product distribution under the Hamming distance) for which we immediately obtain new attacks that find adversarial examples of distance O(√n).Finally, we show that concentration of measure for product spaces implies the existence of forms of “poisoning” attacks in which the adversary tampers with the training data with the goal of degrading the classifier. In particular, we show that for any learning algorithm that uses m training examples, there is an adversary who can increase the probability of any “bad property” (e.g., failing on a particular test instance) that initially happens with non-negligible probability to ≈ 1 by substituting only Õe(√m) of the examples with other (still correctly labeled) examples.
【Keywords】:
【Paper Link】 【Pages】:4544-4551
【Authors】: Maggie Makar ; Adith Swaminathan ; Emre Kiciman
【Abstract】: The potential for using machine learning algorithms as a tool for suggesting optimal interventions has fueled significant interest in developing methods for estimating heterogeneous or individual treatment effects (ITEs) from observational data. While several methods for estimating ITEs have been recently suggested, these methods assume no constraints on the availability of data at the time of deployment or test time. This assumption is unrealistic in settings where data acquisition is a significant part of the analysis pipeline, meaning data about a test case has to be collected in order to predict the ITE. In this work, we present Data Efficient Individual Treatment Effect Estimation (DEITEE), a method which exploits the idea that adjusting for confounding, and hence collecting information about confounders, is not necessary at test time. DEITEE allows the development of rich models that exploit all variables at train time but identifies a minimal set of variables required to estimate the ITE at test time. Using 77 semi-synthetic datasets with varying data generating processes, we show that DEITEE achieves significant reductions in the number of variables required at test time with little to no loss in accuracy. Using real data, we demonstrate the utility of our approach in helping soon-to-be mothers make planning and lifestyle decisions that will impact newborn health.
【Keywords】:
【Paper Link】 【Pages】:4552-4560
【Authors】: André Gustavo Maletzke ; Denis Moreira dos Reis ; Everton Alvares Cherman ; Gustavo E. A. P. A. Batista
【Abstract】: Quantification is an expanding research topic in Machine Learning literature. While in classification we are interested in obtaining the class of individual observations, in quantification we want to estimate the total number of instances that belong to each class. This subtle difference allows the development of several algorithms that incur smaller and more consistent errors than counting the classes issued by a classifier. Among such new quantification methods, one particular family stands out due to its accuracy, simplicity, and ability to operate with imbalanced training samples: Mixture Models (MM). Despite these desirable traits, MM, as a class of algorithms, lacks a more in-depth understanding concerning the influence of internal parameters on its performance. In this paper, we generalize MM with a base framework called DyS: Distribution y-Similarity. With this framework, we perform a thorough evaluation of the most critical design decisions of MM models. For instance, we assess 15 dissimilarity functions to compare histograms with varying numbers of bins from 2 to 110 and, for the first time, make a connection between quantification accuracy and test sample size, with experiments covering 24 public benchmark datasets. We conclude that, when tuned, Topsøe is the histogram distance function that consistently leads to smaller quantification errors and, therefore, is recommended to general use, notwithstanding Hellinger Distance’s popularity. To rid MM models of the dependency on a choice for the number of histogram bins, we introduce two dissimilarity functions that can operate directly on observations. We show that SORD, one of such measures, presents performance that is slightly inferior to the tuned Topsøe, while not requiring the sensible parameterization of the number of bins.
【Keywords】:
【Paper Link】 【Pages】:4561-4569
【Authors】: Raghuram Mandyam Annasamy ; Katia P. Sycara
【Abstract】: Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model’s behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.
【Keywords】:
【Paper Link】 【Pages】:4570-4577
【Authors】: Ryan McBride ; Ke Wang ; Zhouyang Ren ; Wenyuan Li
【Abstract】: We formulate the Cost-Sensitive Learning to Rank problem of learning to prioritize limited resources to mitigate the most costly outcomes. We develop improved ranking models to solve this problem, as verified by experiments in diverse domains such as forest fire prevention, crime prevention, and preventing storm caused outages in electrical networks.
【Keywords】:
【Paper Link】 【Pages】:4578-4585
【Authors】: Shaobo Min ; Xuejin Chen ; Zheng-Jun Zha ; Feng Wu ; Yongdong Zhang
【Abstract】: Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.
【Keywords】:
【Paper Link】 【Pages】:4586-4593
【Authors】: Di Ming ; Chris Ding ; Feiping Nie
【Abstract】: LASSO and ℓ2,1-norm based feature selection had achieved success in many application areas. In this paper, we first derive LASSO and ℓ1,2-norm feature selection from a probabilistic framework, which provides an independent point of view from the usual sparse coding point of view. From here, we further propose a feature selection approach based on the probability-derived ℓ1,2-norm. We point out some inflexibility in the standard feature selection that the feature selected for all different classes are enforced to be exactly the same using the widely used ℓ2,1-norm, which enforces the joint sparsity across all the data instances. Using the probabilityderived ℓ1,2-norm feature selection, allowing certain flexibility that the selected features do not have to be exactly same for all classes, the resulting features lead to better classification on six benchmark datasets.
【Keywords】:
【Paper Link】 【Pages】:4594-4601
【Authors】: Kohei Miyaguchi ; Hiroshi Kajino
【Abstract】: We approach the time-series forecasting problem in the presence of concept drift by automatic learning rate tuning of stochastic gradient descent (SGD). The SGD-based approach is preferable to other concept drift algorithms in that it can be applied to any model and it can keep learning efficiently whilst predicting online. Among a number of SGD algorithms, the variance-based SGD (vSGD) can successfully handle concept drift by automatic learning rate tuning, which is reduced to an adaptive mean estimation problem. However, its performance is still limited because of its heuristic mean estimator. In this paper, we present a concept-drift-aware stochastic gradient descent (Cogra), equipped with more theoretically-sound mean estimator called sequential mean tracker (SMT). Our key contribution is that we define a goodness criterion for the mean estimators; SMT is designed to be optimal according to this criterion. As a result of comprehensive experiments, we find that (i) our SMT can estimate the mean better than vSGD’s estimator in the presence of concept drift, and (ii) in terms of predictive performance, Cogra reduces the predictive loss by 16–67% for real-world datasets, indicating that SMT improves the prediction accuracy significantly.
【Keywords】:
【Paper Link】 【Pages】:4602-4609
【Authors】: Christopher Morris ; Martin Ritzert ; Matthias Fey ; William L. Hamilton ; Jan Eric Lenssen ; Gaurav Rattan ; Martin Grohe
【Abstract】: In recent years, graph neural networks (GNNs) have emerged as a powerful neural architecture to learn vector representations of nodes and graphs in a supervised, end-to-end fashion. Up to now, GNNs have only been evaluated empirically—showing promising results. The following work investigates GNNs from a theoretical point of view and relates them to the 1-dimensional Weisfeiler-Leman graph isomorphism heuristic (1-WL). We show that GNNs have the same expressiveness as the 1-WL in terms of distinguishing non-isomorphic (sub-)graphs. Hence, both algorithms also have the same shortcomings. Based on this, we propose a generalization of GNNs, so-called k-dimensional GNNs (k-GNNs), which can take higher-order graph structures at multiple scales into account. These higher-order structures play an essential role in the characterization of social networks and molecule graphs. Our experimental evaluation confirms our theoretical findings as well as confirms that higher-order information is useful in the task of graph classification and regression.
【Keywords】:
【Paper Link】 【Pages】:4610-4617
【Authors】: Sudipto Mukherjee ; Himanshu Asnani ; Eugene Lin ; Sreeram Kannan
【Abstract】: Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.
【Keywords】:
【Paper Link】 【Pages】:4618-4625
【Authors】: So Nakashima ; Takanori Maehara
【Abstract】: The subspace selection problem seeks a subspace that maximizes an objective function under some constraint. This problem includes several important machine learning problems such as the principal component analysis and sparse dictionary selection problem. Often, these problems can be (exactly or approximately) solved using greedy algorithms. Here, we are interested in why these problems can be solved by greedy algorithms, and what classes of objective functions and constraints admit this property.In this study, we focus on the fact that the set of subspaces forms a lattice, then formulate the problems as optimization problems on lattices. Then, we introduce a new class of functions on lattices, directional DR-submodular functions, to characterize the approximability of problems. We prove that the principal component analysis, sparse dictionary selection problem, and these generalizations are monotone directional DRsubmodularity functions. We also prove the “quantum version” of the cut function is a non-monotone directional DR submodular function. Using these results, we propose new solvable feature selection problems (generalized principal component analysis and quantum maximum cut problem), and improve the approximation ratio of the sparse dictionary selection problem in certain instances.We show that, under several constraints, the directional DRsubmodular function maximization problem can be solved efficiently with provable approximation factors.
【Keywords】:
【Paper Link】 【Pages】:4626-4633
【Authors】: Hyoungwook Nam ; Segwang Kim ; Kyomin Jung
【Abstract】: Inspired by number series tests to measure human intelligence, we suggest number sequence prediction tasks to assess neural network models’ computational powers for solving algorithmic problems. We define the complexity and difficulty of a number sequence prediction task with the structure of the smallest automaton that can generate the sequence. We suggest two types of number sequence prediction problems: the number-level and the digit-level problems. The number-level problems format sequences as 2-dimensional grids of digits and the digit-level problems provide a single digit input per a time step. The complexity of a number-level sequence prediction can be defined with the depth of an equivalent combinatorial logic, and the complexity of a digit-level sequence prediction can be defined with an equivalent state automaton for the generation rule. Experiments with number-level sequences suggest that CNN models are capable of learning the compound operations of sequence generation rules, but the depths of the compound operations are limited. For the digitlevel problems, simple GRU and LSTM models can solve some problems with the complexity of finite state automata. Memory augmented models such as Stack-RNN, Attention, and Neural Turing Machines can solve the reverse-order task which has the complexity of simple pushdown automaton. However, all of above cannot solve general Fibonacci, Arithmetic or Geometric sequence generation problems that represent the complexity of queue automata or Turing machines. The results show that our number sequence prediction problems effectively evaluate machine learning models’ computational capabilities.
【Keywords】:
【Paper Link】 【Pages】:4634-4641
【Authors】: Yusuke Narita ; Shota Yasui ; Kohei Yata
【Abstract】: What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-theart benchmark.
【Keywords】:
【Paper Link】 【Pages】:4642-4650
【Authors】: Pablo Navarrete Michelini ; Hanwen Liu ; Dan Zhu
【Abstract】: We introduce a novel deep–learning architecture for image upscaling by large factors (e.g. 4×, 8×) based on examples of pristine high–resolution images. Our target is to reconstruct high–resolution images from their downscale versions. The proposed system performs a multi–level progressive upscaling, starting from small factors (2×) and updating for higher factors (4× and 8×). The system is recursive as it repeats the same procedure at each level. It is also residual since we use the network to update the outputs of a classic upscaler. The network residuals are improved by Iterative Back–Projections (IBP) computed in the features of a convolutional network. To work in multiple levels we extend the standard back– projection algorithm using a recursion analogous to Multi– Grid algorithms commonly used as solvers of large systems of linear equations. We finally show how the network can be interpreted as a standard upsampling–and–filter upscaler with a space–variant filter that adapts to the geometry. This approach allows us to visualize how the network learns to upscale. Finally, our system reaches state of the art quality for models with relatively few number of parameters.
【Keywords】:
【Paper Link】 【Pages】:4651-4658
【Authors】: Alexander G. Ororbia II ; Ankur Mali
【Abstract】: Finding biologically plausible alternatives to back-propagation of errors is a fundamentally important challenge in artificial neural network research. In this paper, we propose a learning algorithm called error-driven Local Representation Alignment (LRA-E), which has strong connections to predictive coding, a theory that offers a mechanistic way of describing neurocomputational machinery. In addition, we propose an improved variant of Difference Target Propagation, another procedure that comes from the same family of algorithms as LRA-E. We compare our procedures to several other biologicallymotivated algorithms, including two feedback alignment algorithms and Equilibrium Propagation. In two benchmarks, we find that both of our proposed algorithms yield stable performance and strong generalization compared to other competing back-propagation alternatives when training deeper, highly nonlinear networks, with LRA-E performing the best overall.
【Keywords】:
【Paper Link】 【Pages】:4659-4666
【Authors】: Takayuki Osogami ; Rudy Raymond
【Abstract】: We study reinforcement learning for controlling multiple agents in a collaborative manner. In some of those tasks, it is insufficient for the individual agents to take relevant actions, but those actions should also have diversity. We propose the approach of using the determinant of a positive semidefinite matrix to approximate the action-value function in reinforcement learning, where we learn the matrix in a way that it represents the relevance and diversity of the actions. Experimental results show that the proposed approach allows the agents to learn a nearly optimal policy approximately ten times faster than baseline approaches in benchmark tasks of multi-agent reinforcement learning. The proposed approach is also shown to achieve the performance that cannot be achieved with conventional approaches in partially observable environment with exponentially large action space.
【Keywords】:
【Paper Link】 【Pages】:4667-4674
【Authors】: Dipan K. Pal ; Marios Savvides
【Abstract】: ConvNets, through their architecture, only enforce invariance to translation. In this paper, we introduce a new class of deep convolutional architectures called Non-Parametric Transformation Networks (NPTNs) which can learn general invariances and symmetries directly from data. NPTNs are a natural generalization of ConvNets and can be optimized directly using gradient descent. Unlike almost all previous works in deep architectures, they make no assumption regarding the structure of the invariances present in the data and in that aspect are flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks (TN), which yields a better understanding of the connection between the two. We demonstrate the efficacy of NPTNs on data such as MNIST with extreme transformations and CIFAR10 where they outperform baselines, and further outperform several recent algorithms on ETH-80. They do so while having the same number of parameters. We also show that they are more effective than ConvNets in modelling symmetries and invariances from data, without the explicit knowledge of the added arbitrary nuisance transformations. Finally, we replace ConvNets with NPTNs within Capsule Networks and show that this enables Capsule Nets to perform even better.
【Keywords】:
【Paper Link】 【Pages】:4675-4682
【Authors】: Feiyang Pan ; Qingpeng Cai ; Anxiang Zeng ; Chun-Xiang Pan ; Qing Da ; Hua-Lin He ; Qing He ; Pingzhong Tang
【Abstract】: Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning.In this paper, we present a new technique to address the tradeoff between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Modelbased Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.
【Keywords】:
【Paper Link】 【Pages】:4683-4690
【Authors】: Yu Pan ; Jing Xu ; Maolin Wang ; Jinmian Ye ; Fei Wang ; Kun Bai ; Zenglin Xu
【Abstract】: Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs very difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor-train LSTM and other state-of-the-art competitors.
【Keywords】:
【Paper Link】 【Pages】:4691-4698
【Authors】: Zhen-Jia Pang ; Ruo-Ze Liu ; Zhou-Yu Meng ; Yi Zhang ; Yang Yu ; Tong Lu
【Abstract】: StarCraft II poses a grand challenge for reinforcement learning. The main difficulties include huge state space, varying action space, long horizon, etc. In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of StarCraft II. We investigate a hierarchical approach, where the hierarchy involves two levels of abstraction. One is the macro-actions extracted from expert’s demonstration trajectories, which can reduce the action space in an order of magnitude yet remain effective. The other is a two-layer hierarchical architecture, which is modular and easy to scale. We also investigate a curriculum transfer learning approach that trains the agent from the simplest opponent to harder ones. On a 64×64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difficult noncheating built-in AI (level-7) within days. We hope this study could shed some light on the future research of large-scale reinforcement learning.
【Keywords】:
【Paper Link】 【Pages】:4699-4706
【Authors】: Sungrae Park ; Kyungwoo Song ; Mingi Ji ; Wonsung Lee ; Il-Chul Moon
【Abstract】: Successful application processing sequential data, such as text and speech, requires an improved generalization performance of recurrent neural networks (RNNs). Dropout techniques for RNNs were introduced to respond to these demands, but we conjecture that the dropout on RNNs could have been improved by adopting the adversarial concept. This paper investigates ways to improve the dropout for RNNs by utilizing intentionally generated dropout masks. Specifically, the guided dropout used in this research is called as adversarial dropout, which adversarially disconnects neurons that are dominantly used to predict correct targets over time. Our analysis showed that our regularizer, which consists of a gap between the original and the reconfigured RNNs, was the upper bound of the gap between the training and the inference phases of the random dropout. We demonstrated that minimizing our regularizer improved the effectiveness of the dropout for RNNs on sequential MNIST tasks, semi-supervised text classification tasks, and language modeling tasks.
【Keywords】:
【Paper Link】 【Pages】:4707-4714
【Authors】: Minlong Peng ; Qi Zhang ; Xiaoyu Xing ; Tao Gui ; Xuanjing Huang ; Yu-Gang Jiang ; Keyu Ding ; Zhigang Chen
【Abstract】: Undersampling has been widely used in the class-imbalance learning area. The main deficiency of most existing undersampling methods is that their data sampling strategies are heuristic-based and independent of the used classifier and evaluation metric. Thus, they may discard informative instances for the classifier during the data sampling. In this work, we propose a meta-learning method built on the undersampling to address this issue. The key idea of this method is to parametrize the data sampler and train it to optimize the classification performance over the evaluation metric. We solve the non-differentiable optimization problem for training the data sampler via reinforcement learning. By incorporating evaluation metric optimization into the data sampling process, the proposed method can learn which instance should be discarded for the given classifier and evaluation metric. In addition, as a data level operation, this method can be easily applied to arbitrary evaluation metric and classifier, including non-parametric ones (e.g., C4.5 and KNN). Experimental results on both synthetic and realistic datasets demonstrate the effectiveness of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:4715-4722
【Authors】: M. Pérez-Ortiz ; Peter Tiño ; Rafal Mantiuk ; César Hervás-Martínez
【Abstract】: Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems.
【Keywords】:
【Paper Link】 【Pages】:4723-4730
【Authors】: Mirko Polato ; Fabio Aiolli
【Abstract】: A large body of research is currently investigating on the connection between machine learning and game theory. In this work, game theory notions are injected into a preference learning framework. Specifically, a preference learning problem is seen as a two-players zero-sum game. An algorithm is proposed to incrementally include new useful features into the hypothesis. This can be particularly important when dealing with a very large number of potential features like, for instance, in relational learning and rule extraction. A game theoretical analysis is used to demonstrate the convergence of the algorithm. Furthermore, leveraging on the natural analogy between features and rules, the resulting models can be easily interpreted by humans. An extensive set of experiments on classification tasks shows the effectiveness of the proposed method in terms of interpretability and feature selection quality, with accuracy at the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:4731-4738
【Authors】: Marcelo O. R. Prates ; Pedro H. C. Avelar ; Henrique Lemos ; Luís C. Lamb ; Moshe Y. Vardi
【Abstract】: Graph Neural Networks (GNN) are a promising technique for bridging differential programming and combinatorial domains. GNNs employ trainable modules which can be assembled in different configurations that reflect the relational structure of each problem instance. In this paper, we show that GNNs can learn to solve, with very little supervision, the decision variant of the Traveling Salesperson Problem (TSP), a highly relevant NP-Complete problem. Our model is trained to function as an effective message-passing algorithm in which edges (embedded with their weights) communicate with vertices for a number of iterations after which the model is asked to decide whether a route with cost < C exists. We show that such a network can be trained with sets of dual examples: given the optimal tour cost C∗, we produce one decision instance with target cost x% smaller and one with target cost x% larger than C∗. We were able to obtain 80% accuracy training with −2%,+2% deviations, and the same trained model can generalize for more relaxed deviations with increasing performance. We also show that the model is capable of generalizing for larger problem sizes. Finally, we provide a method for predicting the optimal route cost within 2% deviation from the ground truth. In summary, our work shows that Graph Neural Networks are powerful enough to solve NP-Complete problems which combine symbolic and numeric data.
【Keywords】:
【Paper Link】 【Pages】:4739-4746
【Authors】: Qi Qian ; Shenghuo Zhu ; Jiasheng Tang ; Rong Jin ; Baigui Sun ; Hao Li
【Abstract】: In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-recognition model should be applicable to hand-written digits, house numbers, car plates, etc. Therefore, an ideal model for cloud computing has to perform well at each applicable domain. To address this new challenge from cloud computing, we develop a framework of robust optimization over multiple domains. In lieu of minimizing the empirical risk, we aim to learn a model optimized to the adversarial distribution over multiple domains. Hence, we propose to learn the model and the adversarial distribution simultaneously with the stochastic algorithm for efficiency. Theoretically, we analyze the convergence rate for convex and non-convex models. To our best knowledge, we first study the convergence rate of learning a robust non-convex model with a practical algorithm. Furthermore, we demonstrate that the robustness of the framework and the convergence rate can be further enhanced by appropriate regularizers over the adversarial distribution. The empirical study on real-world fine-grained visual categorization and digits recognition tasks verifies the effectiveness and efficiency of the proposed framework.
【Keywords】:
【Paper Link】 【Pages】:4747-4754
【Authors】: You Qiaoben ; Zheng Wang ; Jianguo Li ; Yinpeng Dong ; Yu-Gang Jiang ; Jun Zhu
【Abstract】: Binary neural networks have great resource and computing efficiency, while suffer from long training procedure and non-negligible accuracy drops, when comparing to the fullprecision counterparts. In this paper, we propose the composite binary decomposition networks (CBDNet), which first compose real-valued tensor of each layer with a limited number of binary tensors, and then decompose some conditioned binary tensors into two low-rank binary tensors, so that the number of parameters and operations are greatly reduced comparing to the original ones. Experiments demonstrate the effectiveness of the proposed method, as CBDNet can approximate image classification network ResNet-18 using 5.25 bits, VGG-16 using 5.47 bits, DenseNet-121 using 5.72 bits, object detection networks SSD300 using 4.38 bits, and semantic segmentation networks SegNet using 5.18 bits, all with minor accuracy drops.1
【Keywords】:
【Paper Link】 【Pages】:4755-4762
【Authors】: Xiaoyu Qin ; Kai Ming Ting ; Ye Zhu ; Vincent C. S. Lee
【Abstract】: A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on densitybased clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.
【Keywords】:
【Paper Link】 【Pages】:4763-4771
【Authors】: Alexander Ratner ; Braden Hancock ; Jared Dunnmon ; Frederic Sala ; Shreyash Pandey ; Christopher Ré
【Abstract】: As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately.
【Keywords】:
【Paper Link】 【Pages】:4772-4779
【Authors】: Sathya N. Ravi ; Tuan Dinh ; Vishnu Suresh Lokhande ; Vikas Singh
【Abstract】: A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better accuracy to compression. But support for general constraints within widely used libraries remains scarce and their broader deployment within many applications that can benefit from them remains under-explored. Part of the reason is that Stochastic gradient descent (SGD), the workhorse for training deep neural networks, does not natively deal with constraints with global scope very well. In this paper, we revisit a classical first order scheme from numerical optimization, Conditional Gradients (CG), that has, thus far had limited applicability in training deep models. We show via rigorous analysis how various constraints can be naturally handled by modifications of this algorithm. We provide convergence guarantees and show a suite of immediate benefits that are possible — from training ResNets with fewer layers but better accuracy simply by substituting in our version of CG to faster training of GANs with 50% fewer epochs in image inpainting applications to provably better generalization guarantees using efficiently implementable forms of recently proposed regularizers.
【Keywords】:
【Paper Link】 【Pages】:4780-4789
【Authors】: Esteban Real ; Alok Aggarwal ; Yanping Huang ; Quoc V. Le
【Abstract】: The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier— AmoebaNet-A—that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes. Matching size, AmoebaNet-A has comparable accuracy to current state-of-the-art ImageNet models discovered with more complex architecture-search methods. Scaled to larger size, AmoebaNet-A sets a new state-of-theart 83.9% top-1 / 96.6% top-5 ImageNet accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures.
【Keywords】:
【Paper Link】 【Pages】:4790-4797
【Authors】: Ievgen Redko ; Charlotte Laclau
【Abstract】: Machine learning and game theory are known to exhibit a very strong link as they mutually provide each other with solutions and models allowing to study and analyze the optimal behaviour of a set of agents. In this paper, we take a closer look at a special class of games, known as fair cost sharing games, from a machine learning perspective. We show that this particular kind of games, where agents can choose between selfish behaviour and cooperation with shared costs, has a natural link to several machine learning scenarios including collaborative learning with homogeneous and heterogeneous sources of data. We further demonstrate how the game-theoretical results bounding the ratio between the best Nash equilibrium (or its approximate counterpart) and the optimal solution of a given game can be used to provide the upper bound of the gain achievable by the collaborative learning expressed as the expected risk and the sample complexity for homogeneous and heterogeneous cases, respectively. We believe that the established link can spur many possible future implications for other learning scenarios as well, with privacy-aware learning being among the most noticeable examples.
【Keywords】:
【Paper Link】 【Pages】:4798-4805
【Authors】: Kan Ren ; Jiarui Qin ; Lei Zheng ; Zhengyu Yang ; Weinan Zhang ; Lin Qiu ; Yong Yu
【Abstract】: Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at finegrained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three realworld tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.
【Keywords】:
【Paper Link】 【Pages】:4806-4813
【Authors】: Pengjie Ren ; Zhumin Chen ; Jing Li ; Zhaochun Ren ; Jun Ma ; Maarten de Rijke
【Abstract】: Recurrent neural networks for session-based recommendation have attracted a lot of attention recently because of their promising performance. repeat consumption is a common phenomenon in many recommendation scenarios (e.g., e-commerce, music, and TV program recommendations), where the same item is re-consumed repeatedly over time. However, no previous studies have emphasized repeat consumption with neural networks. An effective neural approach is needed to decide when to perform repeat recommendation. In this paper, we incorporate a repeat-explore mechanism into neural networks and propose a new model, called RepeatNet, with an encoder-decoder structure. RepeatNet integrates a regular neural recommendation approach in the decoder with a new repeat recommendation mechanism that can choose items from a user’s history and recommends them at the right time. We report on extensive experiments on three benchmark datasets. RepeatNet outperforms state-of-the-art baselines on all three datasets in terms of MRR and Recall. Furthermore, as the dataset size and the repeat ratio increase, the improvements of RepeatNet over the baselines also increase, which demonstrates its advantage in handling repeat recommendation scenarios.
【Keywords】:
【Paper Link】 【Pages】:4814-4821
【Authors】: Tao Ruan ; Ting Liu ; Zilong Huang ; Yunchao Wei ; Shikui Wei ; Yao Zhao
【Abstract】: Human parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still unclear how to develop an accurate human parsing system in an efficient and elegant way. In this paper, we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task. The advantages of these useful properties finally result in a simple yet effective Context Embedding with Edge Perceiving (CE2P) framework for single human parsing. Our CE2P is end-to-end trainable and can be easily adopted for conducting multiple human parsing. Benefiting the superiority of CE2P, we won the 1st places on all three human parsing tracks in the 2nd Look into Person (LIP) Challenge. Without any bells and whistles, we achieved 56.50% (mIoU), 45.31% (mean APr) and 33.34% (APp0.5) in Track 1, Track 2 and Track 5, which outperform the state-of-the-arts more than 2.06%, 3.81% and 1.87%, respectively. We hope our CE2P will serve as a solid baseline and help ease future research in single/multiple human parsing. Code has been made available at https://github.com/liutinglt/CE2P.
【Keywords】:
【Paper Link】 【Pages】:4822-4829
【Authors】: Sebastian Ruder ; Joachim Bingel ; Isabelle Augenstein ; Anders Søgaard
【Abstract】: Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)–(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.
【Keywords】:
【Paper Link】 【Pages】:4830-4837
【Authors】: Aadirupa Saha ; Rakesh Shivanna ; Chiranjib Bhattacharyya
【Abstract】: We consider the problem of optimal recovery of true ranking of n items from a randomly chosen subset of their pairwise preferences. It is well known that without any further assumption, one requires a sample size of Ω(n2) for the purpose. We analyze the problem with an additional structure of relational graph G([n],E) over the n items added with an assumption of locality: Neighboring items are similar in their rankings. Noting the preferential nature of the data, we choose to embed not the graph, but, its strong product to capture the pairwise node relationships. Furthermore, unlike existing literature that uses Laplacian embedding for graph based learning problems, we use a richer class of graph embeddings—orthonormal representations—that includes (normalized) Laplacian as its special case. Our proposed algorithm, Pref-Rank, predicts the underlying ranking using an SVM based approach using the chosen embedding of the product graph, and is the first to provide statistical consistency on two ranking losses: Kendall’s tau and Spearman’s footrule, with a required sample complexity of O(n2χ(G¯))⅔ pairs, χ(G¯) being the chromatic number of the complement graph G¯. Clearly, our sample complexity is smaller for dense graphs, with χ(G¯) characterizing the degree of node connectivity, which is also intuitive due to the locality assumption e.g. O(n4/3) for union of k-cliques, or O(n5/3) for random and power law graphs etc.—a quantity much smaller than the fundamental limit of Ω(n2) for large n. This, for the first time, relates ranking complexity to structural properties of the graph. We also report experimental evaluations on different synthetic and real-world datasets, where our algorithm is shown to outperform the state of the art methods.
【Keywords】:
【Paper Link】 【Pages】:4838-4845
【Authors】: Tomoya Sakai ; Nobuyuki Shimizu
【Abstract】: The goal of binary classification is to identify whether an input sample belongs to positive or negative classes. Usually, supervised learning is applied to obtain a classification rule, but in real-world applications, it is conceivable that only positive and unlabeled data are accessible for learning, which is called learning from positive and unlabeled data (PU learning). Furthermore, in practice, data distributions are likely to differ between training and testing due to, for example, time variation and domain shift. The covariate shift is a dataset shift situation, where distributions of covariates (inputs) differ between training and testing, but the input-output relation is the same. In this paper, we address the PU learning problem under the covariate shift. We propose an importanceweighted PU learning method and reveal in which situations the importance-weighting is necessary. Moreover, we derive the convergence rate of the proposed method under mild conditions and experimentally demonstrate its effectiveness.
【Keywords】:
【Paper Link】 【Pages】:4846-4853
【Authors】: Patrick Schwab ; Djordje Miladinovic ; Walter Karlen
【Abstract】: Knowledge of the importance of input features towards decisions made by machine-learning models is essential to increase our understanding of both the models and the underlying data. Here, we present a new approach to estimating feature importance with neural networks based on the idea of distributing the features of interest among experts in an attentive mixture of experts (AME). AMEs use attentive gating networks trained with a Granger-causal objective to learn to jointly produce accurate predictions as well as estimates of feature importance in a single model. Our experiments show (i) that the feature importance estimates provided by AMEs compare favourably to those provided by state-of-theart methods, (ii) that AMEs are significantly faster at estimating feature importance than existing methods, and (iii) that the associations discovered by AMEs are consistent with those reported by domain experts.
【Keywords】:
【Paper Link】 【Pages】:4854-4861
【Authors】: Arik Senderovich ; J. Christopher Beck ; Avigdor Gal ; Matthias Weidlich
【Abstract】: Time prediction is an essential component of decision making in various Artificial Intelligence application areas, including transportation systems, healthcare, and manufacturing. Predictions are required for efficient resource allocation and scheduling, optimized routing, and temporal action planning. In this work, we focus on time prediction in congested systems, where entities share scarce resources. To achieve accurate and explainable time prediction in this setting, features describing system congestion (e.g., workload and resource availability), must be considered. These features are typically gathered using process knowledge, (i.e., insights on the interplay of a system’s entities). Such knowledge is expensive to gather and may be completely unavailable. In order to automatically extract such features from data without prior process knowledge, we propose the model of congestion graphs, which are grounded in queueing theory. We show how congestion graphs are mined from raw event data using queueing theory based assumptions on the information contained in these logs. We evaluate our approach on two real-world datasets from healthcare systems where scarce resources prevail: an emergency department and an outpatient cancer clinic. Our experimental results show that using automatic generation of congestion features, we get an up to 23% improvement in terms of relative error in time prediction, compared to common baseline methods. We also detail how congestion graphs can be used to explain delays in the system.
【Keywords】:
【Paper Link】 【Pages】:4862-4869
【Authors】: Kristen A. Severson ; Soumya Ghosh ; Kenney Ng
【Abstract】: In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings. However, there are few methods that work on sets of data as opposed to data points or sequences. Here, we present a probabilistic model for dimensionality reduction to discover signal that is enriched in the target dataset relative to the background dataset. The data in these sets do not need to be paired or grouped beyond set membership. By using a probabilistic model where some structure is shared amongst the two datasets and some is unique to the target dataset, we are able to recover interesting structure in the latent space of the target dataset. The method also has the advantages of a probabilistic model, namely that it allows for the incorporation of prior information, handles missing data, and can be generalized to different distributional assumptions. We describe several possible variations of the model and demonstrate the application of the technique to de-noising, feature selection, and subgroup discovery settings.
【Keywords】:
【Paper Link】 【Pages】:4870-4877
【Authors】: Kulin Shah ; Naresh Manwani
【Abstract】: In this paper, we propose an approach for learning sparse reject option classifiers using double ramp loss Ldr. We use DC programming to find the risk minimizer. The algorithm solves a sequence of linear programs to learn the reject option classifier. We show that the loss Ldr is Fisher consistent. We also show that the excess risk of loss Ld is upper bounded by excess risk of Ldr. We derive the generalization error bounds for the proposed approach. We show the effectiveness of the proposed approach by experimenting it on several real world datasets. The proposed approach not only performs comparable to the state of the art, it also successfully learns sparse classifiers.
【Keywords】:
【Paper Link】 【Pages】:4878-4885
【Authors】: Jin Shang ; Mingxuan Sun
【Abstract】: Hawkes processes are popular for modeling correlated temporal sequences that exhibit mutual-excitation properties. Existing approaches such as feature-enriched processes or variations of Multivariate Hawkes processes either fail to describe the exact mutual influence between sequences or become computational inhibitive in most real-world applications involving large dimensions. Incorporating additional geometric structure in the form of graphs into Hawkes processes is an effective and efficient way for improving model prediction accuracy. In this paper, we propose the Geometric Hawkes Process (GHP) model to better correlate individual processes, by integrating Hawkes processes and a graph convolutional recurrent neural network. The deep network structure is computational efficient since it requires constant parameters that are independent of the graph size. The experiment results on real-world data show that our framework outperforms recent state-of-art methods.
【Keywords】:
【Paper Link】 【Pages】:4886-4893
【Authors】: Zhiqiang Shen ; Zhankui He ; Xiangyang Xue
【Abstract】: Often the best performing deep neural models are ensembles of multiple base-level networks. Unfortunately, the space required to store these many networks, and the time required to execute them at test-time, prohibits their use in applications where test sets are large (e.g., ImageNet). In this paper, we present a method for compressing large, complex trained ensembles into a single network, where knowledge from a variety of trained deep neural networks (DNNs) is distilled and transferred to a single DNN. In order to distill diverse knowledge from different trained (teacher) models, we propose to use adversarial-based learning strategy where we define a block-wise training loss to guide and optimize the predefined student network to recover the knowledge in teacher models, and to promote the discriminator network to distinguish teacher vs. student features simultaneously. The proposed ensemble method (MEAL) of transferring distilled knowledge with adversarial learning exhibits three important advantages: (1) the student network that learns the distilled knowledge with discriminators is optimized better than the original model; (2) fast inference is realized by a single forward pass, while the performance is even better than traditional ensembles from multi-original models; (3) the student network can learn the distilled knowledge from a teacher model that has arbitrary structures. Extensive experiments on CIFAR-10/100, SVHN and ImageNet datasets demonstrate the effectiveness of our MEAL method. On ImageNet, our ResNet-50 based MEAL achieves top-1/5 21.79%/5.99% val error, which outperforms the original model by 2.06%/1.14%.
【Keywords】:
【Paper Link】 【Pages】:4894-4901
【Authors】: Xiang-Rong Sheng ; De-Chuan Zhan ; Su Lu ; Yuan Jiang
【Abstract】: Identifying anomalies in multi-view data is a difficult task due to the complicated data characteristics of anomalies. Specifically, there are two types of anomalies in multi-view data–anomalies that have inconsistent features across multiple views and anomalies that are consistently anomalous in each view. Existing multi-view anomaly detection approaches have some issues, e.g., they assume multiple views of a normal instance share consistent and normal clustering structures while anomaly exhibits anomalous clustering characteristics across multiple views. When there are no clusters in data, it is difficult for existing approaches to detect anomalies. Besides, existing approaches construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. The objective is formulated to profile normal instances, but not to estimate the set of normal instances, which results in sub-optimal detectors. In addition, the model trained to profile normal instances uses the entire dataset including anomalies. However, anomalies could undermine the model, i.e., the model is not robust to anomalies. To address these issues, we propose the nearest neighborbased MUlti-View Anomaly Detection (MUVAD) approach. Specifically, we first propose an anomaly measurement criterion and utilize this criterion to formulate the objective of MUVAD to estimate the set of normal instances explicitly. We further develop two concrete relaxations for implementing the MUVAD as MUVAD-QPR and MUVAD-FSR. Experimental results validate the superiority of the proposed MUVAD approaches.
【Keywords】:
【Paper Link】 【Pages】:4902-4909
【Authors】: Jing-Cheng Shi ; Yang Yu ; Qing Da ; Shi-Yong Chen ; Anxiang Zeng
【Abstract】: Applying reinforcement learning in physical-world tasks is extremely challenging. It is commonly infeasible to sample a large number of trials, as required by current reinforcement learning methods, in a physical environment. This paper reports our project on using reinforcement learning for better commodity search in Taobao, one of the largest online retail platforms and meanwhile a physical environment with a high sampling cost. Instead of training reinforcement learning in Taobao directly, we present our environment-building approach: we build Virtual-Taobao, a simulator learned from historical customer behavior data, and then we train policies in Virtual-Taobao with no physical sampling costs. To improve the simulation precision, we propose GAN-SD (GAN for Simulating Distributions) for customer feature generation with better matched distribution; we propose MAIL (Multiagent Adversarial Imitation Learning) for generating better generalizable customer actions. To further avoid overfitting the imperfection of the simulator, we propose ANC (Action Norm Constraint) strategy to regularize the policy model. In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers’ records. Compared with the real Taobao, Virtual-Taobao faithfully recovers important properties of the real environment. We further show that the policies trained purely in Virtual-Taobao, which has zero physical sampling cost, can have significantly superior real-world performance to the traditional supervised approaches, through online A/B tests. We hope this work may shed some light on applying reinforcement learning in complex physical environments.
【Keywords】:
【Paper Link】 【Pages】:4910-4917
【Authors】: Shu-Ting Shi ; Ming Li ; David Lo ; Ferdian Thung ; Xuan Huo
【Abstract】: Code review is the process of manual inspection on the revision of the source code in order to find out whether the revised source code eventually meets the revision requirements. However, manual code review is time-consuming, and automating such the code review process will alleviate the burden of code reviewers and speed up the software maintenance process. To construct the model for automatic code review, the characteristics of the revisions of source code (i.e., the difference between the two pieces of source code) should be properly captured and modeled. Unfortunately, most of the existing techniques can easily model the overall correlation between two pieces of source code, but not for the “difference” between two pieces of source code. In this paper, we propose a novel deep model named DACE for automatic code review. Such a model is able to learn revision features by contrasting the revised hunks from the original and revised source code with respect to the code context containing the hunks. Experimental results on six open source software projects indicate by learning the revision features, DACE can outperform the competing approaches in automatic code review.
【Keywords】:
【Paper Link】 【Pages】:4918-4925
【Authors】: Xiaofei Shi ; David P. Woodruff
【Abstract】: We show how to solve a number of problems in numerical linear algebra, such as least squares regression, lp-regression for any p ≥ 1, low rank approximation, and kernel regression, in time T(A)poly(log(nd)), where for a given input matrix A ∈ Rn×d, T(A) is the time needed to compute A · y for an arbitrary vector y ∈ Rd. Since T(A) ≤ O(nnz(A)), where nnz(A) denotes the number of non-zero entries of A, the time is no worse, up to polylogarithmic factors, as all of the recent advances for such problems that run in input-sparsity time. However, for many applications, T(A) can be much smaller than nnz(A), yielding significantly sublinear time algorithms. For example, in the overconstrained (1+ε)-approximate polynomial interpolation problem, A is a Vandermonde matrix and T(A) = O(n log n); in this case our running time is n · poly (log n) + poly (d/ε) and we recover the results of Avron, Sindhwani, and Woodruff (2013) as a special case. For overconstrained autoregression, which is a common problem arising in dynamical systems, T(A) = O(n log n), and we immediately obtain n· poly (log n) + poly(d/ε) time. For kernel autoregression, we significantly improve the running time of prior algorithms for general kernels. For the important case of autoregression with the polynomial kernel and arbitrary target vector b ∈ Rn, we obtain even faster algorithms. Our algorithms show that, perhaps surprisingly, most of these optimization problems do not require much more time than that of a polylogarithmic number of matrix-vector multiplications.
【Keywords】:
【Paper Link】 【Pages】:4926-4933
【Authors】: Yaxin Shi ; Donna Xu ; Yuangang Pan ; Ivor W. Tsang ; Shirui Pan
【Abstract】: Label embedding plays an important role in many real-world applications. To enhance the label relatedness captured by the embeddings, multiple contexts can be adopted. However, these contexts are heterogeneous and often partially observed in practical tasks, imposing significant challenges to capture the overall relatedness among labels. In this paper, we propose a general Partial Heterogeneous Context Label Embedding (PHCLE) framework to address these challenges. Categorizing heterogeneous contexts into two groups, relational context and descriptive context, we design tailor-made matrix factorization formula to effectively exploit the label relatedness in each context. With a shared embedding principle across heterogeneous contexts, the label relatedness is selectively aligned in a shared space. Due to our elegant formulation, PHCLE overcomes the partial context problem and can nicely incorporate more contexts, which both cannot be tackled with existing multi-context label embedding methods. An effective alternative optimization algorithm is further derived to solve the sparse matrix factorization problem. Experimental results demonstrate that the label embeddings obtained with PHCLE achieve superb performance in image classification task and exhibit good interpretability in the downstream label similarity analysis and image understanding task.
【Keywords】:
【Paper Link】 【Pages】:4934-4942
【Authors】: David Shriver ; Sebastian G. Elbaum ; Matthew B. Dwyer ; David S. Rosenblum
【Abstract】: Recommender systems help users to find products or services they may like when lacking personal experience or facing an overwhelming set of choices. Since unstable recommendations can lead to distrust, loss of profits, and a poor user experience, it is important to test recommender system stability. In this work, we present an approach based on inferred models of influence that underlie recommender systems to guide the generation of dataset modifications to assess a recommender’s stability. We implement our approach and evaluate it on several recommender algorithms using the MovieLens dataset. We find that influence-guided fuzzing can effectively find small sets of modifications that cause significantly more instability than random approaches.
【Keywords】:
【Paper Link】 【Pages】:4943-4950
【Authors】: Hai Shu ; Hongtu Zhu
【Abstract】: Deep neural networks (DNNs) have achieved superior performance in various prediction tasks, but can be very vulnerable to adversarial examples or perturbations. Therefore, it is crucial to measure the sensitivity of DNNs to various forms of perturbations in real applications. We introduce a novel perturbation manifold and its associated influence measure to quantify the effects of various perturbations on DNN classifiers. Such perturbations include various external and internal perturbations to input samples and network parameters. The proposed measure is motivated by information geometry and provides desirable invariance properties. We demonstrate that our influence measure is useful for four model building tasks: detecting potential ‘outliers’, analyzing the sensitivity of model architectures, comparing network sensitivity between training and test sets, and locating vulnerable areas. Experiments show reasonably good performance of the proposed measure for the popular DNN models ResNet50 and DenseNet121 on CIFAR10 and MNIST datasets.
【Keywords】:
【Paper Link】 【Pages】:4951-4958
【Authors】: Yang Shu ; Zhangjie Cao ; Mingsheng Long ; Jianmin Wang
【Abstract】: Domain adaptation improves a target task by knowledge transfer from a source domain with rich annotations. It is not uncommon that “source-domain engineering” becomes a cumbersome process in domain adaptation: the high-quality source domains highly related to the target domain are hardly available. Thus, weakly-supervised domain adaptation has been introduced to address this difficulty, where we can tolerate the source domain with noises in labels, features, or both. As such, for a particular target task, we simply collect the source domain with coarse labeling or corrupted data. In this paper, we try to address two entangled challenges of weaklysupervised domain adaptation: sample noises of the source domain and distribution shift across domains. To disentangle these challenges, a Transferable Curriculum Learning (TCL) approach is proposed to train the deep networks, guided by a transferable curriculum informing which of the source examples are noiseless and transferable. The approach enhances positive transfer from clean source examples to the target and mitigates negative transfer of noisy source examples. A thorough evaluation shows that our approach significantly outperforms the state-of-the-art on weakly-supervised domain adaptation tasks.
【Keywords】:
【Paper Link】 【Pages】:4959-4966
【Authors】: Aditya Siddhant ; Anuj Kumar Goyal ; Angeliki Metallinou
【Abstract】: User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.
【Keywords】:
【Paper Link】 【Pages】:4967-4974
【Authors】: Thiago D. Simão ; Matthijs T. J. Spaan
【Abstract】: We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. Factored reinforcement learning, on the other hand, is known to make good use of the data provided. It can achieve a better sample complexity by exploiting independence between features of the environment, but it lacks a confidence level. We study how to improve the sample efficiency of the safe policy improvement with baseline bootstrapping algorithm by exploiting the factored structure of the environment. Our main result is a theoretical bound that is linear in the number of parameters of the factored representation instead of the number of states. The empirical analysis shows that our method can improve the policy using a number of samples potentially one order of magnitude smaller than the flat algorithm.
【Keywords】:
【Paper Link】 【Pages】:4975-4982
【Authors】: Christopher L. Simpkins ; Charles L. Isbell Jr.
【Abstract】: Modular reinforcement learning (MRL) decomposes a monolithic multiple-goal problem into modules that solve a portion of the original problem. The modules’ action preferences are arbitrated to determine the action taken by the agent. Truly modular reinforcement learning would support not only decomposition into modules, but composability of separately written modules in new modular reinforcement learning agents. However, the performance of MRL agents that arbitrate module preferences using additive reward schemes degrades when the modules have incomparable reward scales. This performance degradation means that separately written modules cannot be composed in new modular reinforcement learning agents as-is – they may need to be modified to align their reward scales. We solve this problem with a Q-learningbased command arbitration algorithm and demonstrate that it does not exhibit the same performance degradation as existing approaches to MRL, thereby supporting composability.
【Keywords】:
【Paper Link】 【Pages】:4983-4991
【Authors】: Kyungwoo Song ; Mingi Ji ; Sungrae Park ; Il-Chul Moon
【Abstract】: A long user history inevitably reflects the transitions of personal interests over time. The analyses on the user history require the robust sequential model to anticipate the transitions and the decays of user interests. The user history is often modeled by various RNN structures, but the RNN structures in the recommendation system still suffer from the long-term dependency and the interest drifts. To resolve these challenges, we suggest HCRNN with three hierarchical contexts of the global, the local, and the temporary interests. This structure is designed to withhold the global long-term interest of users, to reflect the local sub-sequence interests, and to attend the temporary interests of each transition. Besides, we propose a hierarchical context-based gate structure to incorporate our interest drift assumption. As we suggest a new RNN structure, we support HCRNN with a complementary bi-channel attention structure to utilize hierarchical context. We experimented the suggested structure on the sequential recommendation tasks with CiteULike, MovieLens, and LastFM, and our model showed the best performances in the sequential recommendations.
【Keywords】:
【Paper Link】 【Pages】:4992-4999
【Authors】: Yuhang Song ; Jianyi Wang ; Thomas Lukasiewicz ; Zhenghua Xu ; Mai Xu
【Abstract】: Hierarchical reinforcement learning (HRL) has recently shown promising advances on speeding up learning, improving the exploration, and discovering intertask transferable skills. Most recent works focus on HRL with two levels, i.e., a master policy manipulates subpolicies, which in turn manipulate primitive actions. However, HRL with multiple levels is usually needed in many real-world scenarios, whose ultimate goals are highly abstract, while their actions are very primitive. Therefore, in this paper, we propose a diversitydriven extensible HRL (DEHRL), where an extensible and scalable framework is built and learned levelwise to realize HRL with multiple levels. DEHRL follows a popular assumption: diverse subpolicies are useful, i.e., subpolicies are believed to be more useful if they are more diverse. However, existing implementations of this diversity assumption usually have their own drawbacks, which makes them inapplicable to HRL with multiple levels. Consequently, we further propose a novel diversity-driven solution to achieve this assumption in DEHRL. Experimental studies evaluate DEHRL with nine baselines from four perspectives in two domains; the results show that DEHRL outperforms the state-of-the-art baselines in all four aspects.
【Keywords】:
【Paper Link】 【Pages】:5000-5007
【Authors】: Amber Srivastava ; Mayank Baranwal ; Srinivasa M. Salapaka
【Abstract】: Typically clustering algorithms provide clustering solutions with prespecified number of clusters. The lack of a priori knowledge on the true number of underlying clusters in the dataset makes it important to have a metric to compare the clustering solutions with different number of clusters. This article quantifies a notion of persistence of clustering solutions that enables comparing solutions with different number of clusters. The persistence relates to the range of dataresolution scales over which a clustering solution persists; it is quantified in terms of the maximum over two-norms of all the associated cluster-covariance matrices. Thus we associate a persistence value for each element in a set of clustering solutions with different number of clusters. We show that the datasets where natural clusters are a priori known, the clustering solutions that identify the natural clusters are most persistent - in this way, this notion can be used to identify solutions with true number of clusters. Detailed experiments on a variety of standard and synthetic datasets demonstrate that the proposed persistence-based indicator outperforms the existing approaches, such as, gap-statistic method, X-means, Gmeans, PG-means, dip-means algorithms and informationtheoretic method, in accurately identifying the clustering solutions with true number of clusters. Interestingly, our method can be explained in terms of the phase-transition phenomenon in the deterministic annealing algorithm, where the number of distinct cluster centers changes (bifurcates) with respect to an annealing parameter.
【Keywords】:
【Paper Link】 【Pages】:5008-5015
【Abstract】: Most existing facial landmark detection algorithms regard the manually annotated landmarks as precise hard labels, therefore, the accurate annotated landmarks are essential to the training of these algorithms. However, in many cases, there exist deviations in manual annotations, and the landmarks marked for facial parts with occlusion and large poses are not always accurate, which means that the “ground truth” landmarks are usually not annotated precisely. In such case, it is more reasonable to use soft labels rather than explicit hard labels. Therefore, this paper proposes to associate a bivariate label distribution (BLD) to each landmark of an image. A BLD covers the neighboring pixels around the original manually annotated point, alleviating the problem of inaccurate landmarks. After generating a BLD for each landmark, the proposed method firstly learns the mappings from an image patch to the BLD of each landmark, and then the predicted BLDs are used in a deformable model fitting process to obtain the final facial shape for the image. Experimental results show that the proposed method performs better than the compared state-of-the-art facial landmark detection algorithms. Furthermore, the proposed method appears to be much more robust against the landmark noise in the training set than other compared baselines.
【Keywords】:
【Paper Link】 【Pages】:5016-5023
【Authors】: Lijuan Sun ; Songhe Feng ; Tao Wang ; Congyan Lang ; Yi Jin
【Abstract】: Multi-Label Learning (MLL) aims to learn from the training data where each example is represented by a single instance while associated with a set of candidate labels. Most existing MLL methods are typically designed to handle the problem of missing labels. However, in many real-world scenarios, the labeling information for multi-label data is always redundant , which can not be solved by classical MLL methods, thus a novel Partial Multi-label Learning (PML) framework is proposed to cope with such problem, i.e. removing the the noisy labels from the multi-label sets. In this paper, in order to further improve the denoising capability of PML framework, we utilize the low-rank and sparse decomposition scheme and propose a novel Partial Multi-label Learning by Low-Rank and Sparse decomposition (PML-LRS) approach. Specifically, we first reformulate the observed label set into a label matrix, and then decompose it into a groundtruth label matrix and an irrelevant label matrix, where the former is constrained to be low rank and the latter is assumed to be sparse. Next, we utilize the feature mapping matrix to explore the label correlations and meanwhile constrain the feature mapping matrix to be low rank to prevent the proposed method from being overfitting. Finally, we obtain the ground-truth labels via minimizing the label loss, where the Augmented Lagrange Multiplier (ALM) algorithm is incorporated to solve the optimization problem. Enormous experimental results demonstrate that PML-LRS can achieve superior or competitive performance against other state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:5024-5032
【Authors】: Qigong Sun ; Fanhua Shang ; Kang Yang ; Xiufang Li ; Yan Ren ; Licheng Jiao
【Abstract】: The training of deep neural networks (DNNs) requires intensive resources both for computation and for storage performance. Thus, DNNs cannot be efficiently applied to mobile phones and embedded devices, which seriously limits their applicability in industry applications. To address this issue, we propose a novel encoding scheme of using {−1, +1} to decompose quantized neural networks (QNNs) into multibranch binary networks, which can be efficiently implemented by bitwise operations (xnor and bitcount) to achieve model compression, computational acceleration and resource saving. Based on our method, users can easily achieve different encoding precisions arbitrarily according to their requirements and hardware resources. The proposed mechanism is very suitable for the use of FPGA and ASIC in terms of data storage and computation, which provides a feasible idea for smart chips. We validate the effectiveness of our method on both large-scale image classification tasks (e.g., ImageNet) and object detection tasks. In particular, our method with lowbit encoding can still achieve almost the same performance as its full-precision counterparts.
【Keywords】:
【Paper Link】 【Pages】:5033-5040
【Authors】: Tao Sun ; Penghang Yin ; Dongsheng Li ; Chun Huang ; Lei Guan ; Hao Jiang
【Abstract】: In this paper, we revisit the convergence of the Heavy-ball method, and present improved convergence complexity results in the convex setting. We provide the first non-ergodic O(1/k) rate result of the Heavy-ball algorithm with constant step size for coercive objective functions. For objective functions satisfying a relaxed strongly convex condition, the linear convergence is established under weaker assumptions on the step size and inertial parameter than made in the existing literature. We extend our results to multi-block version of the algorithm with both the cyclic and stochastic update rules. In addition, our results can also be extended to decentralized optimization, where the ergodic analysis is not applicable.
【Keywords】:
【Paper Link】 【Pages】:5041-5048
【Authors】: Xin Sun ; Zenghui Song ; Junyu Dong ; Yongbo Yu ; Claudia Plant ; Christian Böhm
【Abstract】: Network-structured data is becoming increasingly popular in many applications. However, these data present great challenges to feature engineering due to its high non-linearity and sparsity. The issue on how to transfer the link-connected nodes of the huge network into feature representations is critical. As basic properties of the real-world networks, the local and global structure can be reflected by dynamical transfer behaviors from node to node. In this work, we propose a deep embedding framework to preserve the transfer possibilities among the network nodes. We first suggest a degree-weight biased random walk model to capture the transfer behaviors of the network. Then a deep embedding framework is introduced to preserve the transfer possibilities among the nodes. A network structure embedding layer is added into the conventional Long Short-Term Memory Network to utilize its sequence prediction ability. To keep the local network neighborhood, we further perform a Laplacian supervised space optimization on the embedding feature representations. Experimental studies are conducted on various real-world datasets including social networks and citation networks. The results show that the learned representations can be effectively used as features in a variety of tasks, such as clustering, visualization and classification, and achieve promising performance compared with state-of-the-art models.
【Keywords】:
【Paper Link】 【Pages】:5049-5057
【Authors】: Yi Sun ; Alfredo Cuesta-Infante ; Kalyan Veeramachaneni
【Abstract】: A vine copula model is a flexible high-dimensional dependence model which uses only bivariate building blocks. However, the number of possible configurations of a vine copula grows exponentially as the number of variables increases, making model selection a major challenge in development. In this work, we formulate a vine structure learning problem with both vector and reinforcement learning representation. We use neural network to find the embeddings for the best possible vine model and generate a structure. Throughout experiments on synthetic and real-world datasets, we show that our proposed approach fits the data better in terms of loglikelihood. Moreover, we demonstrate that the model is able to generate high-quality samples in a variety of applications, making it a good candidate for synthetic data generation.
【Keywords】:
【Paper Link】 【Pages】:5058-5065
【Authors】: Fariborz Taherkhani ; Hadi Kazemi ; Nasser M. Nasrabadi
【Abstract】: Convolutional Neural Networks (CNNs) have provided promising achievements for image classification problems. However, training a CNN model relies on a large number of labeled data. Considering the vast amount of unlabeled data available on the web, it is important to make use of these data in conjunction with a small set of labeled data to train a deep learning model. In this paper, we introduce a new iterative Graph-based Semi-Supervised Learning (GSSL) method to train a CNN-based classifier using a large amount of unlabeled data and a small amount of labeled data. In this method, we first construct a similarity graph in which the nodes represent the CNN features corresponding to data points (labeled and unlabeled) while the edges tend to connect the data points with the same class label. In this graph, the missing label of unsupervised nodes is predicted by using a matrix completion method based on rank minimization criterion. In the next step, we use the constructed graph to calculate triplet regularization loss which is added to the supervised loss obtained by initially labeled data to update the CNN network parameters.
【Keywords】:
【Paper Link】 【Pages】:5066-5073
【Authors】: Hiroshi Takahashi ; Tomoharu Iwata ; Yuki Yamanaka ; Masanori Yamada ; Satoshi Yagi
【Abstract】: The variational autoencoder (VAE) is a powerful generative model that can estimate the probability of a data point by using latent variables. In the VAE, the posterior of the latent variable given the data point is regularized by the prior of the latent variable using Kullback Leibler (KL) divergence. Although the standard Gaussian distribution is usually used for the prior, this simple prior incurs over-regularization. As a sophisticated prior, the aggregated posterior has been introduced, which is the expectation of the posterior over the data distribution. This prior is optimal for the VAE in terms of maximizing the training objective function. However, KL divergence with the aggregated posterior cannot be calculated in a closed form, which prevents us from using this optimal prior. With the proposed method, we introduce the density ratio trick to estimate this KL divergence without modeling the aggregated posterior explicitly. Since the density ratio trick does not work well in high dimensions, we rewrite this KL divergence that contains the high-dimensional density ratio into the sum of the analytically calculable term and the lowdimensional density ratio term, to which the density ratio trick is applied. Experiments on various datasets show that the VAE with this implicit optimal prior achieves high density estimation performance.
【Keywords】:
【Paper Link】 【Pages】:5074-5082
【Authors】: Sho Takase ; Jun Suzuki ; Masaaki Nagata
【Abstract】: This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character ngram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headline generation. The experimental results indicate that our proposed method also positively affects these tasks
【Keywords】:
【Paper Link】 【Pages】:5083-5090
【Authors】: Zhi-Hao Tan ; Teng Zhang ; Wei Wang
【Abstract】: A major problem for kernel-based predictors is the prohibitive computational complexity, which limits their application in large-scale datasets. Coreset, an approximation method which tries to cover the given examples with a small set of points, can be used to remain the prominent information and accelerate the kernel method. In this paper, we provide perhaps the first coreset-based kernel-accelerating optimization method that has a linear convergence rate, which is much faster than existing approaches. Our method can be used to train kernel SVM-style problems and obtain sparse solutions efficiently. Specifically, the method uses SVRG as the framework, and utilizes the core points to approximate the gradients, so it can significantly reduce the complexity of the kernel method. Furthermore, we apply the method to train ODM, a kernel machine enjoying better statistical property than SVM, so that we can reduce the risk of compromising the performance while encouraging the sparsity. We conduct extensive experiments on several large-scale datasets and the results verify that our method outperforms the state-of-the-art coreset approximation method in both efficiency and generalization, while simultaneously achieving significant speed-up compared to non-approximation baselines.
【Keywords】:
【Paper Link】 【Pages】:5091-5099
【Authors】: Yusuke Tanaka ; Tomoharu Iwata ; Toshiyuki Tanaka ; Takeshi Kurashima ; Maya Okawa ; Hiroyuki Toda
【Abstract】: We propose a probabilistic model for refining coarse-grained spatial data by utilizing auxiliary spatial data sets. Existing methods require that the spatial granularities of the auxiliary data sets are the same as the desired granularity of target data. The proposed model can effectively make use of auxiliary data sets with various granularities by hierarchically incorporating Gaussian processes. With the proposed model, a distribution for each auxiliary data set on the continuous space is modeled using a Gaussian process, where the representation of uncertainty considers the levels of granularity. The finegrained target data are modeled by another Gaussian process that considers both the spatial correlation and the auxiliary data sets with their uncertainty. We integrate the Gaussian process with a spatial aggregation process that transforms the fine-grained target data into the coarse-grained target data, by which we can infer the fine-grained target Gaussian process from the coarse-grained data. Our model is designed such that the inference of model parameters based on the exact marginal likelihood is possible, in which the variables of finegrained target and auxiliary data are analytically integrated out. Our experiments on real-world spatial data sets demonstrate the effectiveness of the proposed model.
【Keywords】:
【Paper Link】 【Pages】:5101-5108
【Authors】: Chang Tang ; Xinzhong Zhu ; Xinwang Liu ; Lizhe Wang
【Abstract】: Multi-view unsupervised feature selection (MV-UFS) aims to select a feature subset from multi-view data without using the labels of samples. However, we observe that existing MV-UFS algorithms do not well consider the local structure of cross views and the diversity of different views, which could adversely affect the performance of subsequent learning tasks. In this paper, we propose a cross-view local structure preserved diversity and consensus semantic learning model for MV-UFS, termed CRV-DCL briefly, to address these issues. Specifically, we project each view of data into a common semantic label space which is composed of a consensus part and a diversity part, with the aim to capture both the common information and distinguishing knowledge across different views. Further, an inter-view similarity graph between each pairwise view and an intra-view similarity graph of each view are respectively constructed to preserve the local structure of data in different views and different samples in the same view. An l2,1-norm constraint is imposed on the feature projection matrix to select discriminative features. We carefully design an efficient algorithm with convergence guarantee to solve the resultant optimization problem. Extensive experimental study is conducted on six publicly real multi-view datasets and the experimental results well demonstrate the effectiveness of CRV-DCL.
【Keywords】:
【Paper Link】 【Pages】:5109-5116
【Authors】: Shijie Tang ; Yuan Yao ; Suwei Zhang ; Feng Xu ; Tianxiao Gu ; Hanghang Tong ; Xiaohui Yan ; Jian Lu
【Abstract】: Recommending suitable tags for online textual content is a key building block for better content organization and consumption. In this paper, we identify three pillars that impact the accuracy of tag recommendation: (1) sequential text modeling meaning that the intrinsic sequential ordering as well as different areas of text might have an important implication on the corresponding tag(s) , (2) tag correlation meaning that the tags for a certain piece of textual content are often semantically correlated with each other, and (3) content-tag overlapping meaning that the vocabularies of content and tags are overlapped. However, none of the existing methods consider all these three aspects, leading to a suboptimal tag recommendation. In this paper, we propose an integral model to encode all the three aspects in a coherent encoder-decoder framework. In particular, (1) the encoder models the semantics of the textual content via Recurrent Neural Networks with the attention mechanism, (2) the decoder tackles the tag correlation with a prediction path, and (3) a shared embedding layer and an indicator function across encoder-decoder address the content-tag overlapping. Experimental results on three realworld datasets demonstrate that the proposed method significantly outperforms the existing methods in terms of recommendation accuracy.
【Keywords】:
【Paper Link】 【Pages】:5117-5124
【Authors】: Ying-Peng Tang ; Sheng-Jun Huang
【Abstract】: Active learning queries labels from the oracle for the most valuable instances to reduce the labeling cost. In many active learning studies, informative and representative instances are preferred because they are expected to have higher potential value for improving the model. Recently, the results in self-paced learning show that training the model with easy examples first and then gradually with harder examples can improve the performance. While informative and representative instances could be easy or hard, querying valuable but hard examples at early stage may lead to waste of labeling cost. In this paper, we propose a self-paced active learning approach to simultaneously consider the potential value and easiness of an instance, and try to train the model with least cost by querying the right thing at the right time. Experimental results show that the proposed approach is superior to state-of-the-art batch mode active learning methods.
【Keywords】:
【Paper Link】 【Pages】:5125-5132
【Authors】: Hanqing Tao ; Shiwei Tong ; Hongke Zhao ; Tong Xu ; Binbin Jin ; Qi Liu
【Abstract】: Recent years, Chinese text classification has attracted more and more research attention. However, most existing techniques which specifically aim at English materials may lose effectiveness on this task due to the huge difference between Chinese and English. Actually, as a special kind of hieroglyphics, Chinese characters and radicals are semantically useful but still unexplored in the task of text classification. To that end, in this paper, we first analyze the motives of using multiple granularity features to represent a Chinese text by inspecting the characteristics of radicals, characters and words. For better representing the Chinese text and then implementing Chinese text classification, we propose a novel Radicalaware Attention-based Four-Granularity (RAFG) model to take full advantages of Chinese characters, words, characterlevel radicals, word-level radicals simultaneously. Specifically, RAFG applies a serialized BLSTM structure which is context-aware and able to capture the long-range information to model the character sharing property of Chinese and sequence characteristics in texts. Further, we design an attention mechanism to enhance the effects of radicals thus model the radical sharing property when integrating granularities. Finally, we conduct extensive experiments, where the experimental results not only show the superiority of our model, but also validate the effectiveness of radicals in the task of Chinese text classification.
【Keywords】:
【Paper Link】 【Pages】:5133-5142
【Authors】: Pooya Tavallali ; Peyman Tavallali ; Mukesh Singhal
【Abstract】: A fast, convenient and well-known way toward regression is to induce and prune a binary tree. However, there has been little attempt toward improving the performance of an induced regression tree. This paper presents a meta-algorithm capable of minimizing the regression loss function, thus, improving the accuracy of any given hierarchical model, such as k-ary regression trees. Our proposed method minimizes the loss function of each node one by one. At split nodes, this leads to solving an instance-based cost-sensitive classification problem over the node’s data points. At the leaf nodes, the method leads to a simple regression problem. In the case of binary univariate and multivariate regression trees, the computational complexity of training is linear over the samples. Hence, our method is scalable to large trees and datasets. We also briefly explore possibilities of applying proposed method to classification tasks. We show that our algorithm has significantly better test error compared to other state-ofthe- art tree algorithms. At the end, accuracy, memory usage and query time of our method are compared to recently introduced forest models. We depict that, most of the time, our proposed method is able to achieve better or similar accuracy while having tangibly faster query time and smaller number of nonzero weights.
【Keywords】:
【Paper Link】 【Pages】:5143-5150
【Authors】: Yi Tay ; Shuai Zhang ; Anh Tuan Luu ; Siu Cheung Hui ; Lina Yao ; Tran Dang Quang Vinh
【Abstract】: Factorization Machines (FMs) are a class of popular algorithms that have been widely adopted for collaborative filtering and recommendation tasks. FMs are characterized by its usage of the inner product of factorized parameters to model pairwise feature interactions, making it highly expressive and powerful. This paper proposes Holographic Factorization Machines (HFM), a new novel method of enhancing the representation capability of FMs without increasing its parameter size. Our approach replaces the inner product in FMs with holographic reduced representations (HRRs), which are theoretically motivated by associative retrieval and compressed outer products. Empirically, we found that this leads to consistent improvements over vanilla FMs by up to 4% improvement in terms of mean squared error, with improvements larger at smaller parameterization. Additionally, we propose a neural adaptation of HFM which enhances its capability to handle nonlinear structures. We conduct extensive experiments on nine publicly available datasets for collaborative filtering with explicit feedback. HFM achieves state-of-theart performance on all nine, outperforming strong competitors such as Attentional Factorization Machines (AFM) and Neural Matrix Factorization (NeuMF).
【Keywords】:
【Paper Link】 【Pages】:5151-5158
【Authors】: Takeshi Teshima ; Miao Xu ; Issei Sato ; Masashi Sugiyama
【Abstract】: We consider the problem of recovering a low-rank matrix from its clipped observations. Clipping is conceivable in many scientific areas that obstructs statistical analyses. On the other hand, matrix completion (MC) methods can recover a low-rank matrix from various information deficits by using the principle of low-rank completion. However, the current theoretical guarantees for low-rank MC do not apply to clipped matrices, as the deficit depends on the underlying values. Therefore, the feasibility of clipped matrix completion (CMC) is not trivial. In this paper, we first provide a theoretical guarantee for the exact recovery of CMC by using a trace-norm minimization algorithm. Furthermore, we propose practical CMC algorithms by extending ordinary MC methods. Our extension is to use the squared hinge loss in place of the squared loss for reducing the penalty of overestimation on clipped entries. We also propose a novel regularization term tailored for CMC. It is a combination of two trace-norm terms, and we theoretically bound the recovery error under the regularization. We demonstrate the effectiveness of the proposed methods through experiments using both synthetic and benchmark data for recommendation systems.
【Keywords】:
【Paper Link】 【Pages】:5159-5166
【Authors】: Erik Thiel ; Morteza Haghir Chehreghani ; Devdatt P. Dubhashi
【Abstract】: We develop a non-convex optimization approach to correlation clustering using the Frank-Wolfe (FW) framework. We show that the basic approach leads to a simple and natural local search algorithm with guaranteed convergence. This algorithm already beats alternative algorithms by substantial margins in both running time and quality of the clustering. Using ideas from FW algorithms, we develop subsampling and variance reduction paradigms for this approach. This yields both a practical improvement of the algorithm and some interesting further directions to investigate. We demonstrate the performance on both synthetic and real world data sets.
【Keywords】:
【Paper Link】 【Pages】:5167-5174
【Authors】: Kai Tian ; Shuigeng Zhou ; Jianping Fan ; Jihong Guan
【Abstract】: Most of the existing methods for anomaly detection use only positive data to learn the data distribution, thus they usually need a pre-defined threshold at the detection stage to determine whether a test instance is an outlier. Unfortunately, a good threshold is vital for the performance and it is really hard to find an optimal one. In this paper, we take the discriminative information implied in unlabeled data into consideration and propose a new method for anomaly detection that can learn the labels of unlabelled data directly. Our proposed method has an end-to-end architecture with one encoder and two decoders that are trained to model inliers and outliers’ data distributions in a competitive way. This architecture works in a discriminative manner without suffering from overfitting, and the training algorithm of our model is adopted from SGD, thus it is efficient and scalable even for large-scale datasets. Empirical studies on 7 datasets including KDD99, MNIST, Caltech-256, and ImageNet etc. show that our model outperforms the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:5175-5182
【Authors】: Saket Tiwari ; Philip S. Thomas
【Abstract】: The recently proposed option-critic architecture (Bacon, Harb, and Precup 2017) provides a stochastic policy gradient approach to hierarchical reinforcement learning. Specifically, it provides a way to estimate the gradient of the expected discounted return with respect to parameters that define a finite number of temporally extended actions, called options. In this paper we show how the option-critic architecture can be extended to estimate the natural gradient (Amari 1998) of the expected discounted return. To this end, the central questions that we consider in this paper are: 1) what is the definition of the natural gradient in this context, 2) what is the Fisher information matrix associated with an option’s parameterized policy, 3) what is the Fisher information matrix associated with an option’s parameterized termination function, and 4) how can a compatible function approximation approach be leveraged to obtain natural gradient estimates for both the parameterized policy and parameterized termination functions of an option with per-time-step time and space complexity linear in the total number of parameters. Based on answers to these questions we introduce the natural option critic algorithm. Experimental results showcase improvement over the vanilla gradient approach.
【Keywords】:
【Paper Link】 【Pages】:5183-5190
【Authors】: Christopher Tran ; Elena Zheleva
【Abstract】: The causal effect of a treatment can vary from person to person based on their individual characteristics and predispositions. Mining for patterns of individual-level effect differences, a problem known as heterogeneous treatment effect estimation, has many important applications, from precision medicine to recommender systems. In this paper we define and study a variant of this problem in which an individuallevel threshold in treatment needs to be reached, in order to trigger an effect. One of the main contributions of our work is that we do not only estimate heterogeneous treatment effects with fixed treatments but can also prescribe individualized treatments. We propose a tree-based learning method to find the heterogeneity in the treatment effects. Our experimental results on multiple datasets show that our approach can learn the triggers better than existing approaches.
【Keywords】:
【Paper Link】 【Pages】:5191-5198
【Authors】: Ngoc-Trung Tran ; Tuan-Anh Bui ; Ngai-Man Cheung
【Abstract】: We propose two new techniques for training Generative Adversarial Networks (GANs) in the unsupervised setting. Our objectives are to alleviate mode collapse in GAN and improve the quality of the generated samples. First, we propose neighbor embedding, a manifold learning-based regularization to explicitly retain local structures of latent samples in the generated samples. This prevents generator from producing nearly identical data samples from different latent samples, and reduces mode collapse. We propose an inverse t-SNE regularizer to achieve this. Second, we propose a new technique, gradient matching, to align the distributions of the generated samples and the real samples. As it is challenging to work with high-dimensional sample distributions, we propose to align these distributions through the scalar discriminator scores. We constrain the difference between the discriminator scores of the real samples and generated ones. We further constrain the difference between the gradients of these discriminator scores. We derive these constraints from Taylor approximations of the discriminator function. We perform experiments to demonstrate that our proposed techniques are computationally simple and easy to be incorporated in existing systems. When Gradient matching and Neighbour embedding are applied together, our GN-GAN achieves outstanding results on 1D/2D synthetic, CIFAR-10 and STL-10 datasets, e.g. FID score of 30.80 for the STL-10 dataset. Our code is available at: https://github.com/tntrung/gan
【Keywords】:
【Paper Link】 【Pages】:5199-5206
【Authors】: Amin Vahedian ; Xun Zhou ; Ling Tong ; W. Nick Street ; Yanhua Li
【Abstract】: Urban dispersal events are processes where an unusually large number of people leave the same area in a short period. Early prediction of dispersal events is important in mitigating congestion and safety risks and making better dispatching decisions for taxi and ride-sharing fleets. Existing work mostly focuses on predicting taxi demand in the near future by learning patterns from historical data. However, they fail in case of abnormality because dispersal events with abnormally high demand are non-repetitive and violate common assumptions such as smoothness in demand change over time. Instead, in this paper we argue that dispersal events follow a complex pattern of trips and other related features in the past, which can be used to predict such events. Therefore, we formulate the dispersal event prediction problem as a survival analysis problem. We propose a two-stage framework (DILSA), where a deep learning model combined with survival analysis is developed to predict the probability of a dispersal event and its demand volume. We conduct extensive case studies and experiments on the NYC Yellow taxi dataset from 20142016. Results show that DILSA can predict events in the next 5 hours with F1-score of 0:7 and with average time error of 18 minutes. It is orders of magnitude better than the state-of-the-art deep learning approaches for taxi demand prediction.
【Keywords】:
【Paper Link】 【Pages】:5207-5215
【Authors】: Antonio Vergari ; Alejandro Molina ; Robert Peharz ; Zoubin Ghahramani ; Kristian Kersting ; Isabel Valera
【Abstract】: Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for exploratory data analysis are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.
【Keywords】:
【Paper Link】 【Pages】:5216-5223
【Authors】: Hung Vu ; Tu Dinh Nguyen ; Trung Le ; Wei Luo ; Dinh Q. Phung
【Abstract】: Detecting anomalies in surveillance videos has long been an important but unsolved problem. In particular, many existing solutions are overly sensitive to (often ephemeral) visual artifacts in the raw video data, resulting in false positives and fragmented detection regions. To overcome such sensitivity and to capture true anomalies with semantic significance, one natural idea is to seek validation from abstract representations of the videos. This paper introduces a framework of robust anomaly detection using multilevel representations of both intensity and motion data. The framework consists of three main components: 1) representation learning using Denoising Autoencoders, 2) level-wise representation generation using Conditional Generative Adversarial Networks, and 3) consolidating anomalous regions detected at each representation level. Our proposed multilevel detector shows a significant improvement in pixel-level Equal Error Rate, namely 11.35%, 12.32% and 4.31% improvement in UCSD Ped 1, UCSD Ped 2 and Avenue datasets respectively. In addition, the model allowed us to detect mislabeled anomalies in the UCDS Ped 1.
【Keywords】:
【Paper Link】 【Pages】:5224-5231
【Authors】: Chengwei Wang ; Tengfei Zhou ; Chen Chen ; Tianlei Hu ; Gang Chen
【Abstract】: In real-world recommendation tasks, feedback data are usually sparse. Therefore, a recommender’s performance is often determined by how much information that it can extract from textual contents. However, current methods do not make full use of the semantic information. They encode the textual contents either by “bag-of-words” technique or Recurrent Neural Network (RNN). The former neglects the order of words while the latter ignores the fact that textual contents can contain multiple topics. Besides, there exists a dilemma in designing a recommender. On the one hand, we shall use a sophisticated model to exploit every drop of information in item contents; on the other hand, we shall adopt a simple model to prevent itself from over-fitting when facing the sparse feedbacks. To fill the gaps, we propose a recommender named CAMO 1. CAMO employs a multi-layer content encoder for simultaneously capturing the semantic information of multitopic and word order. Moreover, CAMO makes use of adversarial training to prevent the complex encoder from overfitting. Extensive empirical studies show that CAMO outperforms state-of-the-art methods in predicting users’ preferences.
【Keywords】:
【Paper Link】 【Pages】:5232-5239
【Authors】: Chuan Wang ; Haibin Huang ; Xiaoguang Han ; Jue Wang
【Abstract】: We present a new data-driven video inpainting method for recovering missing regions of video frames. A novel deep learning architecture is proposed which contains two subnetworks: a temporal structure inference network and a spatial detail recovering network. The temporal structure inference network is built upon a 3D fully convolutional architecture: it only learns to complete a low-resolution video volume given the expensive computational cost of 3D convolution. The low resolution result provides temporal guidance to the spatial detail recovering network, which performs imagebased inpainting with a 2D fully convolutional network to produce recovered video frames in their original resolution. Such two-step network design ensures both the spatial quality of each frame and the temporal coherence across frames. Our method jointly trains both sub-networks in an end-to-end manner. We provide qualitative and quantitative evaluation on three datasets, demonstrating that our method outperforms previous learning-based video inpainting methods.
【Keywords】:
【Paper Link】 【Pages】:5240-5247
【Authors】: Hanmo Wang ; Runwu Zhou ; Yi-Dong Shen
【Abstract】: The success of batch mode active learning (BMAL) methods lies in selecting both representative and uncertain samples. Representative samples quickly capture the global structure of the whole dataset, while the uncertain ones refine the decision boundary. There are two principles, namely the direct approach and the screening approach, to make a trade-off between representativeness and uncertainty. Although widely used in literature, little is known about the relationship between these two principles. In this paper, we discover that the two approaches both have shortcomings in the initial stage of BMAL. To alleviate the shortcomings, we bound the certainty scores of unlabeled samples from below and directly combine this lower-bounded certainty with representativeness in the objective function. Additionally, we show that the two aforementioned approaches are mathematically equivalent to two special cases of our approach. To the best of our knowledge, this is the first work that tries to generalize the direct and screening approaches. The objective function is then solved by super-modularity optimization. Extensive experiments on fifteen datasets indicate that our method has significantly higher classification accuracy on testing data than the latest state-of-the-art BMAL methods, and also scales better even when the size of the unlabeled pool reaches 106.
【Keywords】:
【Paper Link】 【Pages】:5248-5255
【Authors】: Haoyu Wang ; Nan Shao ; Defu Lian
【Abstract】: Fast item recommendation based on implicit feedback is vital in practical scenarios due to data-abundance, but challenging because of the lack of negative samples and the large number of recommended items. Recent adversarial methods unifying generative and discriminative models are promising, since the generative model, as a negative sampler, gradually improves as iteration continues. However, binary-valued generative model is still unexplored within the min-max framework, but important for accelerating item recommendation. Optimizing binary-valued models is difficult due to non-smooth and nondifferentiable. To this end, we propose two novel methods to relax the binarization based on the error function and Gumbel trick so that the generative model can be optimized by many popular solvers, such as SGD and ADMM. The binary-valued generative model is then evaluated within the min-max framework on four real-world datasets and shown its superiority to competing hashing-based recommendation algorithms. In addition, our proposed framework can approximate discrete variables precisely and be applied to solve other discrete optimization problems.
【Keywords】:
【Paper Link】 【Pages】:5256-5263
【Authors】: Jing Wang ; Xin Geng
【Abstract】: As a novel learning paradigm, label distribution learning (LDL) explicitly models label ambiguity with the definition of label description degree. Although lots of work has been done to deal with real-world applications, theoretical results on LDL remain unexplored. In this paper, we rethink LDL from theoretical aspects, towards analyzing learnability of LDL. Firstly, risk bounds for three representative LDL algorithms (AA-kNN, AA-BP and SA-ME) are provided. For AA-kNN, Lipschitzness of the label distribution function is assumed to bound the risk, and for AA-BP and SA-ME, rademacher complexity is utilized to give data-dependent risk bounds. Secondly, a generalized plug-in decision theorem is proposed to understand the relation between LDL and classification, uncovering that approximation to the conditional probability distribution function in absolute loss guarantees approaching to the optimal classifier, and also data-dependent error probability bounds are presented for the corresponding LDL algorithms to perform classification. As far as we know, this is perhaps the first research on theory of LDL.
【Keywords】:
【Paper Link】 【Pages】:5264-5272
【Authors】: Jing Wang ; Atsushi Suzuki ; Linchuan Xu ; Feng Tian ; Liang Yang ; Kenji Yamanishi
【Abstract】: Semi-supervised representation-based subspace clustering is to partition data into their underlying subspaces by finding effective data representations with partial supervisions. Essentially, an effective and accurate representation should be able to uncover and preserve the true data structure. Meanwhile, a reliable and easy-to-obtain supervision is desirable for practical learning. To meet these two objectives, in this paper we make the first attempt towards utilizing the orderly relationship, such as the data a is closer to b than to c, as a novel supervision. We propose an orderly subspace clustering approach with a novel regularization term. OSC enforces the learned representations to simultaneously capture the intrinsic subspace structure and reveal orderly structure that is faithful to true data relationship. Experimental results with several benchmarks have demonstrated that aside from more accurate clustering against state-of-the-arts, OSC interprets orderly data structure which is beyond what current approaches can offer.
【Keywords】:
【Paper Link】 【Pages】:5273-5280
【Authors】: Jingyuan Wang ; Kai Feng ; Junjie Wu
【Abstract】: The deep network model, with the majority built on neural networks, has been proved to be a powerful framework to represent complex data for high performance machine learning. In recent years, more and more studies turn to nonneural network approaches to build diverse deep structures, and the Deep Stacking Network (DSN) model is one of such approaches that uses stacked easy-to-learn blocks to build a parameter-training-parallelizable deep network. In this paper, we propose a novel SVM-based Deep Stacking Network (SVM-DSN), which uses the DSN architecture to organize linear SVM classifiers for deep learning. A BP-like layer tuning scheme is also proposed to ensure holistic and local optimizations of stacked SVMs simultaneously. Some good math properties of SVM, such as the convex optimization, is introduced into the DSN framework by our model. From a global view, SVM-DSN can iteratively extract data representations layer by layer as a deep neural network but with parallelizability, and from a local view, each stacked SVM can converge to its optimal solution and obtain the support vectors, which compared with neural networks could lead to interesting improvements in anti-saturation and interpretability. Experimental results on both image and text data sets demonstrate the excellent performances of SVM-DSN compared with some competitive benchmark models.
【Keywords】:
【Paper Link】 【Pages】:5281-5288
【Authors】: Lichen Wang ; Jiaxiang Wu ; Shao-Lun Huang ; Lizhong Zheng ; Xiangxiang Xu ; Lin Zhang ; Junzhou Huang
【Abstract】: One primary focus in multimodal feature extraction is to find the representations of individual modalities that are maximally correlated. As a well-known measure of dependence, the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation be-´ comes an appealing objective because of its operational meaning and desirable properties. However, the strict whitening constraints formalized in the HGR maximal correlation limit its application. To address this problem, this paper proposes Soft-HGR, a novel framework to extract informative features from multiple data modalities. Specifically, our framework prevents the “hard” whitening constraints, while simultaneously preserving the same feature geometry as in the HGR maximal correlation. The objective of Soft-HGR is straightforward, only involving two inner products, which guarantees the efficiency and stability in optimization. We further generalize the framework to handle more than two modalities and missing modalities. When labels are partially available, we enhance the discriminative power of the feature representations by making a semi-supervised adaptation. Empirical evaluation implies that our approach learns more informative feature mappings and is more efficient to optimize.
【Keywords】:
【Paper Link】 【Pages】:5289-5296
【Authors】: Shaoqi Wang ; Aidi Pi ; Xiaobo Zhou
【Abstract】: Scalability of distributed deep learning (DL) training with parameter server architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches cannot be effectively applied to the overlap between parameter communication and forward computation. In this paper, we propose and design iBatch, a novel communication approach that batches parameter communication and forward computation to overlap them with each other. We formulate the batching decision as an optimization problem and solve it based on greedy algorithm to derive communication and computation batches. We implement iBatch in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iBatch improves the scalability of a cluster of 72 nodes by up to 73% over the default PS and 41% over the layer by layer strategy.
【Keywords】:
【Paper Link】 【Pages】:5297-5304
【Authors】: Shipeng Wang ; Jian Sun ; Zongben Xu
【Abstract】: Deep neural networks are traditionally trained using humandesigned stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates . The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.
【Keywords】:
【Paper Link】 【Pages】:5305-5312
【Authors】: Shusen Wang
【Abstract】: We study the distributed machine learning problem where the n feature-response pairs are partitioned among m machines uniformly at random. The goal is to approximately solve an empirical risk minimization (ERM) problem with the minimum amount of communication. The divide-and-conquer (DC) method, which was proposed several years ago, lets every worker machine independently solve the same ERM problem using its local feature-response pairs and the driver machine combine the solutions. This approach is in one-shot and thereby extremely communication-efficient. Although the DC method has been studied by many prior works, reasonable generalization bound has not been established before this work.For the ridge regression problem, we show that the prediction error of the DC method on unseen test samples is at most ε times larger than the optimal. There have been constantfactor bounds in the prior works, their sample complexities have a quadratic dependence on d, which does not match the setting of most real-world problems. In contrast, our bounds are much stronger. First, our 1 + ε error bound is much better than their constant-factor bounds. Second, our sample complexity is merely linear with d.
【Keywords】:
【Paper Link】 【Pages】:5313-5320
【Authors】: Siqi Wang ; En Zhu ; Xiping Hu ; Xinwang Liu ; Qiang Liu ; Jianping Yin ; Fei Wang
【Abstract】: Efficient detection of outliers from massive data with a high outlier ratio is challenging but not explicitly discussed yet. In such a case, existing methods either suffer from poor robustness or require expensive computations. This paper proposes a Low-rank based Efficient Outlier Detection (LEOD) framework to achieve favorable robustness against high outlier ratios with much cheaper computations. Specifically, it is worth highlighting the following aspects of LEOD: (1) Our framework exploits the low-rank structure embedded in the similarity matrix and considers inliers/outliers equally based on this low-rank structure, which facilitates us to encourage satisfying robustness with low computational cost later; (2) A novel re-weighting algorithm is derived as a new general solution to the constrained eigenvalue problem, which is a major bottleneck for the optimization process. Instead of the high space and time complexity (O((2n)2)/O((2n)3)) required by the classic solution, our algorithm enjoys O(n) space complexity and a faster optimization speed in the experiments; (3) A new alternative formulation is proposed for further acceleration of the solution process, where a cheap closed-form solution can be obtained. Experiments show that LEOD achieves strong robustness under an outlier ratio from 20% to 60%, while it is at most 100 times more memory efficient and 1000 times faster than its previous counterpart that attains comparable performance. The codes of LEOD are publicly available at https://github.com/demonzyj56/LEOD.
【Keywords】:
【Paper Link】 【Pages】:5321-5328
【Authors】: Tianchen Wang ; Jinjun Xiong ; Xiaowei Xu ; Yiyu Shi
【Abstract】: Various convolutional neural networks (CNNs) were developed recently that achieved accuracy comparable with that of human beings in computer vision tasks such as image recognition, object detection and tracking, etc. Most of these networks, however, process one single frame of image at a time, and may not fully utilize the temporal and contextual correlation typically present in multiple channels of the same image or adjacent frames from a video, thus limiting the achievable throughput. This limitation stems from the fact that existing CNNs operate on deterministic numbers. In this paper, we propose a novel statistical convolutional neural network (SCNN), which extends existing CNN architectures but operates directly on correlated distributions rather than deterministic numbers. By introducing a parameterized canonical model to model correlated data and defining corresponding operations as required for CNN training and inference, we show that SCNN can process multiple frames of correlated images effectively, hence achieving significant speedup over existing CNN models. We use a CNN based video object detection as an example to illustrate the usefulness of the proposed SCNN as a general network model. Experimental results show that even a nonoptimized implementation of SCNN can still achieve 178% speedup over existing CNNs with slight accuracy degradation.
【Keywords】:
【Paper Link】 【Pages】:5329-5336
【Authors】: Xiang Wang ; Dingxian Wang ; Canran Xu ; Xiangnan He ; Yixin Cao ; Tat-Seng Chua
【Abstract】: Incorporating knowledge graph into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user’s interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path.In this paper, we contribute a new model named Knowledgeaware Path Recurrent Network (KPRN) to exploit knowledge graph for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions Collaborative Knowledge Base Embedding and Neural Factorization Machine.
【Keywords】:
【Paper Link】 【Pages】:5337-5344
【Authors】: Xiao Wang ; Yiding Zhang ; Chuan Shi
【Abstract】: Heterogeneous information network (HIN) embedding, aiming to project HIN into a low-dimensional space, has attracted considerable research attention. Most of the exiting HIN embedding methods focus on preserving the inherent network structure and semantic correlations in Euclidean spaces. However, one fundamental problem is that whether the Euclidean spaces are the appropriate or intrinsic isometric spaces of HIN? Recent researches argue that the complex network may have the hyperbolic geometry underneath, because the underlying hyperbolic geometry can naturally reflect some properties of complex network, e.g., hierarchical and power-law structure. In this paper, we make the first effort toward HIN embedding in hyperbolic spaces. We analyze the structures of two real-world HINs and discover some properties, e.g., the power-law distribution, also exist in HIN. Therefore, we propose a novel hyperbolic heterogeneous information network embedding model. Specifically, to capture the structure and semantic relations between nodes, we employ the meta-path guided random walk to sample the sequences for each node. Then we exploit the distance in hyperbolic spaces as the proximity measurement. The hyperbolic distance is able to meet the triangle inequality and well preserve the transitivity in HIN. Our model enables the nodes and their neighborhoods have small hyperbolic distances. We further derive the effective optimization strategy to update the hyperbolic embeddings iteratively. The experimental results, in comparison with the state-of-the-art, demonstrate that our proposed model not only has superior performance on network reconstruction and link prediction tasks but also shows its ability of capture hierarchy structure in HIN via visualization.
【Keywords】:
【Paper Link】 【Pages】:5345-5352
【Authors】: Ximei Wang ; Liang Li ; Weirui Ye ; Mingsheng Long ; Jianmin Wang
【Abstract】: Recent work in domain adaptation bridges different domains by adversarially learning a domain-invariant representation that cannot be distinguished by a domain discriminator. Existing methods of adversarial domain adaptation mainly align the global images across the source and target domains. However, it is obvious that not all regions of an image are transferable, while forcefully aligning the untransferable regions may lead to negative transfer. Furthermore, some of the images are significantly dissimilar across domains, resulting in weak image-level transferability. To this end, we present Transferable Attention for Domain Adaptation (TADA), focusing our adaptation model on transferable regions or images. We implement two types of complementary transferable attention: transferable local attention generated by multiple region-level domain discriminators to highlight transferable regions, and transferable global attention generated by single image-level domain discriminator to highlight transferable images. Extensive experiments validate that our proposed models exceed state of the art results on standard domain adaptation datasets.
【Keywords】:
【Paper Link】 【Pages】:5353-5360
【Authors】: Xing Wang ; Jun Wang ; Carlotta Domeniconi ; Guoxian Yu ; Guoqiang Xiao ; Maozu Guo
【Abstract】: Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it’s still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering. To this end, we provide a two-stage approach called MISC (Multiple Independent Subspace Clusterings). In the first stage, MISC uses independent subspace analysis to seek multiple and statistical independent (i.e. non-redundant) subspaces, and determines the number of subspaces via the minimum description length principle. In the second stage, to account for the intrinsic geometric structure of samples embedded in each subspace, MISC performs graph regularized semi-nonnegative matrix factorization to explore clusters. It additionally integrates the kernel trick into matrix factorization to handle non-linearly separable clusters. Experimental results on synthetic datasets show that MISC can find different interesting clusterings from the sought independent subspaces, and it also outperforms other related and competitive approaches on real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:5361-5368
【Authors】: Xinshao Wang ; Yang Hua ; Elyor Kodirov ; Guosheng Hu ; Neil Martin Robertson
【Abstract】: Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample mining methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:5369-5376
【Authors】: Yanzhi Wang ; Zheng Zhan ; Liang Zhao ; Jian Tang ; Siyue Wang ; Jiayu Li ; Bo Yuan ; Wujie Wen ; Xue Lin
【Abstract】: Large-scale deep neural networks are both memory and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated. Specific forms of binary neural networks (BNNs) and stochastic computing-based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be implemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware implementations. In order to address these concerns, in this paper we prove that the “ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior), which is a new angle from the original approximation property. The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a “bridge” to prove for BNNs. Besides the universal approximation property, we also derive an appropriate bound for bit length M in order to provide insights for the actual neural network implementations. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy complexity. In other words, they have the same asymptotic energy consumption with the growth of network size. We also provide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SCNNs are more suitable.
【Keywords】:
【Paper Link】 【Pages】:5377-5384
【Authors】: Yiren Wang ; Fei Tian ; Di He ; Tao Qin ; ChengXiang Zhai ; Tie-Yan Liu
【Abstract】: As a new neural machine translation approach, NonAutoregressive machine Translation (NAT) has attracted attention recently due to its high efficiency in inference. However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states). In this paper, we propose to address these two problems by improving the quality of decoder hidden representations via two auxiliary regularization terms in the training process of an NAT model. First, to make the hidden states more distinguishable, we regularize the similarity between consecutive hidden states based on the corresponding target tokens. Second, to force the hidden states to contain all the information in the source sentence, we leverage the dual nature of translation tasks (e.g., English to German and German to English) and minimize a backward reconstruction error to ensure that the hidden states of the NAT decoder are able to recover the source side sentence. Extensive experiments conducted on several benchmark datasets show that both regularization strategies are effective and can alleviate the issues of repeated translations and incomplete translations in NAT models. The accuracy of NAT models is therefore improved significantly over the state-of-the-art NAT models with even better efficiency for inference.
【Keywords】:
【Paper Link】 【Pages】:5385-5392
【Authors】: Tong Wei ; Yu-Feng Li
【Abstract】: Large-scale multi-label learning (LMLL) aims to annotate relevant labels from a large number of candidates for unseen data. Due to the high dimensionality in both feature and label spaces in LMLL, the storage overheads of LMLL models are often costly. This paper proposes a POP (joint label and feature Parameter OPtimization) method. It tries to filter out redundant model parameters to facilitate compact models. Our key insights are as follows. First, we investigate labels that have little impact on the commonly used LMLL performance metrics and only preserve a small number of dominant parameters for these labels. Second, for the remaining influential labels, we reduce spurious feature parameters that have little contribution to the generalization capability of models, and preserve parameters for only discriminative features. The overall problem is formulated as a constrained optimization problem pursuing minimal model size. In order to solve the resultant difficult optimization, we show that a relaxation of the optimization can be efficiently solved using binary search and greedy strategies. Experiments verify that the proposed method clearly reduces the model size compared to state-of-the-art LMLL approaches, in addition, achieves highly competitive performance.
【Keywords】:
【Paper Link】 【Pages】:5393-5400
【Authors】: Jie Wen ; Zheng Zhang ; Yong Xu ; Bob Zhang ; Lunke Fei ; Hong Liu
【Abstract】: Multi-view clustering aims to partition data collected from diverse sources based on the assumption that all views are complete. However, such prior assumption is hardly satisfied in many real-world applications, resulting in the incomplete multi-view learning problem. The existing attempts on this problem still have the following limitations: 1) the underlying semantic information of the missing views is commonly ignored; 2) The local structure of data is not well explored; 3) The importance of different views is not effectively evaluated. To address these issues, this paper proposes a Unified Embedding Alignment Framework (UEAF) for robust incomplete multi-view clustering. In particular, a locality-preserved reconstruction term is introduced to infer the missing views such that all views can be naturally aligned. A consensus graph is adaptively learned and embedded via the reverse graph regularization to guarantee the common local structure of multiple views and in turn can further align the incomplete views and inferred views. Moreover, an adaptive weighting strategy is designed to capture the importance of different views. Extensive experimental results show that the proposed method can significantly improve the clustering performance in comparison with some state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:5401-5408
【Authors】: Jun Wen ; Risheng Liu ; Nenggan Zheng ; Qian Zheng ; Zhefeng Gong ; Junsong Yuan
【Abstract】: Unsupervised domain adaptation methods aim to alleviate performance degradation caused by domain-shift by learning domain-invariant representations. Existing deep domain adaptation methods focus on holistic feature alignment by matching source and target holistic feature distributions, without considering local features and their multi-mode statistics. We show that the learned local feature patterns are more generic and transferable and a further local feature distribution matching enables fine-grained feature alignment. In this paper, we present a method for learning domain-invariant local feature patterns and jointly aligning holistic and local feature statistics. Comparisons to the state-of-the-art unsupervised domain adaptation methods on two popular benchmark datasets demonstrate the superiority of our approach and its effectiveness on alleviating negative transfer.
【Keywords】:
【Paper Link】 【Pages】:5409-5416
【Authors】: Qingsong Wen ; Jingkun Gao ; Xiaomin Song ; Liang Sun ; Huan Xu ; Shenghuo Zhu
【Abstract】: Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly, including 1) ability to handle seasonality fluctuation and shift, and abrupt change in trend and reminder; 2) robustness on data with anomalies; 3) applicability on time series with long seasonality period. In the paper, we propose a novel and generic time series decomposition algorithm to address these challenges. Specifically, we extract the trend component robustly by solving a regression problem using the least absolute deviations loss with sparse regularization. Based on the extracted trend, we apply the the non-local seasonal filtering to extract the seasonality component. This process is repeated until accurate decomposition is obtained. Experiments on different synthetic and real-world time series datasets demonstrate that our method outperforms existing solutions.
【Keywords】:
【Paper Link】 【Pages】:5417-5424
【Authors】: Florian Wenzel ; Théo Galy-Fajou ; Christian Donner ; Marius Kloft ; Manfred Opper
【Abstract】: We propose a scalable stochastic variational approach to GP classification building on Pólya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance.
【Keywords】:
【Paper Link】 【Pages】:5425-5432
【Authors】: Jacob Whitehill
【Abstract】: Recent work on privacy-preserving machine learning has considered how datamining competitions such as Kaggle could potentially be “hacked”, either intentionally or inadvertently, by using information from an oracle that reports a classifier’s accuracy on the test set (Blum and Hardt 2015; Hardt and Ullman 2014; Zheng 2015; Whitehill 2016). For binary classification tasks in particular, one of the most common accuracy metrics is the Area Under the ROC Curve (AUC), and in this paper we explore the mathematical structure of how the AUC is computed from an n-vector of real-valued “guesses” with respect to the ground-truth labels. Under the assumption of perfect knowledge of the test set AUC c=p/q, we show how knowing c constrains the set W of possible ground-truth labelings, and we derive an algorithm both to compute the exact number of such labelings and to enumerate efficiently over them. We also provide empirical evidence that, surprisingly, the number of compatible labelings can actually decrease as n grows, until a test set-dependent threshold is reached. Finally, we show how W can be efficiently whittled down, through pairs of oracle queries, to infer all the groundtruth test labels with complete certainty.
【Keywords】:
【Paper Link】 【Pages】:5433-5440
【Authors】: Nannan Wu ; Wenjun Wang ; Feng Chen ; Jianxin Li ; Bo Li ; Jinpeng Huai
【Abstract】: As networks are ubiquitous in the modern era, point anomalies have been changed to graph anomalies in terms of anomaly shapes. However, the specific-shape priors about anomalous subgraphs of interest are seldom considered by the traditional approaches when detecting the subgraphs in attributed graphs (e.g., computer networks, Bitcoin networks, and etc.). This paper proposes a nonlinear approach to specific-shape graph anomaly detection. The nonlinear approach focuses on optimizing a broad class of nonlinear cost functions via specific-shape constraints in attributed graphs. Our approach can be used to many different graph anomaly settings. The traditional approaches can only support linear cost functions (e.g., an aggregation function for the summation of node weights). However, our approach can employ more powerful nonlinear cost functions, and enjoys a rigorous theoretical guarantee on the near-optimal solution with the geometrical convergence rate.
【Keywords】:
【Paper Link】 【Pages】:5441-5449
【Authors】: Pengxiang Wu ; Chao Chen ; Jingru Yi ; Dimitris N. Metaxas
【Abstract】: We present a new permutation-invariant network for 3D point cloud processing. Our network is composed of a recurrent set encoder and a convolutional feature aggregator. Given an unordered point set, the encoder firstly partitions its ambient space into parallel beams. Points within each beam are then modeled as a sequence and encoded into subregional geometric features by a shared recurrent neural network (RNN). The spatial layout of the beams is regular, and this allows the beam features to be further fed into an efficient 2D convolutional neural network (CNN) for hierarchical feature aggregation. Our network is effective at spatial feature learning, and competes favorably with the state-of-the-arts (SOTAs) on a number of benchmarks. Meanwhile, it is significantly more efficient compared to the SOTAs.
【Keywords】:
【Paper Link】 【Pages】:5450-5457
【Authors】: Si Wu ; Jian Zhong ; Wenming Cao ; Rui Li ; Zhiwen Yu ; Hau-San Wong
【Abstract】: For unsupervised domain adaptation, the process of learning domain-invariant representations could be dominated by the labeled source data, such that the specific characteristics of the target domain may be ignored. In order to improve the performance in inferring target labels, we propose a targetspecific network which is capable of learning collaboratively with a domain adaptation network, instead of directly minimizing domain discrepancy. A clustering regularization is also utilized to improve the generalization capability of the target-specific network by forcing target data points to be close to accumulated class centers. As this network learns and specializes to the target domain, its performance in inferring target labels improves, which in turn facilitates the learning process of the adaptation network. Therefore, there is a mutually beneficial relationship between these two networks. We perform extensive experiments on multiple digit and object datasets, and the effectiveness and superiority of the proposed approach is presented and verified on multiple visual adaptation benchmarks, e.g., we improve the state-ofthe-art on the task of MNIST→SVHN from 76.5% to 84.9% without specific augmentation.
【Keywords】:
【Paper Link】 【Pages】:5458-5465
【Authors】: Dongbo Xi ; Fuzhen Zhuang ; Yanchi Liu ; Jingjing Gu ; Hui Xiong ; Qing He
【Abstract】: Human mobility data accumulated from Point-of-Interest (POI) check-ins provides great opportunity for user behavior understanding. However, data quality issues (e.g., geolocation information missing, unreal check-ins, data sparsity) in real-life mobility data limit the effectiveness of existing POIoriented studies, e.g., POI recommendation and location prediction, when applied to real applications. To this end, in this paper, we develop a model, named Bi-STDDP, which can integrate bi-directional spatio-temporal dependence and users’ dynamic preferences, to identify the missing POI check-in where a user has visited at a specific time. Specifically, we first utilize bi-directional global spatial and local temporal information of POIs to capture the complex dependence relationships. Then, target temporal pattern in combination with user and POI information are fed into a multi-layer network to capture users’ dynamic preferences. Moreover, the dynamic preferences are transformed into the same space as the dependence relationships to form the final model. Finally, the proposed model is evaluated on three large-scale real-world datasets and the results demonstrate significant improvements of our model compared with state-of-the-art methods. Also, it is worth noting that the proposed model can be naturally extended to address POI recommendation and location prediction tasks with competitive performances.
【Keywords】:
【Paper Link】 【Pages】:5466-5473
【Authors】: Yingce Xia ; Tianyu He ; Xu Tan ; Fei Tian ; Di He ; Tao Qin
【Abstract】: Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.
【Keywords】:
【Paper Link】 【Pages】:5474-5481
【Authors】: Teng Xiao ; Shangsong Liang ; Weizhou Shen ; Zaiqiao Meng
【Abstract】: In this paper, we propose a Bayesian Deep Collaborative Matrix Factorization (BDCMF) algorithm for collaborative filtering (CF). BDCMF is a novel Bayesian deep generative model that learns user and item latent vectors from users’ social interactions, contents of items as the auxiliary information and user-item rating (feedback) matrix. It alleviates the problem of matrix sparsity by incorporating items’ auxiliary and users’ social information into the model. It can learn more robust and dense latent representations by integrating deep learning into Bayesian probabilistic framework. As being one of deep generative models, it has both non-linearity and Bayesian nature. Additionally, in BDCMF, we derive an efficient EM-style point estimation algorithm for parameter learning. To further improve recommendation performance, we also derive a full Bayesian posterior estimation algorithm for inference. Experiments conducted on two sparse datasets show that BDCMF can significantly outperform the state-of-the-art CF methods.
【Keywords】:
【Paper Link】 【Pages】:5482-5489
【Authors】: Yun Xiao ; Pengzhen Ren ; Zhihui Li ; Xiaojiang Chen ; Xin Wang ; Dingyi Fang
【Abstract】: Spectral clustering has been widely adopted because it can mine structures between data clusters. The clustering performance of spectral clustering depends largely on the quality of the constructed affinity graph, especially when the data has noise. Subspace learning can transform the original input features to a low-dimensional subspace and help to produce a robust method. Therefore, how to learn an intrinsic subspace and construct a pure affinity graph on a dataset with noise is a challenge in spectral clustering. In order to deal with this challenge, a new Robust Single-Step Spectral Clustering with Intrinsic Subspace (RS3CIS) method is proposed in this paper. RS3CIS uses a local representation method that projects the original data into a low-dimensional subspace through a row-sparse transformation matrix and uses the `2,1-norm of the transformation matrix as a penalty term to achieve noise suppression. In addition, RS3CIS introduces Laplacian matrix rank constraint so that it can output an affinity graph with an explicit clustering structure, which makes the final clustering result to be obtained in a single-step of constructing an affinity matrix. One synthetic dataset and six real benchmark datasets are used to verify the performance of the proposed method by performing clustering and projection experiments. Experimental results show that RS3CIS outperforms the related methods with respect to clustering quality, robustness and dimension reduction.
【Keywords】:
【Paper Link】 【Pages】:5490-5497
【Authors】: Hong Xie ; Yongkun Li ; John C. S. Lui
【Abstract】: Online product rating systems have become an indispensable component for numerous web services such as Amazon, eBay, Google play store and TripAdvisor. One functionality of such systems is to uncover the product quality via product ratings (or reviews) contributed by consumers. However, a well-known psychological phenomenon called “messagebased persuasion” lead to “biased” product ratings in a cascading manner (we call this the persuasion cascade). This paper investigates: (1) How does the persuasion cascade influence the product quality estimation accuracy? (2) Given a real-world product rating dataset, how to infer the persuasion cascade and analyze it to draw practical insights? We first develop a mathematical model to capture key factors of a persuasion cascade. We formulate a high-order Markov chain to characterize the opinion dynamics of a persuasion cascade and prove the convergence of opinions. We further bound the product quality estimation error for a class of rating aggregation rules including the averaging scoring rule, via the matrix perturbation theory and the Chernoff bound. We also design a maximum likelihood algorithm to infer parameters of the persuasion cascade. We conduct experiments on the data from Amazon and TripAdvisor, and show that persuasion cascades notably exist, but the average scoring rule has a small product quality estimation error under practical scenarios.
【Keywords】:
【Paper Link】 【Pages】:5498-5507
【Authors】: Jianwen Xie ; Ruiqi Gao ; Zilong Zheng ; Song-Chun Zhu ; Ying Nian Wu
【Abstract】: This paper studies the dynamic generator model for spatialtemporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state vectors follows a non-linear auto-regressive model, where the state vector of the next frame is a non-linear transformation of the state vector of the current frame as well as an independent noise vector that provides randomness in the transition. The non-linear transformation of this transition model can be parametrized by a feedforward neural network. We show that this model can be learned by an alternating back-propagation through time algorithm that iteratively samples the noise vectors and updates the parameters in the transition model and the generator model. We show that our training method can learn realistic models for dynamic textures and action patterns.
【Keywords】:
【Paper Link】 【Pages】:5508-5515
【Authors】: Yuying Xing ; Guoxian Yu ; Carlotta Domeniconi ; Jun Wang ; Zili Zhang ; Maozu Guo
【Abstract】: Multi-view Multi-instance Multi-label Learning (M3L) deals with complex objects encompassing diverse instances, represented with different feature views, and annotated with multiple labels. Existing M3L solutions only partially explore the inter or intra relations between objects (or bags), instances, and labels, which can convey important contextual information for M3L. As such, they may have a compromised performance.\ In this paper, we propose a collaborative matrix factorization based solution called M3Lcmf. M3Lcmf first uses a heterogeneous network composed of nodes of bags, instances, and labels, to encode different types of relations via multiple relational data matrices. To preserve the intrinsic structure of the data matrices, M3Lcmf collaboratively factorizes them into low-rank matrices, explores the latent relationships between bags, instances, and labels, and selectively merges the data matrices. An aggregation scheme is further introduced to aggregate the instance-level labels into bag-level and to guide the factorization. An empirical study on benchmark datasets show that M3Lcmf outperforms other related competitive solutions both in the instance-level and bag-level prediction.
【Keywords】:
【Paper Link】 【Pages】:5516-5524
【Authors】: Haoyi Xiong ; Kafeng Wang ; Jiang Bian ; Zhanxing Zhu ; Cheng-Zhong Xu ; Zhishan Guo ; Jun Huan
【Abstract】: Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) methods have been widely used to sample from certain probability distributions, incorporating (kernel) density derivatives and/or given datasets. Instead of exploring new samples from kernel spaces, this piece of work proposed a novel SGHMC sampler, namely Spectral Hamiltonian Monte Carlo (SpHMC), that produces the high dimensional sparse representations of given datasets through sparse sensing and SGHMC. Inspired by compressed sensing, we assume all given samples are low-dimensional measurements of certain high-dimensional sparse vectors, while a continuous probability distribution exists in such high-dimensional space. Specifically, given a dictionary for sparse coding, SpHMC first derives a novel likelihood evaluator of the probability distribution from the loss function of LASSO, then samples from the high-dimensional distribution using stochastic Langevin dynamics with derivatives of the logarithm likelihood and Metropolis–Hastings sampling. In addition, new samples in low-dimensional measuring spaces can be regenerated using the sampled high-dimensional vectors and the dictionary. Extensive experiments have been conducted to evaluate the proposed algorithm using real-world datasets. The performance comparisons on three real-world applications demonstrate the superior performance of SpHMC beyond baseline methods.
【Keywords】:
【Paper Link】 【Pages】:5525-5532
【Authors】: Chang Xu ; Weiran Huang ; Hongwei Wang ; Gang Wang ; Tie-Yan Liu
【Abstract】: Recurrent Neural Networks (RNNs) have been widely used in processing natural language tasks and achieve huge success. Traditional RNNs usually treat each token in a sentence uniformly and equally. However, this may miss the rich semantic structure information of a sentence, which is useful for understanding natural languages. Since semantic structures such as word dependence patterns are not parameterized, it is a challenge to capture and leverage structure information. In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information. Concretely, MC-RNN contains multiple channels, each of which represents a local dependence pattern at a time. An attention mechanism is introduced to combine these patterns at each step, according to the semantic information. Then we parameterize structure information by adaptively selecting the most appropriate connection structures among channels. In this way, diverse local structures and dependence patterns in sentences can be well captured by MC-RNN. To verify the effectiveness of MC-RNN, we conduct extensive experiments on typical natural language processing tasks, including neural machine translation, abstractive summarization, and language modeling. Experimental results on these tasks all show significant improvements of MC-RNN over current top systems.
【Keywords】:
【Paper Link】 【Pages】:5533-5540
【Authors】: Changdong Xu ; Xin Geng
【Abstract】: Hierarchical classification is a challenging problem where the class labels are organized in a predefined hierarchy. One primary challenge in hierarchical classification is the small training set issue of the local module. The local classifiers in the previous hierarchical classification approaches are prone to over-fitting, which becomes a major bottleneck of hierarchical classification. Fortunately, the labels in the local module are correlated, and the siblings of the true label can provide additional supervision information for the instance. This paper proposes a novel method to deal with the small training set issue. The key idea of the method is to represent the correlation among the labels by the label distribution. It generates a label distribution that contains the supervision information of each label for the given instance, and then learns a mapping from the instance to the label distribution. Experimental results on several hierarchical classification datasets show that our method significantly outperforms other state-of-theart hierarchical classification approaches.
【Keywords】:
【Paper Link】 【Pages】:5541-5548
【Authors】: Hongzuo Xu ; Yongjun Wang ; Zhiyue Wu ; Yijie Wang
【Abstract】: Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.
【Keywords】:
【Paper Link】 【Pages】:5549-5556
【Authors】: Liyuan Xu ; Junya Honda ; Masashi Sugiyama
【Abstract】: We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance—actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper1, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.
【Keywords】:
【Paper Link】 【Pages】:5557-5564
【Authors】: Ning Xu ; Jiaqi Lv ; Xin Geng
【Abstract】: Partial label learning aims to learn from training examples each associated with a set of candidate labels, among which only one label is valid for the training example. The common strategy to induce predictive model is trying to disambiguate the candidate label set, such as disambiguation by identifying the ground-truth label iteratively or disambiguation by treating each candidate label equally. Nonetheless, these strategies ignore considering the generalized label distribution corresponding to each instance since the generalized label distribution is not explicitly available in the training set. In this paper, a new partial label learning strategy named PL-LE is proposed to learn from partial label examples via label enhancement. Specifically, the generalized label distributions are recovered by leveraging the topological information of the feature space. After that, a multi-class predictive model is learned by fitting a regularized multi-output regressor with the generalized label distributions. Extensive experiments show that PL-LE performs favorably against state-ofthe-art partial label learning approaches.
【Keywords】:
【Paper Link】 【Pages】:5565-5572
【Authors】: Ting-Bing Xu ; Cheng-Lin Liu
【Abstract】: Knowledge distillation is an effective technique that has been widely used for transferring knowledge from a network to another network. Despite its effective improvement of network performance, the dependence of accompanying assistive models complicates the training process of single network in the need of large memory and time cost. In this paper, we design a more elegant self-distillation mechanism to transfer knowledge between different distorted versions of same training data without the reliance on accompanying models. Specifically, the potential capacity of single network is excavated by learning consistent global feature distributions and posterior distributions (class probabilities) across these distorted versions of data. Extensive experiments on multiple datasets (i.e., CIFAR-10/100 and ImageNet) demonstrate that the proposed method can effectively improve the generalization performance of various network architectures (such as AlexNet, ResNet, Wide ResNet, and DenseNet), outperform existing distillation methods with little extra training efforts.
【Keywords】:
【Paper Link】 【Pages】:5573-5580
【Authors】: Yao Xu ; Xueshuang Xiang ; Meiyu Huang
【Abstract】: This paper introduces a novel deep learning based method, named bridge neural network (BNN) to dig the potential relationship between two given data sources task by task. The proposed approach employs two convolutional neural networks that project the two data sources into a feature space to learn the desired common representation required by the specific task. The training objective with artificial negative samples is introduced with the ability of mini-batch training and it’s asymptotically equivalent to maximizing the total correlation of the two data sources, which is verified by the theoretical analysis. The experiments on the tasks, including pair matching, canonical correlation analysis, transfer learning, and reconstruction demonstrate the state-of-the-art performance of BNN, which may provide new insights into the aspect of common representation learning.
【Keywords】:
【Paper Link】 【Pages】:5581-5588
【Authors】: Yonghao Xu ; Bo Du ; Lefei Zhang ; Qian Zhang ; Guoli Wang ; Liangpei Zhang
【Abstract】: Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.
【Keywords】:
【Paper Link】 【Pages】:5589-5596
【Authors】: Yanbing Xue ; Milos Hauskrecht
【Abstract】: In this paper, we study the problem of learning multi-class classification models from a limited set of labeled examples obtained from human annotator. We propose a new machine learning framework that learns multi-class classification models from ordered class sets the annotator may use to express not only her top class choice but also other competing classes still under consideration. Such ordered sets of competing classes are common, for example, in various diagnostic tasks. In this paper, we first develop strategies for learning multi-class classification models from examples associated with ordered class set information. After that we develop an active learning strategy that considers such a feedback. We evaluate the benefit of the framework on multiple datasets. We show that class-order feedback and active learning can reduce the annotation cost both individually and jointly.
【Keywords】:
【Paper Link】 【Pages】:5597-5604
【Authors】: Bo Yan ; Chuming Lin ; Weimin Tan
【Abstract】: For video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR frames to super-resolve the following frame. The main weaknesses of these approaches are: 1) separately generating each output frame may obtain high-quality HR estimates while resulting in unsatisfactory flickering artifacts, and 2) combining previously generated HR frames can produce temporally consistent results in the case of short information flow, but it will cause significant jitter and jagged artifacts because the previous super-resolving errors are constantly accumulated to the subsequent frames.In this paper, we propose a fully end-to-end trainable frame and feature-context video super-resolution (FFCVSR) network that consists of two key sub-networks: local network and context network, where the first one explicitly utilizes a sequence of consecutive LR frames to generate local feature and local SR frame, and the other combines the outputs of local network and the previously estimated HR frames and features to super-resolve the subsequent frame. Our approach takes full advantage of the inter-frame information from multiple LR frames and the context information from previously predicted HR frames, producing temporally consistent highquality results while maintaining real-time speed by directly reusing previous features and frames. Extensive evaluations and comparisons demonstrate that our approach produces state-of-the-art results on a standard benchmark dataset, with advantages in terms of accuracy, efficiency, and visual quality over the existing approaches.
【Keywords】:
【Paper Link】 【Pages】:5605-5612
【Authors】: Yuguang Yan ; Mingkui Tan ; Yanwu Xu ; Jiezhang Cao ; Michael K. Ng ; Huaqing Min ; Qingyao Wu
【Abstract】: The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.
【Keywords】:
【Paper Link】 【Pages】:5613-5620
【Authors】: Baoyao Yang ; Pong C. Yuen
【Abstract】: In unsupervised domain adaptation, distributions of visual representations are mismatched across domains, which leads to the performance drop of a source model in the target domain. Therefore, distribution alignment methods have been proposed to explore cross-domain visual representations. However, most alignment methods have not considered the difference in distribution structures across domains, and the adaptation would subject to the insufficient aligned cross-domain representations. To avoid the misclassification/misidentification due to the difference in distribution structures, this paper proposes a novel unsupervised graph alignment method that aligns both data representations and distribution structures across the source and target domains. An adversarial network is developed for unsupervised graph alignment, which maps both source and target data to a feature space where data are distributed with unified structure criteria. Experimental results show that the graph-aligned visual representations achieve good performance on both crossdataset recognition and cross-modal re-identification.
【Keywords】:
【Paper Link】 【Pages】:5621-5627
【Authors】: Bin-Bin Yang ; Song-Qing Shen ; Wei Gao
【Abstract】: Decision trees have attracted much attention during the past decades. Previous decision trees include axis-parallel and oblique decision trees; both of them try to find the best splits via exhaustive search or heuristic algorithms in each iteration. Oblique decision trees generally simplify tree structure and take better performance, but are always accompanied with higher computation, as well as the initialization with the best axis-parallel splits. This work presents the Weighted Oblique Decision Tree (WODT) based on continuous optimization with random initialization. We consider different weights of each instance for child nodes at all internal nodes, and then obtain a split by optimizing the continuous and differentiable objective function of weighted information entropy. Extensive experiments show the effectiveness of the proposed algorithm.
【Keywords】:
【Paper Link】 【Pages】:5628-5635
【Authors】: Chenglin Yang ; Lingxi Xie ; Siyuan Qiao ; Alan L. Yuille
【Abstract】: We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations.This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network, facilitating a few secondary classes to emerge and complement to the primary class. Consequently, the teacher provides a milder supervision signal (a less peaked distribution), and makes it possible for the student to learn from inter-class similarity and potentially lower the risk of over-fitting. Experiments are performed on standard image classification tasks (CIFAR100 and ILSVRC2012). Although the teacher network behaves less powerful, the students show a persistent ability growth and eventually achieve higher classification accuracies than other competitors. Model ensemble and transfer feature extraction also verify the effectiveness of our approach.
【Keywords】:
【Paper Link】 【Pages】:5636-5643
【Authors】: Peng Yang ; Peilin Zhao ; Jiayu Zhou ; Xin Gao
【Abstract】: Traditional online multitask learning only utilizes the firstorder information of the datastream. To remedy this issue, we propose a confidence weighted multitask learning algorithm, which maintains a Gaussian distribution over each task model to guide online learning process. The mean (covariance) of the Gaussian Distribution is a sum of a local component and a global component that is shared among all the tasks. In addition, this paper also addresses the challenge of active learning on the online multitask setting. Instead of requiring labels of all the instances, the proposed algorithm determines whether the learner should acquire a label by considering the confidence from its related tasks over label prediction. Theoretical results show the regret bounds can be significantly reduced. Empirical results demonstrate that the proposed algorithm is able to achieve promising learning efficacy, while simultaneously minimizing the labeling cost.
【Keywords】:
【Paper Link】 【Pages】:5644-5651
【Authors】: Shuo Yang ; Kai Shu ; Suhang Wang ; Renjie Gu ; Fan Wu ; Huan Liu
【Abstract】: Social media has become one of the main channels for people to access and consume news, due to the rapidness and low cost of news dissemination on it. However, such properties of social media also make it a hotbed of fake news dissemination, bringing negative impacts on both individuals and society. Therefore, detecting fake news has become a crucial problem attracting tremendous research effort. Most existing methods of fake news detection are supervised, which require an extensive amount of time and labor to build a reliably annotated dataset. In search of an alternative, in this paper, we investigate if we could detect fake news in an unsupervised manner. We treat truths of news and users’ credibility as latent random variables, and exploit users’ engagements on social media to identify their opinions towards the authenticity of news. We leverage a Bayesian network model to capture the conditional dependencies among the truths of news, the users’ opinions, and the users’ credibility. To solve the inference problem, we propose an efficient collapsed Gibbs sampling approach to infer the truths of news and the users’ credibility without any labelled data. Experiment results on two datasets show that the proposed method significantly outperforms the compared unsupervised methods.
【Keywords】:
【Paper Link】 【Pages】:5652-5659
【Authors】: Yang Yang ; Yi-Feng Wu ; De-Chuan Zhan ; Zhi-Bin Liu ; Yuan Jiang
【Abstract】: In real-world applications, data are often with multiple modalities, and many multi-modal learning approaches are proposed for integrating the information from different sources. Most of the previous multi-modal methods utilize the modal consistency to reduce the complexity of the learning problem, therefore the modal completeness needs to be guaranteed. However, due to the data collection failures, self-deficiencies, and other various reasons, multi-modal instances are often incomplete in real applications, and have the inconsistent anomalies even in the complete instances, which jointly result in the inconsistent problem. These degenerate the multi-modal feature learning performance, and will finally affect the generalization abilities in different tasks. In this paper, we propose a novel Deep Robust Unsupervised Multi-modal Network structure (DRUMN) for solving this real problem within a unified framework. The proposed DRUMN can utilize the extrinsic heterogeneous information from unlabeled data against the insufficiency caused by the incompleteness. On the other hand, the inconsistent anomaly issue is solved with an adaptive weighted estimation, rather than adjusting the complex thresholds. As DRUMN can extract the discriminative feature representations for each modality, experiments on real-world multimodal datasets successfully validate the effectiveness of our proposed method.
【Keywords】:
【Paper Link】 【Pages】:5660-5667
【Authors】: Zhiyong Yang ; Qianqian Xu ; Xiaochun Cao ; Qingming Huang
【Abstract】: Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understand and predict personalized attribute annotations. Regarding the attribute preference learning for each annotator as a specific task, we first propose a multi-level task parameter decomposition to capture the evolution from a highly popular opinion of the mass to highly personalized choices that are special for each person. Meanwhile, for personalized learning methods, ranking prediction is much more important than accurate classification. This motivates us to employ an Area Under ROC Curve (AUC) based loss function to improve our model. On top of the AUC-based loss, we propose an efficient method to evaluate the loss and gradients. Theoretically, we propose a novel closed-form solution for one of our non-convex subproblem, which leads to provable convergence behaviors. Furthermore, we also provide a generalization bound to guarantee a reasonable performance. Finally, empirical analysis consistently speaks to the efficacy of our proposed method.
【Keywords】:
【Paper Link】 【Pages】:5668-5675
【Authors】: Huaxiu Yao ; Xianfeng Tang ; Hua Wei ; Guanjie Zheng ; Zhenhui Li
【Abstract】: Traffic prediction has drawn increasing attention in AI research field due to the increasing availability of large-scale traffic data and its importance in the real world. For example, an accurate taxi demand prediction can assist taxi companies in pre-allocating taxis. The key challenge of traffic prediction lies in how to model the complex spatial dependencies and temporal dynamics. Although both factors have been considered in modeling, existing works make strong assumptions about spatial dependence and temporal dynamics, i.e., spatial dependence is stationary in time, and temporal dynamics is strictly periodical. However, in practice the spatial dependence could be dynamic (i.e., changing from time to time), and the temporal dynamics could have some perturbation from one period to another period. In this paper, we make two important observations: (1) the spatial dependencies between locations are dynamic; and (2) the temporal dependency follows daily and weekly pattern but it is not strictly periodic for its dynamic temporal shifting. To address these two issues, we propose a novel Spatial-Temporal Dynamic Network (STDN), in which a flow gating mechanism is introduced to learn the dynamic similarity between locations, and a periodically shifted attention mechanism is designed to handle long-term periodic temporal shifting. To the best of our knowledge, this is the first work that tackle both issues in a unified framework. Our experimental results on real-world traffic datasets verify the effectiveness of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:5676-5683
【Authors】: Zhuliang Yao ; Shijie Cao ; Wencong Xiao ; Chen Zhang ; Lanshun Nie
【Abstract】: In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.
【Keywords】:
【Paper Link】 【Pages】:5684-5692
【Authors】: Teresa Yeo ; Parameswaran Kamalaruban ; Adish Singla ; Arpit Merchant ; Thibault Asselborn ; Louis Faucon ; Pierre Dillenbourg ; Volkan Cevher
【Abstract】: We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom using O(min{d,N}log 1/ɛ) exam-ples, where d is the ambient dimension of the problem, N is the number of learners, and ɛ is the accuracy parameter. We show the robustness of our teaching strategy when the teacher has limited knowledge of the learners’ internal dynamics as provided by a noisy oracle. Further, we study the trade-off between the learners’ workload and the teacher’s cost in teaching the target concept. Our experiments validate our theoretical results and suggest that appropriately partitioning the classroom into homogenous groups provides a balance between these two objectives.
【Keywords】:
【Paper Link】 【Pages】:5693-5700
【Authors】: Hao Yu ; Sen Yang ; Shenghuo Zhu
【Abstract】: In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.
【Keywords】:
【Paper Link】 【Pages】:5701-5708
【Authors】: Joonsang Yu ; Sungbum Kang ; Kiyoung Choi
【Abstract】: This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.
【Keywords】:
【Paper Link】 【Pages】:5709-5716
【Authors】: Lu Yu ; Chuxu Zhang ; Shangsong Liang ; Xiangliang Zhang
【Abstract】: In modern e-commerce, the temporal order behind users’ transactions implies the importance of exploiting the transition dependency among items for better inferring what a user prefers to interact in “near future”. The types of interaction among items are usually divided into individual-level interaction that can stand out the transition order between a pair of items, or union-level relation between a set of items and single one. However, most of existing work only captures one of them from a single view, especially on modeling the individual-level interaction. In this paper, we propose a Multi-order Attentive Ranking Model (MARank) to unify both individual- and union-level item interaction into preference inference model from multiple views. The idea is to represent user’s short-term preference by embedding user himself and a set of present items into multi-order features from intermedia hidden status of a deep neural network. With the help of attention mechanism, we can obtain a unified embedding to keep the individual-level interactions with a linear combination of mapped items’ features. Then, we feed the aggregated embedding to a designed residual neural network to capture union-level interaction. Thorough experiments are conducted to show the features of MARank under various component settings. Furthermore experimental results on several public datasets show that MARank significantly outperforms the state-of-the-art baselines on different evaluation metrics. The source code can be found at https://github.com/voladorlu/MARank.
【Keywords】:
【Paper Link】 【Pages】:5717-5724
【Authors】: Hao Yuan ; Yongjun Chen ; Xia Hu ; Shuiwang Ji
【Abstract】: Interpreting deep neural networks is of great importance to understand and verify deep models for natural language processing (NLP) tasks. However, most existing approaches only focus on improving the performance of models but ignore their interpretability. In this work, we propose an approach to investigate the meaning of hidden neurons of the convolutional neural network (CNN) models. We first employ saliency map and optimization techniques to approximate the detected information of hidden neurons from input sentences. Then we develop regularization terms and explore words in vocabulary to interpret such detected information. Experimental results demonstrate that our approach can identify meaningful and reasonable interpretations for hidden spatial locations. Additionally, we show that our approach can describe the decision procedure of deep NLP models.
【Keywords】:
【Paper Link】 【Pages】:5725-5732
【Authors】: Biqiao Zhang ; Yuqing Kong ; Georg Essl ; Emily Mower Provost
【Abstract】: In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.
【Keywords】:
【Paper Link】 【Pages】:5733-5740
【Authors】: Chen Zhang ; Steven C. H. Hoi
【Abstract】: This paper explores machine learning to address a problem of Partially Observable Multi-sensor Sequential Change Detection (POMSCD), where only a subset of sensors can be observed to monitor a target system for change-point detection at each online learning round. In contrast to traditional Multisensor Sequential Change Detection tasks where all the sensors are observable, POMSCD is much more challenging because the learner not only needs to detect on-the-fly whether a change occurs based on partially observed multi-sensor data streams, but also needs to cleverly choose a subset of informative sensors to be observed in the next learning round, in order to maximize the overall sequential change detection performance. In this paper, we present the first online learning study to tackle POMSCD in a systemic and rigorous way. Our approach has twofold novelties: (i) we attempt to detect changepoints from partial observations effectively by exploiting potential correlations between sensors, and (ii) we formulate the sensor subset selection task as a Multi-Armed Bandit (MAB) problem and develop an effective adaptive sampling strategy using MAB algorithms. We offer theoretical analysis for the proposed online learning solution, and further validate its empirical performance via an extensive set of numerical studies together with a case study on real-world data sets.
【Keywords】:
【Paper Link】 【Pages】:5741-5748
【Authors】: Cheng Zhang ; Cengiz Öztireli ; Stephan Mandt ; Giampiero Salvi
【Abstract】: The convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches. We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch. In particular, we prove that such repulsive sampling schemes lower the variance of the gradient estimator. This generalizes recent work on using Determinantal Point Processes (DPPs) for mini-batch diversification (Zhang et al., 2017) to the broader class of repulsive point processes. We first show that the phenomenon of variance reduction by diversified sampling generalizes in particular to non-stationary point processes. We then show that other point processes may be computationally much more efficient than DPPs. In particular, we propose and investigate Poisson Disk sampling—frequently encountered in the computer graphics community—for this task. We show empirically that our approach improves over standard SGD both in terms of convergence speed as well as final model performance.
【Keywords】:
【Paper Link】 【Pages】:5749-5756
【Authors】: Hanrui Zhang
【Abstract】: We study PMAC-learning of real-valued set functions with limited complementarity. We prove, to our knowledge, the first nontrivial learnability result for set functions exhibiting complementarity, generalizing Balcan and Harvey’s result for submodular functions. We prove a nearly matching information theoretical lower bound on the number of samples required, complementing our learnability result. We conduct numerical simulations to show that our algorithm is likely to perform well in practice.
【Keywords】:
【Paper Link】 【Pages】:5757-5764
【Authors】: Huan Zhang ; Pengchuan Zhang ; Cho-Jui Hsieh
【Abstract】: The Jacobian matrix (or the gradient for single-output networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks. In this paper, we propose a recursive algorithm, RecurJac, to compute both upper and lower bounds for each element in the Jacobian matrix of a neural network with respect to network’s input, and the network can contain a wide range of activation functions. As a byproduct, we can efficiently obtain a (local) Lipschitz constant, which plays a crucial role in neural network robustness verification, as well as the training stability of GANs. Experiments show that (local) Lipschitz constants produced by our method is of better quality than previous approaches, thus providing better robustness verification results. Our algorithm has polynomial time complexity, and its computation time is reasonable even for relatively large networks. Additionally, we use our bounds of Jacobian matrix to characterize the landscape of the neural network, for example, to determine whether there exist stationary points in a local neighborhood.
【Keywords】:
【Paper Link】 【Pages】:5765-5772
【Authors】: Jingwen Zhang ; Joseph G. Ibrahim ; Tengfei Li ; Hongtu Zhu
【Abstract】: We consider the problem of performing an association test between functional data and scalar variables in a varying coefficient model setting. We propose a functional projection regression model and an associated global test statistic to aggregate relatively weak signals across the domain of functional data, while reducing the dimension. An optimal functional projection direction is selected to maximize signal-to-noise ratio with ridge penalty. Theoretically, we systematically study the asymptotic distribution of the global test statistic and provide a strategy to adaptively select the optimal tuning parameter. We use simulations to show that the proposed test outperforms all existing state-of-the-art methods in functional statistical inference. Finally, we apply the proposed testing method to the genome-wide association analysis of imaging genetic data in UK Biobank dataset.
【Keywords】:
【Paper Link】 【Pages】:5773-5780
【Authors】: Kai Zhang ; Hefu Zhang ; Qi Liu ; Hongke Zhao ; Hengshu Zhu ; Enhong Chen
【Abstract】: Cross-domain sentiment classification refers to utilizing useful knowledge in the source domain to help sentiment classification in the target domain which has few or no labeled data. Most existing methods mainly concentrate on extracting common features between domains. Unfortunately, they cannot fully consider the effects of the aspect (e.g., the battery life in reviewing an electronic product) information of the sentences. In order to better solve this problem, we propose an Interactive Attention Transfer Network (IATN) for crossdomain sentiment classification. IATN provides an interactive attention transfer mechanism, which can better transfer sentiment across domains by incorporating information of both sentences and aspects. Specifically, IATN comprises two attention networks, one of them is to identify the common features between domains through domain classification, and the other aims to extract information from the aspects by using the common features as a bridge. Then, we conduct interactive attention learning for those two networks so that both the sentences and the aspects can influence the final sentiment representation. Extensive experiments on the Amazon reviews dataset and crowdfunding reviews dataset not only demonstrate the effectiveness and universality of our method, but also give an interpretable way to track the attention information for sentiment.
【Keywords】:
【Paper Link】 【Pages】:5781-5788
【Authors】: Qi Zhang ; Richard Lewis ; Satinder P. Singh ; Edmund H. Durfee
【Abstract】: We study emergent communication between speaker and listener recurrent neural-network agents that are tasked to cooperatively construct a blocks-world target image sampled from a generative grammar of blocks configurations. The speaker receives the target image and learns to emit a sequence of discrete symbols from a fixed vocabulary. The listener learns to construct a blocks-world image by choosing block placement actions as a function of the speaker’s full utterance and the image of the ongoing construction. Our contributions are (a) the introduction of a task domain for studying emergent communication that is both challenging and affords useful analyses of the emergent protocols; (b) an empirical comparison of the interpolation and extrapolation performance of training via supervised, (contextual) Bandit, and reinforcement learning; and (c) evidence for the emergence of interesting linguistic properties in the RL agent protocol that are distinct from the other two.
【Keywords】:
【Paper Link】 【Pages】:5789-5796
【Authors】: Shangtong Zhang ; Hengshuai Yao
【Abstract】: In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. In ACE, we use actor ensemble (i.e., multiple actors) to search the global maxima of the critic. Besides the ensemble perspective, we also formulate ACE in the option framework by extending the option-critic architecture with deterministic intra-option policies, revealing a relationship between ensemble and options. Furthermore, we perform a look-ahead tree search with those actors and a learned value prediction model, resulting in a refined value estimation. We demonstrate a significant performance boost of ACE over DDPG and its variants in challenging physical robot simulators.
【Keywords】:
【Paper Link】 【Pages】:5797-5804
【Authors】: Shangtong Zhang ; Hengshuai Yao
【Abstract】: In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL). In QUOTA, decision making is based on quantiles of a value distribution, not only the mean. QUOTA provides a new dimension for exploration via making use of both optimism and pessimism of a value distribution. We demonstrate the performance advantage of QUOTA in both challenging video games and physical robot simulators.
【Keywords】:
【Paper Link】 【Pages】:5805-5812
【Authors】: Suwei Zhang ; Yuan Yao ; Feng Xu ; Hanghang Tong ; Xiaohui Yan ; Jian Lu
【Abstract】: Hashtags can greatly facilitate content navigation and improve user engagement in social media. Meaningful as it might be, recommending hashtags for photo sharing services such as Instagram and Pinterest remains a daunting task due to the following two reasons. On the endogenous side, posts in photo sharing services often contain both images and text, which are likely to be correlated with each other. Therefore, it is crucial to coherently model both image and text as well as the interaction between them. On the exogenous side, hashtags are generated by users and different users might come up with different tags for similar posts, due to their different preference and/or community effect. Therefore, it is highly desirable to characterize the users’ tagging habits. In this paper, we propose an integral and effective hashtag recommendation approach for photo sharing services. In particular, the proposed approach considers both the endogenous and exogenous effects by a content modeling module and a habit modeling module, respectively. For the content modeling module, we adopt the parallel co-attention mechanism to coherently model both image and text as well as the interaction between them; for the habit modeling module, we introduce an external memory unit to characterize the historical tagging habit of each user. The overall hashtag recommendations are generated on the basis of both the post features from the content modeling module and the habit influences from the habit modeling module. We evaluate the proposed approach on real Instagram data. The experimental results demonstrate that the proposed approach significantly outperforms the state-of-theart methods in terms of recommendation accuracy, and that both content modeling and habit modeling contribute significantly to the overall recommendation accuracy.
【Keywords】:
【Paper Link】 【Pages】:5813-5820
【Authors】: Yan-Ya Zhang ; Ming Li
【Abstract】: Code clone is common in software development, which usually leads to software defects or copyright infringement. Researchers have paid significant attention to code clone detection, and many methods have been proposed. However, the patterns for generating the code clones do not always remain the same. In order to fool the clone detection systems, the plagiarists, known as the clone creator, usually conduct a series of tricky modifications on the code fragments to make the clone difficult to detect. The existing clone detection approaches, which neglects the dynamics of the “contest” between the plagiarist and the detectors, is doomed to be not robust to adversarial revision of the code. In this paper, we propose a novel clone detection approach, namely ACD, to mimic the adversarial process between the plagiarist and the detector, which enables us to not only build strong a clone detector but also model the behavior of the plagiarists. Such a plagiarist model may in turn help to understand the vulnerability of the current software clone detection tools. Experiments show that the learned policy of plagiarist can help us build stronger clone detector, which outperforms the existing clone detection methods.
【Keywords】:
【Paper Link】 【Pages】:5821-5828
【Authors】: Yao Zhang ; Wen-Ping Fan ; Xuan Wu ; Hua Chen ; Bin-Yang Li ; Min-Ling Zhang
【Abstract】: Virtual desktop infrastructure (VDI) is a virtualization technology that hosts desktop operating system on centralized server in a data center of private or public cloud. Effective resource management is of crucial importance for VDI customers, where maintaining sufficient virtual machines helps guarantee satisfactory user experience while turning off spare virtual machines helps save running cost. Generally, existing techniques work in passive manner by either driving available capacity reactively or configuring management schedules manually. In this paper, a novel proactive resource management approach is proposed which aims to predict VDI pool workload adaptively by utilizing CoArse to Fine historical dEscriptive (CAFE) features. Specifically, aggregate session count from pool end users serves as the basis for workload measurement and predictive model induction. Extensive experiments on real VDI customers data sets clearly validate the effectiveness of multi-grained features for VDI workload prediction. Furthermore, practical insights identified in our VDI data analytics are also discussed.
【Keywords】:
【Paper Link】 【Pages】:5829-5836
【Authors】: Yingxue Zhang ; Soumyasundar Pal ; Mark Coates ; Deniz Üstebay
【Abstract】: Recently, techniques for applying convolutional neural networks to graph-structured data have emerged. Graph convolutional neural networks (GCNNs) have been used to address node and graph classification and matrix completion. Although the performance has been impressive, the current implementations have limited capability to incorporate uncertainty in the graph structure. Almost all GCNNs process a graph as though it is a ground-truth depiction of the relationship between nodes, but often the graphs employed in applications are themselves derived from noisy data or modelling assumptions. Spurious edges may be included; other edges may be missing between nodes that have very strong relationships. In this paper we adopt a Bayesian approach, viewing the observed graph as a realization from a parametric family of random graphs. We then target inference of the joint posterior of the random graph parameters and the node (or graph) labels. We present the Bayesian GCNN framework and develop an iterative learning procedure for the case of assortative mixed-membership stochastic block models. We present the results of experiments that demonstrate that the Bayesian formulation can provide better performance when there are very few labels available during the training process.
【Keywords】:
【Paper Link】 【Pages】:5837-5844
【Abstract】: Data features usually can be organized in a hierarchical structure to reflect the relations among them. Most of previous studies that utilize the hierarchical structure to help improve the performance of supervised learning tasks can only handle the structure of a limited height such as 2. In this paper, we propose a Deep Hierarchical Structure (DHS) method to handle the hierarchical structure of an arbitrary height with a convex objective function. The DHS method relies on the exponents of the edge weights in the hierarchical structure but the exponents need to be given by users or set to be identical by default, which may be suboptimal. Based on the DHS method, we propose a variant to learn the exponents from data. Moreover, we consider a case where even the hierarchical structure is not available. Based on the DHS method, we propose a Learning Deep Hierarchical Structure (LDHS) method which can learn the hierarchical structure via a generalized fused-Lasso regularizer and a proposed sequential constraint. All the optimization problems are solved by proximal methods where each subproblem has an efficient solution. Experiments on synthetic and real-world datasets show the effectiveness of the proposed methods.
【Keywords】:
【Paper Link】 【Pages】:5845-5852
【Authors】: Yudong Zhang ; Wenhao Zheng ; Ming Li
【Abstract】: Semantic feature learning for natural language and programming language is a preliminary step in addressing many software mining tasks. Many existing methods leverage information in lexicon and syntax to learn features for textual data. However, such information is inadequate to represent the entire semantics in either text sentence or code snippet. This motivates us to propose a new approach to learn semantic features for both languages, through extracting three levels of information, namely global, local and sequential information, from textual data. For tasks involving both modalities, we project the data of both types into a uniform feature space so that the complementary knowledge in between can be utilized in their representation. In this paper, we build a novel and general-purpose feature learning framework called UniEmbed, to uniformly learn comprehensive semantic representation for both natural language and programming language. Experimental results on three real-world software mining tasks show that UniEmbed outperforms state-of-the-art models in feature learning and prove the capacity and effectiveness of our model.
【Keywords】:
【Paper Link】 【Pages】:5853-5860
【Authors】: Zheng Zhang ; Guo-Sen Xie ; Yang Li ; Sheng Li ; Zi Huang
【Abstract】: Due to its low storage cost and fast query speed, hashing has been recognized to accomplish similarity search in largescale multimedia retrieval applications. Particularly, supervised hashing has recently received considerable research attention by leveraging the label information to preserve the pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full pairwise similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full pairwise similarities while skillfully handle the cumbersome n×n pairwise similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.
【Keywords】:
【Paper Link】 【Pages】:5861-5868
【Authors】: Junzhou Zhao ; Shuo Shang ; Pinghui Wang ; John C. S. Lui ; Xiangliang Zhang
【Abstract】: Cardinality constrained submodular function maximization, which aims to select a subset of size at most k to maximize a monotone submodular utility function, is the key in many data mining and machine learning applications such as data summarization and maximum coverage problems. When data is given as a stream, streaming submodular optimization (SSO) techniques are desired. Existing SSO techniques can only apply to insertion-only streams where each element has an infinite lifespan, and sliding-window streams where each element has a same lifespan (i.e., window size). However, elements in some data streams may have arbitrary different lifespans, and this requires addressing SSO over streams with inhomogeneous-decays (SSO-ID). This work formulates the SSO-ID problem and presents three algorithms: BASIC-STREAMING is a basic streaming algorithm that achieves an (1/2 − ɛ) approximation factor; HISTAPPROX improves the efficiency significantly and achieves an (1/3 − ɛ) approximation factor; HISTSTREAMING is a streaming version of HISTAPPROX and uses heuristics to further improve the efficiency. Experiments conducted on real data demonstrate that HISTSTREAMING can find high quality solutions and is up to two orders of magnitude faster than the naive GREEDY algorithm.
【Keywords】:
【Paper Link】 【Pages】:5869-5876
【Authors】: Chenxiao Zhao ; P. Thomas Fletcher ; Mixue Yu ; Yaxin Peng ; Guixu Zhang ; Chaomin Shen
【Abstract】: Many deep learning models are vulnerable to the adversarial attack, i.e., imperceptible but intentionally-designed perturbations to the input can cause incorrect output of the networks. In this paper, using information geometry, we provide a reasonable explanation for the vulnerability of deep learning models. By considering the data space as a non-linear space with the Fisher information metric induced from a neural network, we first propose an adversarial attack algorithm termed one-step spectral attack (OSSA). The method is described by a constrained quadratic form of the Fisher information matrix, where the optimal adversarial perturbation is given by the first eigenvector, and the vulnerability is reflected by the eigenvalues. The larger an eigenvalue is, the more vulnerable the model is to be attacked by the corresponding eigenvector. Taking advantage of the property, we also propose an adversarial detection method with the eigenvalues serving as characteristics. Both our attack and detection algorithms are numerically optimized to work efficiently on large datasets. Our evaluations show superior performance compared with other methods, implying that the Fisher information is a promising approach to investigate the adversarial attacks and defenses.
【Keywords】:
【Paper Link】 【Pages】:5877-5884
【Authors】: Pengpeng Zhao ; Haifeng Zhu ; Yanchi Liu ; Jiajie Xu ; Zhixu Li ; Fuzhen Zhuang ; Victor S. Sheng ; Xiaofang Zhou
【Abstract】: Next Point-of-Interest (POI) recommendation is of great value for both location-based service providers and users. However, the state-of-the-art Recurrent Neural Networks (RNNs) rarely consider the spatio-temporal intervals between neighbor check-ins, which are essential for modeling user check-in behaviors in next POI recommendation. To this end, in this paper, we propose a new Spatio-Temporal Gated Network (STGN) by enhancing long-short term memory network, where spatio-temporal gates are introduced to capture the spatio-temporal relationships between successive checkins. Specifically, two pairs of time gate and distance gate are designed to control the short-term interest and the longterm interest updates, respectively. Moreover, we introduce coupled input and forget gates to reduce the number of parameters and further improve efficiency. Finally, we evaluate the proposed model using four real-world datasets from various location-based social networks. The experimental results show that our model significantly outperforms the state-ofthe-art approaches for next POI recommendation.
【Keywords】:
【Paper Link】 【Pages】:5885-5892
【Authors】: Shengjia Zhao ; Jiaming Song ; Stefano Ermon
【Abstract】: A key advance in learning generative models is the use of amortized inference distributions that are jointly trained with the models. We find that existing training objectives for variational autoencoders can lead to inaccurate amortized inference distributions and, in some cases, improving the objective provably degrades the inference quality. In addition, it has been observed that variational autoencoders tend to ignore the latent variables when combined with a decoding distribution that is too flexible. We again identify the cause in existing training criteria and propose a new class of objectives (Info-VAE) that mitigate these problems. We show that our model can significantly improve the quality of the variational posterior and can make effective use of the latent features regardless of the flexibility of the decoding distribution. Through extensive qualitative and quantitative analyses, we demonstrate that our models outperform competing approaches on multiple performance metrics
【Keywords】:
【Paper Link】 【Pages】:5893-5900
【Authors】: Yang Zhao ; Jianyi Zhang ; Changyou Chen
【Abstract】: Scalable Bayesian sampling is playing an important role in modern machine learning, especially in the fast-developed unsupervised-(deep)-learning models. While tremendous progresses have been achieved via scalable Bayesian sampling such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD), the generated samples are typically highly correlated. Moreover, their sample-generation processes are often criticized to be inefficient. In this paper, we propose a novel self-adversarial learning framework that automatically learns a conditional generator to mimic the behavior of a Markov kernel (transition kernel). High-quality samples can be efficiently generated by direct forward passes though a learned generator. Most importantly, the learning process adopts a self-learning paradigm, requiring no information on existing Markov kernels, e.g., knowledge of how to draw samples from them. Specifically, our framework learns to use current samples, either from the generator or pre-provided training data, to update the generator such that the generated samples progressively approach a target distribution, thus it is called self-learning. Experiments on both synthetic and real datasets verify advantages of our framework, outperforming related methods in terms of both sampling efficiency and sample quality.
【Keywords】:
【Paper Link】 【Pages】:5901-5908
【Authors】: Hao Zheng ; Lin Yang ; Jianxu Chen ; Jun Han ; Yizhe Zhang ; Peixian Liang ; Zhuo Zhao ; Chaoli Wang ; Danny Z. Chen
【Abstract】: Deep learning has been applied successfully to many biomedical image segmentation tasks. However, due to the diversity and complexity of biomedical image data, manual annotation for training common deep learning models is very timeconsuming and labor-intensive, especially because normally only biomedical experts can annotate image data well. Human experts are often involved in a long and iterative process of annotation, as in active learning type annotation schemes. In this paper, we propose representative annotation (RA), a new deep learning framework for reducing annotation effort in biomedical image segmentation. RA uses unsupervised networks for feature extraction and selects representative image patches for annotation in the latent space of learned feature descriptors, which implicitly characterizes the underlying data while minimizing redundancy. A fully convolutional network (FCN) is then trained using the annotated selected image patches for image segmentation. Our RA scheme offers three compelling advantages: (1) It leverages the ability of deep neural networks to learn better representations of image data; (2) it performs one-shot selection for manual annotation and frees annotators from the iterative process of common active learning based annotation schemes; (3) it can be deployed to 3D images with simple extensions. We evaluate our RA approach using three datasets (two 2D and one 3D) and show our framework yields competitive segmentation results comparing with state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:5909-5916
【Authors】: Hao Zheng ; Yizhe Zhang ; Lin Yang ; Peixian Liang ; Zhuo Zhao ; Chaoli Wang ; Danny Z. Chen
【Abstract】: 3D image segmentation plays an important role in biomedical image analysis. Many 2D and 3D deep learning models have achieved state-of-the-art segmentation performance on 3D biomedical image datasets. Yet, 2D and 3D models have their own strengths and weaknesses, and by unifying them together, one may be able to achieve more accurate results. In this paper, we propose a new ensemble learning framework for 3D biomedical image segmentation that combines the merits of 2D and 3D models. First, we develop a fully convolutional network based meta-learner to learn how to improve the results from 2D and 3D models (base-learners). Then, to minimize over-fitting for our sophisticated meta-learner, we devise a new training method that uses the results of the baselearners as multiple versions of “ground truths”. Furthermore, since our new meta-learner training scheme does not depend on manual annotation, it can utilize abundant unlabeled 3D image data to further improve the model. Extensive experiments on two public datasets (the HVSMR 2016 Challenge dataset and the mouse piriform cortex dataset) show that our approach is effective under fully-supervised, semisupervised, and transductive settings, and attains superior performance over state-of-the-art image segmentation methods.
【Keywords】:
【Paper Link】 【Pages】:5917-5924
【Authors】: Huangjie Zheng ; Jiangchao Yao ; Ya Zhang ; Ivor W. Tsang ; Jia Wang
【Abstract】: In information theory, Fisher information and Shannon information (entropy) are respectively used to quantify the uncertainty associated with the distribution modeling and the uncertainty in specifying the outcome of given variables. These two quantities are complementary and are jointly applied to information behavior analysis in most cases. The uncertainty property in information asserts a fundamental trade-off between Fisher information and Shannon information, which enlightens us the relationship between the encoder and the decoder in variational auto-encoders (VAEs). In this paper, we investigate VAEs in the Fisher-Shannon plane, and demonstrate that the representation learning and the log-likelihood estimation are intrinsically related to these two information quantities. Through extensive qualitative and quantitative experiments, we provide with a better comprehension of VAEs in tasks such as high-resolution reconstruction, and representation learning in the perspective of Fisher information and Shannon information. We further propose a variant of VAEs, termed as Fisher auto-encoder (FAE), for practical needs to balance Fisher information and Shannon information. Our experimental results have demonstrated its promise in improving the reconstruction accuracy and avoiding the noninformative latent code as occurred in previous works.
【Keywords】:
【Paper Link】 【Pages】:5925-5932
【Authors】: Shuxin Zheng ; Qi Meng ; Huishuai Zhang ; Wei Chen ; Nenghai Yu ; Tie-Yan Liu
【Abstract】: Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. It has been shown that the generalization error bound in terms of the path norm explains the empirical generalization behaviors of the ReLU neural networks better than that of other capacity measures. Moreover, optimization algorithms which take path norm as the regularization term to the loss function, like Path-SGD, have been shown to achieve better generalization performance. However, the path norm counts the values of all paths, and hence the capacity measure based on path norm could be improperly influenced by the dependency among different paths. It is also known that each path of a ReLU network can be represented by a small group of linearly independent basis paths with multiplication and division operation, which indicates that the generalization behavior of the network only depends on only a few basis paths. Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately. We establish a generalization error bound based on this basis path norm, and show it explains the generalization behaviors of ReLU networks more accurately than previous capacity measures via extensive experiments. In addition, we develop optimization algorithms which minimize the empirical risk regularized by the basis-path norm. Our experiments on benchmark datasets demonstrate that the proposed regularization method achieves clearly better performance on the test set than the previous regularization approaches.
【Keywords】:
【Paper Link】 【Pages】:5933-5940
【Authors】: Zhuobin Zheng ; Chun Yuan ; Xinrui Zhu ; Zhihui Lin ; Yangyang Cheng ; Cheng Shi ; Jiahui Ye
【Abstract】: Learning related tasks in various domains and transferring exploited knowledge to new situations is a significant challenge in Reinforcement Learning (RL). However, most RL algorithms are data inefficient and fail to generalize in complex environments, limiting their adaptability and applicability in multi-task scenarios. In this paper, we propose SelfSupervised Mixture-of-Experts (SUM), an effective algorithm driven by predictive uncertainty estimation for multitask RL. SUM utilizes a multi-head agent with shared parameters as experts to learn a series of related tasks simultaneously by Deep Deterministic Policy Gradient (DDPG). Each expert is extended by predictive uncertainty estimation on known and unknown states to enhance the Q-value evaluation capacity against overfitting and the overall generalization ability. These enable the agent to capture and diffuse the common knowledge across different tasks improving sample efficiency in each task and the effectiveness of expert scheduling across multiple tasks. Instead of task-specific design as common MoEs, a self-supervised gating network is adopted to determine a potential expert to handle each interaction from unseen environments and calibrated completely by the uncertainty feedback from the experts without explicit supervision. To alleviate the imbalanced expert utilization as the crux of MoE, optimization is accomplished via decayedmasked experience replay, which encourages both diversification and specialization of experts during different periods. We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended OpenAI Gym’s MuJoCo multi-task environments.
【Keywords】:
【Paper Link】 【Pages】:5941-5948
【Authors】: Guorui Zhou ; Na Mou ; Ying Fan ; Qi Pi ; Weijie Bian ; Chang Zhou ; Xiaoqiang Zhu ; Kun Gai
【Abstract】: Click-through rate (CTR) prediction, whose goal is to estimate the probability of a user clicking on the item, has become one of the core tasks in the advertising system. For CTR prediction model, it is necessary to capture the latent user interest behind the user behavior data. Besides, considering the changing of the external environment and the internal cognition, user interest evolves over time dynamically. There are several CTR prediction methods for interest modeling, while most of them regard the representation of behavior as the interest directly, and lack specially modeling for latent interest behind the concrete behavior. Moreover, little work considers the changing trend of the interest. In this paper, we propose a novel model, named Deep Interest Evolution Network (DIEN), for CTR prediction. Specifically, we design interest extractor layer to capture temporal interests from history behavior sequence. At this layer, we introduce an auxiliary loss to supervise interest extracting at each step. As user interests are diverse, especially in the e-commerce system, we propose interest evolving layer to capture interest evolving process that is relative to the target item. At interest evolving layer, attention mechanism is embedded into the sequential structure novelly, and the effects of relative interests are strengthened during interest evolution. In the experiments on both public and industrial datasets, DIEN significantly outperforms the state-of-the-art solutions. Notably, DIEN has been deployed in the display advertisement system of Taobao, and obtained 20.7% improvement on CTR.
【Keywords】:
【Paper Link】 【Pages】:5949-5956
【Authors】: Xichuan Zhou ; Lang Xu ; Shujun Liu ; Yingcheng Lin ; Lei Zhang ; Cheng Zhuo
【Abstract】: This paper addresses the challenge of designing efficient framework for real-time object detection and image compression. The proposed Compressive Convolutional Network (CCN) is basically a compressive-sensing-enabled convolutional neural network. Instead of designing different components for compressive sensing and object detection, the CCN optimizes and reuses the convolution operation for recoverable data embedding and image compression. Technically, the incoherence condition, which is the sufficient condition for recoverable data embedding, is incorporated in the first convolutional layer of the CCN model as regularization; Therefore, the CCN convolution kernels learned by training over the VOC and COCO image set can be used for data embedding and image compression. By reusing the convolution operation, no extra computational overhead is required for image compression. As a result, the CCN is 3.1 to 5.0 fold more efficient than the conventional approaches. In our experiments, the CCN achieved 78.1 mAP for object detection and 3.0 dB to 5.2 dB higher PSNR for image compression than the examined compressive sensing approaches.
【Keywords】:
【Paper Link】 【Pages】:5957-5964
【Authors】: Chun Jiang Zhu ; Tan Zhu ; Kam-yiu Lam ; Song Han ; Jinbo Bi
【Abstract】: We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose communication-efficient algorithms for two well-established communication models namely the message passing and the blackboard models. Given a graph with n nodes that is observed at s remote sites over time [1,t], the two proposed algorithms have communication costs Õ(ns) and Õ(n + s) (Õ hides a polylogarithmic factor), almost matching their lower bounds, Ω(ns) and Ω(n + s), respectively, in the message passing and the blackboard models. More importantly, we prove that at each time point in [1,t] our algorithms generate clustering quality nearly as good as that of centralizing all updates up to that time and then applying a standard centralized clustering algorithm. We conducted extensive experiments on both synthetic and real-life datasets which confirmed the communication efficiency of our approach over baseline algorithms while achieving comparable clustering results.
【Keywords】:
【Paper Link】 【Pages】:5965-5972
【Authors】: Lin Zhu ; Yihong Chen ; Bowen He
【Abstract】: As one of the most popular techniques for solving the ranking problem in information retrieval, Learning-to-rank (LETOR) has received a lot of attention both in academia and industry due to its importance in a wide variety of data mining applications. However, most of existing LETOR approaches choose to learn a single global ranking function to handle all queries, and ignore the substantial differences that exist between queries. In this paper, we propose a domain generalization strategy to tackle this problem. We propose QueryInvariant Listwise Context Modeling (QILCM), a novel neural architecture which eliminates the detrimental influence of inter-query variability by learning query-invariant latent representations, such that the ranking system could generalize better to unseen queries. We evaluate our techniques on benchmark datasets, demonstrating that QILCM outperforms previous state-of-the-art approaches by a substantial margin.
【Keywords】:
【Paper Link】 【Pages】:5973-5980
【Authors】: Qiannan Zhu ; Xiaofei Zhou ; Zeliang Song ; Jianlong Tan ; Li Guo
【Abstract】: With the rapid information explosion of news, making personalized news recommendation for users becomes an increasingly challenging problem. Many existing recommendation methods that regard the recommendation procedure as the static process, have achieved better recommendation performance. However, they usually fail with the dynamic diversity of news and user’s interests, or ignore the importance of sequential information of user’s clicking selection. In this paper, taking full advantages of convolution neural network (CNN), recurrent neural network (RNN) and attention mechanism, we propose a deep attention neural network DAN for news recommendation. Our DAN model presents to use attention-based parallel CNN for aggregating user’s interest features and attention-based RNN for capturing richer hidden sequential features of user’s clicks, and combines these features for new recommendation. We conduct experiment on real-world news data sets, and the experimental results demonstrate the superiority and effectiveness of our proposed DAN model.
【Keywords】:
【Paper Link】 【Pages】:5981-5988
【Authors】: Xiaobin Zhu ; Zhuangzi Li ; Xiaoyu Zhang ; Changsheng Li ; Yaqi Liu ; Ziyu Xue
【Abstract】: Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.
【Keywords】:
【Paper Link】 【Pages】:5989-5996
【Authors】: Yongchun Zhu ; Fuzhen Zhuang ; Deqing Wang
【Abstract】: While Unsupervised Domain Adaptation (UDA) algorithms, i.e., there are only labeled data from source domains, have been actively studied in recent years, most algorithms and theoretical results focus on Single-source Unsupervised Domain Adaptation (SUDA). However, in the practical scenario, labeled data can be typically collected from multiple diverse sources, and they might be different not only from the target domain but also from each other. Thus, domain adapters from multiple sources should not be modeled in the same way. Recent deep learning based Multi-source Unsupervised Domain Adaptation (MUDA) algorithms focus on extracting common domain-invariant representations for all domains by aligning distribution of all pairs of source and target domains in a common feature space. However, it is often very hard to extract the same domain-invariant representations for all domains in MUDA. In addition, these methods match distributions without considering domain-specific decision boundaries between classes. To solve these problems, we propose a new framework with two alignment stages for MUDA which not only respectively aligns the distributions of each pair of source and target domains in multiple specific feature spaces, but also aligns the outputs of classifiers by utilizing the domainspecific decision boundaries. Extensive experiments demonstrate that our method can achieve remarkable results on popular benchmark datasets for image classification.
【Keywords】:
【Paper Link】 【Pages】:5997-6004
【Authors】: Han Zou ; Yuxun Zhou ; Jianfei Yang ; Huihan Liu ; Hari Prasanna Das ; Costas J. Spanos
【Abstract】: We propose a novel domain adaptation framework, namely Consensus Adversarial Domain Adaptation (CADA), that gives freedom to both target encoder and source encoder to embed data from both domains into a common domaininvariant feature space until they achieve consensus during adversarial learning. In this manner, the domain discrepancy can be further minimized in the embedded space, yielding more generalizable representations. The framework is also extended to establish a new few-shot domain adaptation scheme (F-CADA), that remarkably enhances the ADA performance by efficiently propagating a few labeled data once available in the target domain. Extensive experiments are conducted on the task of digit recognition across multiple benchmark datasets and a real-world problem involving WiFi-enabled device-free gesture recognition under spatial dynamics. The results show the compelling performance of CADA versus the state-of-the-art unsupervised domain adaptation (UDA) and supervised domain adaptation (SDA) methods. Numerical experiments also demonstrate that F-CADA can significantly improve the adaptation performance even with sparsely labeled data in the target domain.
【Keywords】:
【Paper Link】 【Pages】:6006-6013
【Authors】: Michael E. Akintunde ; Andreea Kevorchian ; Alessio Lomuscio ; Edoardo Pirovano
【Abstract】: We introduce agent-environment systems where the agent is stateful and executing a ReLU recurrent neural network. We define and study their verification problem by providing equivalences of recurrent and feed-forward neural networks on bounded execution traces. We give a sound and complete procedure for their verification against properties specified in a simplified version of LTL on bounded executions. We present an implementation and discuss the experimental results obtained.
【Keywords】:
【Paper Link】 【Pages】:6014-6021
【Authors】: Christopher Archibald ; Delma Nieves-Rivera
【Abstract】: The performance of agents in many domains with continuous action spaces depends not only on their ability to select good actions to execute, but also on their ability to execute planned actions precisely. This ability, which has been called an agent’s execution skill, is an important characteristic of an agent which can have a significant impact on their success. In this paper, we address the problem of estimating the execution skill of an agent given observations of that agent acting in a domain. Each observation includes the executed action and a description of the state in which the action was executed and the reward received, but notably excludes the action that the agent intended to execute. We previously introduced this problem and demonstrated that estimating an agent’s execution skill is possible under certain conditions. Our previous method focused entirely on the reward that the agent received from executed actions and assumed that the agent was able to select the optimal action for each state. This paper addresses the execution skill estimation problem from an entirely different perspective, focusing instead on the action that was executed. We present a Bayesian framework for reasoning about action observations and show that it is able to outperform previous methods under the same conditions. We also show that the flexibility of this framework allows it to be applied in settings where the previous limiting assumptions are not met. The success of the proposed method is demonstrated experimentally in a toy domain as well as the domain of computational billiards.
【Keywords】:
【Paper Link】 【Pages】:6022-6029
【Authors】: Vincenzo Auletta ; Angelo Fanelli ; Diodato Ferraioli
【Abstract】: Friedkin and Johnsen (1990) modeled opinion formation in social networks as a dynamic process which evolves in rounds: at each round each agent updates her expressed opinion to a weighted average of her innate belief and the opinions expressed in the previous round by her social neighbors. The stubbornness level of an agent represents the tendency of the agent to express an opinion close to her innate belief. Motivated by the observation that innate beliefs, stubbornness levels and even social relations can co-evolve together with the expressed opinions, we present a new model of opinion formation where the dynamics runs in a co-evolving environment. We assume that agents’ stubbornness and social relations can vary arbitrarily, while their innate beliefs slowly change as a function of the opinions they expressed in the past. We prove that, in our model, the opinion formation dynamics converges to a consensus if reasonable conditions on the structure of the social relationships and on how the personal beliefs can change are satisfied. Moreover, we discuss how this result applies in several simpler (but realistic) settings.
【Keywords】:
【Paper Link】 【Pages】:6030-6037
【Authors】: Francesco Belardinelli ; Alessio Lomuscio ; Vadim Malvone
【Abstract】: We investigate the verification of Multi-agent Systems against strategic properties expressed in Alternating-time Temporal Logic under the assumptions of imperfect information and perfect recall. To this end, we develop a three-valued semantics for concurrent game structures upon which we define an abstraction method. We prove that concurrent game structures with imperfect information admit perfect information abstractions that preserve three-valued satisfaction. Further, we present a refinement procedure to deal with cases where the value of a specification is undefined. We illustrate the overall procedure in a variant of the Train Gate Controller scenario under imperfect information and perfect recall.
【Keywords】:
【Paper Link】 【Pages】:6038-6045
【Authors】: Ziyu Chen ; Xingqiong Jiang ; Yanchen Deng ; Dingding Chen ; Zhongshi He
【Abstract】: Belief propagation approaches, such as Max-Sum and its variants, are important methods to solve large-scale Distributed Constraint Optimization Problems (DCOPs). However, for problems with n-ary constraints, these algorithms face a huge challenge since their computational complexity scales exponentially with the number of variables a function holds. In this paper, we present a generic and easy-touse method based on a branch-and-bound technique to solve the issue, called Function Decomposing and State Pruning (FDSP). We theoretically prove that FDSP can provide monotonically non-increasing upper bounds and speed up belief propagation based incomplete DCOP algorithms without an effect on solution quality. Also, our empirically evaluation indicates that FDSP can reduce 97% of the search space at least and effectively accelerate Max-Sum, compared with the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:6046-6053
【Authors】: Emilio Cruciani ; Emanuele Natale ; Giacomo Scornavacca
【Abstract】: We investigate the behavior of a simple majority dynamics on networks of agents whose interaction topology exhibits a community structure. By leveraging recent advancements in the analysis of dynamics, we prove that, when the states of the nodes are randomly initialized, the system rapidly and stably converges to a configuration in which the communities maintain internal consensus on different states. This is the first analytical result on the behavior of dynamics for nonconsensus problems on non-complete topologies, based on the first symmetry-breaking analysis in such setting.Our result has several implications in different contexts in which dynamics are adopted for computational and biological modeling purposes. In the context of Label Propagation Algorithms, a class of widely used heuristics for community detection, it represents the first theoretical result on the behavior of a distributed label propagation algorithm with quasi-linear message complexity. In the context of evolutionary biology, dynamics such as the Moran process have been used to model the spread of mutations in genetic populations (Lieberman, Hauert, and Nowak 2005); our result shows that, when the probability of adoption of a given mutation by a node of the evolutionary graph depends super-linearly on the frequency of the mutation in the neighborhood of the node and the underlying evolutionary graph exhibits a community structure, there is a non-negligible probability for species differentiation to occur.
【Keywords】:
【Paper Link】 【Pages】:6054-6061
【Authors】: Tarun Gupta ; Akshat Kumar ; Praveen Paruchuri
【Abstract】: Decentralized MDPs (Dec-MDPs) provide a rigorous framework for collaborative multi-agent sequential decisionmaking under uncertainty. However, their computational complexity limits the practical impact. To address this, we focus on a class of Dec-MDPs consisting of independent collaborating agents that are tied together through a global reward function that depends upon their entire histories of states and actions to accomplish joint tasks. To overcome scalability barrier, our main contributions are: (a) We propose a new actor-critic based Reinforcement Learning (RL) approach for event-based Dec-MDPs using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards; (b) We then present Dec-ESR (Decentralized Event based Successor Representation) which generalizes learning for event-based Dec-MDPs using SF within an end-to-end deep RL framework; (c) We also show that Dec-ESR allows useful transfer of information on related but different tasks, hence bootstraps the learning for faster convergence on new tasks; (d) For validation purposes, we test our approach on a large multi-agent coverage problem which models schedule coordination of agents in a real urban subway network and achieves better quality solutions than previous best approaches.
【Keywords】:
【Paper Link】 【Pages】:6062-6069
【Authors】: Yanlin Han ; Piotr J. Gmytrasiewicz
【Abstract】: This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.
【Keywords】:
【Paper Link】 【Pages】:6070-6078
【Authors】: Zehong Hu ; Jie Zhang ; Zhao Li
【Abstract】: Incentive mechanisms that assume agents to be fully rational, may fail due to the bounded rationality of agents in practice. It is thus crucial to evaluate to what extent mechanisms can resist agents’ bounded rationality, termed robustness. In this paper, we propose a general empirical framework for robustness evaluation. One novelty of our framework is to develop a robustness formulation that is generally applicable to different types of incentive mechanisms and bounded rationality models. This formulation considers not only the incentives to agents but also the performance of mechanisms. The other novelty lies in converting the empirical robustness computation into a continuum-armed bandit problem, and then developing an efficient solver that has theoretically guaranteed error rate upper bound. We also conduct extensive experiments using various mechanisms to verify the advantages and practicability of our robustness evaluation framework.
【Keywords】:
【Paper Link】 【Pages】:6079-6086
【Authors】: Woojun Kim ; Myungsik Cho ; Youngchul Sung
【Abstract】: In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the messagedropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed messagedropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.
【Keywords】:
【Paper Link】 【Pages】:6087-6095
【Authors】: Jiaoyang Li ; Daniel Harabor ; Peter J. Stuckey ; Hang Ma ; Sven Koenig
【Abstract】: We describe a new way of reasoning about symmetric collisions for Multi-Agent Path Finding (MAPF) on 4-neighbor grids. We also introduce a symmetry-breaking constraint to resolve these conflicts. This specialized technique allows us to identify and eliminate, in a single step, all permutations of two currently assigned but incompatible paths. Each such permutation has exactly the same cost as a current path, and each one results in a new collision between the same two agents. We show that the addition of symmetry-breaking techniques can lead to an exponential reduction in the size of the search space of CBS, a popular framework for MAPF, and report significant improvements in both runtime and success rate versus CBSH and EPEA* – two recent and state-of-the-art MAPF algorithms.
【Keywords】:
【Paper Link】 【Pages】:6096-6103
【Authors】: Xu Li ; Mingming Sun ; Ping Li
【Abstract】: We introduce the discussion mechanism into the multiagent communicating encoder-decoder architecture for Natural Language Generation (NLG) tasks and prove that by applying the discussion mechanism, the communication between agents becomes more effective. Generally speaking, an encoder-decoder architecture predicts target-sequence word by word in several time steps. At each time step of prediction, agents with the discussion mechanism predict the target word after several discussion steps. In the first step of discussion, agents make their choice independently and express their decision to other agents. In the next discussion step, agents collect other agents’ decision to update their own decisions, then express the updated decisions to others again. After several iterations, the agents make their final decision based on a well-communicated situation. The benefit of the discussion mechanism is that multiple encoders can be designed as different structures to fit the specified input or to fetch different representations of inputs.We train and evaluate the discussion mechanism on Table to Text Generation, Text Summarization and Image Caption tasks, respectively. Our empirical results demonstrate that the proposed multi-agent discussion mechanism is helpful for maximizing the utility of the communication between agents.
【Keywords】:
【Paper Link】 【Pages】:6104-6111
【Authors】: Chun Kai Ling ; Fei Fang ; J. Zico Kolter
【Abstract】: With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions. Although a recent work provides a powerful differentiable end-to-end learning frameworks which embed a game solver within a deep-learning framework, allowing unknown game parameters to be learned via backpropagation, this framework faces significant limitations when applied to boundedly rational human agents and large scale problems, leading to poor practicality. In this paper, we address these limitations and propose a framework that is applicable for more practical settings. First, seeking to learn the rationality of human agents in complex two-player zero-sum games, we draw upon well-known ideas in decision theory to obtain a concise and interpretable agent behavior model, and derive solvers and gradients for end-to-end learning. Second, to scale up to large, real-world scenarios, we propose an efficient first-order primal-dual method which exploits the structure of extensive-form games, yielding significantly faster computation for both game solving and gradient computation. When tested on randomly generated games, we report speedups of orders of magnitude over previous approaches. We also demonstrate the effectiveness of our model on both real-world one-player settings and synthetic data.
【Keywords】:
【Paper Link】 【Pages】:6112-6119
【Authors】: Andrei Lupu ; Audrey Durand ; Doina Precup
【Abstract】: Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allowing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.
【Keywords】:
【Paper Link】 【Pages】:6120-6127
【Authors】: Yuexin Ma ; Xinge Zhu ; Sibo Zhang ; Ruigang Yang ; Wenping Wang ; Dinesh Manocha
【Abstract】: To safely and efficiently navigate in complex urban traffic, autonomous vehicles must make responsible predictions in relation to surrounding traffic-agents (vehicles, bicycles, pedestrians, etc.). A challenging and critical task is to explore the movement patterns of different traffic-agents and predict their future trajectories accurately to help the autonomous vehicle make reasonable navigation decision. To solve this problem, we propose a long short-term memory-based (LSTM-based) realtime traffic prediction algorithm, TrafficPredict. Our approach uses an instance layer to learn instances’ movements and interactions and has a category layer to learn the similarities of instances belonging to the same type to refine the prediction. In order to evaluate its performance, we collected trajectory datasets in a large city consisting of varying conditions and traffic densities. The dataset includes many challenging scenarios where vehicles, bicycles, and pedestrians move among one another. We evaluate the performance of TrafficPredict on our new dataset and highlight its higher accuracy for trajectory prediction by comparing with prior prediction methods.
【Keywords】:
【Paper Link】 【Pages】:6128-6136
【Authors】: Shayegan Omidshafiei ; Dong-Ki Kim ; Miao Liu ; Gerald Tesauro ; Matthew Riemer ; Christopher Amato ; Murray Campbell ; Jonathan P. How
【Abstract】: Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.
【Keywords】:
【Paper Link】 【Pages】:6137-6145
【Authors】: Ramya Ramakrishnan ; Ece Kamar ; Besmira Nushi ; Debadeepta Dey ; Julie Shah ; Eric Horvitz
【Abstract】: Simulators are being increasingly used to train agents before deploying them in real-world environments. While training in simulation provides a cost-effective way to learn, poorly modeled aspects of the simulator can lead to costly mistakes, or blind spots. While humans can help guide an agent towards identifying these error regions, humans themselves have blind spots and noise in execution. We study how learning about blind spots of both can be used to manage hand-off decisions when humans and agents jointly act in the real-world in which neither of them are trained or evaluated fully. The formulation assumes that agent blind spots result from representational limitations in the simulation world, which leads the agent to ignore important features that are relevant for acting in the open world. Our approach for blind spot discovery combines experiences collected in simulation with limited human demonstrations. The first step applies imitation learning to demonstration data to identify important features that the human is using but that the agent is missing. The second step uses noisy labels extracted from action mismatches between the agent and the human across simulation and demonstration data to train blind spot models. We show through experiments on two domains that our approach is able to learn a succinct representation that accurately captures blind spot regions and avoids dangerous errors in the real world through transfer of control between the agent and the human.
【Keywords】:
【Paper Link】 【Pages】:6146-6153
【Authors】: Fernando P. Santos ; Jorge M. Pacheco ; Ana Paiva ; Francisco C. Santos
【Abstract】: Fairness plays a fundamental role in decision-making, which is evidenced by the high incidence of human behaviors that result in egalitarian outcomes. This is often shown in the context of dyadic interactions, resorting to the Ultimatum Game. The peculiarities of group interactions – and the corresponding effect in eliciting fair actions – remain, however, astray. Focusing on groups suggests several questions related with the effect of group size, group decision rules and the interrelation of human and agents’ behaviors in hybrid groups. To address these topics, here we test a Multiplayer version of the Ultimatum Game (MUG): proposals are made to groups of Responders that, collectively, accept or reject them. Firstly, we run an online experiment to evaluate how humans react to different group decision rules. We observe that people become increasingly fair if groups adopt stricter decision rules, i.e., if more individuals are required to accept a proposal for it to be accepted by the group. Secondly, we propose a new analytical model to shed light on how such behaviors may have evolved. Thirdly, we adapt our model to include agents with fixed behaviors. We show that including hardcoded Pro-social agents favors the evolutionary stability of fair states, even for soft group decision rules. This suggests that judiciously introducing agents with particular behaviors in a population may leverage long-term social benefits.
【Keywords】:
【Paper Link】 【Pages】:6154-6162
【Authors】: Wen Shen ; Yang Feng ; Cristina V. Lopes
【Abstract】: Strategic diffusion encourages participants to take active roles in promoting stakeholders’ agendas by rewarding successful referrals. As social media continues to transform the way people communicate, strategic diffusion has become a powerful tool for stakeholders to influence people’s decisions or behaviors for desired objectives. Existing reward mechanisms for strategic diffusion are usually either vulnerable to falsename attacks or not individually rational for participants that have made successful referrals. Here, we introduce a novel multi-winner contests (MWC) mechanism for strategic diffusion in social networks. The MWC mechanism satisfies several desirable properties, including false-name-proofness, individual rationality, budget constraint, monotonicity, and subgraph constraint. Numerical experiments on four real-world social network datasets demonstrate that stakeholders can significantly boost participants’ aggregated efforts with proper design of competitions. Our work sheds light on how to design manipulation-resistant mechanisms with appropriate contests.
【Keywords】:
【Paper Link】 【Pages】:6163-6170
【Authors】: Michael Shum ; Max Kleiman-Weiner ; Michael L. Littman ; Joshua B. Tenenbaum
【Abstract】: Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about the latent relationships that underlie behavior from just sparse and noisy observations. Rapid and accurate inferences are important for determining who to cooperate with, who to compete with, and how to cooperate in order to compete. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multiagent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH). This representation is grounded in the formalism of stochastic games and multi-agent reinforcement learning. We use CTH as a target for Bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together. Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations. The patterns of inference made by this algorithm closely correspond with human judgments and the algorithm makes the same rapid generalizations that people do.
【Keywords】:
【Paper Link】 【Pages】:6171-6178
【Authors】: Arambam James Singh ; Duc Thien Nguyen ; Akshat Kumar ; Hoong Chuin Lau
【Abstract】: We address the problem of maritime traffic management in busy waterways to increase the safety of navigation by reducing congestion. We model maritime traffic as a large multiagent systems with individual vessels as agents, and VTS authority as the regulatory agent. We develop a maritime traffic simulator based on historical traffic data that incorporates realistic domain constraints such as uncertain and asynchronous movement of vessels. We also develop a traffic coordination approach that provides speed recommendation to vessels in different zones. We exploit the nature of collective interactions among agents to develop a scalable policy gradient approach that can scale up to real world problems. Empirical results on synthetic and real world problems show that our approach can significantly reduce congestion while keeping the traffic throughput high.
【Keywords】:
【Paper Link】 【Pages】:6179-6186
【Authors】: Fu Song ; Yedi Zhang ; Taolue Chen ; Yu Tang ; Zhiwu Xu
【Abstract】: Reasoning about strategic abilities is key to an AI system consisting of multiple agents with random behaviors. We propose a probabilistic extension of Alternating µ-Calculus (AMC), named PAMC, for reasoning about strategic abilities of agents in stochastic multi-agent systems. PAMC subsumes existing logics AMC and PµTL. The usefulness of PAMC is exemplified by applications in genetic regulatory networks. We show that, for PAMC, the model checking problem is in UP∩co-UP, and the satisfiability problem is EXPTIME-complete, both of which are the same as those for AMC. Moreover, PAMC admits the small model property. We implement the satisfiability checking procedure in a tool PAMCSolver.
【Keywords】:
【Paper Link】 【Pages】:6188-6195
【Authors】: Mohamed Abdalla ; Magnus Sahlgren ; Graeme Hirst
【Abstract】: We propose a novel method for enriching word-embeddings without the need of a labeled corpus. Instead, we show that relying on a regressor – trained with a small lexicon to predict pseudo-labels – significantly improves performance over current techniques that rely on human-derived sentence-level labels for an entire corpora. Our approach enables enrichment for corpora that have no labels (such as Wikipedia). Exploring the utility of this general approach in both sentiment and non-sentiment-focused tasks, we show how enriching embeddings, for both Twitter and Wikipedia-based embeddings, provide notable improvements in performance for: binary sentiment classification, SemEval Tasks, embedding analogy task, and, document classification. Importantly, our approach is notably better and more generalizable than other state-of-the-art approaches for enriching both labeled and unlabeled corpora.
【Keywords】:
【Paper Link】 【Pages】:6196-6203
【Authors】: Anish Acharya ; Rahul Goel ; Angeliki Metallinou ; Inderjit S. Dhillon
【Abstract】: Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training, to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.
【Keywords】:
【Paper Link】 【Pages】:6204-6211
【Authors】: Muhammad Asif Ali ; Yifang Sun ; Xiaoling Zhou ; Wei Wang ; Xiang Zhao
【Abstract】: Distinguishing antonyms from synonyms is a key challenge for many NLP applications focused on the lexical-semantic relation extraction. Existing solutions relying on large-scale corpora yield low performance because of huge contextual overlap of antonym and synonym pairs. We propose a novel approach entirely based on pre-trained embeddings. We hypothesize that the pre-trained embeddings comprehend a blend of lexical-semantic information and we may distill the task-specific information using Distiller, a model proposed in this paper. Later, a classifier is trained based on features constructed from the distilled sub-spaces along with some word level features to distinguish antonyms from synonyms. Experimental results show that the proposed model outperforms existing research on antonym synonym distinction in both speed and performance.
【Keywords】:
【Paper Link】 【Pages】:6212-6219
【Authors】: Reinald Kim Amplayo ; Seung-won Hwang ; Min Song
【Abstract】: Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the stateof-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://github.com/rktamplayo/AutoSense.
【Keywords】:
【Paper Link】 【Pages】:6220-6227
【Authors】: Ananya B. Sai ; Mithun Das Gupta ; Mitesh M. Khapra ; Mukundhan Srinivasan
【Abstract】: Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. ADEM (Lowe et al. 2017) formulated the automatic evaluation of dialogue systems as a learning problem and showed that such a model was able to predict responses which correlate significantly with human judgements, both at utterance and system level. Their system was shown to have beaten word-overlap metrics such as BLEU with large margins. We start with the question of whether an adversary can game the ADEM model. We design a battery of targeted attacks at the neural network based ADEM evaluation system and show that automatic evaluation of dialogue systems still has a long way to go. ADEM can get confused with a variation as simple as reversing the word order in the text! We report experiments on several such adversarial scenarios that draw out counterintuitive scores on the dialogue responses. We take a systematic look at the scoring function proposed by ADEM and connect it to linear system theory to predict the shortcomings evident in the system. We also devise an attack that can fool such a system to rate a response generation system as favorable. Finally, we allude to future research directions of using the adversarial attacks to design a truly automated dialogue evaluation system.
【Keywords】:
【Paper Link】 【Pages】:6228-6235
【Authors】: Zied Bouraoui ; Steven Schockaert
【Abstract】: Considerable attention has recently been devoted to the problem of automatically extending knowledge bases by applying some form of inductive reasoning. While the vast majority of existing work is centred around so-called knowledge graphs, in this paper we consider a setting where the input consists of a set of (existential) rules. To this end, we exploit a vector space representation of the considered concepts, which is partly induced from the rule base itself and partly from a pre-trained word embedding. Inspired by recent approaches to concept induction, we then model rule templates in this vector space embedding using Gaussian distributions. Unlike many existing approaches, we learn rules by directly exploiting regularities in the given rule base, and do not require that a database with concept and relation instances is given. As a result, our method can be applied to a wide variety of ontologies. We present experimental results that demonstrate the effectiveness of our method.
【Keywords】:
【Paper Link】 【Pages】:6236-6243
【Authors】: Hui Chen ; Zijia Lin ; Guiguang Ding ; Jianguang Lou ; Yusen Zhang ; Börje Karlsson
【Abstract】: The dominant approaches for named entity recognitionm (NER) mostly adopt complex recurrent neural networks (RNN), e.g., long-short-term-memory (LSTM). However, RNNs are limited by their recurrent nature in terms of computational efficiency. In contrast, convolutional neural networks (CNN) can fully exploit the GPU parallelism with their feedforward architectures. However, little attention has been paid to performing NER with CNNs, mainly owing to their difficulties in capturing the long-term context information in a sequence. In this paper, we propose a simple but effective CNN-based network for NER, i.e., gated relation network (GRN), which is more capable than common CNNs in capturing long-term context. Specifically, in GRN we firstly employ CNNs to explore the local context features of each word. Then we model the relations between words and use them as gates to fuse local context features into global ones for predicting labels. Without using recurrent layers that process a sentence in a sequential manner, our GRN allows computations to be performed in parallel across the entire sentence. Experiments on two benchmark NER datasets (i.e., CoNLL2003 and Ontonotes 5.0) show that, our proposed GRN can achieve state-of-the-art performance with or without external knowledge. It also enjoys lower time costs to train and test.
【Keywords】:
【Paper Link】 【Pages】:6244-6251
【Authors】: Jiaao Chen ; Jianshu Chen ; Zhou Yu
【Abstract】: The ability to select an appropriate story ending is the first step towards perfect narrative comprehension. Story ending prediction requires not only the explicit clues within the context, but also the implicit knowledge (such as commonsense) to construct a reasonable and consistent story. However, most previous approaches do not explicitly use background commonsense knowledge. We present a neural story ending selection model that integrates three types of information: narrative sequence, sentiment evolution and commonsense knowledge. Experiments show that our model outperforms state-ofthe-art approaches on a public dataset, ROCStory Cloze Task (Mostafazadeh et al. 2017), and the performance gain from adding the additional commonsense knowledge is significant.
【Keywords】:
【Paper Link】 【Pages】:6252-6259
【Authors】: Jindong Chen ; Yizhou Hu ; Jingping Liu ; Yanghua Xiao ; Haiyun Jiang
【Abstract】: Short text classification is one of important tasks in Natural Language Processing (NLP). Unlike paragraphs or documents, short texts are more ambiguous since they have not enough contextual information, which poses a great challenge for classification. In this paper, we retrieve knowledge from external knowledge source to enhance the semantic representation of short texts. We take conceptual information as a kind of knowledge and incorporate it into deep neural networks. For the purpose of measuring the importance of knowledge, we introduce attention mechanisms and propose deep Short Text Classification with Knowledge powered Attention (STCKA). We utilize Concept towards Short Text (CST) attention and Concept towards Concept Set (C-CS) attention to acquire the weight of concepts from two aspects. And we classify a short text with the help of conceptual information. Unlike traditional approaches, our model acts like a human being who has intrinsic ability to make decisions based on observation (i.e., training data for machines) and pays more attention to important knowledge. We also conduct extensive experiments on four public datasets for different tasks. The experimental results and case studies show that our model outperforms the state-of-the-art methods, justifying the effectiveness of knowledge powered attention.
【Keywords】:
【Paper Link】 【Pages】:6260-6267
【Authors】: Lingzhen Chen ; Alessandro Moschitti
【Abstract】: In this paper, we propose an approach for transferring the knowledge of a neural model for sequence labeling, learned from the source domain, to a new model trained on a target domain, where new label categories appear. Our transfer learning (TL) techniques enable to adapt the source model using the target data and new categories, without accessing to the source data. Our solution consists in adding new neurons in the output layer of the target model and transferring parameters from the source model, which are then fine-tuned with the target data. Additionally, we propose a neural adapter to learn the difference between the source and the target label distribution, which provides additional important information to the target model. Our experiments on Named Entity Recognition show that (i) the learned knowledge in the source model can be effectively transferred when the target data contains new categories and (ii) our neural adapter further improves such transfer.
【Keywords】:
【Paper Link】 【Pages】:6268-6275
【Authors】: Wang Chen ; Yifan Gao ; Jiani Zhang ; Irwin King ; Michael R. Lyu
【Abstract】: Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP). Most previous methods solve this problem in an extractive manner, while recently, several attempts are made under the generative setting using deep neural networks. However, the state-of-the-art generative methods simply treat the document title and the document main body equally, ignoring the leading role of the title to the overall document. To solve this problem, we introduce a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoderdecoder architecture with two new features: (i) the title is additionally employed as a query-like input, and (ii) a titleguided encoder gathers the relevant information from the title to each word in the document. Experiments on a range of KG datasets demonstrate that our model outperforms the state-of-the-art models with a large margin, especially for documents with either very low or very high title length ratios.
【Keywords】:
【Paper Link】 【Pages】:6276-6283
【Authors】: Zhipeng Chen ; Yiming Cui ; Wentao Ma ; Shijin Wang ; Guoping Hu
【Abstract】: Machine Reading Comprehension (MRC) with multiplechoice questions requires the machine to read given passage and select the correct answer among several candidates. In this paper, we propose a novel approach called Convolutional Spatial Attention (CSA) model which can better handle the MRC with multiple-choice questions. The proposed model could fully extract the mutual information among the passage, question, and the candidates, to form the enriched representations. Furthermore, to merge various attention results, we propose to use convolutional operation to dynamically summarize the attention values within the different size of regions. Experimental results show that the proposed model could give substantial improvements over various state-of- the-art systems on both RACE and SemEval-2018 Task11 datasets.
【Keywords】:
【Paper Link】 【Pages】:6284-6291
【Authors】: Pengxiang Cheng ; Katrin Erk
【Abstract】: Implicit arguments, which cannot be detected solely through syntactic cues, make it harder to extract predicate-argument tuples. We present a new model for implicit argument prediction that draws on reading comprehension, casting the predicate-argument tuple with the missing argument as a query. We also draw on pointer networks and multi-hop computation. Our model shows good performance on an argument cloze task as well as on a nominal implicit argument prediction task.
【Keywords】:
【Paper Link】 【Pages】:6292-6299
【Authors】: Raj Dabre ; Atsushi Fujita
【Abstract】: In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.
【Keywords】:
【Paper Link】 【Pages】:6300-6308
【Authors】: Dai Dai ; Xinyan Xiao ; Yajuan Lyu ; Shan Dou ; Qiaoqiao She ; Haifeng Wang
【Abstract】: Joint entity and relation extraction is to detect entity and relation using a single model. In this paper, we present a novel unified joint extraction model which directly tags entity and relation labels according to a query word position p, i.e., detecting entity at p, and identifying entities at other positions that have relationship with the former. To this end, we first design a tagging scheme to generate n tag sequences for an n-word sentence. Then a position-attention mechanism is introduced to produce different sentence representations for every query position to model these n tag sequences. In this way, our method can simultaneously extract all entities and their type, as well as all overlapping relations. Experiment results show that our framework performances significantly better on extracting overlapping relations as well as detecting long-range relation, and thus we achieve state-of-the-art performance on two public datasets.
【Keywords】:
【Paper Link】 【Pages】:6309-6317
【Authors】: Fahim Dalvi ; Nadir Durrani ; Hassan Sajjad ; Yonatan Belinkov ; Anthony Bau ; James R. Glass
【Abstract】: Despite the remarkable evolution of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. Previous work largely focused on what these models learn at the representation level. We break this analysis down further and study individual dimensions (neurons) in the vector representation learned by end-to-end neural models in NLP tasks. We propose two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation Analysis, an unsupervised method to extract salient neurons w.r.t. the model itself. We evaluate the effectiveness of our techniques by ablating the identified neurons and reevaluating the network’s performance for two tasks: neural machine translation (NMT) and neural language modeling (NLM). We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models? ii) are certain neurons exclusive to some properties and not others? iii) is the information more or less distributed in NMT vs. NLM? and iv) how important are the neurons identified through the linguistic correlation method to the overall task? Our code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019a). This paper is a non-archived version of the paper published at AAAI (Dalvi et al. 2019b).
【Keywords】:
【Paper Link】 【Pages】:6318-6325
【Authors】: Yang Deng ; Yuexiang Xie ; Yaliang Li ; Min Yang ; Nan Du ; Wei Fan ; Kai Lei ; Ying Shen
【Abstract】: Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.
【Keywords】:
【Paper Link】 【Pages】:6326-6334
【Authors】: Valerio Di Carlo ; Federico Bianchi ; Matteo Palmonari
【Abstract】: Temporal word embeddings have been proposed to support the analysis of word meaning shifts during time and to study the evolution of languages. Different approaches have been proposed to generate vector representations of words that embed their meaning during a specific time interval. However, the training process used in these approaches is complex, may be inefficient or it may require large text corpora. As a consequence, these approaches may be difficult to apply in resource-scarce domains or by scientists with limited in-depth knowledge of embedding models. In this paper, we propose a new heuristic to train temporal word embeddings based on the Word2vec model. The heuristic consists in using atemporal vectors as a reference, i.e., as a compass, when training the representations specific to a given time interval. The use of the compass simplifies the training process and makes it more efficient. Experiments conducted using state-of-the-art datasets and methodologies suggest that our approach outperforms or equals comparable approaches while being more robust in terms of the required corpus size.
【Keywords】:
【Paper Link】 【Pages】:6335-6342
【Authors】: Wentao Ding ; Guanji Gao ; Linfeng Shi ; Yuzhong Qu
【Abstract】: Recognizing time expressions is a fundamental and important task in many applications of natural language understanding, such as reading comprehension and question answering. Several newest state-of-the-art approaches have achieved good performance on recognizing time expressions. These approaches are black-boxed or based on heuristic rules, which leads to the difficulty in understanding the temporal information. On the contrary, classic rule-based or semantic parsing approaches can capture rich structural information, but their performances on recognition are not so good. In this paper, we propose a pattern-based approach, called PTime, which automatically generates and selects patterns for recognizing time expressions. In this approach, time expressions in training text are abstracted into type sequences by using fine-grained token types, thus the problem is transformed to select an appropriate subset of the sequential patterns. We use the Extended Budgeted Maximum Coverage (EBMC) model to optimize the pattern selection. The main idea is to maximize the correct token sequences matched by the selected patterns while the number of the mistakes should be limited by an adjustable budget. The interpretability of patterns and the adjustability of permitted number of mistakes make PTime a very promising approach for many applications. Experimental results show that PTime achieves a very competitive performance as compared with existing state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:6343-6350
【Authors】: Zixiang Ding ; Huihui He ; Mengran Zhang ; Rui Xia
【Abstract】: Emotion cause identification aims at identifying the potential causes that lead to a certain emotion expression in text. Several techniques including rule based methods and traditional machine learning methods have been proposed to address this problem based on manually designed rules and features. More recently, some deep learning methods have also been applied to this task, with the attempt to automatically capture the causal relationship of emotion and its causes embodied in the text. In this work, we find that in addition to the content of the text, there are another two kinds of information, namely relative position and global labels, that are also very important for emotion cause identification. To integrate such information, we propose a model based on the neural network architecture to encode the three elements (i.e., text content, relative position and global label), in an unified and end-to-end fashion. We introduce a relative position augmented embedding learning algorithm, and transform the task from an independent prediction problem to a reordered prediction problem, where the dynamic global label information is incorporated. Experimental results on a benchmark emotion cause dataset show that our model achieves new state-ofthe-art performance and performs significantly better than a number of competitive baselines. Further analysis shows the effectiveness of the relative position augmented embedding learning algorithm and the reordered prediction mechanism with dynamic global labels.
【Keywords】:
【Paper Link】 【Pages】:6351-6358
【Authors】: Qianqian Dong ; Feng Wang ; Zhen Yang ; Wei Chen ; Shuang Xu ; Bo Xu
【Abstract】: Transcript disfluency detection (TDD) is an important component of the real-time speech translation system, which arouses more and more interests in recent years. This paper presents our study on adapting neural machine translation (NMT) models for TDD. We propose a general training framework for adapting NMT models to TDD task rapidly. In this framework, the main structure of the model is implemented similar to the NMT model. Additionally, several extended modules and training techniques which are independent of the NMT model are proposed to improve the performance, such as the constrained decoding, denoising autoencoder initialization and a TDD-specific training object. With the proposed training framework, we achieve significant improvement. However, it is too slow in decoding to be practical. To build a feasible and production-ready solution for TDD, we propose a fast non-autoregressive TDD model following the non-autoregressive NMT model emerged recently. Even we do not assume the specific architecture of the NMT model, we build our TDD model on the basis of Transformer, which is the state-of-the-art NMT model. We conduct extensive experiments on the publicly available set, Switchboard, and in-house Chinese set. Experimental results show that the proposed model significantly outperforms previous state-ofthe-art models.
【Keywords】:
【Paper Link】 【Pages】:6359-6366
【Authors】: Cunxiao Du ; Zhaozheng Chen ; Fuli Feng ; Lei Zhu ; Tian Gan ; Liqiang Nie
【Abstract】: Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.
【Keywords】:
【Paper Link】 【Pages】:6367-6374
【Authors】: Kai Fan ; Jiayi Wang ; Bo Li ; Fengming Zhou ; Boxing Chen ; Luo Si
【Abstract】: The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when the golden references are provided. However, in the case of model inference or production deployment, golden references are usually expensively available, such as human annotation with bilingual expertise. In order to address the issue of translation quality estimation (QE) without reference, we propose a general framework for automatic evaluation of the translation output for the QE task in the Conference on Statistical Machine Translation (WMT). We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality estimation. The experimental results show that our approach achieves the state-of-the-art performance in most public available datasets of WMT 2017/2018 QE task.
【Keywords】:
【Paper Link】 【Pages】:6375-6382
【Authors】: Chengzhen Fu ; Yan Zhang
【Abstract】: Query-document semantic interactions are essential for the success of many cloze-style question answering models. Recently, researchers have proposed several attention-based methods to predict the answer by focusing on appropriate subparts of the context document. In this paper, we design a novel module to produce the query-aware context vector, named Multi-Space based Context Fusion (MSCF), with the following considerations: (1) interactions are applied across multiple latent semantic spaces; (2) attention is measured at bit level, not at token level. Moreover, we extend MSCF to the multi-hop architecture. This unified model is called Enhanced Attentive Reader (EA Reader). During the iterative inference process, the reader is equipped with a novel memory update rule and maintains the understanding of documents through read, update and write operations. We conduct extensive experiments on four real-world datasets. Our results demonstrate that EA Reader outperforms state-of-the-art models.
【Keywords】:
【Paper Link】 【Pages】:6383-6390
【Authors】: Jun Gao ; Wei Bi ; Xiaojiang Liu ; Junhui Li ; Shuming Shi
【Abstract】: Neural generative models have become popular and achieved promising performance on short-text conversation tasks. They are generally trained to build a 1-to-1 mapping from the input post to its output response. However, a given post is often associated with multiple replies simultaneously in real applications. Previous research on this task mainly focuses on improving the relevance and informativeness of the top one generated response for each post. Very few works study generating multiple accurate and diverse responses for the same post. In this paper, we propose a novel response generation model, which considers a set of responses jointly and generates multiple diverse responses simultaneously. A reinforcement learning algorithm is designed to solve our model. Experiments on two short-text conversation tasks validate that the multiple responses generated by our model obtain higher quality and larger diversity compared with various state-ofthe-art generative models.
【Keywords】:
【Paper Link】 【Pages】:6391-6398
【Authors】: Lianli Gao ; Pengpeng Zeng ; Jingkuan Song ; Yuan-Fang Li ; Wu Liu ; Tao Mei ; Heng Tao Shen
【Abstract】: To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures of a video as well as text to provide an accurate answer. In this paper, we specifically tackle the problem of video QA by proposing a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given video. First, we infer rich longrange temporal structures in videos using our structured segment component and encode text features. Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text. Finally, the structured two-stream fusion component incorporates different segments of query and video aware context representation and infers the answers. Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., TrameQA and Count tasks. It also outperforms the best competitor (i.e., with two representations) on the Action, Trans., TrameQA tasks by 4.1%, 4.7%, and 5.1%.
【Keywords】:
【Paper Link】 【Pages】:6399-6406
【Authors】: Shen Gao ; Xiuying Chen ; Piji Li ; Zhaochun Ren ; Lidong Bing ; Dongyan Zhao ; Rui Yan
【Abstract】: In neural abstractive summarization field, conventional sequence-to-sequence based models often suffer from summarizing the wrong aspect of the document with respect to the main aspect. To tackle this problem, we propose the task of reader-aware abstractive summary generation, which utilizes the reader comments to help the model produce better summary about the main aspect. Unlike traditional abstractive summarization task, reader-aware summarization confronts two main challenges: (1) Comments are informal and noisy; (2) jointly modeling the news document and the reader comments is challenging. To tackle the above challenges, we design an adversarial learning model named reader-aware summary generator (RASG), which consists of four components: (1) a sequence-to-sequence based summary generator; (2) a reader attention module capturing the reader focused aspects; (3) a supervisor modeling the semantic gap between the generated summary and reader focused aspects; (4) a goal tracker producing the goal for each generation step. The supervisor and the goal tacker are used to guide the training of our framework in an adversarial manner. Extensive experiments are conducted on our large-scale real-world text summarization dataset, and the results show that RASG achieves the stateof-the-art performance in terms of both automatic metrics and human evaluations. The experimental results also demonstrate the effectiveness of each module in our framework. We release our large-scale dataset for further research1.
【Keywords】:
【Paper Link】 【Pages】:6407-6414
【Authors】: Tianyu Gao ; Xu Han ; Zhiyuan Liu ; Maosong Sun
【Abstract】: The existing methods for relation classification (RC) primarily rely on distant supervision (DS) because large-scale supervised training datasets are not readily available. Although DS automatically annotates adequate amounts of data for model training, the coverage of this data is still quite limited, and meanwhile many long-tail relations still suffer from data sparsity. Intuitively, people can grasp new knowledge by learning few instances. We thus provide a different view on RC by formalizing RC as a few-shot learning (FSL) problem. However, the current FSL models mainly focus on low-noise vision tasks, which makes them hard to directly deal with the diversity and noise of text. In this paper, we propose hybrid attention-based prototypical networks for the problem of noisy few-shot RC. We design instancelevel and feature-level attention schemes based on prototypical networks to highlight the crucial instances and features respectively, which significantly enhances the performance and robustness of RC models in a noisy FSL scenario. Besides, our attention schemes accelerate the convergence speed of RC models. Experimental results demonstrate that our hybrid attention-based models require fewer training iterations and outperform the state-of-the-art baseline models. The code and datasets are released on https://github.com/thunlp/ HATT-Proto.
【Keywords】:
【Paper Link】 【Pages】:6415-6422
【Authors】: Yifan Gao ; Yang Zhong ; Daniel Preotiuc-Pietro ; Junyi Jessy Li
【Abstract】: In computational linguistics, specificity quantifies how much detail is engaged in text. It is an important characteristic of speaker intention and language style, and is useful in NLP applications such as summarization and argumentation mining. Yet to date, expert-annotated data for sentence-level specificity are scarce and confined to the news genre. In addition, systems that predict sentence specificity are classifiers trained to produce binary labels (general or specific).We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems. We also present the first large-scale study revealing the social, temporal and mental health factors underlying language specificity on social media.
【Keywords】:
【Paper Link】 【Pages】:6423-6430
【Authors】: Yifan Gao ; Lidong Bing ; Piji Li ; Irwin King ; Michael R. Lyu
【Abstract】: We investigate the task of distractor generation for multiple choice reading comprehension questions from examinations. In contrast to all previous works, we do not aim at preparing words or short phrases distractors, instead, we endeavor to generate longer and semantic-rich distractors which are closer to distractors in real reading comprehension from examinations. Taking a reading comprehension article, a pair of question and its correct option as input, our goal is to generate several distractors which are somehow related to the answer, consistent with the semantic context of the question and have some trace in the article. We propose a hierarchical encoderdecoder framework with static and dynamic attention mechanisms to tackle this task. Specifically, the dynamic attention can combine sentence-level and word-level attention varying at each recurrent time step to generate a more readable sequence. The static attention is to modulate the dynamic attention not to focus on question irrelevant sentences or sentences which contribute to the correct option. Our proposed framework outperforms several strong baselines on the first prepared distractor generation dataset of real reading comprehension questions. For human evaluation, compared with those distractors generated by baselines, our generated distractors are more functional to confuse the annotators.
【Keywords】:
【Paper Link】 【Pages】:6431-6440
【Authors】: Sahil Garg ; Aram Galstyan ; Greg Ver Steeg ; Irina Rish ; Guillermo A. Cecchi ; Shuyang Gao
【Abstract】: Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing an approximation of mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w.r.t. state-ofthe-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods.
【Keywords】:
【Paper Link】 【Pages】:6441-6448
【Authors】: Erfan Ghadery ; Sajad Movahedi ; Heshaam Faili ; Azadeh Shakery
【Abstract】: The advent of the Internet has caused a significant growth in the number of opinions expressed about products or services on e-commerce websites. Aspect category detection, which is one of the challenging subtasks of aspect-based sentiment analysis, deals with categorizing a given review sentence into a set of predefined categories. Most of the research efforts in this field are devoted to English language reviews, while there are a large number of reviews in other languages that are left unexplored. In this paper, we propose a multilingual method to perform aspect category detection on reviews in different languages, which makes use of a deep convolutional neural network with multilingual word embeddings. To the best of our knowledge, our method is the first attempt at performing aspect category detection on multiple languages simultaneously. Empirical results on the multilingual dataset provided by SemEval workshop demonstrate the effectiveness of the proposed method1.
【Keywords】:
【Paper Link】 【Pages】:6449-6456
【Authors】: ChengYue Gong ; Xu Tan ; Di He ; Tao Qin
【Abstract】: Maximum-likelihood estimation (MLE) is widely used in sequence to sequence tasks for model training. It uniformly treats the generation/prediction of each target token as multiclass classification, and yields non-smooth prediction probabilities: in a target sequence, some tokens are predicted with small probabilities while other tokens are with large probabilities. According to our empirical study, we find that the non-smoothness of the probabilities results in low quality of generated sequences. In this paper, we propose a sentence-wise regularization method which aims to output smooth prediction probabilities for all the tokens in the target sequence. Our proposed method can automatically adjust the weights and gradients of each token in one sentence to ensure the predictions in a sequence uniformly well. Experiments on three neural machine translation tasks and one text summarization task show that our method outperforms conventional MLE loss on all these tasks and achieves promising BLEU scores on WMT14 English-German and WMT17 Chinese-English translation task.
【Keywords】:
【Paper Link】 【Pages】:6457-6464
【Authors】: Jingjing Gong ; Xinchi Chen ; Tao Gui ; Xipeng Qiu
【Abstract】: Multi-criteria Chinese word segmentation is a promising but challenging task, which exploits several different segmentation criteria and mines their common underlying knowledge. In this paper, we propose a flexible multi-criteria learning for Chinese word segmentation. Usually, a segmentation criterion could be decomposed into multiple sub-criteria, which are shareable with other segmentation criteria. The process of word segmentation is a routing among these sub-criteria. From this perspective, we present Switch-LSTMs to segment words, which consist of several long short-term memory neural networks (LSTM), and a switcher to automatically switch the routing among these LSTMs. With these auto-switched LSTMs, our model provides a more flexible solution for multi-criteria CWS, which is also easy to transfer the learned knowledge to new criteria. Experiments show that our model obtains significant improvements on eight corpora with heterogeneous segmentation criteria, compared to the previous method and single-criterion learning.
【Keywords】:
【Paper Link】 【Pages】:6465-6472
【Authors】: Yu Gong ; Xusheng Luo ; Yu Zhu ; Wenwu Ou ; Zhao Li ; Muhua Zhu ; Kenny Q. Zhu ; Lu Duan ; Xi Chen
【Abstract】: Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform.
【Keywords】:
【Paper Link】 【Pages】:6473-6480
【Authors】: Jian Guan ; Yansen Wang ; Minlie Huang
【Abstract】: Generating a reasonable ending for a given story context, i.e., story ending generation, is a strong indication of story comprehension. This task requires not only to understand the context clues which play an important role in planning the plot, but also to handle implicit knowledge to make a reasonable, coherent story. In this paper, we devise a novel model for story ending generation. The model adopts an incremental encoding scheme to represent context clues which are spanning in the story context. In addition, commonsense knowledge is applied through multi-source attention to facilitate story comprehension, and thus to help generate coherent and reasonable endings. Through building context clues and using implicit knowledge, the model is able to produce reasonable story endings. Automatic and manual evaluation shows that our model can generate more reasonable story endings than state-of-the-art baselines1.
【Keywords】:
【Paper Link】 【Pages】:6481-6488
【Authors】: Tao Gui ; Qi Zhang ; Lujun Zhao ; Yaosong Lin ; Minlong Peng ; Jingjing Gong ; Xuanjing Huang
【Abstract】: In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.
【Keywords】:
【Paper Link】 【Pages】:6489-6496
【Authors】: Maosheng Guo ; Yu Zhang ; Ting Liu
【Abstract】: Natural Language Inference (NLI) is an active research area, where numerous approaches based on recurrent neural networks (RNNs), convolutional neural networks (CNNs), and self-attention networks (SANs) has been proposed. Although obtaining impressive performance, previous recurrent approaches are hard to train in parallel; convolutional models tend to cost more parameters, while self-attention networks are not good at capturing local dependency of texts. To address this problem, we introduce a Gaussian prior to selfattention mechanism, for better modeling the local structure of sentences. Then we propose an efficient RNN/CNN-free architecture named Gaussian Transformer for NLI, which consists of encoding blocks modeling both local and global dependency, high-order interaction blocks collecting the evidence of multi-step inference, and a lightweight comparison block saving lots of parameters. Experiments show that our model achieves new state-of-the-art performance on both SNLI and MultiNLI benchmarks with significantly fewer parameters and considerably less training time. Besides, evaluation using the Hard NLI datasets demonstrates that our approach is less affected by the undesirable annotation artifacts.
【Keywords】:
【Paper Link】 【Pages】:6497-6504
【Authors】: Divam Gupta ; Tanmoy Chakraborty ; Soumen Chakrabarti
【Abstract】: In several natural language tasks, labeled sequences are available in separate domains (say, languages), but the goal is to label sequences with mixed domain (such as code-switched text). Or, we may have available models for labeling whole passages (say, with sentiments), which we would like to exploit toward better position-specific label inference (say, target-dependent sentiment annotation). A key characteristic shared across such tasks is that different positions in a primary instance can benefit from different ‘experts’ trained from auxiliary data, but labeled primary instances are scarce, and labeling the best expert for each position entails unacceptable cognitive burden. We propose GIRNet, a unified position-sensitive multi-task recurrent neural network (RNN) architecture for such applications. Auxiliary and primary tasks need not share training instances. Auxiliary RNNs are trained over auxiliary instances. A primary instance is also submitted to each auxiliary RNN, but their state sequences are gated and merged into a novel composite state sequence tailored to the primary inference task. Our approach is in sharp contrast to recent multi-task networks like the crossstitch and sluice networks, which do not control state transfer at such fine granularity. We demonstrate the superiority of GIRNet using three applications: sentiment classification of code-switched passages, part-of-speech tagging of codeswitched text, and target position-sensitive annotation of sentiment in monolingual passages. In all cases, we establish new state-of-the-art performance beyond recent competitive baselines.
【Keywords】:
【Paper Link】 【Pages】:6505-6512
【Authors】: Pankaj Gupta ; Yatin Chaudhary ; Florian Buettner ; Hinrich Schütze
【Abstract】: We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., “networks” used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.
【Keywords】:
【Paper Link】 【Pages】:6513-6520
【Authors】: Pankaj Gupta ; Subburam Rajaram ; Hinrich Schütze ; Thomas A. Runkler
【Abstract】: Past work in relation extraction mostly focuses on binary relation between entity pairs within single sentence. Recently, the NLP community has gained interest in relation extraction in entity pairs spanning multiple sentences. In this paper, we propose a novel architecture for this task: inter-sentential dependency-based neural networks (iDepNN). iDepNN models the shortest and augmented dependency paths via recurrent and recursive neural networks to extract relationships within (intra-) and across (inter-) sentence boundaries. Compared to SVM and neural network baselines, iDepNN is more robust to false positives in relationships spanning sentences. We evaluate our models on four datasets from newswire (MUC6) and medical (BioNLP shared task) domains that achieve state-of-the-art performance and show a better balance in precision and recall for inter-sentential relationships. We perform better than 11 teams participating in the BioNLP shared task 2016 and achieve a gain of 5.2% (0.587 vs 0.558) in F1 over the winning team. We also release the crosssentence annotations for MUC6.
【Keywords】:
【Paper Link】 【Pages】:6521-6528
【Authors】: J. Edward Hu ; Rachel Rudinger ; Matt Post ; Benjamin Van Durme
【Abstract】: We present PARABANK, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of PARANMT (Wieting and Gimpel, 2018), we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that PARABANK’s paraphrases improve over PARANMT on both semantic similarity and fluency. Finally, we use PARABANK to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.
【Keywords】:
【Paper Link】 【Pages】:6529-6537
【Authors】: Minghao Hu ; Furu Wei ; Yuxing Peng ; Zhen Huang ; Nan Yang ; Dongsheng Li
【Abstract】: Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred. In addition to extract answers, previous works usually predict an additional “no-answer” probability to detect unanswerable cases. However, they fail to validate the answerability of the question by verifying the legitimacy of the predicted answer. To address this problem, we propose a novel read-then-verify system, which not only utilizes a neural reader to extract candidate answers and produce no-answer probabilities, but also leverages an answer verifier to decide whether the predicted answer is entailed by the input snippets. Moreover, we introduce two auxiliary losses to help the reader better handle answer extraction as well as no-answer detection, and investigate three different architectures for the answer verifier. Our experiments on the SQuAD 2.0 dataset show that our system obtains a score of 74.2 F1 on test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018).
【Keywords】:
【Paper Link】 【Pages】:6538-6545
【Authors】: Hengguan Huang ; Hao Wang ; Brian Mak
【Abstract】: Over the past few years, there has been a resurgence of interest in using recurrent neural network-hidden Markov model (RNN-HMM) for automatic speech recognition (ASR). Some modern recurrent network models, such as long shortterm memory (LSTM) and simple recurrent unit (SRU), have demonstrated promising results on this task. Recently, several scientific perspectives in the fields of neuroethology and speech production suggest that human speech signals may be represented in discrete point patterns involving acoustic events in the speech signal. Based on this hypothesis, it may pose some challenges for RNN-HMM acoustic modeling: firstly, it arbitrarily discretizes the continuous input into the interval features at a fixed frame rate, which may introduce discretization errors; secondly, the occurrences of such acoustic events are unknown. Furthermore, the training targets of RNN-HMM are obtained from other (inferior) models, giving rise to misalignments. In this paper, we propose a recurrent Poisson process (RPP) which can be seen as a collection of Poisson processes at a series of time intervals in which the intensity evolves according to the RNN hidden states that encode the history of the acoustic signal. It aims at allocating the latent acoustic events in continuous time. Such events are efficiently drawn from the RPP using a sampling-free solution in an analytic form. The speech signal containing latent acoustic events is reconstructed/sampled dynamically from the discretized acoustic features using linear interpolation, in which the weight parameters are estimated from the onset of these events. The above processes are further integrated into an SRU, forming our final model, called recurrent Poisson process unit (RPPU). Experimental evaluations on ASR tasks including ChiME-2, WSJ0 and WSJ0&1 demonstrate the effectiveness and benefits of the RPPU. For example, it achieves a relative WER reduction of 10.7% over state-of-the-art models on WSJ0.
【Keywords】:
【Paper Link】 【Pages】:6546-6553
【Authors】: Shaohan Huang ; Yu Wu ; Furu Wei ; Zhongzhi Luan
【Abstract】: An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct. We propose a novel approach to modeling the process with dictionary-guided editing networks which effectively conduct rewriting on the source sentence to generate paraphrase sentences. It jointly learns the selection of the appropriate word level and phrase level paraphrase pairs in the context of the original sentence from an off-the-shelf dictionary as well as the generation of fluent natural language sentences. Specifically, the system retrieves a set of word level and phrase level paraphrase pairs derived from the Paraphrase Database (PPDB) for the original sentence, which is used to guide the decision of which the words might be deleted or inserted with the soft attention mechanism under the sequence-to-sequence framework. We conduct experiments on two benchmark datasets for paraphrase generation, namely the MSCOCO and Quora dataset. The automatic evaluation results demonstrate that our dictionary-guided editing networks outperforms the baseline methods. On human evaluation, results indicate that the generated paraphrases are grammatically correct and relevant to the input sentence.
【Keywords】:
【Paper Link】 【Pages】:6554-6561
【Authors】: Parag Jain ; Abhijit Mishra ; Amar Prakash Azad ; Karthik Sankaranarayanan
【Abstract】: We propose a novel framework for controllable natural language transformation. Realizing that the requirement of parallel corpus is practically unsustainable for controllable generation tasks, an unsupervised training scheme is introduced. The crux of the framework is a deep neural encoder-decoder that is reinforced with text-transformation knowledge through auxiliary modules (called scorers). These scorers, based on off-the-shelf language processing tools, decide the learning scheme of the encoder-decoder based on its actions. We apply this framework for the text-transformation task of formalizing an input text by improving its readability grade; the degree of required formalization can be controlled by the user at run-time. Experiments on public datasets demonstrate the efficacy of our model towards: (a) transforming a given text to a more formal style, and (b) varying the amount of formalness in the output text based on the specified input control. Our code and datasets are released for academic use.
【Keywords】:
【Paper Link】 【Pages】:6562-6569
【Authors】: Shoaib Jameel ; Zihao Fu ; Bei Shi ; Wai Lam ; Steven Schockaert
【Abstract】: The GloVe word embedding model relies on solving a global optimization problem, which can be reformulated as a maximum likelihood estimation problem. In this paper, we propose to generalize this approach to word embedding by considering parametrized variants of the GloVe model and incorporating priors on these parameters. To demonstrate the usefulness of this approach, we consider a word embedding model in which each context word is associated with a corresponding variance, intuitively encoding how informative it is. Using our framework, we can then learn these variances together with the resulting word vectors in a unified way. We experimentally show that the resulting word embedding models outperform GloVe, as well as many popular alternatives.
【Keywords】:
【Paper Link】 【Pages】:6570-6577
【Authors】: Hannah Kim ; Denys Katerenchuk ; Daniel Billet ; Jun Huan ; Haesun Park ; Boyang Li
【Abstract】: Understanding narrative content has become an increasingly popular topic. Nonetheless, research on identifying common types of narrative characters, or personae, is impeded by the lack of automatic and broad-coverage evaluation methods. We argue that computationally modeling actors provides benefits, including novel evaluation mechanisms for personae. Specifically, we propose two actor-modeling tasks, cast prediction and versatility ranking, which can capture complementary aspects of the relation between actors and the characters they portray. For an actor model, we present a technique for embedding actors, movies, character roles, genres, and descriptive keywords as Gaussian distributions and translation vectors, where the Gaussian variance corresponds to actors’ versatility. Empirical results indicate that (1) the technique considerably outperforms TransE (Bordes et al. 2013) and ablation baselines and (2) automatically identified persona topics (Bamman, O’Connor, and Smith 2013) yield statistically significant improvements in both tasks, whereas simplistic persona descriptors including age and gender perform inconsistently, validating prior research.
【Keywords】:
【Paper Link】 【Pages】:6578-6585
【Authors】: Najoung Kim ; Kyle Rawlins ; Benjamin Van Durme ; Paul Smolensky
【Abstract】: Distinguishing between arguments and adjuncts of a verb is a longstanding, nontrivial problem. In natural language processing, argumenthood information is important in tasks such as semantic role labeling (SRL) and prepositional phrase (PP) attachment disambiguation. In theoretical linguistics, many diagnostic tests for argumenthood exist but they often yield conflicting and potentially gradient results. This is especially the case for syntactically oblique items such as PPs. We propose two PP argumenthood prediction tasks branching from these two motivations: (1) binary argumentadjunct classification of PPs in VerbNet, and (2) gradient argumenthood prediction using human judgments as gold standard, and report results from prediction models that use pretrained word embeddings and other linguistically informed features. Our best results on each task are (1) acc. = 0.955, F1 = 0.954 (ELMo+BiLSTM) and (2) Pearson’s r = 0.624 (word2vec+MLP). Furthermore, we demonstrate the utility of argumenthood prediction in improving sentence representations via performance gains on SRL when a sentence encoder is pretrained with our tasks.
【Keywords】:
【Paper Link】 【Pages】:6586-6593
【Authors】: Seonhoon Kim ; Inho Kang ; Nojun Kwak
【Abstract】: Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.
【Keywords】:
【Paper Link】 【Pages】:6594-6601
【Authors】: Taeuk Kim ; Jihun Choi ; Daniel Edmiston ; Sanghwan Bae ; Sang-goo Lee
【Abstract】: Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing. We present a novel RvNN architecture that can provide dynamic compositionality by considering comprehensive syntactic information derived from both the structure and linguistic tags. Specifically, we introduce a structure-aware tag representation constructed by a separate tag-level tree-LSTM. With this, we can control the composition function of the existing wordlevel tree-LSTM by augmenting the representation as a supplementary input to the gate functions of the tree-LSTM. In extensive experiments, we show that models built upon the proposed architecture obtain superior or competitive performance on several sentence-level tasks such as sentiment analysis and natural language inference when compared against previous tree-structured models and other sophisticated neural models.
【Keywords】:
【Paper Link】 【Pages】:6602-6609
【Authors】: Yanghoon Kim ; Hwanhee Lee ; Joongbo Shin ; Kyomin Jung
【Abstract】: Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
【Keywords】:
【Paper Link】 【Pages】:6610-6617
【Authors】: Wei-Jen Ko ; Greg Durrett ; Junyi Jessy Li
【Abstract】: Sentence specificity quantifies the level of detail in a sentence, characterizing the organization of information in discourse. While this information is useful for many downstream applications, specificity prediction systems predict very coarse labels (binary or ternary) and are trained on and tailored toward specific domains (e.g., news). The goal of this work is to generalize specificity prediction to domains where no labeled data is available and output more nuanced realvalued specificity ratings.We present an unsupervised domain adaptation system for sentence specificity prediction, specifically designed to output real-valued estimates from binary training labels. To calibrate the values of these predictions appropriately, we regularize the posterior distribution of the labels towards a reference distribution. We show that our framework generalizes well to three different domains with 50%-68% mean absolute error reduction than the current state-of-the-art system trained for news sentence specificity. We also demonstrate the potential of our work in improving the quality and informativeness of dialogue generation systems.
【Keywords】:
【Paper Link】 【Pages】:6618-6625
【Authors】: Xiang Kong ; Zhaopeng Tu ; Shuming Shi ; Eduard H. Hovy ; Tong Zhang
【Abstract】: Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machine translation, they face problems like the inadequate translation. We attribute this to that the standard Maximum Likelihood Estimation (MLE) cannot judge the real translation quality due to its several limitations. In this work, we propose an adequacyoriented learning mechanism for NMT by casting translation as a stochastic policy in Reinforcement Learning (RL), where the reward is estimated by explicitly measuring translation adequacy. Benefiting from the sequence-level training of RL strategy and a more accurate reward designed specifically for translation, our model outperforms multiple strong baselines, including (1) standard and coverage-augmented attention models with MLE-based training, and (2) advanced reinforcement and adversarial training strategies with rewards based on both word-level BLEU and character-level CHRF3. Quantitative and qualitative analyses on different language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach.
【Keywords】:
【Paper Link】 【Pages】:6626-6633
【Authors】: Xiang Kong ; Qizhe Xie ; Zihang Dai ; Eduard H. Hovy
【Abstract】: Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoSboosted Transformer yields 29.6 BLEU score for English-toGerman and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.9 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.
【Keywords】:
【Paper Link】 【Pages】:6634-6641
【Authors】: Yuxuan Lai ; Yansong Feng ; Xiaohan Yu ; Zheng Wang ; Kun Xu ; Dongyan Zhao
【Abstract】: Short text matching often faces the challenges that there are great word mismatch and expression diversity between the two texts, which would be further aggravated in languages like Chinese where there is no natural space to segment words explicitly. In this paper, we propose a novel lattice based CNN model (LCNs) to utilize multi-granularity information inherent in the word lattice while maintaining strong ability to deal with the introduced noisy information for matching based question answering in Chinese. We conduct extensive experiments on both document based question answering and knowledge based question answering tasks, and experimental results show that the LCNs models can significantly outperform the state-of-the-art matching models and strong baselines by taking advantages of better ability to distill rich but discriminative information from the word lattice input.
【Keywords】:
【Paper Link】 【Pages】:6642-6649
【Authors】: Sungjin Lee ; Rahul Jha
【Abstract】: Conversational agents such as Alexa and Google Assistant constantly need to increase their language understanding capabilities by adding new domains. A massive amount of labeled data is required for training each new domain. While domain adaptation approaches alleviate the annotation cost, prior approaches suffer from increased training time and suboptimal concept alignments. To tackle this, we introduce a novel Zero-Shot Adaptive Transfer method for slot tagging that utilizes the slot description for transferring reusable concepts across domains, and enjoys efficient training without any explicit concept alignments. Extensive experimentation over a dataset of 10 domains relevant to our commercial personal digital assistant shows that our model outperforms previous state-of-the-art systems by a large margin, and achieves an even higher improvement in the low data regime.
【Keywords】:
【Paper Link】 【Pages】:6650-6657
【Authors】: Zeyang Lei ; Yujiu Yang ; Min Yang ; Wei Zhao ; Jun Guo ; Yi Liu
【Abstract】: In this paper, we propose a novel Human-like Semantic Cognition Network (HSCN) for aspect-level sentiment classification, motivated by the principles of human beings’ reading cognitive process (pre-reading, active reading, post-reading). We first design a word-level interactive perception module to capture the correlation between context words and the given target words, which can be regarded as pre-reading. Second, to mimic the process of active reading, we propose a targetaware semantic distillation module to produce the targetspecific context representation for aspect-level sentiment prediction. Third, we further devise a semantic deviation metric module to measure the semantic deviation between the targetspecific context representation and the given target, which evaluates the degree we understand the target-specific context semantics. The measured semantic deviation is then used to fine-tune the above active reading process in a feedback regulation way. To verify the effectiveness of our approach, we conduct extensive experiments on three widely used datasets. The experiments demonstrate that HSCN achieves impressive results compared to other strong competitors.
【Keywords】:
【Paper Link】 【Pages】:6658-6665
【Authors】: Bowen Li ; Jianpeng Cheng ; Yang Liu ; Frank Keller
【Abstract】: Dependency grammar induction is the task of learning dependency syntax without annotated training data. Traditional graph-based models with global inference achieve state-ofthe-art results on this task but they require O(n3) run time. Transition-based models enable faster inference with O(n) time complexity, but their performance still lags behind. In this work, we propose a neural transition-based parser for dependency grammar induction, whose inference procedure utilizes rich neural features with O(n) time complexity. We train the parser with an integration of variational inference, posterior regularization and variance reduction techniques. The resulting framework outperforms previous unsupervised transition-based dependency parsers and achieves performance comparable to graph-based models, both on the English Penn Treebank and on the Universal Dependency Treebank. In an empirical comparison, we show that our approach substantially increases parsing speed over graphbased models.
【Keywords】:
【Paper Link】 【Pages】:6666-6673
【Authors】: Christy Y. Li ; Xiaodan Liang ; Zhiting Hu ; Eric P. Xing
【Abstract】: Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions. We propose a novel Knowledge-driven Encode, Retrieve, Paraphrase (KERP) approach which reconciles traditional knowledge- and retrieval-based methods with modern learning-based methods for accurate and robust medical report generation. Specifically, KERP decomposes medical report generation into explicit medical abnormality graph learning and subsequent natural language modeling. KERP first employs an Encode module that transforms visual features into a structured abnormality graph by incorporating prior medical knowledge; then a Retrieve module that retrieves text templates based on the detected abnormalities; and lastly, a Paraphrase module that rewrites the templates according to specific cases. The core of KERP is a proposed generic implementation unit—Graph Transformer (GTR) that dynamically transforms high-level semantics between graph-structured data of multiple domains such as knowledge graphs, images and sequences. Experiments show that the proposed approach generates structured and robust reports supported with accurate abnormality description and explainable attentive regions, achieving the state-of-the-art results on two medical report benchmarks, with the best medical abnormality and disease classification accuracy and improved human evaluation performance.
【Keywords】:
【Paper Link】 【Pages】:6674-6681
【Authors】: Irene Li ; Alexander R. Fabbri ; Robert R. Tung ; Dragomir R. Radev
【Abstract】: Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of “what should one learn first,”we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available 1. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.
【Keywords】:
【Paper Link】 【Pages】:6682-6689
【Authors】: Jianing Li ; Yanyan Lan ; Jiafeng Guo ; Jun Xu ; Xueqi Cheng
【Abstract】: Neural language models based on recurrent neural networks (RNNLM) have significantly improved the performance for text generation, yet the quality of generated text represented by Turing Test pass rate is still far from satisfying. Some researchers propose to use adversarial training or reinforcement learning to promote the quality, however, such methods usually introduce great challenges in the training and parameter tuning processes. Through our analysis, we find the problem of RNNLM comes from the usage of maximum likelihood estimation (MLE) as the objective function, which requires the generated distribution to precisely recover the true distribution. Such requirement favors high generation diversity which restricted the generation quality. This is not suitable when the overall quality is low, since high generation diversity usually indicates lot of errors rather than diverse good samples. In this paper, we propose to achieve differentiated distribution recovery, DDR for short. The key idea is to make the optimal generation probability proportional to the β-th power of the true probability, where β > 1. In this way, the generation quality can be greatly improved by sacrificing diversity from noises and rare patterns. Experiments on synthetic data and two public text datasets show that our DDR method achieves more flexible quality-diversity trade-off and higher Turing Test pass rate, as compared with baseline methods including RNNLM, SeqGAN and LeakGAN.
【Keywords】:
【Paper Link】 【Pages】:6690-6697
【Authors】: Junjie Li ; Haoran Li ; Chengqing Zong
【Abstract】: We address personalized review summarization, which generates a condensed summary for a user’s review, accounting for his preference on different aspects or his writing style. We propose a novel personalized review summarization model named User-aware Sequence Network (USN) to consider the aforementioned users’ characteristics when generating summaries, which contains a user-aware encoder and a useraware decoder. Specifically, the user-aware encoder adopts a user-based selective mechanism to select the important information of a review, and the user-aware decoder incorporates user characteristic and user-specific word-using habits into word prediction process to generate personalized summaries. To validate our model, we collected a new dataset Trip, comprising 536,255 reviews from 19,400 users. With quantitative and human evaluation, we show that USN achieves state-ofthe-art performance on personalized review summarization.
【Keywords】:
【Paper Link】 【Pages】:6698-6705
【Authors】: Juntao Li ; Lisong Qiu ; Bo Tang ; Dongmin Chen ; Dongyan Zhao ; Rui Yan
【Abstract】: Recent successes of open-domain dialogue generation mainly rely on the advances of deep neural networks. The effectiveness of deep neural network models depends on the amount of training data. As it is laboursome and expensive to acquire a huge amount of data in most scenarios, how to effectively utilize existing data is the crux of this issue. In this paper, we use data augmentation techniques to improve the performance of neural dialogue models on the condition of insufficient data. Specifically, we propose a novel generative model to augment existing data, where the conditional variational autoencoder (CVAE) is employed as the generator to output more training data with diversified expressions. To improve the correlation of each augmented training pair, we design a discriminator with adversarial training to supervise the augmentation process. Moreover, we thoroughly investigate various data augmentation schemes for neural dialogue system with generative models, both GAN and CVAE. Experimental results on two open corpora, Weibo and Twitter, demonstrate the superiority of our proposed data augmentation model.
【Keywords】:
【Paper Link】 【Pages】:6706-6713
【Authors】: Naihan Li ; Shujie Liu ; Yanqing Liu ; Sheng Zhao ; Ming Liu
【Abstract】: Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves training efficiency. Meanwhile, any two inputs at different times are connected directly by a self-attention mechanism, which solves the long range dependency problem effectively. Using phoneme sequences as input, our Transformer TTS network generates mel spectrograms, followed by a WaveNet vocoder to output the final audio results. Experiments are conducted to test the efficiency and performance of our new network. For the efficiency, our Transformer TTS network can speed up the training about 4.25 times faster compared with Tacotron2. For the performance, rigorous human tests show that our proposed model achieves state-of-the-art performance (outperforms Tacotron2 with a gap of 0.048) and is very close to human quality (4.39 vs 4.44 in MOS).
【Keywords】:
【Paper Link】 【Pages】:6714-6721
【Authors】: Xin Li ; Lidong Bing ; Piji Li ; Wai Lam
【Abstract】: Target-based sentiment analysis involves opinion target extraction and target sentiment classification. However, most of the existing works usually studied one of these two sub-tasks alone, which hinders their practical use. This paper aims to solve the complete task of target-based sentiment analysis in an end-to-end fashion, and presents a novel unified model which applies a unified tagging scheme. Our framework involves two stacked recurrent neural networks: The upper one predicts the unified tags to produce the final output results of the primary target-based sentiment analysis; The lower one performs an auxiliary target boundary prediction aiming at guiding the upper network to improve the performance of the primary task. To explore the inter-task dependency, we propose to explicitly model the constrained transitions from target boundaries to target sentiment polarities. We also propose to maintain the sentiment consistency within an opinion target via a gate mechanism which models the relation between the features for the current word and the previous word. We conduct extensive experiments on three benchmark datasets and our framework achieves consistently superior results.
【Keywords】:
【Paper Link】 【Pages】:6722-6729
【Authors】: Ziming Li ; Julia Kiseleva ; Maarten de Rijke
【Abstract】: The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:6730-6737
【Authors】: Zuchao Li ; Shexia He ; Hai Zhao ; Yiqing Zhang ; Zhuosheng Zhang ; Xi Zhou ; Xiang Zhou
【Abstract】: Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.
【Keywords】:
【Paper Link】 【Pages】:6738-6745
【Authors】: Changsheng Liu ; Rebecca Hwa
【Abstract】: Many idiomatic expressions can be used figuratively or literally depending on the context. A particular challenge of automatic idiom usage recognition is that idioms, by their very nature, are idiosyncratic in their usages; therefore, most previous work on idiom usage recognition mainly adopted a “per idiom” classifier approach, i.e., a classifier needs to be trained separately for each idiomatic expression of interest, often with the aid of annotated training examples. This paper presents a transferred learning approach for developing a generalized model to recognize whether an idiom is used figuratively or literally. Our work is based on the observation that most idioms, when taken literally, would be somehow semantically at odds with their context. Therefore, a quantified notion of semantic compatibility may help to discern the intended usage for any arbitrary idiom. We propose a novel semantic compatibility model by adapting the training of a Continuous Bag-of-Words (CBOW) model for the purpose of idiom usage recognition. There is no need to annotate idiom usage examples for training. We perform evaluative experiments on two corpora; results show that the proposed generalized model achieves competitive results compared to state of-the-art per-idiom models.
【Keywords】:
【Paper Link】 【Pages】:6746-6753
【Authors】: Haoyan Liu ; Lei Fang ; Jian-Guang Lou ; Zhoujun Li
【Abstract】: Much recent work focuses on leveraging semantic lexicons like WordNet to enhance word representation learning (WRL) and achieves promising performance on many NLP tasks. However, most existing methods might have limitations because they require high-quality, manually created, semantic lexicons or linguistic structures. In this paper, we propose to leverage semantic knowledge automatically mined from web structured data to enhance WRL. We first construct a semantic similarity graph, which is referred as semantic knowledge, based on a large collection of semantic lists extracted from the web using several pre-defined HTML tag patterns. Then we introduce an efficient joint word representation learning model to capture semantics from both semantic knowledge and text corpora. Compared with recent work on improving WRL with semantic resources, our approach is more general, and can be easily scaled with no additional effort. Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks.
【Keywords】:
【Paper Link】 【Pages】:6754-6761
【Authors】: Jian Liu ; Yubo Chen ; Kang Liu
【Abstract】: The ambiguity in language expressions poses a great challenge for event detection. To disambiguate event types, current approaches rely on external NLP toolkits to build knowledge representations. Unfortunately, these approaches work in a pipeline paradigm and suffer from error propagation problem. In this paper, we propose an adversarial imitation based knowledge distillation approach, for the first time, to tackle the challenge of acquiring knowledge from rawsentences for event detection. In our approach, a teacher module is first devised to learn the knowledge representations from the ground-truth annotations. Then, we set up a student module that only takes the raw-sentences as the input. The student module is taught to imitate the behavior of the teacher under the guidance of an adversarial discriminator. By this way, the process of knowledge distillation from rawsentence has been implicitly integrated into the feature encoding stage of the student module. To the end, the enhanced student is used for event detection, which processes raw texts and requires no extra toolkits, naturally eliminating the error propagation problem faced by pipeline approaches. We conduct extensive experiments on the ACE 2005 datasets, and the experimental results justify the effectiveness of our approach.
【Keywords】:
【Paper Link】 【Pages】:6762-6769
【Authors】: Pengfei Liu ; Shuaichen Chang ; Xuanjing Huang ; Jian Tang ; Jackie Chi Kit Cheung
【Abstract】: Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.
【Keywords】:
【Paper Link】 【Pages】:6770-6777
【Authors】: Qian Liu ; Bei Chen ; Jian-Guang Lou ; Ge Jin ; Dongmei Zhang
【Abstract】: Recent work on Natural Language Interfaces to Databases (NLIDB) has attracted considerable attention. NLIDB allow users to search databases using natural language instead of SQL-like query languages. While saving the users from having to learn query languages, multi-turn interaction with NLIDB usually involves multiple queries where contextual information is vital to understand the users’ query intents. In this paper, we address a typical contextual understanding problem, termed as follow-up query analysis. In spite of its ubiquity, follow-up query analysis has not been well studied due to two primary obstacles: the multifarious nature of follow-up query scenarios and the lack of high-quality datasets. Our work summarizes typical follow-up query scenarios and provides a new FollowUp dataset with 1000 query triples on 120 tables. Moreover, we propose a novel approach FANDA, which takes into account the structures of queries and employs a ranking model with weakly supervised max-margin learning. The experimental results on FollowUp demonstrate the superiority of FANDA over multiple baselines across multiple metrics.
【Keywords】:
【Paper Link】 【Pages】:6778-6785
【Authors】: Tianlin Liu ; Lyle Ungar ; João Sedoc
【Abstract】: Word vectors are at the core of many natural language processing tasks. Recently, there has been interest in post-processing word vectors to enrich their semantic information. In this paper, we introduce a novel word vector post-processing technique based on matrix conceptors (Jaeger 2014), a family of regularized identity maps. More concretely, we propose to use conceptors to suppress those latent features of word vectors having high variances. The proposed method is purely unsupervised: it does not rely on any corpus or external linguistic database. We evaluate the post-processed word vectors on a battery of intrinsic lexical evaluation tasks, showing that the proposed method consistently outperforms existing state-of-the-art alternatives. We also show that post-processed word vectors can be used for the downstream natural language processing task of dialogue state tracking, yielding improved results in different dialogue domains.
【Keywords】:
【Paper Link】 【Pages】:6786-6793
【Authors】: Tianyu Liu ; Fuli Luo ; Qiaolin Xia ; Shuming Ma ; Baobao Chang ; Zhifang Sui
【Abstract】: Generating natural language descriptions for the structured tables which consist of multiple attribute-value tuples is a convenient way to help people to understand the tables. Most neural table-to-text models are based on the encoder-decoder framework. However, it is hard for a vanilla encoder to learn the accurate semantic representation of a complex table. The challenges are two-fold: firstly, the table-to-text datasets often contain large number of attributes across different domains, thus it is hard for the encoder to incorporate these heterogeneous resources. Secondly, the single encoder also has difficulties in modeling the complex attribute-value structure of the tables. To this end, we first propose a two-level hierarchical encoder with coarse-to-fine attention to handle the attribute-value structure of the tables. Furthermore, to capture the accurate semantic representations of the tables, we propose 3 joint tasks apart from the prime encoder-decoder learning, namely auxiliary sequence labeling task, text autoencoder and multi-labeling classification, as the auxiliary supervisions for the table encoder. We test our models on the widely used dataset WIKIBIO which contains Wikipedia infoboxes and related descriptions. The dataset contains complex tables as well as large number of attributes across different domains. We achieve the state-of-the-art performance on both automatic and human evaluation metrics.
【Keywords】:
【Paper Link】 【Pages】:6794-6801
【Authors】: Liangchen Luo ; Wenhao Huang ; Qi Zeng ; Zaiqing Nie ; Xu Sun
【Abstract】: Most existing works on dialog systems only consider conversation content while neglecting the personality of the user the bot is interacting with, which begets several unsolved issues. In this paper, we present a personalized end-to-end model in an attempt to leverage personalization in goal-oriented dialogs. We first introduce a PROFILE MODEL which encodes user profiles into distributed embeddings and refers to conversation history from other similar users. Then a PREFERENCE MODEL captures user preferences over knowledge base entities to handle the ambiguity in user requests. The two models are combined into the PERSONALIZED MEMN2N. Experiments show that the proposed model achieves qualitative performance improvements over state-of-the-art methods. As for human evaluation, it also outperforms other approaches in terms of task completion rate and user satisfaction.
【Keywords】:
【Paper Link】 【Pages】:6802-6809
【Authors】: Shangwen Lv ; Wanhui Qian ; Longtao Huang ; Jizhong Han ; Songlin Hu
【Abstract】: Scripts represent knowledge of event sequences that can help text understanding. Script event prediction requires to measure the relation between an existing chain and the subsequent event. The dominant approaches either focus on the effects of individual events, or the influence of the chain sequence. However, only considering individual events will lose much semantic relations within the event chain, and only considering the sequence of the chain will introduce much noise. With our observations, both the individual events and the event segments within the chain can facilitate the prediction of the subsequent event. This paper develops self attention mechanism to focus on diverse event segments within the chain and the event chain is represented as a set of event segments. We utilize the event-level attention to model the relations between subsequent events and individual events. Then, we propose the chain-level attention to model the relations between subsequent events and event segments within the chain. Finally, we integrate event-level and chain-level attentions to interact with the chain to predict what happens next. Comprehensive experiment results on the widely used New York Times corpus demonstrate that our model achieves better results than other state-of-the-art baselines by adopting the evaluation of Multi-Choice Narrative Cloze task.
【Keywords】:
【Paper Link】 【Pages】:6810-6817
【Authors】: Shuming Ma ; Lei Cui ; Damai Dai ; Furu Wei ; Xu Sun
【Abstract】: We introduce the task of automatic live commenting. Live commenting, which is also called “video barrage”, is an emerging feature on online video sites that allows real-time comments from viewers to fly across the screen like bullets or roll at the right side of the screen. The live comments are a mixture of opinions for the video and the chit chats with other comments. Automatic live commenting requires AI agents to comprehend the videos and interact with human viewers who also make the comments, so it is a good testbed of an AI agent’s ability to deal with both dynamic vision and language. In this work, we construct a large-scale live comment dataset with 2,361 videos and 895,929 live comments. Then, we introduce two neural models to generate live comments based on the visual and textual contexts, which achieve better performance than previous neural baselines such as the sequence-to-sequence model. Finally, we provide a retrieval-based evaluation protocol for automatic live commenting where the model is asked to sort a set of candidate comments based on the log-likelihood score, and evaluated on metrics such as mean-reciprocal-rank. Putting it all together, we demonstrate the first “LiveBot”. The datasets and the codes can be found at https://github.com/lancopku/livebot.
【Keywords】:
【Paper Link】 【Pages】:6818-6825
【Authors】: Navonil Majumder ; Soujanya Poria ; Devamanyu Hazarika ; Rada Mihalcea ; Alexander F. Gelbukh ; Erik Cambria
【Abstract】: Emotion detection in conversations is a necessary step for a number of applications, including opinion mining over chat history, social media threads, debates, argumentation mining, understanding consumer feedback in live conversations, and so on. Currently systems do not treat the parties in the conversation individually by adapting to the speaker of each utterance. In this paper, we describe a new method based on recurrent neural networks that keeps track of the individual party states throughout the conversation and uses this information for emotion classification. Our model outperforms the state-of-the-art by a significant margin on two different datasets.
【Keywords】:
【Paper Link】 【Pages】:6826-6833
【Authors】: Yu Meng ; Jiaming Shen ; Chao Zhang ; Jiawei Han
【Abstract】: Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for text classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical text classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical text classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.
【Keywords】:
【Paper Link】 【Pages】:6834-6842
【Authors】: Ning Miao ; Hao Zhou ; Lili Mou ; Rui Yan ; Lei Li
【Abstract】: In real-world applications of natural language generation, there are often constraints on the target sentences in addition to fluency and naturalness requirements. Existing language generation techniques are usually based on recurrent neural networks (RNNs). However, it is non-trivial to impose constraints on RNNs while maintaining generation quality, since RNNs generate sentences sequentially (or with beam search) from the first word to the last. In this paper, we propose CGMH, a novel approach using Metropolis-Hastings sampling for constrained sentence generation. CGMH allows complicated constraints such as the occurrence of multiple keywords in the target sentences, which cannot be handled in traditional RNN-based approaches. Moreover, CGMH works in the inference stage, and does not require parallel corpora for training. We evaluate our method on a variety of tasks, including keywords-to-sentence generation, unsupervised sentence paraphrasing, and unsupervised sentence error correction. CGMH achieves high performance compared with previous supervised methods for sentence generation. Our code is released at https://github.com/NingMiao/CGMH
【Keywords】:
【Paper Link】 【Pages】:6843-6850
【Authors】: Sebastian J. Mielke ; Jason Eisner
【Abstract】: We show how the spellings of known words can help us deal with unknown words in open-vocabulary NLP tasks. The method we propose can be used to extend any closedvocabulary generative model, but in this paper we specifically consider the case of neural language modeling. Our Bayesian generative story combines a standard RNN language model (generating the word tokens in each sentence) with an RNNbased spelling model (generating the letters in each word type). These two RNNs respectively capture sentence structure and word structure, and are kept separate as in linguistics. By invoking the second RNN to generate spellings for novel words in context, we obtain an open-vocabulary language model. For known words, embeddings are naturally inferred by combining evidence from type spelling and token context. Comparing to baselines (including a novel strong baseline), we beat previous work and establish state-of-the-art results on multiple datasets.
【Keywords】:
【Paper Link】 【Pages】:6851-6858
【Authors】: Trung Minh Nguyen ; Thien Huu Nguyen
【Abstract】: The previous work for event extraction has mainly focused on the predictions for event triggers and argument roles, treating entity mentions as being provided by human annotators. This is unrealistic as entity mentions are usually predicted by some existing toolkits whose errors might be propagated to the event trigger and argument role recognition. Few of the recent work has addressed this problem by jointly predicting entity mentions, event triggers and arguments. However, such work is limited to using discrete engineering features to represent contextual information for the individual tasks and their interactions. In this work, we propose a novel model to jointly perform predictions for entity mentions, event triggers and arguments based on the shared hidden representations from deep learning. The experiments demonstrate the benefits of the proposed method, leading to the state-of-the-art performance for event extraction.
【Keywords】:
【Paper Link】 【Pages】:6859-6866
【Authors】: Yixin Nie ; Haonan Chen ; Mohit Bansal
【Abstract】: The increasing concern with misinformation has stimulated research efforts on automatic fact checking. The recentlyreleased FEVER dataset introduced a benchmark factverification task in which a system is asked to verify a claim using evidential sentences from Wikipedia documents. In this paper, we present a connected system consisting of three homogeneous neural semantic matching models that conduct document retrieval, sentence selection, and claim verification jointly for fact extraction and verification. For evidence retrieval (document retrieval and sentence selection), unlike traditional vector space IR models in which queries and sources are matched in some pre-designed term vector space, we develop neural models to perform deep semantic matching from raw textual input, assuming no intermediate term representation and no access to structured external knowledge bases. We also show that Pageview frequency can also help improve the performance of evidence retrieval results, that later can be matched by using our neural semantic matching network. For claim verification, unlike previous approaches that simply feed upstream retrieved evidence and the claim to a natural language inference (NLI) model, we further enhance the NLI model by providing it with internal semantic relatedness scores (hence integrating it with the evidence retrieval modules) and ontological WordNet features. Experiments on the FEVER dataset indicate that (1) our neural semantic matching method outperforms popular TF-IDF and encoder models, by significant margins on all evidence retrieval metrics, (2) the additional relatedness score and WordNet features improve the NLI model via better semantic awareness, and (3) by formalizing all three subtasks as a similar semantic matching problem and improving on all three stages, the complete model is able to achieve the state-of-the-art results on the FEVER test set (two times greater than baseline results).1
【Keywords】:
【Paper Link】 【Pages】:6867-6874
【Authors】: Yixin Nie ; Yicheng Wang ; Mohit Bansal
【Abstract】: Success in natural language inference (NLI) should require a model to understand both lexical and compositional semantics. However, through adversarial evaluation, we find that several state-of-the-art models with diverse architectures are over-relying on the former and fail to use the latter. Further, this compositionality unawareness is not reflected via standard evaluation on current datasets. We show that removing RNNs in existing models or shuffling input words during training does not induce large performance loss despite the explicit removal of compositional information. Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i.e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models’ actual compositionality awareness. We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models’ compositional understanding.
【Keywords】:
【Paper Link】 【Pages】:6875-6882
【Authors】: Liang Pang ; Yanyan Lan ; Jiafeng Guo ; Jun Xu ; Lixin Su ; Xueqi Cheng
【Abstract】: This paper is concerned with open-domain question answering (i.e., OpenQA). Recently, some works have viewed this problem as a reading comprehension (RC) task, and directly applied successful RC models to it. However, the performances of such models are not so good as that in the RC task. In our opinion, the perspective of RC ignores three characteristics in OpenQA task: 1) many paragraphs without the answer span are included in the data collection; 2) multiple answer spans may exist within one given paragraph; 3) the end position of an answer span is dependent with the start position. In this paper, we first propose a new probabilistic formulation of OpenQA, based on a three-level hierarchical structure, i.e., the question level, the paragraph level and the answer span level. Then a Hierarchical Answer Spans Model (HASQA) is designed to capture each probability. HAS-QA has the ability to tackle the above three problems, and experiments on public OpenQA datasets show that it significantly outperforms traditional RC baselines and recent OpenQA baselines.
【Keywords】:
【Paper Link】 【Pages】:6883-6891
【Authors】: Sunghyun Park ; Seung-won Hwang ; Fuxiang Chen ; Jaegul Choo ; Jung-Woo Ha ; Sunghun Kim ; Jinyeong Yim
【Abstract】: The problem of generating a set of diverse paraphrase sentences while (1) not compromising the original meaning of the original sentence, and (2) imposing diversity in various semantic aspects, such as a lexical or syntactic structure, is examined. Existing work on paraphrase generation has focused more on the former, and the latter was trained as a fixed style transfer, such as transferring from positive to negative sentiments, even at the cost of losing semantics. In this work, we consider style transfer as a means of imposing diversity, with a paraphrasing correctness constraint that the target sentence must remain a paraphrase of the original sentence. However, our goal is to maximize the diversity for a set of k generated paraphrases, denoted as the diversified paraphrase (DP) problem. Our key contribution is deciding the style guidance at generation towards the direction of increasing the diversity of output with respect to those generated previously. As pre-materializing training data for all style decisions is impractical, we train with biased data, but with debiasing guidance. Compared to state-of-the-art methods, our proposed model can generate more diverse and yet semantically consistent paraphrase sentences. That is, our model, trained with the MSCOCO dataset, achieves the highest embedding scores, .94/.95/.86, similar to state-of-the-art results, but with a lower mBLEU score (more diverse) by 8.73%.
【Keywords】:
【Paper Link】 【Pages】:6892-6899
【Authors】: Hai Pham ; Paul Pu Liang ; Thomas Manzini ; Louis-Philippe Morency ; Barnabás Póczos
【Abstract】: Multimodal sentiment analysis is a core research area that studies speaker sentiment expressed from the language, visual, and acoustic modalities. The central challenge in multimodal learning involves inferring joint representations that can process and relate information from these modalities. However, existing work learns joint representations by requiring all modalities as input and as a result, the learned representations may be sensitive to noisy or missing modalities at test time. With the recent success of sequence to sequence (Seq2Seq) models in machine translation, there is an opportunity to explore new ways of learning joint representations that may not require all input modalities at test time. In this paper, we propose a method to learn robust joint representations by translating between modalities. Our method is based on the key insight that translation from a source to a target modality provides a method of learning joint representations using only the source modality as input. We augment modality translations with a cycle consistency loss to ensure that our joint representations retain maximal information from all modalities. Once our translation model is trained with paired multimodal data, we only need data from the source modality at test time for final sentiment prediction. This ensures that our model remains robust from perturbations or missing information in the other modalities. We train our model with a coupled translationprediction objective and it achieves new state-of-the-art results on multimodal sentiment analysis datasets: CMU-MOSI, ICTMMMO, and YouTube. Additional experiments show that our model learns increasingly discriminative joint representations with more input modalities while maintaining robustness to missing or perturbed modalities.
【Keywords】:
【Paper Link】 【Pages】:6900-6907
【Authors】: Victor Prokhorov ; Mohammad Taher Pilehvar ; Dimitri Kartsaklis ; Pietro Lio' ; Nigel Collier
【Abstract】: Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and invivo, in two downstream text classification tasks.
【Keywords】:
【Paper Link】 【Pages】:6908-6915
【Authors】: Ratish Puduppully ; Li Dong ; Mirella Lapata
【Abstract】: Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model1 outperforms strong baselines improving the state-of-the-art on the recently released RotoWIRE dataset.
【Keywords】:
【Paper Link】 【Pages】:6916-6923
【Authors】: Marek Rei ; Anders Søgaard
【Abstract】: Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language composition can allow us to guide the models based on existing knowledge, regularizing them towards more robust and interpretable representations. In this paper, we investigate how objectives at different granularities can be used to learn better language representations and we propose an architecture for jointly learning to label sentences and tokens. The predictions at each level are combined together using an attention mechanism, with token-level labels also acting as explicit supervision for composing sentence-level representations. Our experiments show that by learning to perform these tasks jointly on multiple levels, the model achieves substantial improvements for both sentence classification and sequence labeling.
【Keywords】:
【Paper Link】 【Pages】:6924-6931
【Authors】: Shruti Rijhwani ; Jiateng Xie ; Graham Neubig ; Jaime G. Carbonell
【Abstract】: Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-basedentity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner. With experiments on 9 low-resource languages and transfer through a total of54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario.1 Further, we also investigate the use of language-universal phonological representations which improves average accuracy (absolute) by 36% when transferring between languages that use different scripts.
【Keywords】:
【Paper Link】 【Pages】:6932-6939
【Authors】: Andreas Rücklé ; Nafise Sadat Moosavi ; Iryna Gurevych
【Abstract】: Current neural network based community question answering (cQA) systems fall short of (1) properly handling long answers which are common in cQA; (2) performing under small data conditions, where a large amount of training data is unavailable—i.e., for some domains in English and even more so for a huge number of datasets in other languages; and (3) benefiting from syntactic information in the model—e.g., to differentiate between identical lexemes with different syntactic roles. In this paper, we propose COALA, an answer selection approach that (a) selects appropriate long answers due to an effective comparison of all question-answer aspects, (b) has the ability to generalize from a small number of training examples, and (c) makes use of the information about syntactic roles of words. We show that our approach outperforms existing answer selection models by a large margin on six cQA datasets from different domains. Furthermore, we report the best results on the passage retrieval benchmark WikiPassageQA.
【Keywords】:
【Paper Link】 【Pages】:6940-6948
【Authors】: Devendra Singh Sachan ; Manzil Zaheer ; Ruslan Salakhutdinov
【Abstract】: In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semisupervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-theart results for text classification task on several benchmark datasets. In particular, on the ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin. We also show the generality of the mixed objective function by improving the performance on relation extraction task.1
【Keywords】:
【Paper Link】 【Pages】:6949-6956
【Authors】: Victor Sanh ; Thomas Wolf ; Sebastian Ruder
【Abstract】: Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.
【Keywords】:
【Paper Link】 【Pages】:6957-6964
【Authors】: Vasanth Sarathy ; Matthias Scheutz
【Abstract】: Anaphora resolution is a central problem in natural language understanding. We study a subclass of this problem involving object pronouns when they are used in simple imperative sentences (e.g., “pick it up.”). Specifically, we address cases where situational and contextual information is required to interpret these pronouns. Current state-of-the art statisticallydriven coreference systems and knowledge-based reasoning systems are insufficient to address these cases. In this paper, we introduce, with examples, a general class of situated anaphora resolution problems, propose a proof-of-concept system for disambiguating situated pronouns, and discuss some general types of reasoning that might be needed.
【Keywords】:
【Paper Link】 【Pages】:6965-6973
【Authors】: Timo Schick ; Hinrich Schütze
【Abstract】: Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this induced embedding space. Currently, two approaches for learning embeddings of novel words exist: (i) learning an embedding from the novel word’s surface-form (e.g., subword n-grams) and (ii) learning an embedding from the context in which it occurs. In this paper, we propose an architecture that leverages both sources of information – surface-form and context – and show that it results in large increases in embedding quality. Our architecture obtains state-of-the-art results on the Definitional Nonce and Contextual Rare Words datasets. As input, we only require an embedding set and an unlabeled corpus for training our architecture to produce embeddings appropriate for the induced embedding space. Thus, our model can easily be integrated into any existing NLP system and enhance its capability to handle novel words.
【Keywords】:
【Paper Link】 【Pages】:6974-6981
【Authors】: Claudia Schulz ; Christian M. Meyer ; Iryna Gurevych
【Abstract】: Diagnostic reasoning is a key component of many professions. To improve students’ diagnostic reasoning skills, educational psychologists analyse and give feedback on epistemic activities used by these students while diagnosing, in particular, hypothesis generation, evidence generation, evidence evaluation, and drawing conclusions. However, this manual analysis is highly time-consuming. We aim to enable the large-scale adoption of diagnostic reasoning analysis and feedback by automating the epistemic activity identification. We create the first corpus for this task, comprising diagnostic reasoning selfexplanations of students from two domains annotated with epistemic activities. Based on insights from the corpus creation and the task’s characteristics, we discuss three challenges for the automatic identification of epistemic activities using AI methods: the correct identification of epistemic activity spans, the reliable distinction of similar epistemic activities, and the detection of overlapping epistemic activities. We propose a separate performance metric for each challenge and thus provide an evaluation framework for future research. Indeed, our evaluation of various state-of-the-art recurrent neural network architectures reveals that current techniques fail to address some of these challenges.
【Keywords】:
【Paper Link】 【Pages】:6982-6990
【Authors】: Holger Schwenk ; Douwe Kiela ; Matthijs Douze
【Abstract】: Multilingual sentence and document representations are becoming increasingly important. We build on recent advances in multilingual sentence encoders, with a focus on efficiency and large-scale applicability. Specifically, we construct and investigate the k-nn graph over the joint space of 566 million news sentences in seven different languages. We show excellent multilingual retrieval quality on the UN corpus of 11.3M sentences, which extends to the zero-shot case where we have never seen a language. We provide a detailed analysis of both the multilingual sentence encoder for twenty-one European languages and the learned graph. Our sentence encoder is language agnostic and supports code switching.
【Keywords】:
【Paper Link】 【Pages】:6991-6998
【Authors】: Jiaxin Shi ; Lei Hou ; Juanzi Li ; Zhiyuan Liu ; Hanwang Zhang
【Abstract】: Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However, existing models have no explicit mechanism to emphasize taskinformative words in the tree structure. To this end, we propose an Attentive Recursive Tree model (AR-Tree), where the words are dynamically located according to their importance in the task. Specifically, we construct the latent tree for a sentence in a proposed important-first strategy, and place more attentive words nearer to the root; thus, AR-Tree can inherently emphasize important words during the bottomup composition of the sentence embedding. We propose an end-to-end reinforced training strategy for AR-Tree, which is demonstrated to consistently outperform, or be at least comparable to, the state-of-the-art sentence embedding methods on three sentence understanding tasks.
【Keywords】:
【Paper Link】 【Pages】:6999-7006
【Authors】: Jiaxin Shi ; Chen Liang ; Lei Hou ; Juanzi Li ; Zhiyuan Liu ; Hanwang Zhang
【Abstract】: We propose DeepChannel, a robust, data-efficient, and interpretable neural model for extractive document summarization. Given any document-summary pair, we estimate a salience score, which is modeled using an attention-based deep neural network, to represent the salience degree of the summary for yielding the document. We devise a contrastive training strategy to learn the salience estimation network, and then use the learned salience score as a guide and iteratively extract the most salient sentences from the document as our generated summary. In experiments, our model not only achieves state-of-the-art ROUGE scores on CNN/Daily Mail dataset, but also shows strong robustness in the out-of-domain test on DUC2007 test set. Moreover, our model reaches a ROUGE-1 F-1 score of 39.41 on CNN/Daily Mail test set with merely 1/100 training set, demonstrating a tremendous data efficiency.
【Keywords】:
【Paper Link】 【Pages】:7007-7014
【Authors】: Zhouxing Shi ; Minlie Huang
【Abstract】: Discourse structures are beneficial for various NLP tasks such as dialogue understanding, question answering, sentiment analysis, and so on. This paper presents a deep sequential model for parsing discourse dependency structures of multi-party dialogues. The proposed model aims to construct a discourse dependency tree by predicting dependency relations and constructing the discourse structure jointly and alternately. It makes a sequential scan of the Elementary DiscourseUnits(EDUs)1 in a dialogue. For each EDU, the model decides to which previous EDU the current one should link and what the corresponding relation type is. The predicted link and relation type are then used to build the discourse structure incrementally with a structured encoder. During link prediction and relation classification, the model utilizes not only local information that represents the concerned EDUs, but also global information that encodes the EDU sequence and the discourse structure that is already built at the current step. Experiments show that the proposed model outperforms all the state-of-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:7015-7022
【Authors】: Farhad Bin Siddique ; Dario Bertero ; Pascale Fung
【Abstract】: We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian. Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, GlobalTrait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset.
【Keywords】:
【Paper Link】 【Pages】:7023-7030
【Authors】: Vivian Dos Santos Silva ; André Freitas ; Siegfried Handschuh
【Abstract】: Recognizing textual entailment is a key task for many semantic applications, such as Question Answering, Text Summarization, and Information Extraction, among others. Entailment scenarios can range from a simple syntactic variation to more complex semantic relationships between pieces of text, but most approaches try a one-size-fits-all solution that usually favors some scenario to the detriment of another. We propose a composite approach for recognizing text entailment which analyzes the entailment pair to decide whether it must be resolved syntactically or semantically. We also make the answer interpretable: whenever an entailment is solved semantically, we explore a knowledge base composed of structured lexical definitions to generate natural language humanlike justifications, explaining the semantic relationship holding between the pieces of text. Besides outperforming wellestablished entailment algorithms, our composite approach gives an important step towards Explainable AI, using world knowledge to make the semantic reasoning process explicit and understandable.
【Keywords】:
【Paper Link】 【Pages】:7031-7038
【Authors】: Behrouz Haji Soleimani ; Stan Matwin
【Abstract】: Continuous word representations that can capture the semantic information in the corpus are the building blocks of many natural language processing tasks. Pre-trained word embeddings are being used for sentiment analysis, text classification, question answering and so on. In this paper, we propose a new word embedding algorithm that works on a smoothed Positive Pointwise Mutual Information (PPMI) matrix which is obtained from the word-word co-occurrence counts. One of our major contributions is to propose an objective function and an optimization framework that exploits the full capacity of “negative examples”, the unobserved or insignificant wordword co-occurrences, in order to push unrelated words away from each other which improves the distribution of words in the latent space. We also propose a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions. Moreover, we propose an approximate alternative to our algorithm using a modified Vantage Point tree and reduce the computational complexity of the algorithm to |V |log|V | with respect to the number of words in the vocabulary. We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that our method outperforms the state-of-the-art in most word similarity tasks by a good margin.
【Keywords】:
【Paper Link】 【Pages】:7039-7046
【Authors】: Changzhi Sun ; Yuanbin Wu
【Abstract】: We investigate the task of distantly supervised joint entity relation extraction. It’s known that training with distant supervision will suffer from noisy samples. To tackle the problem, we propose to adapt a small manually labelled dataset to the large automatically generated dataset. By developing a novel adaptation algorithm, we are able to transfer the high quality but heterogeneous entity relation annotations in a robust and consistent way. Experiments on the benchmark NYT dataset show that our approach significantly outperforms state-ofthe-art methods.
【Keywords】:
【Paper Link】 【Pages】:7047-7054
【Authors】: Jingyuan Sun ; Shaonan Wang ; Jiajun Zhang ; Chengqing Zong
【Abstract】: Decoding human brain activities based on linguistic representations has been actively studied in recent years. However, most previous studies exclusively focus on word-level representations, and little is learned about decoding whole sentences from brain activation patterns. This work is our effort to mend the gap. In this paper, we build decoders to associate brain activities with sentence stimulus via distributed representations, the currently dominant sentence representation approach in natural language processing (NLP). We carry out a systematic evaluation, covering both widely-used baselines and state-of-the-art sentence representation models. We demonstrate how well different types of sentence representations decode the brain activation patterns and give empirical explanations of the performance difference. Moreover, to explore how sentences are neurally represented in the brain, we further compare the sentence representation’s correspondence to different brain areas associated with high-level cognitive functions. We find the supervised structured representation models most accurately probe the language atlas of human brain. To the best of our knowledge, this work is the first comprehensive evaluation of distributed sentence representations for brain decoding. We hope this work can contribute to decoding brain activities with NLP representation models, and understanding how linguistic items are neurally represented.
【Keywords】:
【Paper Link】 【Pages】:7055-7062
【Authors】: Zeyu Sun ; Qihao Zhu ; Lili Mou ; Yingfei Xiong ; Ge Li ; Lu Zhang
【Abstract】: Code generation maps a program description to executable source code in a programming language. Existing approaches mainly rely on a recurrent neural network (RNN) as the decoder. However, we find that a program contains significantly more tokens than a natural language sentence, and thus it may be inappropriate for RNN to capture such a long sequence. In this paper, we propose a grammar-based structural convolutional neural network (CNN) for code generation. Our model generates a program by predicting the grammar rules of the programming language; we design several CNN modules, including the tree-based convolution and pre-order convolution, whose information is further aggregated by dedicated attentive pooling layers. Experimental results on the HearthStone benchmark dataset show that our CNN code generator significantly outperforms the previous state-of-the-art method by 5 percentage points; additional experiments on several semantic parsing tasks demonstrate the robustness of our model. We also conduct in-depth ablation test to better understand each component of our model.
【Keywords】:
【Paper Link】 【Pages】:7063-7071
【Authors】: Oyvind Tafjord ; Peter Clark ; Matt Gardner ; Wen-tau Yih ; Ashish Sabharwal
【Abstract】: Many natural la guage questions require recognizing and reasoning with qualitative relationships (e.g., in science, economics, and medicine), but are challenging to answer with corpus-based methods. Qualitative modeling provides tools that support such reasoning, but the semantic parsing task of mapping questions into those models has formidable challenges. We present QUAREL, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. The dataset has 2771 questions relating 19 different types of quantities. For example, “Jenny observes that the robot vacuum cleaner moves slower on the living room carpet than on the bedroom carpet. Which carpet has more friction?” We contribute (1) a simple and flexible conceptual framework for representing these kinds of questions; (2) the QUAREL dataset, including logical forms, exemplifying the parsing challenges; and (3) two novel models for this task, built as extensions of type-constrained semantic parsing. The first of these models (called QUASP+) significantly outperforms off-the-shelf tools on QUAREL. The second (QUASP+ZERO) demonstrates zero-shot capability, i.e., the ability to handle new qualitative relationships without requiring additional training data, something not possible with previous models. This work thus makes inroads into answering complex, qualitative questions that require reasoning, and scaling to new relationships at low cost. The dataset and models are available at http://data.allenai.org/quarel.
【Keywords】:
【Paper Link】 【Pages】:7072-7079
【Authors】: Ryuichi Takanobu ; Tianyang Zhang ; Jiexi Liu ; Minlie Huang
【Abstract】: Most existing methods determine relation types only after all the entities have been recognized, thus the interaction between relation types and entity mentions is not fully modeled. This paper presents a novel paradigm to deal with relation extraction by regarding the related entities as the arguments of a relation. We apply a hierarchical reinforcement learning (HRL) framework in this paradigm to enhance the interaction between entity mentions and relation types. The whole extraction process is decomposed into a hierarchy of two-level RL policies for relation detection and entity extraction respectively, so that it is more feasible and natural to deal with overlapping relations. Our model was evaluated on public datasets collected via distant supervision, and results show that it gains better performance than existing methods and is more powerful for extracting overlapping relations1.
【Keywords】:
【Paper Link】 【Pages】:7080-7087
【Authors】: Zhen Tan ; Xiang Zhao ; Wei Wang ; Weidong Xiao
【Abstract】: Triplets extraction is an essential and pivotal step in automatic knowledge base construction, which captures structural information from unstructured text corpus. Conventional extraction models use a pipeline of named entity recognition and relation classification to extract entities and relations, respectively, which ignore the connection between the two tasks. Recently, several neural network-based models were proposed to tackle the problem, and achieved state-of-the-art performance. However, most of them are unable to extract multiple triplets from a single sentence, which are yet commonly seen in real-life scenarios. To close the gap, we propose in this paper a joint neural extraction model for multitriplets, namely, TME, which is capable of adaptively discovering multiple triplets simultaneously in a sentence via ranking with translation mechanism. In experiment, TME exhibits superior performance and achieves an improvement of 37.6% on F1 score over state-of-the-art competitors.
【Keywords】:
【Paper Link】 【Pages】:7088-7095
【Authors】: Min Tang ; Jiaran Cai ; Hankz Hankui Zhuo
【Abstract】: Multiple-choice machine reading comprehension is an important and challenging task where the machine is required to select the correct answer from a set of candidate answers given passage and question. Existing approaches either match extracted evidence with candidate answers shallowly or model passage, question and candidate answers with a single paradigm of matching. In this paper, we propose Multi-Matching Network (MMN) which models the semantic relationship among passage, question and candidate answers from multiple different paradigms of matching. In our MMN model, each paradigm is inspired by how human think and designed under a unified compose-match framework. To demonstrate the effectiveness of our model, we evaluate MMN on a large-scale multiple choice machine reading comprehension dataset (i.e. RACE). Empirical results show that our proposed model achieves a significant improvement compared to strong baselines and obtains state-of-the-art results.
【Keywords】:
【Paper Link】 【Pages】:7096-7103
【Authors】: Yasufumi Taniguchi ; Yukun Feng ; Hiroya Takamura ; Manabu Okumura
【Abstract】: We address the task of generating live soccer-match commentaries from play event data. This task has characteristics that (i) each commentary is only partially aligned with events, (ii) play event data contains many types of categorical and numerical attributes, (iii) live commentaries often mention player names and team names. For these reasons, we propose an encoder for play event data, which is enhanced with a gate mechanism. We also introduce an attention mechanism on events. In addition, we introduced placeholders and their reconstruction mechanism to enable the model to copy appropriate player names and team names from the input data. We conduct experiments on the play data of the English Premier League, provide a discussion on the result including generated commentaries.
【Keywords】:
【Paper Link】 【Pages】:7104-7111
【Authors】: Julien Tissier ; Christophe Gravier ; Amaury Habrard
【Abstract】: Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of memory and calculations which makes them unsuitable for use on low-resource devices. The method proposed in this paper transforms real-valued embeddings into binary embeddings while preserving semantic information, requiring only 128 or 256 bits for each vector. This leads to a small memory footprint and fast vector operations. The model is based on an autoencoder architecture, which also allows to reconstruct original vectors from the binary ones. Experimental results on semantic similarity, text classification and sentiment analysis tasks show that the binarization of word embeddings only leads to a loss of ∼2% in accuracy while vector size is reduced by 97%. Furthermore, a top-k benchmark demonstrates that using these binary vectors is 30 times faster than using real-valued vectors.
【Keywords】:
【Paper Link】 【Pages】:7112-7119
【Authors】: Maksim Tkachenko ; Hady W. Lauw
【Abstract】: A number of real-world applications require comparison of entities based on their textual representations. In this work, we develop a topic model supervised by pairwise comparisons of documents. Such a model seeks to yield topics that help to differentiate entities along some dimension of interest, which may vary from one application to another. While previous supervised topic models consider document labels in an independent and pointwise manner, our proposed Comparative Latent Dirichlet Allocation (CompareLDA) learns predictive topic distributions that comply with the pairwise comparison observations. To fit the model, we derive a maximum likelihood estimation method via augmented variational approximation algorithm. Evaluation on several public datasets underscores the strengths of CompareLDA in modelling document comparisons.
【Keywords】:
【Paper Link】 【Pages】:7120-7127
【Authors】: Takuma Udagawa ; Akiko Aizawa
【Abstract】: Common grounding is the process of creating, repairing and updating mutual understandings, which is a critical aspect of sophisticated human communication. However, traditional dialogue systems have limited capability of establishing common ground, and we also lack task formulations which introduce natural difficulty in terms of common grounding while enabling easy evaluation and analysis of complex models. In this paper, we propose a minimal dialogue task which requires advanced skills of common grounding under continuous and partially-observable context. Based on this task formulation, we collected a largescale dataset of 6,760 dialogues which fulfills essential requirements of natural language corpora. Our analysis of the dataset revealed important phenomena related to common grounding that need to be considered. Finally, we evaluate and analyze baseline neural models on a simple subtask that requires recognition of the created common ground. We show that simple baseline models perform decently but leave room for further improvement. Overall, we show that our proposed task will be a fundamental testbed where we can train, evaluate, and analyze dialogue system’s ability for sophisticated common grounding.
【Keywords】:
【Paper Link】 【Pages】:7128-7135
【Authors】: Chengyu Wang ; Xiaofeng He ; Aoying Zhou
【Abstract】: Hypernymy is a basic semantic relation in computational linguistics that expresses the “is-a” relation between a generic concept and its specific instances, serving as the backbone in taxonomies and ontologies. Although several NLP tasks related to hypernymy prediction have been extensively addressed, few methods have fully exploited the large number of hypernymy relations in Web-scale taxonomies.In this paper, we introduce the Taxonomy Enhanced Adversarial Learning (TEAL) for hypernymy prediction. We first propose an unsupervised measure U-TEAL to distinguish hypernymy with other semantic relations. It is implemented based on a word embedding projection network distantly trained over a taxonomy. To address supervised hypernymy detection tasks, the supervised model S-TEAL and its improved version, the adversarial supervised model AS-TEAL, are further presented. Specifically, AS-TEAL employs a coupled adversarial training algorithm to transfer hierarchical knowledge in taxonomies to hypernymy prediction models. We conduct extensive experiments to confirm the effectiveness of TEAL over three standard NLP tasks: unsupervised hypernymy classification, supervised hypernymy detection and graded lexical entailment. We also show that TEAL can be applied to non-English languages and can detect missing hypernymy relations in taxonomies.
【Keywords】:
【Paper Link】 【Pages】:7136-7143
【Authors】: Haohan Wang ; Da Sun ; Eric P. Xing
【Abstract】: Nature language inference (NLI) task is a predictive task of determining the inference relationship of a pair of natural language sentences. With the increasing popularity of NLI, many state-of-the-art predictive models have been proposed with impressive performances. However, several works have noticed the statistical irregularities in the collected NLI data set that may result in an over-estimated performance of these models and proposed remedies. In this paper, we further investigate the statistical irregularities, what we refer as confounding factors, of the NLI data sets. With the belief that some NLI labels should preserve under swapping operations, we propose a simple yet effective way (swapping the two text fragments) of evaluating the NLI predictive models that naturally mitigate the observed problems. Further, we continue to train the predictive models with our swapping manner and propose to use the deviation of the model’s evaluation performances under different percentages of training text fragments to be swapped to describe the robustness of a predictive model. Our evaluation metrics leads to some interesting understandings of recent published NLI methods. Finally, we also apply the swapping operation on NLI models to see the effectiveness of this straightforward method in mitigating the confounding factor problems in training generic sentence embeddings for other NLP transfer tasks.
【Keywords】:
【Paper Link】 【Pages】:7144-7151
【Authors】: Lei Wang ; Dongxiang Zhang ; Jipeng Zhang ; Xing Xu ; Lianli Gao ; Bing Tian Dai ; Heng Tao Shen
【Abstract】: The design of automatic solvers to arithmetic math word problems has attracted considerable attention in recent years and a large number of datasets and methods have been published. Among them, Math23K is the largest data corpus that is very helpful to evaluate the generality and robustness of a proposed solution. The best performer in Math23K is a seq2seq model based on LSTM to generate the math expression. However, the model suffers from performance degradation in large space of target expressions. In this paper, we propose a template-based solution based on recursive neural network for math expression construction. More specifically, we first apply a seq2seq model to predict a tree-structure template, with inferred numbers as leaf nodes and unknown operators as inner nodes. Then, we design a recursive neural network to encode the quantity with Bi-LSTM and self attention, and infer the unknown operator nodes in a bottom-up manner. The experimental results clearly establish the superiority of our new framework as we improve the accuracy by a wide margin in two of the largest datasets, i.e., from 58.1% to 66.9% in Math23K and from 62.8% to 66.8% in MAWPS.
【Keywords】:
【Paper Link】 【Pages】:7152-7159
【Authors】: PeiFeng Wang ; Jialong Han ; Chenliang Li ; Rong Pan
【Abstract】: Knowledge graph embedding aims at modeling entities and relations with low-dimensional vectors. Most previous methods require that all entities should be seen during training, which is unpractical for real-world knowledge graphs with new entities emerging on a daily basis. Recent efforts on this issue suggest training a neighborhood aggregator in conjunction with the conventional entity and relation embeddings, which may help embed new entities inductively via their existing neighbors. However, their neighborhood aggregators neglect the unordered and unequal natures of an entity’s neighbors. To this end, we summarize the desired properties that may lead to effective neighborhood aggregators. We also introduce a novel aggregator, namely, Logic Attention Network (LAN), which addresses the properties by aggregating neighbors with both rules- and network-based attention weights. By comparing with conventional aggregators on two knowledge graph completion tasks, we experimentally validate LAN’s superiority in terms of the desired properties.
【Keywords】:
【Paper Link】 【Pages】:7160-7167
【Authors】: Rui Wang ; Xin Xin ; Wei Chang ; Kun Ming ; Biao Li ; Xin Fan
【Abstract】: In this paper, we investigate how to improve Chinese named entity recognition (NER) by jointly modeling NER and constituent parsing, in the framework of neural conditional random fields (CRF). We reformulate the parsing task to heightlimited constituent parsing, by which the computational complexity can be significantly reduced, and the majority of phrase-level grammars are retained. Specifically, an unified model of neural semi-CRF and neural tree-CRF is proposed, which simultaneously conducts word segmentation, part-ofspeech (POS) tagging, NER, and parsing. The challenge comes from how to train and infer the joint model, which has not been solved previously. We design a dynamic programming algorithm for both training and inference, whose complexity is O(n·4h), where n is the sentence length and h is the height limit. In addition, we derive a pruning algorithm for the joint model, which further prunes 99.9% of the search space with 2% loss of the ground truth data. Experimental results on the OntoNotes 4.0 dataset have demonstrated that the proposed model outperforms the state-of-the-art method by 2.79 points in the F1-measure.
【Keywords】:
【Paper Link】 【Pages】:7168-7175
【Authors】: Siyuan Wang ; Zhongyu Wei ; Zhihao Fan ; Yang Liu ; Xuanjing Huang
【Abstract】: Question generation aims to produce questions automatically given a piece of text as input. Existing research follows a sequence-to-sequence fashion that constructs a single question based on the input. Considering each question usually focuses on a specific fragment of the input, especially in the scenario of reading comprehension, it is reasonable to identify the corresponding focus before constructing the question. In this paper, we propose to identify question-worthy phrases first and generate questions with the assistance of these phrases. We introduce a multi-agent communication framework, taking phrase extraction and question generation as two agents, and learn these two tasks simultaneously via message passing mechanism. The results of experiments show the effectiveness of our framework: we can extract question-worthy phrases, which are able to improve the performance of question generation. Besides, our system is able to extract more than one question worthy phrases and generate multiple questions accordingly.
【Keywords】:
【Paper Link】 【Pages】:7176-7183
【Authors】: Su Wang ; Rahul Gupta ; Nancy Chang ; Jason Baldridge
【Abstract】: Paraphrasing is rooted in semantics. We show the effectiveness of transformers (Vaswani et al. 2017) for paraphrase generation and further improvements by incorporating PropBank labels via a multi-encoder. Evaluating on MSCOCO and WikiAnswers, we find that transformers are fast and effective, and that semantic augmentation for both transformers and LSTMs leads to sizable 2-3 point gains in BLEU, METEOR and TER. More importantly, we find surprisingly large gains on human evaluations compared to previous models. Nevertheless, manual inspection of generated paraphrases reveals ample room for improvement: even our best model produces human-acceptable paraphrases for only 28% of captions from the CHIA dataset (Sharma et al. 2018), and it fails spectacularly on sentences from Wikipedia. Overall, these results point to the potential for incorporating semantics in the task while highlighting the need for stronger evaluation.
【Keywords】:
【Paper Link】 【Pages】:7184-7191
【Authors】: Tianming Wang ; Xiaojun Wan
【Abstract】: Modeling discourse coherence is an important problem in natural language generation and understanding. Sentence ordering, the goal of which is to organize a set of sentences into a coherent text, is a commonly used task to learn and evaluate the model. In this paper, we propose a novel hierarchical attention network that captures word clues and dependencies between sentences to address this problem. Our model outperforms prior methods and achieves state-of-the-art performance on several datasets in different domains. Furthermore, our experiments demonstrate that the model performs very well even though adding noisy sentences into the set, which shows the robustness and effectiveness of the model. Visualization analysis and case study show that our model captures the structure and pattern of coherent texts not only by simple word clues but also by consecution in context.
【Keywords】:
【Paper Link】 【Pages】:7192-7199
【Authors】: Wenya Wang ; Sinno Jialin Pan
【Abstract】: In fine-grained opinion mining, aspect and opinion terms extraction has become a fundamental task that provides key information for user-generated texts. Despite its importance, a lack of annotated resources in many domains impede the ability to train a precise model. Very few attempts have applied unsupervised domain adaptation methods to transfer fine-grained knowledge (in the word level) from some labeled source domain(s) to any unlabeled target domain. Existing methods depend on the construction of “pivot” knowledge, e.g., common opinion terms or syntactic relations between aspect and opinion words. In this work, we propose an interactive memory network that consists of local and global memory units. The model could exploit both local and global memory interactions to capture intra-correlations among aspect words or opinion words themselves, as well as the interconnections between aspect and opinion words. The source space and the target space are aligned through these domaininvariant interactions by incorporating an auxiliary task and domain adversarial networks. The proposed model does not require any external resources and demonstrates promising results on 3 benchmark datasets.
【Keywords】:
【Paper Link】 【Pages】:7200-7207
【Authors】: Xiaobin Wang ; Deng Cai ; Linlin Li ; Guangwei Xu ; Hai Zhao ; Luo Si
【Abstract】: By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this work makes the first attempt at exploring adding unsupervised segmentation information into neural supervised segmenter. We survey various effective strategies, including extending the character embedding, augmenting the word score and applying multi-task learning, for leveraging unsupervised information derived from abundant unlabeled data. Experiments on standard data sets show that the explored strategies indeed improve the recall rate of out-of-vocabulary words and thus boost the segmentation accuracy. Moreover, the model enhanced by the proposed methods outperforms state-of-theart models in closed test and shows promising improvement trend when adopting three different strategies with the help of a large unlabeled data set. Our thorough empirical study eventually verifies the proposed approach outperforms the widelyused pre-training approach in terms of effectively making use of freely abundant unlabeled data.
【Keywords】:
【Paper Link】 【Pages】:7208-7215
【Authors】: Xiaoyan Wang ; Pavan Kapanipathi ; Ryan Musa ; Mo Yu ; Kartik Talamadupula ; Ibrahim Abdelaziz ; Maria Chang ; Achille Fokoue ; Bassem Makni ; Nicholas Mattei ; Michael Witbrock
【Abstract】: Natural Language Inference (NLI) is fundamental to many Natural Language Processing (NLP) applications including semantic search and question answering. The NLI problem has gained significant attention due to the release of large scale, challenging datasets. Present approaches to the problem largely focus on learning-based methods that use only textual information in order to classify whether a given premise entails, contradicts, or is neutral with respect to a given hypothesis. Surprisingly, the use of methods based on structured knowledge – a central topic in artificial intelligence – has not received much attention vis-a-vis the NLI problem. While there are many open knowledge bases that contain various types of reasoning information, their use for NLI has not been well explored. To address this, we present a combination of techniques that harness external knowledge to improve performance on the NLI problem in the science questions domain. We present the results of applying our techniques on text, graph, and text-and-graph based models; and discuss the implications of using external knowledge to solve the NLI problem. Our model achieves close to state-of-the-art performance for NLI on the SciTail science questions dataset.
【Keywords】:
【Paper Link】 【Pages】:7216-7223
【Authors】: Yansen Wang ; Ying Shen ; Zhun Liu ; Paul Pu Liang ; Amir Zadeh ; Louis-Philippe Morency
【Abstract】: Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.
【Keywords】:
【Paper Link】 【Pages】:7224-7232
【Authors】: Yu Wang ; Hongxia Jin
【Abstract】: In this paper, we present a multi-step coarse to fine question answering (MSCQA) system which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multistep question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3%-1.7% accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.
【Keywords】:
【Paper Link】 【Pages】:7233-7240
【Authors】: Zhao Wang ; Aron Culotta
【Abstract】: Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimates, they do not scale to the potentially millions of comparisons necessary to consider all lexical choices. Instead, in this paper, we first offer two classes of methods to estimate the effect on perception of changing one word to another in a given sentence. The first class of algorithms builds upon quasi-experimental designs to estimate individual treatment effects from observational data. The second class treats treatment effect estimation as a classification problem. We conduct experiments with three data sources (Yelp, Twitter, and Airbnb), finding that the algorithmic estimates align well with those produced by randomized-control trials. Additionally, we find that it is possible to transfer treatment effect classifiers across domains and still maintain high accuracy.
【Keywords】:
【Paper Link】 【Pages】:7241-7248
【Authors】: Zhi Wang ; Wei Bi ; Yan Wang ; Xiaojiang Liu
【Abstract】: Transfer learning for deep neural networks has achieved great success in many text classification applications. A simple yet effective transfer learning method is to fine-tune the pretrained model parameters. Previous fine-tuning works mainly focus on the pre-training stage and investigate how to pretrain a set of parameters that can help the target task most. In this paper, we propose an Instance Weighting based Finetuning (IW-Fit) method, which revises the fine-tuning stage to improve the final performance on the target domain. IW-Fit adjusts instance weights at each fine-tuning epoch dynamically to accomplish two goals: 1) identify and learn the specific knowledge of the target domain effectively; 2) well preserve the shared knowledge between the source and the target domains. The designed instance weighting metrics used in IW-Fit are model-agnostic, which are easy to implement for general DNN-based classifiers. Experimental results show that IW-Fit can consistently improve the classification accuracy on the target domain.
【Keywords】:
【Paper Link】 【Pages】:7249-7256
【Authors】: Penghui Wei ; Wenji Mao ; Guandan Chen
【Abstract】: Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:7257-7264
【Authors】: Xiangpeng Wei ; Yue Hu ; Luxi Xing ; Yipeng Wang ; Li Gao
【Abstract】: The dominant neural machine translation (NMT) models that based on the encoder-decoder architecture have recently achieved the state-of-the-art performance. Traditionally, the NMT models only depend on the representations learned during training for mapping a source sentence into the target domain. However, the learned representations often suffer from implicit and inadequately informed properties. In this paper, we propose a novel bilingual topic enhanced NMT (BLTNMT) model to improve translation performance by incorporating bilingual topic knowledge into NMT. Specifically, the bilingual topic knowledge is included into the hidden states of both encoder and decoder, as well as the attention mechanism. With this new setting, the proposed BLT-NMT has access to the background knowledge implied in bilingual topics which is beyond the sequential context, and enables the attention mechanism to attend to topic-level attentions for generating accurate target words during translation. Experimental results show that the proposed model consistently outperforms the traditional RNNsearch and the previous topic-informed NMT on Chinese-English and EnglishGerman translation tasks. We also introduce the bilingual topic knowledge into the newly emerged Transformer base model on English-German translation and achieve a notable improvement.
【Keywords】:
【Paper Link】 【Pages】:7265-7272
【Authors】: Robert West ; Eric Horvitz
【Abstract】: Humor is an essential human trait. Efforts to understand humor have called out links between humor and the foundations of cognition, as well as the importance of humor in social engagement. As such, it is a promising and important subject of study, with relevance for artificial intelligence and human– computer interaction. Previous computational work on humor has mostly operated at a coarse level of granularity, e.g., predicting whether an entire sentence, paragraph, document, etc., is humorous. As a step toward deep understanding of humor, we seek fine-grained models of attributes that make a given text humorous. Starting from the observation that satirical news headlines tend to resemble serious news headlines, we build and analyze a corpus of satirical headlines paired with nearly identical but serious headlines. The corpus is constructed via Unfun.me, an online game that incentivizes players to make minimal edits to satirical headlines with the goal of making other players believe the results are serious headlines. The edit operations used to successfully remove humor pinpoint the words and concepts that play a key role in making the original, satirical headline funny. Our analysis reveals that the humor tends to reside toward the end of headlines, and primarily in noun phrases, and that most satirical headlines follow a certain logical pattern, which we term false analogy. Overall, this paper deepens our understanding of the syntactic and semantic structure of satirical news headlines and provides insights for building humor-producing systems.
【Keywords】:
【Paper Link】 【Pages】:7273-7280
【Authors】: Shanchan Wu ; Kai Fan ; Qiong Zhang
【Abstract】: Distant supervised relation extraction has been successfully applied to large corpus with thousands of relations. However, the inevitable wrong labeling problem by distant supervision will hurt the performance of relation extraction. In this paper, we propose a method with neural noise converter to alleviate the impact of noisy data, and a conditional optimal selector to make proper prediction. Our noise converter learns the structured transition matrix on logit level and captures the property of distant supervised relation extraction dataset. The conditional optimal selector on the other hand helps to make proper prediction decision of an entity pair even if the group of sentences is overwhelmed by no-relation sentences. We conduct experiments on a widely used dataset and the results show significant improvement over competitive baseline methods.
【Keywords】:
【Paper Link】 【Pages】:7281-7288
【Authors】: Yu Wu ; Furu Wei ; Shaohan Huang ; Yunli Wang ; Zhoujun Li ; Ming Zhou
【Abstract】: Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses. We propose a new paradigm, prototypethen-edit for response generation, that first retrieves a prototype response from a pre-defined index and then edits the prototype response according to the differences between the prototype context and current context. Our motivation is that the retrieved prototype provides a good start-point for generation because it is grammatical and informative, and the post-editing process further improves the relevance and coherence of the prototype. In practice, we design a contextaware editing model that is built upon an encoder-decoder framework augmented with an editing vector. We first generate an edit vector by considering lexical differences between a prototype context and current context. After that, the edit vector and the prototype response representation are fed to a decoder to generate a new response. Experiment results on a large scale dataset demonstrate that our new paradigm significantly increases the relevance, diversity and originality of generation results, compared to traditional generative models. Furthermore, our model outperforms retrieval-based methods in terms of relevance and originality.
【Keywords】:
【Paper Link】 【Pages】:7289-7296
【Authors】: Yuexin Wu ; Xiujun Li ; Jingjing Liu ; Jianfeng Gao ; Yiming Yang
【Abstract】: Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the stateaction space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.1
【Keywords】:
【Paper Link】 【Pages】:7297-7304
【Authors】: Mengzhou Xia ; Guoping Huang ; Lemao Liu ; Shuming Shi
【Abstract】: A translation memory (TM) is proved to be helpful to improve neural machine translation (NMT). Existing approaches either pursue the decoding efficiency by merely accessing local information in a TM or encode the global information in a TM yet sacrificing efficiency due to redundancy. We propose an efficient approach to making use of the global information in a TM. The key idea is to pack a redundant TM into a compact graph and perform additional attention mechanisms over the packed graph for integrating the TM representation into the decoding network. We implement the model by extending the state-of-the-art NMT, Transformer. Extensive experiments on three language pairs show that the proposed approach is efficient in terms of running time and space occupation, and particularly it outperforms multiple strong baselines in terms of BLEU scores.
【Keywords】:
【Paper Link】 【Pages】:7305-7313
【Authors】: Qingrong Xia ; Zhenghua Li ; Min Zhang ; Meishan Zhang ; Guohong Fu ; Rui Wang ; Luo Si
【Abstract】: Semantic role labeling (SRL), also known as shallow semantic parsing, is an important yet challenging task in NLP. Motivated by the close correlation between syntactic and semantic structures, traditional discrete-feature-based SRL approaches make heavy use of syntactic features. In contrast, deep-neural-network-based approaches usually encode the input sentence as a word sequence without considering the syntactic structures. In this work, we investigate several previous approaches for encoding syntactic trees, and make a thorough study on whether extra syntax-aware representations are beneficial for neural SRL models. Experiments on the benchmark CoNLL-2005 dataset show that syntax-aware SRL approaches can effectively improve performance over a strong baseline with external word representations from ELMo. With the extra syntax-aware representations, our approaches achieve new state-of-the-art 85.6 F1 (single model) and 86.6 F1 (ensemble) on the test data, outperforming the corresponding strong baselines with ELMo by 0.8 and 1.0, respectively. Detailed error analysis are conducted to gain more insights on the investigated approaches.
【Keywords】:
【Paper Link】 【Pages】:7314-7321
【Authors】: Liuyu Xiang ; Xiaoming Jin ; Lan Yi ; Guiguang Ding
【Abstract】: Deep learning models such as convolutional neural networks and recurrent networks are widely applied in text classification. In spite of their great success, most deep learning models neglect the importance of modeling context information, which is crucial to understanding texts. In this work, we propose the Adaptive Region Embedding to learn context representation to improve text classification. Specifically, a metanetwork is learned to generate a context matrix for each region, and each word interacts with its corresponding context matrix to produce the regional representation for further classification. Compared to previous models that are designed to capture context information, our model contains less parameters and is more flexible. We extensively evaluate our method on 8 benchmark datasets for text classification. The experimental results prove that our method achieves state-of-the-art performances and effectively avoids word ambiguity.
【Keywords】:
【Paper Link】 【Pages】:7322-7329
【Authors】: Yijun Xiao ; William Yang Wang
【Abstract】: Reliable uncertainty quantification is a first step towards building explainable, transparent, and accountable artificial intelligent systems. Recent progress in Bayesian deep learning has made such quantification realizable. In this paper, we propose novel methods to study the benefits of characterizing model and data uncertainties for natural language processing (NLP) tasks. With empirical experiments on sentiment analysis, named entity recognition, and language modeling using convolutional and recurrent neural network models, we show that explicitly modeling uncertainties is not only necessary to measure output confidence levels, but also useful at enhancing model performances in various NLP tasks.
【Keywords】:
【Paper Link】 【Pages】:7330-7337
【Authors】: Zhipeng Xie ; Feiteng Mu
【Abstract】: This paper focuses on building up distributed representation of words in cause and effect spaces, a task-specific word embedding technique for causality. The causal embedding model is trained on a large set of cause-effect phrase pairs extracted from raw text corpus via a set of high-precision causal patterns. Three strategies are proposed to transfer the positive or negative labels from the level of phrase pairs to the level of word pairs, leading to three causal embedding models (Pairwise-Matching, Max-Matching, and AttentiveMatching) correspondingly. Experimental results have shown that Max-Matching and Attentive-Matching models significantly outperform several state-of-the-art competitors by a large margin on both English and Chinese corpora.
【Keywords】:
【Paper Link】 【Pages】:7338-7345
【Authors】: Hao Xiong ; Zhongjun He ; Hua Wu ; Haifeng Wang
【Abstract】: Discourse coherence plays an important role in the translation of one text. However, the previous reported models most focus on improving performance over individual sentence while ignoring cross-sentence links and dependencies, which affects the coherence of the text. In this paper, we propose to use discourse context and reward to refine the translation quality from the discourse perspective. In particular, we generate the translation of individual sentences at first. Next, we deliberate the preliminary produced translations, and train the model to learn the policy that produces discourse coherent text by a reward teacher. Practical results on multiple discourse test datasets indicate that our model significantly improves the translation quality over the state-of-the-art baseline system by +1.23 BLEU score. Moreover, our model generates more discourse coherent text and obtains +2.2 BLEU improvements when evaluated by discourse metrics.
【Keywords】:
【Paper Link】 【Pages】:7346-7353
【Authors】: Lin Xu ; Qixian Zhou ; Ke Gong ; Xiaodan Liang ; Jianheng Tang ; Liang Lin
【Abstract】: Beyond current conversational chatbots or task-oriented dialogue systems that have attracted increasing attention, we move forward to develop a dialogue system for automatic medical diagnosis that converses with patients to collect additional symptoms beyond their self-reports and automatically makes a diagnosis. Besides the challenges for conversational dialogue systems (e.g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations. Existing dialogue systems (Madotto, Wu, and Fung 2018; Wei et al. 2018; Li et al. 2017) mostly rely on datadriven learning and cannot be able to encode extra expert knowledge graph. In this work, we propose an End-to-End Knowledge-routed Relational Dialogue System (KR-DS) that seamlessly incorporates rich medical knowledge graph into the topic transition in dialogue management, and makes it cooperative with natural language understanding and natural language generation. A novel Knowledge-routed Deep Q-network (KR-DQN) is introduced to manage topic transitions, which integrates a relational refinement branch for encoding relations among different symptoms and symptomdisease pairs, and a knowledge-routed graph branch for topic decision-making. Extensive experiments on a public medical dialogue dataset show our KR-DS significantly beats stateof-the-art methods (by more than 8% in diagnosis accuracy). We further show the superiority of our KR-DS on a newly collected medical dialogue system dataset, which is more challenging retaining original self-reports and conversational data between patients and doctors.
【Keywords】:
【Paper Link】 【Pages】:7354-7361
【Authors】: Ming Yan ; Jiangnan Xia ; Chen Wu ; Bin Bi ; Zhongzhou Zhao ; Ji Zhang ; Luo Si ; Rui Wang ; Wei Wang ; Haiqing Chen
【Abstract】: A fundamental trade-off between effectiveness and efficiency needs to be balanced when designing an online question answering system. Effectiveness comes from sophisticated functions such as extractive machine reading comprehension (MRC), while efficiency is obtained from improvements in preliminary retrieval components such as candidate document selection and paragraph ranking. Given the complexity of the real-world multi-document MRC scenario, it is difficult to jointly optimize both in an end-to-end system. To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the documentlevel and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension. Specifically, irrelevant documents and paragraphs are first filtered out with simple functions for efficiency consideration. Then we jointly train three modules on the remaining texts for better tracking the answer: the document extraction, the paragraph extraction and the answer extraction. Experiment results show that the proposed method outperforms the previous state-of-the-art methods on two large-scale multidocument benchmark datasets, i.e., TriviaQA and DuReader. In addition, our online system can stably serve typical scenarios with millions of daily requests in less than 50ms.
【Keywords】:
【Paper Link】 【Pages】:7362-7369
【Authors】: Min Yang ; Qiang Qu ; Wenting Tu ; Ying Shen ; Zhou Zhao ; Xiaojun Chen
【Abstract】: The recent artificial intelligence studies have witnessed great interest in abstractive text summarization. Although remarkable progress has been made by deep neural network based methods, generating plausible and high-quality abstractive summaries remains a challenging task. The human-like reading strategy is rarely explored in abstractive text summarization, which however is able to improve the effectiveness of the summarization by considering the process of reading comprehension and logical thinking. Motivated by the humanlike reading strategy that follows a hierarchical routine, we propose a novel Hybrid learning model for Abstractive Text Summarization (HATS). The model consists of three major components, a knowledge-based attention network, a multitask encoder-decoder network, and a generative adversarial network, which are consistent with the different stages of the human-like reading strategy. To verify the effectiveness of HATS, we conduct extensive experiments on two real-life datasets, CNN/Daily Mail and Gigaword datasets. The experimental results demonstrate that HATS achieves impressive results on both datasets.
【Keywords】:
【Paper Link】 【Pages】:7370-7377
【Authors】: Liang Yao ; Chengsheng Mao ; Yuan Luo
【Abstract】: Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.
【Keywords】:
【Paper Link】 【Pages】:7378-7385
【Authors】: Lili Yao ; Nanyun Peng ; Ralph M. Weischedel ; Kevin Knight ; Dongyan Zhao ; Rui Yan
【Abstract】: Automatic storytelling is challenging since it requires generating long, coherent natural language to describes a sensible sequence of events. Despite considerable efforts on automatic story generation in the past, prior work either is restricted in plot planning, or can only generate stories in a narrow domain. In this paper, we explore open-domain story generation that writes stories given a title (topic) as input. We propose a plan-and-write hierarchical generation framework that first plans a storyline, and then generates a story based on the storyline. We compare two planning strategies. The dynamic schema interweaves story planning and its surface realization in text, while the static schema plans out the entire storyline before generating stories. Experiments show that with explicit storyline planning, the generated stories are more diverse, coherent, and on topic than those generated without creating a full plan, according to both automatic and human evaluations.
【Keywords】:
【Paper Link】 【Pages】:7386-7393
【Authors】: Michihiro Yasunaga ; Jungo Kasai ; Rui Zhang ; Alexander R. Fabbri ; Irene Li ; Dan Friedman ; Dragomir R. Radev
【Abstract】: Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article’s impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate the authors’ original highlights (abstract) and the article’s actual impacts on the community (citations), to create comprehensive, hybrid summaries. We conduct experiments to demonstrate the efficacy of our corpus in training data-driven models for scientific paper summarization and the advantage of our hybrid summaries over abstracts and traditional citation-based summaries. Our large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research.
【Keywords】:
【Paper Link】 【Pages】:7394-7401
【Authors】: Michihiro Yasunaga ; John D. Lafferty
【Abstract】: Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words.
【Keywords】:
【Paper Link】 【Pages】:7402-7409
【Authors】: Kang Min Yoo ; Youhyun Shin ; Sang-goo Lee
【Abstract】: Data scarcity is one of the main obstacles of domain adaptation in spoken language understanding (SLU) due to the high cost of creating manually tagged SLU datasets. Recent works in neural text generative models, particularly latent variable models such as variational autoencoder (VAE), have shown promising results in regards to generating plausible and natural sentences. In this paper, we propose a novel generative architecture which leverages the generative power of latent variable models to jointly synthesize fully annotated utterances. Our experiments show that existing SLU models trained on the additional synthetic examples achieve performance gains. Our approach not only helps alleviate the data scarcity issue in the SLU task for many datasets but also indiscriminately improves language understanding performances for various SLU models, supported by extensive experiments and rigorous statistical testing.
【Keywords】:
【Paper Link】 【Pages】:7410-7417
【Authors】: Masashi Yoshikawa ; Koji Mineshima ; Hiroshi Noji ; Daisuke Bekki
【Abstract】: In logic-based approaches to reasoning tasks such as Recognizing Textual Entailment (RTE), it is important for a system to have a large amount of knowledge data. However, there is a tradeoff between adding more knowledge data for improved RTE performance and maintaining an efficient RTE system, as such a big database is problematic in terms of the memory usage and computational complexity. In this work, we show the processing time of a state-of-the-art logic-based RTE system can be significantly reduced by replacing its search-based axiom injection (abduction) mechanism by that based on Knowledge Base Completion (KBC). We integrate this mechanism in a Coq plugin that provides a proof automation tactic for natural language inference. Additionally, we show empirically that adding new knowledge data contributes to better RTE performance while not harming the processing speed in this framework.
【Keywords】:
【Paper Link】 【Pages】:7418-7425
【Authors】: Changsen Yuan ; Heyan Huang ; Chong Feng ; Xiao Liu ; Xiaochi Wei
【Abstract】: Distant supervision for relation extraction is an efficient method to reduce labor costs and has been widely used to seek novel relational facts in large corpora, which can be identified as a multi-instance multi-label problem. However, existing distant supervision methods suffer from selecting important words in the sentence and extracting valid sentences in the bag. Towards this end, we propose a novel approach to address these problems in this paper. Firstly, we propose a linear attenuation simulation to reflect the importance of words in the sentence with respect to the distances between entities and words. Secondly, we propose a non-independent and identically distributed (non-IID) relevance embedding to capture the relevance of sentences in the bag. Our method can not only capture complex information of words about hidden relations, but also express the mutual information of instances in the bag. Extensive experiments on a benchmark dataset have well-validated the effectiveness of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:7426-7433
【Authors】: JianHua Yuan ; Yanyan Zhao ; Jingfang Xu ; Bing Qin
【Abstract】: Detecting stance from certain types of question-answer pairs is an interesting problem which has not been carefully explored. Unlike previous stance detection tasks, targets here are not given entities or claims but entire questions, which makes it difficult to capture the semantics of targets and build target-dependent representations of answers. To address them, we introduce the Recurrent Conditional Attention (RCA) model which incorporates a conditional attention structure into the recurrent reading process. RCA iteratively guides the distillation of question semantic with answer information and collects stance-oriented text relating to question, further revealing mutual relationship among stance, answer and question. Experiments on a manually labeled Chinese community QA stance dataset show that RCA outperforms four strong baselines by average 2.90% on macro-F1 and 2.66% on micro-F1 respectively.
【Keywords】:
【Paper Link】 【Pages】:7434-7441
【Authors】: Yunzhe Yuan ; Yong Jiang ; Kewei Tu
【Abstract】: Transition-based dependency parsing is a fast and effective approach for dependency parsing. Traditionally, a transitionbased dependency parser processes an input sentence and predicts a sequence of parsing actions in a left-to-right manner. During this process, an early prediction error may negatively impact the prediction of subsequent actions. In this paper, we propose a simple framework for bidirectional transitionbased parsing. During training, we learn a left-to-right parser and a right-to-left parser separately. To parse a sentence, we perform joint decoding with the two parsers. We propose three joint decoding algorithms that are based on joint scoring, dual decomposition, and dynamic oracle respectively. Empirical results show that our methods lead to competitive parsing accuracy and our method based on dynamic oracle consistently achieves the best performance.
【Keywords】:
【Paper Link】 【Pages】:7442-7449
【Authors】: Kun Zhang ; Guangyi Lv ; Linyuan Wang ; Le Wu ; Enhong Chen ; Fangzhao Wu ; Xing Xie
【Abstract】: Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.
【Keywords】:
【Paper Link】 【Pages】:7450-7458
【Authors】: Lipeng Zhang ; Peng Zhang ; Xindian Ma ; Shuqin Gu ; Zhan Su ; Dawei Song
【Abstract】: In the literature, tensors have been effectively used for capturing the context information in language models. However, the existing methods usually adopt relatively-low order tensors, which have limited expressive power in modeling language. Developing a higher-order tensor representation is challenging, in terms of deriving an effective solution and showing its generality. In this paper, we propose a language model named Tensor Space Language Model (TSLM), by utilizing tensor networks and tensor decomposition. In TSLM, we build a high-dimensional semantic space constructed by the tensor product of word vectors. Theoretically, we prove that such tensor representation is a generalization of the n-gram language model. We further show that this high-order tensor representation can be decomposed to a recursive calculation of conditional probability for language modeling. The experimental results on Penn Tree Bank (PTB) dataset and WikiText benchmark demonstrate the effectiveness of TSLM.
【Keywords】:
【Paper Link】 【Pages】:7459-7467
【Authors】: Richong Zhang ; Xinyu Liu ; Xinwei Chen ; Zhiyuan Hu ; Zhaoqing Xu ; Yongyi Mao
【Abstract】: Ci is a lyric poetry form that follows highly restrictive metrical structures. This makes it challenging for a computer to compose Ci subject to a specified metrical requirement. In this work, we adapt the CVAE framework to automated Ci generation under metrical constraints. Specifically, we present the first neural model that explicitly encodes the designated metrical structure for Ci generation. The proposed model is shown experimentally to generate Ci with nearly perfect metrical structures.
【Keywords】:
【Paper Link】 【Pages】:7468-7475
【Authors】: Weinan Zhang ; Yue Zhang ; Yuanxing Liu ; Donglin Di ; Ting Liu
【Abstract】: Verb Phrase Ellipsis (VPE) is a linguistic phenomenon, where some verb phrases as syntactic constituents are omitted and typically referred by an auxiliary verb. It is ubiquitous in both formal and informal text, such as news articles and dialogues. Previous work on VPE resolution mainly focused on manually constructing features extracted from auxiliary verbs, syntactic trees, etc. However, the optimization of feature representation, the effectiveness of continuous features and the automatic composition of features are not well addressed. In this paper, we explore the advantages of neural models on VPE resolution in both pipeline and end-to-end processes, comparing the differences between statistical and neural models. Two neural models, namely multi-layer perception and the Transformer, are employed for the subtasks of VPE detection and resolution. Experimental results show that the neural models outperform the state-of-the-art baselines in both subtasks and the end-to-end results.
【Keywords】:
【Paper Link】 【Pages】:7476-7483
【Authors】: Weiwei Zhang ; Jackie Chi Kit Cheung ; Joel Oren
【Abstract】: Summaries of fictional stories allow readers to quickly decide whether or not a story catches their interest. A major challenge in automatic summarization of fiction is the lack of standardized evaluation methodology or high-quality datasets for experimentation. In this work, we take a bottomup approach to this problem by assuming that story authors are uniquely qualified to inform such decisions. We collect a dataset of one million fiction stories with accompanying author-written summaries from Wattpad, an online story sharing platform. We identify commonly occurring summary components, of which a description of the main characters is the most frequent, and elicit descriptions of main characters directly from the authors for a sample of the stories. We propose two approaches to generate character descriptions, one based on ranking attributes found in the story text, the other based on classifying into a list of pre-defined attributes. We find that the classification-based approach performs the best in predicting character descriptions.
【Keywords】:
【Paper Link】 【Pages】:7484-7491
【Authors】: Xinsong Zhang ; Pengshuai Li ; Weijia Jia ; Hai Zhao
【Abstract】: To disclose overlapped multiple relations from a sentence still keeps challenging. Most current works in terms of neural models inconveniently assuming that each sentence is explicitly mapped to a relation label, cannot handle multiple relations properly as the overlapped features of the relations are either ignored or very difficult to identify. To tackle with the new issue, we propose a novel approach for multi-labeled relation extraction with capsule network which acts considerably better than current convolutional or recurrent net in identifying the highly overlapped relations within an individual sentence. To better cluster the features and precisely extract the relations, we further devise attention-based routing algorithm and sliding-margin loss function, and embed them into our capsule network. The experimental results show that the proposed approach can indeed extract the highly overlapped features and achieve significant performance improvement for relation extraction comparing to the state-of-the-art works.
【Keywords】:
【Paper Link】 【Pages】:7492-7500
【Authors】: Peixiang Zhong ; Di Wang ; Chunyan Miao
【Abstract】: Affect conveys important implicit information in human communication. Having the capability to correctly express affect during human-machine conversations is one of the major milestones in artificial intelligence. In recent years, extensive research on open-domain neural conversational models has been conducted. However, embedding affect into such models is still under explored. In this paper, we propose an endto-end affect-rich open-domain neural conversational model that produces responses not only appropriate in syntax and semantics, but also with rich affect. Our model extends the Seq2Seq model and adopts VAD (Valence, Arousal and Dominance) affective notations to embed each word with affects. In addition, our model considers the effect of negators and intensifiers via a novel affective attention mechanism, which biases attention towards affect-rich words in input sentences. Lastly, we train our model with an affect-incorporated objective function to encourage the generation of affect-rich words in the output responses. Evaluations based on both perplexity and human evaluations show that our model outperforms the state-of-the-art baseline model of comparable size in producing natural and affect-rich responses.
【Keywords】:
【Paper Link】 【Pages】:7502-7510
【Authors】: Mohammad Abdulaziz
【Abstract】: We consider the problem of compositionally computing upper bounds on lengths of plans. Following existing work, our approach is based on a decomposition of state-variable dependency graphs (a.k.a. causal graphs). Tight bounds have been demonstrated previously for problems where key dependencies flow in a single direction—i.e. manipulating variable v1 can disturb the ability to manipulate v2 and not vice versa. We develop a more general bounding approach which allows us to compute useful bounds where dependency flows in both directions. Our approach is practically most useful when combined with earlier approaches, where the computed bounds are substantially improved in a relatively broad variety of problems. When combined with an existing planning procedure, the improved bounds yield coverage improvements for both solvable and unsolvable planning problems.
【Keywords】:
【Paper Link】 【Pages】:7511-7519
【Authors】: Benjamin J. Ayton ; Brian Williams ; Richard Camilli
【Abstract】: In autonomous exploration a mobile agent must adapt to new measurements to seek high reward, but disturbances cause a probability of collision that must be traded off against expected reward. This paper considers an autonomous agent tasked with maximizing measurements from a Gaussian Process while subject to unbounded disturbances. We seek an adaptive policy in which the maximum allowed probability of failure is constrained as a function of the expected reward. The policy is found using an extension to Monte Carlo Tree Search (MCTS) which bounds probability of failure. We apply MCTS to a sequence of approximating problems, which allows constraint satisfying actions to be found in an anytime manner. Our innovation lies in defining the approximating problems and replanning strategy such that the probability of failure constraint is guaranteed to be satisfied over the true policy. The approach does not need to plan for all measurements explicitly or constrain planning based only on the measurements that were observed. To the best of our knowledge, our approach is the first to enforce probability of failure constraints in adaptive sampling. Through experiments on real bathymetric data and simulated measurements, we show our algorithm allows an agent to take dangerous actions only when the reward justifies the risk. We then verify through Monte Carlo simulations that failure bounds are satisfied.
【Keywords】:
【Paper Link】 【Pages】:7520-7529
【Authors】: Gregor Behnke ; Daniel Höller ; Susanne Biundo
【Abstract】: HTN planning provides an expressive formalism to model complex application domains. It has been widely used in realworld applications. However, the development of domainindependent planning techniques for such models is still lacking behind. The need to be informed about both statetransitions and the task hierarchy makes the realisation of search-based approaches difficult, especially with unrestricted partial ordering of tasks in HTN domains. Recently, a translation of HTN planning problems into propositional logic has shown promising empirical results. Such planners benefit from a unified representation of state and hierarchy, but until now require very large formulae to represent partial order. In this paper, we introduce a novel encoding of HTN Planning as SAT. In contrast to related work, most of the reasoning on ordering relations is not left to the SAT solver, but done beforehand. This results in much smaller formulae and, as shown in our evaluation, in a planner that outperforms previous SAT-based approaches as well as the state-of-the-art in search-based HTN planning.
【Keywords】:
【Paper Link】 【Pages】:7530-7537
【Authors】: Thiago P. Bueno ; Leliane N. de Barros ; Denis Deratani Mauá ; Scott Sanner
【Abstract】: Recent advances in applying deep learning to planning have shown that Deep Reactive Policies (DRPs) can be powerful for fast decision-making in complex environments. However, an important limitation of current DRP-based approaches is either the need of optimal planners to be used as ground truth in a supervised learning setting or the sample complexity of high-variance policy gradient estimators, which are particularly troublesome in continuous state-action domains. In order to overcome those limitations, we introduce a framework for training DRPs in continuous stochastic spaces via gradient-based policy search. The general approach is to explicitly encode a parametric policy as a deep neural network, and to formulate the probabilistic planning problem as an optimization task in a stochastic computation graph by exploiting the re-parameterization of the transition probability densities; the optimization is then solved by leveraging gradient descent algorithms that are able to handle non-convex objective functions. We benchmark our approach against stochastic planning domains exhibiting arbitrary differentiable nonlinear transition and cost functions (e.g., Reservoir Control, HVAC and Navigation). Results show that DRPs with more than 125,000 continuous action parameters can be optimized by our approach for problems with 30 state fluents and 30 action fluents on inexpensive hardware under 6 minutes. Also, we observed a speedup of 5 orders of magnitude in the average inference time per decision step of DRPs when compared to other state-of-the-art online gradient-based planners when the same level of solution quality is required.
【Keywords】:
【Paper Link】 【Pages】:7538-7545
【Authors】: Michael Cashmore ; Alessandro Cimatti ; Daniele Magazzeni ; Andrea Micheli ; Parisa Zehtabi
【Abstract】: To achieve practical execution, planners must produce temporal plans with some degree of run-time adaptability. Such plans can be expressed as Simple Temporal Networks (STN), that constrain the timing of action activations, and implicitly represent the space of choices for the plan executor.A first problem is to verify that all the executor choices allowed by the STN plan will be successful, i.e. the plan is valid. An even more important problem is to assess the effect of discrepancies between the model used for planning and the execution environment.We propose an approach to compute the “robustness envelope” (i.e., alternative action durations or resource consumption rates) of a given STN plan, for which the plan remains valid. Plans can have boolean and numeric variables as well as discrete and continuous change. We leverage Satisfiability Modulo Theories (SMT) to make the approach formal and practical.
【Keywords】:
【Paper Link】 【Pages】:7546-7553
【Authors】: Lukás Chrpa ; Mauro Vallati
【Abstract】: Macro-operators, macros for short, are a well-known technique for enhancing performance of planning engines by providing “short-cuts” in the state space. Existing macro learning systems usually generate macros from most frequent sequences of actions in training plans. Such approach priorities frequently used sequences of actions over meaningful activities to be performed for solving planning tasks. This paper presents a technique that, inspired by resource locking in critical sections in parallel computing, learns macros capturing activities in which a limited resource (e.g., a robotic hand) is used. In particular, such macros capture the whole activity in which the resource is “locked” (e.g., the robotic hand is holding an object) and thus “bridge” states in which the resource is locked and cannot be used. We also introduce an “aggressive” variant of our technique that removes original operators superseded by macros from the domain model. Usefulness of macros is evaluated on several stateof-the-art planners, and a wide range of benchmarks from the learning tracks of the 2008 and 2011 editions of the International Planning Competition.
【Keywords】:
【Paper Link】 【Pages】:7554-7561
【Authors】: Amanda Jane Coles ; Andrew Coles ; J. Christopher Beck
【Abstract】: When performing temporal planning as forward state-space search, effective state memoisation is challenging. Whereas in classical planning, two states are equal if they have the same facts and variable values, in temporal planning this is not the case: as the plans that led to the two states are subject to temporal constraints, one might be extendable into at temporally valid plan, while the other might not. In this paper, we present an approach for reducing the state space explosion that arises due to having to keep many copies of the same ‘classically’ equal state – states that are classically equal are aggregated into metastates, and these are separated lazily only in the case of temporal inconsistency. Our evaluation shows that this approach, implemented in OPTIC and compared to existing state-of-the-art memoisation techniques, improves performance across a range of temporal domains.
【Keywords】:
【Paper Link】 【Pages】:7562-7569
【Authors】: Amanda Jane Coles ; Andrew Coles ; Moisés Martínez ; Emre Savas ; Juan Manuel Delfa ; Tomás de la Rosa ; Yolanda E.-Martín ; Angel García Olaya
【Abstract】: In this paper we present techniques for reasoning natively with quantitative/qualitative interval constraints in statebased PDDL planners. While these are considered important in modeling and solving problems in timeline based planners; reasoning with these in PDDL planners has seen relatively little attention, yet is a crucial step towards making PDDL planners applicable in real-world scenarios, such as space missions. Our main contribution is to extend the planner OPTIC to reason natively with Allen interval constraints. We show that our approach outperforms both MTP, the only PDDL planner capable of handling similar constraints and a compilation to PDDL 2.1, by an order of magnitude. We go on to present initial results indicating that our approach is competitive with a timeline based planner on a Mars rover domain, showing the potential of PDDL planners in this setting.
【Keywords】:
【Paper Link】 【Pages】:7570-7577
【Authors】: Bingqian Du ; Chuan Wu ; Zhiyi Huang
【Abstract】: Cloud computing has been widely adopted to support various computation services. A fundamental problem faced by cloud providers is how to efficiently allocate resources upon user requests and price the resource usage, in order to maximize resource efficiency and hence provider profit. Existing studies establish detailed performance models of cloud resource usage, and propose offline or online algorithms to decide allocation and pricing. Differently, we adopt a blackbox approach, and leverage model-free Deep Reinforcement Learning (DRL) to capture dynamics of cloud users and better characterize inherent connections between an optimal allocation/pricing policy and the states of the dynamic cloud system. The goal is to learn a policy that maximizes net profit of the cloud provider through trial and error, which is better than decisions made on explicit performance models. We combine long short-term memory (LSTM) units with fully-connected neural networks in our DRL to deal with online user arrivals, and adjust the output and update methods of basic DRL algorithms to address both resource allocation and pricing. Evaluation based on real-world datasets shows that our DRL approach outperforms basic DRL algorithms and state-of-theart white-box online cloud resource allocation/pricing algorithms significantly, in terms of both profit and the number of accepted users.
【Keywords】:
【Paper Link】 【Pages】:7578-7585
【Authors】: Rebecca Eifler ; Maximilian Fickert ; Jörg Hoffmann ; Wheeler Ruml
【Abstract】: In real-time planning, the planner must select the next action within a fixed time bound. Because a complete plan may not have been found, the selected action might not lead to a goal and the agent may need to return to its current state. To preserve completeness, real-time search methods incorporate learning, in which heuristic values are updated. Previous work in real-time search has used table-based heuristics, in which the values of states are updated individually. In this paper, we explore the use of abstraction-based heuristics. By refining the abstraction on-line, we can update the values of multiple states, including ones the agent has not yet generated. We test this idea empirically using Cartesian abstractions in the Fast Downward planner. Results on various benchmarks, including the sliding tile puzzle and several IPC domains, indicate that the approach can improve performance compared to traditional heuristic updating. This work brings abstraction refinement, a powerful technique from offline planning, into the real-time setting.
【Keywords】:
【Paper Link】 【Pages】:7586-7593
【Authors】: Daniel Fiser ; Álvaro Torralba ; Alexander Shleyfman
【Abstract】: Simplifying classical planning tasks by removing operators while preserving at least one optimal solution can significantly enhance the performance of planners. In this paper, we introduce the notion of operator mutex, which is a set of operators that cannot all be part of the same (strongly) optimal plan. We propose four different methods for inference of operator mutexes and experimentally verify that they can be found in a sizable number of planning tasks. We show how operator mutexes can be used in combination with structural symmetries to safely remove operators from the planning task.
【Keywords】:
【Paper Link】 【Pages】:7594-7601
【Authors】: Daniel Furelos-Blanco ; Anders Jonsson
【Abstract】: In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Our planner is the first to handle action effects that are conditional on what other agents are doing. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.
【Keywords】:
【Paper Link】 【Pages】:7602-7609
【Authors】: Daniel Gnad ; Álvaro Torralba ; Martín Ariel Domínguez ; Carlos Areces ; Facundo Bustos
【Abstract】: Current classical planners are very successful in finding (nonoptimal) plans, even for large planning instances. To do so, most planners rely on a preprocessing stage that computes a grounded representation of the task. Whenever the grounded task is too big to be generated (i.e., whenever this preprocess fails) the instance cannot even be tackled by the actual planner. To address this issue, we introduce a partial grounding approach that grounds only a projection of the task, when complete grounding is not feasible. We propose a guiding mechanism that, for a given domain, identifies the parts of a task that are relevant to find a plan by using off-the-shelf machine learning methods. Our empirical evaluation attests that the approach is capable of solving planning instances that are too big to be fully grounded.
【Keywords】:
【Paper Link】 【Pages】:7610-7618
【Authors】: León Illanes ; Sheila A. McIlraith
【Abstract】: We consider a class of generalized planning problems based on the idea of quantifying over sets of similar objects. We show how we can adapt fully observable nondeterministic planning techniques to produce generalized solutions that are easy to instantiate over particular problem instances. We also describe how we can reformulate a classical planning problem into a quantified one. The reformulation allows us to solve the original planning task without grounding every action with respect to all objects in the problem, and a single solution can be applied to a possibly infinite set of related classical planning tasks. We report experimental results that show our approach is a practical and promising technique for solving an interesting class of problems.
【Keywords】:
【Paper Link】 【Pages】:7619-7626
【Authors】: Michael Katz
【Abstract】: Red-black planning is a state-of-the-art approach to satisficing classical planning. Red-black planning heuristics are at the heart of the planner Mercury, the runner-up of a satisficing track in the International Planning Competition (IPC) 2014 and a major component of four additional planners in IPC 2018, including Saarplan, the runner-up in the agile track. Mercury’s exceptional performance is amplified by the fact that conditional effects were handled by the planner in a trivial way, simply by compiling them away. Conditional effects, however, are important for classical planning, and many domains require them for efficient modeling.Consequently, we investigate the possibility of handling conditional effects directly in the red-black planning heuristic function, extending the algorithm for computing red-black plans to the conditional effects setting. We show empirically that red-black planning heuristics that handle conditional effects natively outperform the variants that compile this feature away, improving coverage on tasks where black variables exist by 19%.
【Keywords】:
【Paper Link】 【Pages】:7627-7634
【Authors】: Jiaoyang Li ; Pavel Surynek ; Ariel Felner ; Hang Ma ; T. K. Satish Kumar ; Sven Koenig
【Abstract】: Multi-Agent Path Finding (MAPF) has been widely studied in the AI community. For example, Conflict-Based Search (CBS) is a state-of-the-art MAPF algorithm based on a twolevel tree-search. However, previous MAPF algorithms assume that an agent occupies only a single location at any given time, e.g., a single cell in a grid. This limits their applicability in many real-world domains that have geometric agents in lieu of point agents. Geometric agents are referred to as “large” agents because they can occupy multiple points at the same time. In this paper, we formalize and study LAMAPF, i.e., MAPF for large agents. We first show how CBS can be adapted to solve LA-MAPF. We then present a generalized version of CBS, called Multi-Constraint CBS (MCCBS), that adds multiple constraints (instead of one constraint) for an agent when it generates a high-level search node. We introduce three different approaches to choose such constraints as well as an approach to compute admissible heuristics for the high-level search. Experimental results show that all MC-CBS variants outperform CBS by up to three orders of magnitude in terms of runtime. The best variant also outperforms EPEA (a state-of-the-art A-based MAPF solver) in all cases and MDD-SAT (a state-of-the-art reduction-based MAPF solver) in some cases.
【Keywords】:
【Paper Link】 【Pages】:7635-7642
【Authors】: Felix Lindner ; Robert Mattmüller ; Bernhard Nebel
【Abstract】: Research in classical planning so far was mainly concerned with generating a satisficing or an optimal plan. However, if such systems are used to make decisions that are relevant to humans, one should also consider the ethical consequences generated plans can have. We address this challenge by analyzing in how far it is possible to generalize existing approaches of machine ethics to automatic planning systems. Traditionally, ethical principles are formulated in an actionbased manner, allowing to judge the execution of one action. We show how such a judgment can be generalized to plans. Further, we study the computational complexity of making ethical judgment about plans.
【Keywords】:
【Paper Link】 【Pages】:7643-7650
【Authors】: Hang Ma ; Daniel Harabor ; Peter J. Stuckey ; Jiaoyang Li ; Sven Koenig
【Abstract】: We study prioritized planning for Multi-Agent Path Finding (MAPF). Existing prioritized MAPF algorithms depend on rule-of-thumb heuristics and random assignment to determine a fixed total priority ordering of all agents a priori. We instead explore the space of all possible partial priority orderings as part of a novel systematic and conflict-driven combinatorial search framework. In a variety of empirical comparisons, we demonstrate state-of-the-art solution qualities and success rates, often with similar runtimes to existing algorithms. We also develop new theoretical results that explore the limitations of prioritized planning, in terms of completeness and optimality, for the first time.
【Keywords】:
【Paper Link】 【Pages】:7651-7658
【Authors】: Hang Ma ; Wolfgang Hönig ; T. K. Satish Kumar ; Nora Ayanian ; Sven Koenig
【Abstract】: The Multi-Agent Pickup and Delivery (MAPD) problem models applications where a large number of agents attend to a stream of incoming pickup-and-delivery tasks. Token Passing (TP) is a recent MAPD algorithm that is efficient and effective. We make TP even more efficient and effective by using a novel combinatorial search algorithm, called Safe Interval Path Planning with Reservation Table (SIPPwRT), for single-agent path planning. SIPPwRT uses an advanced data structure that allows for fast updates and lookups of the current paths of all agents in an online setting. The resulting MAPD algorithm TP-SIPPwRT takes kinematic constraints of real robots into account directly during planning, computes continuous agent movements with given velocities that work on non-holonomic robots rather than discrete agent movements with uniform velocity, and is complete for wellformed MAPD instances. We demonstrate its benefits for automated warehouses using both an agent simulator and a standard robot simulator. For example, we demonstrate that it can compute paths for hundreds of agents and thousands of tasks in seconds and is more efficient and effective than existing MAPD algorithms that use a post-processing step to adapt their paths to continuous agent movements with given velocities.
【Keywords】:
【Paper Link】 【Pages】:7659-7666
【Authors】: Sultan Javed Majeed ; Marcus Hutter
【Abstract】: Most real-world problems have huge state and/or action spaces. Therefore, a naive application of existing tabular solution methods is not tractable on such problems. Nonetheless, these solution methods are quite useful if an agent has access to a relatively small state-action space homomorphism of the true environment and near-optimal performance is guaranteed by the map. A plethora of research is focused on the case when the homomorphism is a Markovian representation of the underlying process. However, we show that nearoptimal performance is sometimes guaranteed even if the homomorphism is non-Markovian.
【Keywords】:
【Paper Link】 【Pages】:7667-7674
【Authors】: George B. Mertzios ; Hendrik Molter ; Viktor Zamaraev
【Abstract】: Graph coloring is one of the most famous computational problems with applications in a wide range of areas such as planning and scheduling, resource allocation, and pattern matching. So far coloring problems are mostly studied on static graphs, which often stand in stark contrast to practice where data is inherently dynamic and subject to discrete changes over time. A temporal graph is a graph whose edges are assigned a set of integer time labels, indicating at which discrete time steps the edge is active. In this paper we present a natural temporal extension of the classical graph coloring problem. Given a temporal graph and a natural number ∆, we ask for a coloring sequence for each vertex such that (i) in every sliding time window of ∆ consecutive time steps, in which an edge is active, this edge is properly colored (i.e. its endpoints are assigned two different colors) at least once during that time window, and (ii) the total number of different colors is minimized. This sliding window temporal coloring problem abstractly captures many realistic graph coloring scenarios in which the underlying network changes over time, such as dynamically assigning communication channels to moving agents. We present a thorough investigation of the computational complexity of this temporal coloring problem. More specifically, we prove strong computational hardness results, complemented by efficient exact and approximation algorithms. Some of our algorithms are linear-time fixed-parameter tractable with respect to appropriate parameters, while others are asymptotically almost optimal under the Exponential Time Hypothesis (ETH).
【Keywords】:
【Paper Link】 【Pages】:7675-7682
【Authors】: Andrea Micheli ; Enrico Scala
【Abstract】: In several industrial applications of planning, complex temporal metric trajectory constraints are needed to adequately model the problem at hand. For example, in production plants, items must be processed following a “recipe” of steps subject to precise timing constraints. Modeling such domains is very challenging in existing action-based languages due to the lack of sufficiently expressive trajectory constraints.We propose a novel temporal planning formalism allowing quantified temporal constraints over execution timing of action instances. We build on top of instantaneous actions borrowed from classical planning and add expressive temporal constructs. The paper details the semantics of our new formalism and presents a solving technique grounded in classical, heuristic forward search planning. Our experiments prove the proposed framework superior to alternative state-of-theart planning approaches on industrial benchmarks, and competitive with similar solving methods on well known benchmarks took from the planning competition.
【Keywords】:
【Paper Link】 【Pages】:7683-7690
【Authors】: Ronen Nir ; Erez Karpas
【Abstract】: Designing multi-agent systems, where several agents work in a shared environment, requires coordinating between the agents so they do not interfere with each other. One of the canonical approaches to coordinating agents is enacting a social law, which applies restrictions on agents’ available actions. A good social law prevents the agents from interfering with each other, while still allowing all of them to achieve their goals. Recent work took the first step towards reasoning about social laws using automated planning and showed how to verify if a given social law is robust, that is, allows all agents to achieve their goals regardless of what the other agents do. This work relied on a classical planning formalism, which assumed actions are instantaneous and some external scheduler chooses which agent acts next. However, this work is not directly applicable to multi-robot systems, because in the real world actions take time and the agents can act concurrently. In this paper, we show how the robustness of a social law in a continuous time setting can be verified through compilation to temporal planning. We demonstrate our work both theoretically and on real robots.
【Keywords】:
【Paper Link】 【Pages】:7691-7698
【Authors】: Sunandita Patra ; Malik Ghallab ; Dana S. Nau ; Paolo Traverso
【Abstract】: The most common representation formalisms for planning are descriptive models. They abstractly describe what the actions do and are tailored for efficiently computing the next state(s) in a state transition system. But acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making. Using descriptive representations for planning and operational representations for acting can lead to problems with developing and verifying consistency of the different models.We define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures. The acting component is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from the planner. Our planning algorithm, RAEplan, plans by doing Monte Carlo rollout simulations of the actor’s operational models. Our experiments show significant benefits in the efficiency of the acting and planning system.
【Keywords】:
【Paper Link】 【Pages】:7699-7706
【Authors】: Hangwei Qian ; Sinno Jialin Pan ; Chunyan Miao
【Abstract】: Supervised learning methods have been widely applied to activity recognition. The prevalent success of existing methods, however, has two crucial prerequisites: proper feature extraction and sufficient labeled training data. The former is important to differentiate activities, while the latter is crucial to build a precise learning model. These two prerequisites have become bottlenecks to make existing methods more practical. Most existing feature extraction methods highly depend on domain knowledge, while labeled data requires intensive human annotation effort. Therefore, in this paper, we propose a novel method, named Distribution-based Semi-Supervised Learning, to tackle the aforementioned limitations. The proposed method is capable of automatically extracting powerful features with no domain knowledge required, meanwhile, alleviating the heavy annotation effort through semi-supervised learning. Specifically, we treat data stream of sensor readings received in a period as a distribution, and map all training distributions, including labeled and unlabeled, into a reproducing kernel Hilbert space (RKHS) using the kernel mean embedding technique. The RKHS is further altered by exploiting the underlying geometry structure of the unlabeled distributions. Finally, in the altered RKHS, a classifier is trained with the labeled distributions. We conduct extensive experiments on three public datasets to verify the effectiveness of our method compared with state-of-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:7707-7714
【Authors】: Riccardo Rasconi ; Angelo Oddi
【Abstract】: Quantum Computing represents the next big step towards speed boost in computation, which promises major breakthroughs in several disciplines including Artificial Intelligence. This paper investigates the performance of a genetic algorithm to optimize the realization (compilation) of nearest-neighbor compliant quantum circuits. Currrent technological limitations (e.g., decoherence effect) impose that the overall duration (makespan) of the quantum circuit realization be minimized, and therefore the makespanminimization problem of compiling quantum algorithms on present or future quantum machines is dragging increasing attention in the AI community. In our genetic algorithm, a solution is built utilizing a novel chromosome encoding where each gene controls the iterative selection of a quantum gate to be inserted in the solution, over a lexicographic double-key ranking returned by a heuristic function recently published in the literature.Our algorithm has been tested on a set of quantum circuit benchmark instances of increasing sizes available from the recent literature. We demonstrate that our genetic approach obtains very encouraging results that outperform the solutions obtained in previous research against the same benchmark, succeeding in significantly improving the makespan values for a great number of instances.
【Keywords】:
【Paper Link】 【Pages】:7715-7723
【Authors】: Silvan Sievers ; Michael Katz ; Shirin Sohrabi ; Horst Samulowitz ; Patrick Ferber
【Abstract】: As classical planning is known to be computationally hard, no single planner is expected to work well across many planning domains. One solution to this problem is to use online portfolio planners that select a planner for a given task. These portfolios perform a classification task, a well-known and wellresearched task in the field of machine learning. The classification is usually performed using a representation of planning tasks with a collection of hand-crafted statistical features. Recent techniques in machine learning that are based on automatic extraction of features have not been employed yet due to the lack of suitable representations of planning tasks.In this work, we alleviate this barrier. We suggest representing planning tasks by images, allowing to exploit arguably one of the most commonly used and best developed techniques in deep learning. We explore some of the questions that inevitably rise when applying such a technique, and present various ways of building practically useful online portfoliobased planners. An evidence of the usefulness of our proposed technique is a planner that won the cost-optimal track of the International Planning Competition 2018.
【Keywords】:
【Paper Link】 【Pages】:7724-7731
【Authors】: Helge Spieker ; Arnaud Gotlieb ; Morten Mossige
【Abstract】: In multi-cycle assignment problems with rotational diversity, a set of tasks has to be repeatedly assigned to a set of agents. Over multiple cycles, the goal is to achieve a high diversity of assignments from tasks to agents. At the same time, the assignments’ profit has to be maximized in each cycle. Due to changing availability of tasks and agents, planning ahead is infeasible and each cycle is an independent assignment problem but influenced by previous choices. We approach the multi-cycle assignment problem as a two-part problem: Profit maximization and rotation are combined into one objective value, and then solved as a General Assignment Problem. Rotational diversity is maintained with a single execution of the costly assignment model. Our simple, yet effective method is applicable to different domains and applications. Experiments show the applicability on a multi-cycle variant of the multiple knapsack problem and a real-world case study on the test case selection and assignment problem, an example from the software engineering domain, where test cases have to be distributed over compatible test machines.
【Keywords】:
【Paper Link】 【Pages】:7732-7739
【Authors】: Jirí Svancara ; Marek Vlk ; Roni Stern ; Dor Atzmon ; Roman Barták
【Abstract】: Multi-agent pathfinding (MAPF) is the problem of moving a group of agents to a set of target destinations while avoiding collisions. In this work, we study the online version of MAPF where new agents appear over time. Several variants of online MAPF are defined and analyzed theoretically, showing that it is not possible to create an optimal online MAPF solver. Nevertheless, we propose effective online MAPF algorithms that balance solution quality, runtime, and the number of plan changes an agent makes during execution.
【Keywords】:
【Paper Link】 【Pages】:7741-7748
【Authors】: Nadjet Bourdache ; Patrice Perny
【Abstract】: We consider the problem of actively eliciting preferences from a Decision Maker supervising a collective decision process in the context of fair multiagent combinatorial optimization. Individual preferences are supposed to be known and represented by linear utility functions defined on a combinatorial domain and the social utility is defined as a generalized Gini Social evaluation Function (GSF) for the sake of fairness. The GSF is a non-linear aggregation function parameterized by weighting coefficients which allow a fine control of the equity requirement in the aggregation of individual utilities. The paper focuses on the elicitation of these weights by active learning in the context of the fair multiagent knapsack problem. We introduce and compare several incremental decision procedures interleaving an adaptive preference elicitation procedure with a combinatorial optimization algorithm to determine a GSF-optimal solution. We establish an upper bound on the number of queries and provide numerical tests to show the efficiency of the proposed approach.
【Keywords】:
【Paper Link】 【Pages】:7749-7758
【Authors】: Daniel S. Brown ; Scott Niekum
【Abstract】: Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decisionmaking task. We formalize the problem of finding maximally informative demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing a reduction to the set cover problem which enables an efficient approximation algorithm for determining the set of maximallyinformative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: providing a lower bound on the number of queries needed to learn a policy using active IRL and developing a novel IRL algorithm that can learn more efficiently from informative demonstrations than a standard IRL approach.
【Keywords】:
【Paper Link】 【Pages】:7759-7768
【Authors】: Luca Cardelli ; Marta Kwiatkowska ; Luca Laurenti ; Andrea Patane
【Abstract】: Bayesian inference and Gaussian processes are widely used in applications ranging from robotics and control to biological systems. Many of these applications are safety-critical and require a characterization of the uncertainty associated with the learning model and formal guarantees on its predictions. In this paper we define a robustness measure for Bayesian inference against input perturbations, given by the probability that, for a test point and a compact set in the input space containing the test point, the prediction of the learning model will remain δ−close for all the points in the set, for δ > 0. Such measures can be used to provide formal probabilistic guarantees for the absence of adversarial examples. By employing the theory of Gaussian processes, we derive upper bounds on the resulting robustness by utilising the Borell-TIS inequality, and propose algorithms for their computation. We evaluate our techniques on two examples, a GP regression problem and a fully-connected deep neural network, where we rely on weak convergence to GPs to study adversarial examples on the MNIST dataset.
【Keywords】:
【Paper Link】 【Pages】:7769-7776
【Authors】: Federico Cerutti ; Lance M. Kaplan ; Angelika Kimmig ; Murat Sensoy
【Abstract】: We enable aProbLog—a probabilistic logical programming approach—to reason in presence of uncertain probabilities represented as Beta-distributed random variables. We achieve the same performance of state-of-the-art algorithms for highly specified and engineered domains, while simultaneously we maintain the flexibility offered by aProbLog in handling complex relational domains. Our motivation is that faithfully capturing the distribution of probabilities is necessary to compute an expected utility for effective decision making under uncertainty: unfortunately, these probability distributions can be highly uncertain due to sparse data. To understand and accurately manipulate such probability distributions we need a well-defined theoretical framework that is provided by the Beta distribution, which specifies a distribution of probabilities representing all the possible values of a probability when the exact value is unknown.
【Keywords】:
【Paper Link】 【Pages】:7777-7784
【Authors】: Sourav Chakraborty ; Kuldeep S. Meel
【Abstract】: Recent years have seen an unprecedented adoption of artificial intelligence in a wide variety of applications ranging from medical diagnosis, automobile industry, security to aircraft collision avoidance. Probabilistic reasoning is a key component of such modern artificial intelligence systems. Sampling techniques form the core of the state of the art probabilistic reasoning systems. The divide between the existence of sampling techniques that have strong theoretical guarantees but fail to scale and scalable techniques with weak or no theoretical guarantees mirrors the gap in software engineering between poor scalability of classical program synthesis techniques and billions of programs that are routinely used by practitioners. One bridge connecting the two extremes in the context of software engineering has been program testing. In contrast to testing for deterministic programs, where one trace is sufficient to prove the existence of a bug, in case of samplers one sample is typically not sufficient to prove non-conformity of the sampler to the desired distribution. This makes one wonder whether it is possible to design testing methodology to test whether a sampler under test generates samples close to a given distribution. The primary contribution of this paper is an affirmative answer to the above question when the given distribution is a uniform distribution: We design, to the best of our knowledge, the first algorithmic framework, Barbarik, to test whether the distribution generated is ε−close or η−far from the uniform distribution. In contrast to the sampling techniques that require an exponential or sub-exponential number of samples for sampler whose support can be represented by n bits, Barbarik requires only O(1/(η−ε)4) samples. We present a prototype implementation of Barbarik and use it to test three state of the art uniform samplers over the support defined by combinatorial constraints. Barbarik can provide a certificate of uniformity to one sampler and demonstrate nonuniformity for the other two samplers. Erratum: This research is supported in part by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: [AISG-RP-2018-005])
【Keywords】:
【Paper Link】 【Pages】:7785-7792
【Authors】: Supratik Chakraborty ; Kuldeep S. Meel ; Moshe Y. Vardi
【Abstract】: A promising approach to probabilistic inference that has attracted recent attention exploits its reduction to a set of model counting queries. Since probabilistic inference and model counting are #P-hard, various relaxations are used in practice, with the hope that these relaxations allow efficient computation while also providing rigorous approximation guarantees. In this paper, we show that contrary to common belief, several relaxations used for model counting and its applications (including probablistic inference) do not really lead to computational efficiency in a complexity theoretic sense. Our arguments proceed by showing the corresponding relaxed notions of counting to be computationally hard. We argue that approximate counting with multiplicative tolerance and probabilistic guarantees of correctness is the only class of relaxations that provably simplifies the problem, given access to an NP-oracle. Finally, we show that for applications that compare probability estimates with a threshold, a new notion of relaxation with gaps between low and high thresholds can be used. This new relaxation allows efficient decision making in practice, given access to an NP-oracle, while also bounding the approximation error. Erratum: This research is supported in part by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: [AISG-RP-2018-005])
【Keywords】:
【Paper Link】 【Pages】:7793-7800
【Authors】: Cong Chen ; Changhe Yuan
【Abstract】: Much effort has been directed at developing algorithms for learning optimal Bayesian network structures from data. When given limited or noisy data, however, the optimal Bayesian network often fails to capture the true underlying network structure. One can potentially address the problem by finding multiple most likely Bayesian networks (K-Best) in the hope that one of them recovers the true model. However, it is often the case that some of the best models come from the same peak(s) and are very similar to each other; so they tend to fail together. Moreover, many of these models are not even optimal respective to any causal ordering, thus unlikely to be useful. This paper proposes a novel method for finding a set of diverse top Bayesian networks, called modes, such that each network is guaranteed to be optimal in a local neighborhood. Such mode networks are expected to provide a much better coverage of the true model. Based on a globallocal theorem showing that a mode Bayesian network must be optimal in all local scopes, we introduce an A* search algorithm to efficiently find top M Bayesian networks which are highly probable and naturally diverse. Empirical evaluations show that our top mode models have much better diversity as well as accuracy in discovering true underlying models than those found by K-Best.
【Keywords】:
【Paper Link】 【Pages】:7801-7808
【Authors】: Silvia Chiappa
【Abstract】: We consider the problem of learning fair decision systems from data in which a sensitive attribute might affect the decision along both fair and unfair pathways. We introduce a counterfactual approach to disregard effects along unfair pathways that does not incur in the same loss of individual-specific information as previous approaches. Our method corrects observations adversely affected by the sensitive attribute, and uses these to form a decision. We leverage recent developments in deep learning and approximate inference to develop a VAE-type method that is widely applicable to complex nonlinear models.
【Keywords】:
【Paper Link】 【Pages】:7809-7815
【Authors】: Liat Cohen ; Gera Weiss
【Abstract】: We present an efficient algorithm that, given a discrete random variable X and a number m, computes a random variable whose support is of size at most m and whose Kolmogorov distance from X is minimal. We present some variants of the algorithm, analyse their correctness and computational complexity, and present a detailed empirical evaluation that shows how they performs in practice. The main application that we examine, which is our motivation for this work, is estimation of the probability of missing deadlines in series-parallel schedules. Since exact computation of these probabilities is NP-hard, we propose to use the algorithms described in this paper to obtain an approximation.
【Keywords】:
【Paper Link】 【Pages】:7816-7824
【Authors】: Mayukh Das ; Devendra Singh Dhami ; Gautam Kunapuli ; Kristian Kersting ; Sriraam Natarajan
【Abstract】: Counting the number of true instances of a clause is arguably a major bottleneck in relational probabilistic inference and learning. We approximate counts in two steps: (1) transform the fully grounded relational model to a large hypergraph, and partially-instantiated clauses to hypergraph motifs; (2) since the expected counts of the motifs are provably the clause counts, approximate them using summary statistics (in/outdegrees, edge counts, etc). Our experimental results demonstrate the efficiency of these approximations, which can be applied to many complex statistical relational models, and can be significantly faster than state-of-the-art, both for inference and learning, without sacrificing effectiveness.
【Keywords】:
【Paper Link】 【Pages】:7825-7833
【Authors】: Pedro Zuidberg Dos Martires ; Anton Dries ; Luc De Raedt
【Abstract】: Weighted model counting has recently been extended to weighted model integration, which can be used to solve hybrid probabilistic reasoning problems. Such problems involve both discrete and continuous probability distributions. We show how standard knowledge compilation techniques (to SDDs and d-DNNFs) apply to weighted model integration, and use it in two novel solvers, one exact and one approximate solver. Furthermore, we extend the class of employable weight functions to actual probability density functions instead of mere polynomial weight functions.
【Keywords】:
【Paper Link】 【Pages】:7834-7841
【Authors】: Yuanzhen Guo ; Hao Xiong ; Nicholas Ruozzi
【Abstract】: Exact marginal inference in continuous graphical models is computationally challenging outside of a few special cases. Existing work on approximate inference has focused on approximately computing the messages as part of the loopy belief propagation algorithm either via sampling methods or moment matching relaxations. In this work, we present an alternative family of approximations that, instead of approximating the messages, approximates the beliefs in the continuous Bethe free energy using mixture distributions. We show that these types of approximations can be combined with numerical quadrature to yield algorithms with both theoretical guarantees on the quality of the approximation and significantly better practical performance in a variety of applications that are challenging for current state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:7842-7849
【Authors】: Shubham Gupta ; Gaurav Sharma ; Ambedkar Dukkipati
【Abstract】: Networks observed in real world like social networks, collaboration networks etc., exhibit temporal dynamics, i.e. nodes and edges appear and/or disappear over time. In this paper, we propose a generative, latent space based, statistical model for such networks (called dynamic networks). We consider the case where the number of nodes is fixed, but the presence of edges can vary over time. Our model allows the number of communities in the network to be different at different time steps. We use a neural network based methodology to perform approximate inference in the proposed model and its simplified version. Experiments done on synthetic and real world networks for the task of community detection and link prediction demonstrate the utility and effectiveness of our model as compared to other similar existing approaches.
【Keywords】:
【Paper Link】 【Pages】:7850-7857
【Authors】: Trong Nghia Hoang ; Quang Minh Hoang ; Kian Hsiang Low ; Jonathan P. How
【Abstract】: This paper presents a novel Collective Online Learning of Gaussian Processes (COOL-GP) framework for enabling a massive number of GP inference agents to simultaneously perform (a) efficient online updates of their GP models using their local streaming data with varying correlation structures and (b) decentralized fusion of their resulting online GP models with different learned hyperparameter settings and inducing inputs. To realize this, we exploit the notion of a common encoding structure to encapsulate the local streaming data gathered by any GP inference agent into summary statistics based on our proposed representation, which is amenable to both an efficient online update via an importance sampling trick as well as multi-agent model fusion via decentralized message passing that can exploit sparse connectivity among agents for improving efficiency and enhance the robustness of our framework against transmission loss. We provide a rigorous theoretical analysis of the approximation loss arising from our proposed representation to achieve efficient online updates and model fusion. Empirical evaluations show that COOL-GP is highly effective in model fusion, resilient to information disparity between agents, robust to transmission loss, and can scale to thousands of agents.
【Keywords】:
【Paper Link】 【Pages】:7858-7865
【Authors】: Mahdi Imani ; Seyede Fatemeh Ghoreishi ; Douglas L. Allaire ; Ulisses Braga-Neto
【Abstract】: Nonlinear state-space models are ubiquitous in modeling real-world dynamical systems. Sequential Monte Carlo (SMC) techniques, also known as particle methods, are a well-known class of parameter estimation methods for this general class of state-space models. Existing SMC-based techniques rely on excessive sampling of the parameter space, which makes their computation intractable for large systems or tall data sets. Bayesian optimization techniques have been used for fast inference in state-space models with intractable likelihoods. These techniques aim to find the maximum of the likelihood function by sequential sampling of the parameter space through a single SMC approximator. Various SMC approximators with different fidelities and computational costs are often available for sample-based likelihood approximation. In this paper, we propose a multi-fidelity Bayesian optimization algorithm for the inference of general nonlinear state-space models (MFBO-SSM), which enables simultaneous sequential selection of parameters and approximators. The accuracy and speed of the algorithm are demonstrated by numerical experiments using synthetic gene expression data from a gene regulatory network model and real data from the VIX stock price index.
【Keywords】:
【Paper Link】 【Pages】:7866-7875
【Authors】: Brendan Juba
【Abstract】: Standard approaches to probabilistic reasoning require that one possesses an explicit model of the distribution in question. But, the empirical learning of models of probability distributions from partial observations is a problem for which efficient algorithms are generally not known. In this work we consider the use of bounded-degree fragments of the “sum-of-squares” logic as a probability logic. Prior work has shown that we can decide refutability for such fragments in polynomial-time. We propose to use such fragments to decide queries about whether a given probability distribution satisfies a given system of constraints and bounds on expected values. We show that in answering such queries, such constraints and bounds can be implicitly learned from partial observations in polynomial-time as well. It is known that this logic is capable of deriving many bounds that are useful in probabilistic analysis. We show here that it furthermore captures key polynomial-time fragments of resolution. Thus, these fragments are also quite expressive.
【Keywords】:
【Paper Link】 【Pages】:7876-7883
【Authors】: Yasushi Kawase ; Hanna Sumita
【Abstract】: In this paper, we study the following robust optimization problem. Given an independence system and candidate objective functions, we choose an independent set, and then an adversary chooses one objective function, knowing our choice. The goal is to find a randomized strategy (i.e., a probability distribution over the independent sets) that maximizes the expected objective value in the worst case. This problem is fundamental in wide areas such as artificial intelligence, machine learning, game theory and optimization. To solve the problem, we propose two types of schemes for designing approximation algorithms. One scheme is for the case when objective functions are linear. It first finds an approximately optimal aggregated strategy and then retrieves a desired solution with little loss of the objective value. The approximation ratio depends on a relaxation of an independence system polytope. As applications, we provide approximation algorithms for a knapsack constraint or a matroid intersection by developing appropriate relaxations and retrievals. The other scheme is based on the multiplicative weights update (MWU) method. The direct application of the MWU method does not yield a strict multiplicative approximation algorithm but yield one with an additional additive error term. A key technique to overcome the issue is to introduce a new concept called (η,γ)-reductions for objective functions with parameters η and γ. We show that our scheme outputs a nearly α-approximate solution if there exists an α-approximation algorithm for a subproblem defined by (η,γ)-reductions. This improves approximation ratios in previous results. Using our result, we provide approximation algorithms when the objective functions are submodular or correspond to the cardinality robustness for the knapsack problem.
【Keywords】:
【Paper Link】 【Pages】:7884-7891
【Authors】: Ximing Li ; Jiaojiao Zhang ; Jihong Ouyang
【Abstract】: Conventional topic models suffer from a severe sparsity problem when facing extremely short texts such as social media posts. The family of Dirichlet multinomial mixture (DMM) can handle the sparsity problem, however, they are still very sensitive to ordinary and noisy words, resulting in inaccurate topic representations at the document level. In this paper, we alleviate this problem by preserving local neighborhood structure of short texts, enabling to spread topical signals among neighboring documents, so as to correct the inaccurate topic representations. This is achieved by using variational manifold regularization, constraining the close short texts should have similar variational topic representations. Upon this idea, we propose a novel Laplacian DMM (LapDMM) topic model. During the document graph construction, we further use the word mover’s distance with word embeddings to measure document similarities at the semantic level. To evaluate LapDMM, we compare it against the state-of-theart short text topic models on several traditional tasks. Experimental results demonstrate that our LapDMM achieves very significant performance gains over baseline models, e.g., achieving even about 0.2 higher scores on clustering and classification tasks in many cases.
【Keywords】:
【Paper Link】 【Pages】:7892-7899
【Authors】: Zhenyu A. Liao ; Charupriya Sharma ; James Cussens ; Peter van Beek
【Abstract】: A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known score-andsearch approach. However, selecting a single model (i.e., the best scoring BN) can be misleading or may not achieve the best possible accuracy. An alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging, where the space of possible BNs is sampled or enumerated in some fashion. Unfortunately, existing approaches for model averaging either severely restrict the structure of the Bayesian network or have only been shown to scale to networks with fewer than 30 random variables. In this paper, we propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches.
【Keywords】:
【Paper Link】 【Pages】:7900-7907
【Authors】: Qi Lou ; Rina Dechter ; Alexander T. Ihler
【Abstract】: Computing the partition function of a graphical model is a fundamental task in probabilistic inference. Variational bounds and Monte Carlo methods, two important approximate paradigms for this task, each has its respective strengths for solving different types of problems, but it is often nontrivial to decide which one to apply to a particular problem instance without significant prior knowledge and a high level of expertise. In this paper, we propose a general framework that interleaves optimization of variational bounds (via message passing) with Monte Carlo sampling. Our adaptive interleaving policy can automatically balance the computational effort between these two schemes in an instance-dependent way, which provides our framework with the strengths of both schemes, leads to tighter anytime bounds and an unbiased estimate of the partition function, and allows flexible tradeoffs between memory, time, and solution quality. We verify our approach empirically on real-world problems taken from recent UAI inference competitions.
【Keywords】:
【Paper Link】 【Pages】:7908-7915
【Authors】: Ke Ma ; Qianqian Xu ; Xiaochun Cao
【Abstract】: Existing ordinal embedding methods usually follow a twostage routine: outlier detection is first employed to pick out the inconsistent comparisons; then an embedding is learned from the clean data. However, learning in a multi-stage manner is well-known to suffer from sub-optimal solutions. In this paper, we propose a unified framework to jointly identify the contaminated comparisons and derive reliable embeddings. The merits of our method are three-fold: (1) By virtue of the proposed unified framework, the sub-optimality of traditional methods is largely alleviated; (2) The proposed method is aware of global inconsistency by minimizing a corresponding cost, while traditional methods only involve local inconsistency; (3) Instead of considering the nuclear norm heuristics, we adopt an exact solution for rank equality constraint. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ordinal embedding from the contaminated comparisons.
【Keywords】:
【Paper Link】 【Pages】:7916-7923
【Authors】: Mohammad Maminur Islam ; Somdeb Sarkhel ; Deepak Venugopal
【Abstract】: We present a dense representation for Markov Logic Networks (MLNs) called Obj2Vec that encodes symmetries in the MLN structure. Identifying symmetries is a key challenge for lifted inference algorithms and we leverage advances in neural networks to learn symmetries which are hard to specify using hand-crafted features. Specifically, we learn an embedding for MLN objects that predicts the context of an object, i.e., objects that appear along with it in formulas of the MLN, since common contexts indicate symmetry in the distribution. Importantly, our formulation leverages well-known skip-gram models that allow us to learn the embedding efficiently. Finally, to reduce the size of the ground MLN, we sample objects based on their learned embeddings. We integrate Obj2Vec with several inference algorithms, and show the scalability and accuracy of our approach compared to other state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:7924-7932
【Authors】: Radu Marinescu ; Akihiro Kishimoto ; Adi Botea ; Rina Dechter ; Alexander T. Ihler
【Abstract】: Marginal MAP is a difficult mixed inference task for graphical models. Existing state-of-the-art solvers for this task are based on a hybrid best-first and depth-first search scheme that allows them to compute upper and lower bounds on the optimal solution value in an anytime fashion. These methods however are memory intensive schemes (via the best-first component) and do not have an efficient memory management mechanism. For this reason, they are often less effective in practice, especially on difficult problem instances with very large search spaces. In this paper, we introduce a new recursive best-first search based bounding scheme that operates efficiently within limited memory and computes anytime upper and lower bounds that improve over time. An empirical evaluation demonstrates the effectiveness of our proposed approach against current solvers.
【Keywords】:
【Paper Link】 【Pages】:7933-7940
【Authors】: Mingdong Ou ; Nan Li ; Cheng Yang ; Shenghuo Zhu ; Rong Jin
【Abstract】: We consider the stochastic bandit problem with a large candidate arm set. In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. By exploiting arm correlations based on a parametric reward model with arm features, contextual bandit algorithms are more efficient, but they can also suffer from large regret in practical applications, due to the reward estimation bias from mis-specified model assumption or incomplete features. In this paper, we propose a novel Bayesian framework, called Semi-Parametric Sampling (SPS), for this problem, which employs semi-parametric function as the reward model. Specifically, the parametric part of SPS, which models expected reward as a parametric function of arm feature, can efficiently eliminate poor arms from candidate set. The non-parametric part of SPS, which adopts nonparametric reward model, revises the parametric estimation to avoid estimation bias, especially on the remained candidate arms. We give an implementation of SPS, Linear SPS (LSPS), which utilizes linear function as the parametric part. In semi-parametric environment, theoretical analysis shows that LSPS achieves better regret bound (i.e. O̴(√N1−α dα √T) with α ∈ [0, 1])) than existing approaches. Also, experiments demonstrate the superiority of the proposed approach.
【Keywords】:
【Paper Link】 【Pages】:7941-7948
【Authors】: Thomy Phan ; Lenz Belzner ; Marie Kiermeier ; Markus Friedrich ; Kyrill Schmid ; Claudia Linnhoff-Popien
【Abstract】: State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to openloop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performancememory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.
【Keywords】:
【Paper Link】 【Pages】:7949-7956
【Authors】: Silviu Pitis
【Abstract】: Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor γ
【Keywords】:
【Paper Link】 【Pages】:7957-7965
【Authors】: Yujia Shen ; Anchal Goyanka ; Adnan Darwiche ; Arthur Choi
【Abstract】: Structured Bayesian networks (SBNs) are a recently proposed class of probabilistic graphical models which integrate background knowledge in two forms: conditional independence constraints and Boolean domain constraints. In this paper, we propose the first exact inference algorithm for SBNs, based on compiling a given SBN to a Probabilistic Sentential Decision Diagram (PSDD). We further identify a tractable subclass of SBNs, which have PSDDs of polynomial size. These SBNs yield a tractable model of route distributions, whose structure can be learned from GPS data, using a simple algorithm that we propose. Empirically, we demonstrate the utility of our inference algorithm, showing that it can be an order-ofmagnitude more efficient than more traditional approaches to exact inference. We demonstrate the utility of our learning algorithm, showing that it can learn more accurate models and classifiers from GPS data.
【Keywords】:
【Paper Link】 【Pages】:7966-7974
【Authors】: Andy Shih ; Arthur Choi ; Adnan Darwiche
【Abstract】: We propose an algorithm for compiling Bayesian network classifiers into decision graphs that mimic the input and output behavior of the classifiers. In particular, we compile Bayesian network classifiers into ordered decision graphs, which are tractable and can be exponentially smaller in size than decision trees. This tractability facilitates reasoning about the behavior of Bayesian network classifiers, including the explanation of decisions they make. Our compilation algorithm comes with guarantees on the time of compilation and the size of compiled decision graphs. We apply our compilation algorithm to classifiers from the literature and discuss some case studies in which we show how to automatically explain their decisions and verify properties of their behavior.
【Keywords】:
【Paper Link】 【Pages】:7975-7983
【Authors】: Sriram Srinivasan ; Behrouz Babaki ; Golnoosh Farnadi ; Lise Getoor
【Abstract】: Statistical relational learning models are powerful tools that combine ideas from first-order logic with probabilistic graphical models to represent complex dependencies. Despite their success in encoding large problems with a compact set of weighted rules, performing inference over these models is often challenging. In this paper, we show how to effectively combine two powerful ideas for scaling inference for large graphical models. The first idea, lifted inference, is a wellstudied approach to speeding up inference in graphical models by exploiting symmetries in the underlying problem. The second idea is to frame Maximum a posteriori (MAP) inference as a convex optimization problem and use alternating direction method of multipliers (ADMM) to solve the problem in parallel. A well-studied relaxation to the combinatorial optimization problem defined for logical Markov random fields gives rise to a hinge-loss Markov random field (HLMRF) for which MAP inference is a convex optimization problem. We show how the formalism introduced for coloring weighted bipartite graphs using a color refinement algorithm can be integrated with the ADMM optimization technique to take advantage of the sparse dependency structures of HLMRFs. Our proposed approach, lifted hinge-loss Markov random fields (LHL-MRFs), preserves the structure of the original problem after lifting and solves lifted inference as distributed convex optimization with ADMM. In our empirical evaluation on real-world problems, we observe up to a three times speed up in inference over HL-MRFs.
【Keywords】:
【Paper Link】 【Pages】:7984-7991
【Authors】: Topi Talvitie ; Mikko Koivisto
【Abstract】: Exploring directed acyclic graphs (DAGs) in a Markov equivalence class is pivotal to infer causal effects or to discover the causal DAG via appropriate interventional data. We consider counting and uniform sampling of DAGs that are Markov equivalent to a given DAG. These problems efficiently reduce to counting the moral acyclic orientations of a given undirected connected chordal graph on n vertices, for which we give two algorithms. Our first algorithm requires O(2nn4) arithmetic operations, improving a previous superexponential upper bound. The second requires O(k!2kk2n) operations, where k is the size of the largest clique in the graph; for bounded-degree graphs this bound is linear in n. After a single run, both algorithms enable uniform sampling from the equivalence class at a computational cost linear in the graph size. Empirical results indicate that our algorithms are superior to previously presented algorithms over a range of inputs; graphs with hundreds of vertices and thousands of edges are processed in a second on a desktop computer.
【Keywords】:
【Paper Link】 【Pages】:7992-7999
【Authors】: Hong Xie ; Yongkun Li ; John C. S. Lui
【Abstract】: Feedback-based reputation systems are widely deployed in E-commerce systems. Evidences showed that earning a reputable label (for sellers of such systems) may take a substantial amount of time and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are: (1) The demands from buyers depend on both the discount and reputation; (2) The demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semiMarkov decision process (SMDP) to explore the optimal trade-offs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a QLFP (Q-learning with forward projection) algorithm, which infers the optimal discount from historical transaction data. We conduct experiments on a dataset from to show that our QLFP algorithm improves the profit by as high as 50% over both the classical Q-learning and speedy Q-learning algorithm. Our QLFP algorithm also improves the profit by as high as four times over the case of not providing any price discount.
【Keywords】:
【Paper Link】 【Pages】:8001-8008
【Authors】: Vincent Casser ; Sören Pirk ; Reza Mahjourian ; Anelia Angelova
【Abstract】: Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.
【Keywords】:
【Paper Link】 【Pages】:8009-8016
【Authors】: Changhao Chen ; Yishu Miao ; Chris Xiaoxuan Lu ; Linhai Xie ; Phil Blunsom ; Andrew Markham ; Niki Trigoni
【Abstract】: Inertial information processing plays a pivotal role in egomotion awareness for mobile agents, as inertial measurements are entirely egocentric and not environment dependent. However, they are affected greatly by changes in sensor placement/orientation or motion dynamics, and it is infeasible to collect labelled data from every domain. To overcome the challenges of domain adaptation on long sensory sequences, we propose MotionTransformer - a novel framework that extracts domain-invariant features of raw sequences from arbitrary domains, and transforms to new domains without any paired data. Through the experiments, we demonstrate that it is able to efficiently and effectively convert the raw sequence from a new unlabelled target domain into an accurate inertial trajectory, benefiting from the motion knowledge transferred from the labelled source domain. We also conduct real-world experiments to show our framework can reconstruct physically meaningful trajectories from raw IMU measurements obtained with a standard mobile phone in various attachments.
【Keywords】:
【Paper Link】 【Pages】:8017-8024
【Authors】: Beomjoon Kim ; Leslie Pack Kaelbling ; Tomás Lozano-Pérez
【Abstract】: We propose an actor-critic algorithm that uses past planning experience to improve the efficiency of solving robot task-and-motion planning (TAMP) problems. TAMP planners search for goal-achieving sequences of high-level operator instances specified by both discrete and continuous parameters. Our algorithm learns a policy for selecting the continuous parameters during search, using a small training set generated from the search trees of previously solved instances. We also introduce a novel fixed-length vector representation for world states with varying numbers of objects with different shapes, based on a set of key robot configurations. We demonstrate experimentally that our method learns more efficiently from less data than standard reinforcementlearning approaches and that using a learned policy to guide a planner results in the improvement of planning efficiency.
【Keywords】:
【Paper Link】 【Pages】:8025-8033
【Authors】: Hangxin Liu ; Chi Zhang ; Yixin Zhu ; Chenfanfu Jiang ; Song-Chun Zhu
【Abstract】: This paper presents a mirroring approach, inspired by the neuroscience discovery of the mirror neurons, to transfer demonstrated manipulation actions to robots. Designed to address the different embodiments between a human (demonstrator) and a robot, this approach extends the classic robot Learning from Demonstration (LfD) in the following aspects:i) It incorporates fine-grained hand forces collected by a tactile glove in demonstration to learn robot’s fine manipulative actions; ii) Through model-free reinforcement learning and grammar induction, the demonstration is represented by a goal-oriented grammar consisting of goal states and the corresponding forces to reach the states, independent of robot embodiments; iii) A physics-based simulation engine is applied to emulate various robot actions and mirrors the actions that are functionally equivalent to the human’s in the sense of causing the same state changes by exerting similar forces. Through this approach, a robot reasons about which forces to exert and what goals to achieve to generate actions (i.e., mirroring), rather than strictly mimicking demonstration (i.e., overimitation). Thus the embodiment difference between a human and a robot is naturally overcome. In the experiment, we demonstrate the proposed approach by teaching a real Baxter robot with a complex manipulation task involving haptic feedback—opening medicine bottles.
【Keywords】:
【Paper Link】 【Pages】:8034-8041
【Authors】: Kai Liu ; Hua Wang ; Fei Han ; Hao Zhang
【Abstract】: Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.
【Keywords】:
【Paper Link】 【Pages】:8042-8049
【Authors】: Robert Platt Jr. ; Colin Kohler ; Marcus Gualtieri
【Abstract】: In applications of deep reinforcement learning to robotics, it is often the case that we want to learn pose invariant policies: policies that are invariant to changes in the position and orientation of objects in the world. For example, consider a pegin-hole insertion task. If the agent learns to insert a peg into one hole, we would like that policy to generalize to holes presented in different poses. Unfortunately, this is a challenge using conventional methods. This paper proposes a novel state and action abstraction that is invariant to pose shifts called deictic image maps that can be used with deep reinforcement learning. We provide broad conditions under which optimal abstract policies are optimal for the underlying system. Finally, we show that the method can help solve challenging robotic manipulation problems.
【Keywords】:
【Paper Link】 【Pages】:8050-8057
【Authors】: Aditi Ramachandran ; Sarah Strohkorb Sebo ; Brian Scassellati
【Abstract】: Selecting appropriate tutoring help actions that account for both a student’s content mastery and engagement level is essential for effective human tutors, indicating the critical need for these skills in autonomous tutors. In this work, we formulate the robot-student tutoring help action selection problem as the Assistive Tutor partially observable Markov decision process (AT-POMDP). We designed the AT-POMDP and derived its parameters based on data from a prior robot-student tutoring study. The policy that results from solving the AT-POMDP allows a robot tutor to decide upon the optimal tutoring help action to give a student, while maintaining a belief of the student’s mastery of the material and engagement with the task. This approach is validated through a between-subjects field study, which involved 4th grade students (n=28) interacting with a social robot solving long division problems over five sessions. Students who received help from a robot using the AT-POMDP policy demonstrated significantly greater learning gains than students who received help from a robot with a fixed help action selection policy. Our results demonstrate that this robust computational framework can be used effectively to deliver diverse and personalized tutoring support over time for students.
【Keywords】:
【Paper Link】 【Pages】:8058-8065
【Authors】: Zhi-Xuan Tan ; Jake Brawer ; Brian Scassellati
【Abstract】: The ability for autonomous agents to learn and conform to human norms is crucial for their safety and effectiveness in social environments. While recent work has led to frameworks for the representation and inference of simple social rules, research into norm learning remains at an exploratory stage. Here, we present a robotic system capable of representing, learning, and inferring ownership relations and norms. Ownership is represented as a graph of probabilistic relations between objects and their owners, along with a database of predicate-based norms that constrain the actions permissible on owned objects. To learn these norms and relations, our system integrates (i) a novel incremental norm learning algorithm capable of both one-shot learning and induction from specific examples, (ii) Bayesian inference of ownership relations in response to apparent rule violations, and (iii) perceptbased prediction of an object’s likely owners. Through a series of simulated and real-world experiments, we demonstrate the competence and flexibility of the system in performing object manipulation tasks that require a variety of norms to be followed, laying the groundwork for future research into the acquisition and application of social norms.
【Keywords】:
【Paper Link】 【Pages】:8066-8074
【Authors】: Xingyu Zhao ; Valentin Robu ; David Flynn ; Fateme Dinmohammadi ; Michael Fisher ; Matt Webster
【Abstract】: Robots are increasingly used to carry out critical missions in extreme environments that are hazardous for humans. This requires a high degree of operational autonomy under uncertain conditions, and poses new challenges for assuring the robot’s safety and reliability. In this paper, we develop a framework for probabilistic model checking on a layered Markov model to verify the safety and reliability requirements of such robots, both at pre-mission stage and during runtime. Two novel estimators based on conservative Bayesian inference and imprecise probability model with sets of priors are introduced to learn the unknown transition parameters from operational data. We demonstrate our approach using data from a real-world deployment of unmanned underwater vehicles in extreme environments.
【Keywords】:
【Paper Link】 【Pages】:8076-8084
【Authors】: Manoj Acharya ; Kushal Kafle ; Christopher Kanan
【Abstract】: Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world’s largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields stateof-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.
【Keywords】:
【Paper Link】 【Pages】:8085-8093
【Authors】: Umar Asif ; Jianbin Tang ; Stefan Harrer
【Abstract】: This paper presents Densely Supervised Grasp Detector (DSGD), a deep learning framework which combines CNN structures with layer-wise feature fusion and produces grasps and their confidence scores at different levels of the image hierarchy (i.e., global-, region-, and pixel-levels). Specifically, at the global-level, DSGD uses the entire image information to predict a grasp. At the region-level, DSGD uses a region proposal network to identify salient regions in the image and uses a grasp prediction network to generate segmentations and their corresponding grasp poses of the salient regions. At the pixel-level, DSGD uses a fully convolutional network and predicts a grasp and its confidence at every pixel. During inference, DSGD selects the most confident grasp as the output. This selection from hierarchically generated grasp candidates overcomes limitations of the individual models. DSGD outperforms state-of-the-art methods on the Cornell grasp dataset in terms of grasp accuracy. Evaluation on a multi-object dataset and real-world robotic grasping experiments show that DSGD produces highly stable grasps on a set of unseen objects in new environments. It achieves 97% grasp detection accuracy and 90% robotic grasping success rate with real-time inference speed.
【Keywords】:
【Paper Link】 【Pages】:8094-8101
【Authors】: Seung-Hwan Bae
【Abstract】: Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection.In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.
【Keywords】:
【Paper Link】 【Pages】:8102-8109
【Authors】: Hedi Ben-younes ; Rémi Cadène ; Nicolas Thome ; Matthieu Cord
【Abstract】: Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOCK, a new multimodal fusion based on the block-superdiagonal tensor decomposition. It leverages the notion of block-term ranks, which generalizes both concepts of rank and mode ranks for tensors, already used for multimodal fusion. It allows to define new ways for optimizing the tradeoff between the expressiveness and complexity of the fusion model, and is able to represent very fine interactions between modalities while maintaining powerful mono-modal representations. We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities. Through extensive experiments, we show that BLOCK compares favorably with respect to state-of-the-art multimodal fusion models for both VQA and VRD tasks. Our code is available at https://github.com/Cadene/block.bootstrap.pytorch.
【Keywords】:
【Paper Link】 【Pages】:8110-8117
【Authors】: Yi Bin ; Yang Yang ; Chaofan Tao ; Zi Huang ; Jingjing Li ; Heng Tao Shen
【Abstract】: Inferring the interactions between objects, a.k.a visual relationship detection, is a crucial point for vision understanding, which captures more definite concepts than object detection. Most previous work that treats the interaction between a pair of objects as a one way fail to exploit the mutual relation between objects, which is essential to modern visual application. In this work, we propose a mutual relation net, dubbed MR-Net, to explore the mutual relation between paired objects for visual relationship detection. Specifically, we construct a mutual relation space to model the mutual interaction of paired objects, and employ linear constraint to optimize the mutual interaction, which is called mutual relation learning. Our mutual relation learning does not introduce any parameters, and can adapt to improve the performance of other methods. In addition, we devise a semantic ranking loss to discriminatively penalize predicates with semantic similarity, which is ignored by traditional loss function (e.g., cross entropy with softmax). Then, our MR-Net optimizes the mutual relation learning together with semantic ranking loss with a siamese network. The experimental results on two commonly used datasets (VG and VRD) demonstrate the superior performance of the proposed approach.
【Keywords】:
【Paper Link】 【Pages】:8118-8125
【Authors】: Yijun Cai ; Haoxin Li ; Jianfang Hu ; Wei-Shi Zheng
【Abstract】: Predicting action class from partially observed videos, which is known as action prediction, is an important task in computer vision field with many applications. The challenge for action prediction mainly lies in the lack of discriminative action information for the partially observed videos. To tackle this challenge, in this work, we propose to transfer action knowledge learned from fully observed videos for improving the prediction of partially observed videos. Specifically, we develop a two-stage learning framework for action knowledge transfer. At the first stage, we learn feature embeddings and discriminative action classifier from full videos. The knowledge in the learned embeddings and classifier is then transferred to the partial videos at the second stage. Our experiments on the UCF-101 and HMDB-51 datasets show that the proposed action knowledge transfer method can significantly improve the performance of action prediction, especially for the actions with small observation ratios (e.g., 10%). We also experimentally illustrate that our method outperforms all the state-of-the-art action prediction systems.
【Keywords】:
【Paper Link】 【Pages】:8126-8133
【Authors】: Hanqing Chao ; Yiwei He ; Junping Zhang ; Jianfeng Feng
【Abstract】: As a unique biometric feature that can be recognized at a distance, gait has broad applications in crime prevention, forensic identification and social security. To portray a gait, existing gait recognition methods utilize either a gait template, where temporal information is hard to preserve, or a gait sequence, which must keep unnecessary sequential constraints and thus loses the flexibility of gait recognition. In this paper we present a novel perspective, where a gait is regarded as a set consisting of independent frames. We propose a new network named GaitSet to learn identity information from the set. Based on the set perspective, our method is immune to permutation of frames, and can naturally integrate frames from different videos which have been filmed under different scenarios, such as diverse viewing angles, different clothes/carrying conditions. Experiments show that under normal walking conditions, our single-model method achieves an average rank-1 accuracy of 95.0% on the CASIA-B gait dataset and an 87.1% accuracy on the OU-MVLP gait dataset. These results represent new state-of-the-art recognition accuracy. On various complex scenarios, our model exhibits a significant level of robustness. It achieves accuracies of 87.2% and 70.4% on CASIA-B under bag-carrying and coat-wearing walking conditions, respectively. These outperform the existing best methods by a large margin. The method presented can also achieve a satisfactory accuracy with a small number of frames in a test sample, e.g., 82.5% on CASIA-B with only 7 frames. The source code has been released at https://github.com/AbnerHqC/GaitSet.
【Keywords】:
【Paper Link】 【Pages】:8134-8141
【Authors】: Binghui Chen ; Weihong Deng
【Abstract】: Deep metric learning has been widely applied in many computer vision tasks, and recently, it is more attractive in zeroshot image retrieval and clustering (ZSRC) where a good embedding is requested such that the unseen classes can be distinguished well. Most existing works deem this ’good’ embedding just to be the discriminative one and thus race to devise powerful metric objectives or hard-sample mining strategies for leaning discriminative embedding. However, in this paper, we first emphasize that the generalization ability is a core ingredient of this ’good’ embedding as well and largely affects the metric performance in zero-shot settings as a matter of fact. Then, we propose the Energy Confused Adversarial Metric Learning (ECAML) framework to explicitly optimize a robust metric. It is mainly achieved by introducing an interesting Energy Confusion regularization term, which daringly breaks away from the traditional metric learning idea of discriminative objective devising, and seeks to ’confuse’ the learned model so as to encourage its generalization ability by reducing overfitting on the seen classes. We train this confusion term together with the conventional metric objective in an adversarial manner. Although it seems weird to ’confuse’ the network, we show that our ECAML indeed serves as an efficient regularization technique for metric learning and is applicable to various conventional metric methods. This paper empirically and experimentally demonstrates the importance of learning embedding with good generalization, achieving state-of-theart performances on the popular CUB, CARS, Stanford Online Products and In-Shop datasets for ZSRC tasks. Code available at http://www.bhchen.cn/.
【Keywords】:
【Paper Link】 【Pages】:8142-8150
【Authors】: Chen Chen ; Shuai Mu ; Wanpeng Xiao ; Zexiong Ye ; Liesi Wu ; Qi Ju
【Abstract】: In this paper, we propose a novel conditional-generativeadversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some “discriminator” networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNNbased structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators.
【Keywords】:
【Paper Link】 【Pages】:8151-8158
【Authors】: Cheng-Kuan Chen ; Zhu Feng Pan ; Ming-Yu Liu ; Min Sun
【Abstract】: Most of the existing works on image description focus on generating expressive descriptions. The only few works that are dedicated to generating stylish (e.g., romantic, lyric, etc.) descriptions suffer from limited style variation and content digression. To address these limitations, we propose a controllable stylish image description generation model. It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions. Moreover, it enables users to generate various stylish descriptions by plugging in style-specific parameters to include new styles into the existing model. We achieve this capability via a novel layer normalization layer design, which we will refer to as the Domain Layer Norm (DLN). Extensive experimental validation and user study on various stylish image description generation tasks are conducted to show the competitive advantages of the proposed model.
【Keywords】:
【Paper Link】 【Pages】:8159-8166
【Authors】: Ding-Jie Chen ; Jui-Ting Chien ; Hwann-Tzong Chen ; Tyng-Luh Liu
【Abstract】: This paper presents a “learning to learn” approach to figureground image segmentation. By exploring webly-abundant images of specific visual effects, our method can effectively learn the visual-effect internal representations in an unsupervised manner and uses this knowledge to differentiate the figure from the ground in an image. Specifically, we formulate the meta-learning process as a compositional image editing task that learns to imitate a certain visual effect and derive the corresponding internal representation. Such a generative process can help instantiate the underlying figure-ground notion and enables the system to accomplish the intended image segmentation. Whereas existing generative methods are mostly tailored to image synthesis or style transfer, our approach offers a flexible learning mechanism to model a general concept of figure-ground segmentation from unorganized images that have no explicit pixel-level annotations. We validate our approach via extensive experiments on six datasets to demonstrate that the proposed model can be end-to-end trained without ground-truth pixel labeling yet outperforms the existing methods of unsupervised segmentation tasks.
【Keywords】:
【Paper Link】 【Pages】:8167-8174
【Authors】: Jingwen Chen ; Yingwei Pan ; Yehao Li ; Ting Yao ; Hongyang Chao ; Tao Mei
【Abstract】: It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence to sequence manner via Recurrent Neural Network (RNN). Nevertheless, the training of RNN still suffers to some degree from vanishing/exploding gradient problem, making the optimization difficult. Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations. In this paper, we present a novel design — Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both encoder and decoder networks for video captioning. Technically, we exploit convolutional block structures that compute intermediate states of a fixed number of inputs and stack several blocks to capture long-term relationships. The structure in encoder is further equipped with temporal deformable convolution to enable free-form deformation of temporal sampling. Our model also capitalizes on temporal attention mechanism for sentence generation. Extensive experiments are conducted on both MSVD and MSR-VTT video captioning datasets, and superior results are reported when comparing to conventional RNN-based encoder-decoder techniques. More remarkably, TDConvED increases CIDEr-D performance from 58.8% to 67.2% on MSVD.
【Keywords】:
【Paper Link】 【Pages】:8175-8182
【Authors】: Jingyuan Chen ; Lin Ma ; Xinpeng Chen ; Zequn Jie ; Jiebo Luo
【Abstract】: In this paper, we consider the task of natural language video localization (NLVL): given an untrimmed video and a natural language description, the goal is to localize a segment in the video which semantically corresponds to the given natural language description. We propose a localizing network (LNet), working in an end-to-end fashion, to tackle the NLVL task. We first match the natural sentence and video sequence by cross-gated attended recurrent networks to exploit their fine-grained interactions and generate a sentence-aware video representation. A self interactor is proposed to perform crossframe matching, which dynamically encodes and aggregates the matching evidences. Finally, a boundary model is proposed to locate the positions of video segments corresponding to the natural sentence description by predicting the starting and ending points of the segment. Extensive experiments conducted on the public TACoS and DiDeMo datasets demonstrate that our proposed model performs effectively and efficiently against the state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:8183-8190
【Authors】: Junjie Chen ; William K. Cheung
【Abstract】: Quantization has been widely adopted for large-scale multimedia retrieval due to its effectiveness of coding highdimensional data. Deep quantization models have been demonstrated to achieve the state-of-the-art retrieval accuracy. However, training the deep models given a large-scale database is highly time-consuming as a large amount of parameters are involved. Existing deep quantization methods often sample only a subset from the database for training, which may end up with unsatisfactory retrieval performance as a large portion of label information is discarded. To alleviate this problem, we propose a novel model called Similarity Preserving Deep Asymmetric Quantization (SPDAQ) which can directly learn the compact binary codes and quantization codebooks for all the items in the database efficiently. To do that, SPDAQ makes use of an image subset as well as the label information of all the database items so the image subset items and the database items are mapped to two different but correlated distributions, where the label similarity can be well preserved. An efficient optimization algorithm is proposed for the learning. Extensive experiments conducted on four widely-used benchmark datasets demonstrate the superiority of our proposed SPDAQ model.
【Keywords】:
【Paper Link】 【Pages】:8191-8198
【Authors】: Shaoxiang Chen ; Yu-Gang Jiang
【Abstract】: Sequence-to-sequence models incorporated with attention mechanism have shown promising improvements on video captioning. While there is rich information both inside and between frames, spatial attention is rarely explored and motion information is usually handled by 3D-CNNs as just another modality for fusion. On the other hand, researches about human perception suggest that apparent motion can attract attention. Motivated by this, we aim to learn spatial attention on video frames under the guidance of motion information for caption generation. We present a novel video captioning framework by utilizing Motion Guided Spatial Attention (MGSA). The proposed MGSA exploits the motion between video frames by learning spatial attention from stacked optical flow images with a custom CNN. To further relate the spatial attention maps of video frames, we designed a Gated Attention Recurrent Unit (GARU) to adaptively incorporate previous attention maps. The whole framework can be trained in an end-to-end manner. We evaluate our approach on two benchmark datasets, MSVD and MSR-VTT. The experiments show that our designed model can generate better video representation and state of the art results are obtained under popular evaluation metrics such as BLEU@4, CIDEr, and METEOR.
【Keywords】:
【Paper Link】 【Pages】:8199-8206
【Authors】: Shaoxiang Chen ; Yu-Gang Jiang
【Abstract】: This paper presents an efficient algorithm to tackle temporal localization of activities in videos via sentence queries. The task differs from traditional action localization in three aspects: (1) Activities are combinations of various kinds of actions and may span a long period of time. (2) Sentence queries are not limited to a predefined list of classes. (3) The videos usually contain multiple different activity instances. Traditional proposal-based approaches for action localization that only consider the class-agnostic “actionness” of video snippets are insufficient to tackle this task. We propose a novel Semantic Activity Proposal (SAP) which integrates the semantic information of sentence queries into the proposal generation process to get discriminative activity proposals. Visual and semantic information are jointly utilized for proposal ranking and refinement. We evaluate our algorithm on the TACoS dataset and the Charades-STA dataset. Experimental results show that our algorithm outperforms existing methods on both datasets, and at the same time reduces the number of proposals by a factor of at least 10.
【Keywords】:
【Paper Link】 【Pages】:8207-8214
【Authors】: Shizhe Chen ; Qin Jin ; Alexander G. Hauptmann
【Abstract】: Bilingual lexicon induction, translating words from the source language to the target language, is a long-standing natural language processing task. Recent endeavors prove that it is promising to employ images as pivot to learn the lexicon induction without reliance on parallel corpora. However, these vision-based approaches simply associate words with entire images, which are constrained to translate concrete words and require object-centered images. We humans can understand words better when they are within a sentence with context. Therefore, in this paper, we propose to utilize images and their associated captions to address the limitations of previous approaches. We propose a multi-lingual caption model trained with different mono-lingual multimodal data to map words in different languages into joint spaces. Two types of word representation are induced from the multi-lingual caption model: linguistic features and localized visual features. The linguistic feature is learned from the sentence contexts with visual semantic constraints, which is beneficial to learn translation for words that are less visual-relevant. The localized visual feature is attended to the region in the image that correlates to the word, so that it alleviates the image restriction for salient visual representation. The two types of features are complementary for word translation. Experimental results on multiple language pairs demonstrate the effectiveness of our proposed method, which substantially outperforms previous vision-based approaches without using any parallel sentences or supervision of seed word pairs.
【Keywords】:
【Paper Link】 【Pages】:8215-8222
【Authors】: Yun-Chun Chen ; Yu-Jhe Li ; Xiaofei Du ; Yu-Chiang Frank Wang
【Abstract】: Person re-identification (re-ID) solves the task of matching images across cameras and is among the research topics in vision community. Since query images in real-world scenarios might suffer from resolution loss, how to solve the resolution mismatch problem during person re-ID becomes a practical problem. Instead of applying separate image super-resolution models, we propose a novel network architecture of Resolution Adaptation and re-Identification Network (RAIN) to solve cross-resolution person re-ID. Advancing the strategy of adversarial learning, we aim at extracting resolution-invariant representations for re-ID, while the proposed model is learned in an end-to-end training fashion. Our experiments confirm that the use of our model can recognize low-resolution query images, even if the resolution is not seen during training. Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.
【Keywords】:
【Paper Link】 【Pages】:8223-8230
【Authors】: Saheb Chhabra ; Puspita Majumdar ; Mayank Vatsa ; Richa Singh
【Abstract】: In real-world applications, commercial off-the-shelf systems are utilized for performing automated facial analysis including face recognition, emotion recognition, and attribute prediction. However, a majority of these commercial systems act as black boxes due to the inaccessibility of the model parameters which makes it challenging to fine-tune the models for specific applications. Stimulated by the advances in adversarial perturbations, this research proposes the concept of Data Fine-tuning to improve the classification accuracy of a given model without changing the parameters of the model. This is accomplished by modeling it as data (image) perturbation problem. A small amount of “noise” is added to the input with the objective of minimizing the classification loss without affecting the (visual) appearance. Experiments performed on three publicly available datasets LFW, CelebA, and MUCT, demonstrate the effectiveness of the proposed concept.
【Keywords】:
【Paper Link】 【Pages】:8231-8238
【Authors】: Cheng Chi ; Shifeng Zhang ; Junliang Xing ; Zhen Lei ; Stan Z. Li ; Xudong Zou
【Abstract】: High performance face detection remains a very challenging problem, especially when there exists many tiny faces. This paper presents a novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel twostep classification and regression operations selectively into an anchor-based face detector to reduce false positives and improve location accuracy simultaneously. In particular, the SRN consists of two modules: the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module. The STC aims to filter out most simple negative anchors from low level detection layers to reduce the search space for the subsequent classifier, while the STR is designed to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the subsequent regressor. Moreover, we design a Receptive Field Enhancement (RFE) block to provide more diverse receptive field, which helps to better capture faces in some extreme poses. As a consequence, the proposed SRN detector achieves state-of-the-art performance on all the widely used face detection benchmarks, including AFW, PASCAL face, FDDB, and WIDER FACE datasets. Codes will be released to facilitate further studies on the face detection problem.
【Keywords】:
【Paper Link】 【Pages】:8239-8246
【Authors】: Zhongying Deng ; Xiaojiang Peng ; Yu Qiao
【Abstract】: Heterogeneous Face Recognition (HFR) is a challenging task due to large modality discrepancy as well as insufficient training images in certain modalities. In this paper, we propose a new two-branch network architecture, termed as Residual Compensation Networks (RCN), to learn separated features for different modalities in HFR. The RCN incorporates a residual compensation (RC) module and a modality discrepancy loss (MD loss) into traditional convolutional neural networks. The RC module reduces modal discrepancy by adding compensation to one of the modalities so that its representation can be close to the other modality. The MD loss alleviates modal discrepancy by minimizing the cosine distance between different modalities. In addition, we explore different architectures and positions for the RC module, and evaluate different transfer learning strategies for HFR. Extensive experiments on IIIT-D Viewed Sketch, Forensic Sketch, CASIA NIR-VIS 2.0 and CUHK NIR-VIS show that our RCN outperforms other state-of-the-art methods significantly.
【Keywords】:
【Paper Link】 【Pages】:8247-8254
【Authors】: Wenkai Dong ; Zhaoxiang Zhang ; Tieniu Tan
【Abstract】: Deep learning based methods have achieved remarkable progress in action recognition. Existing works mainly focus on designing novel deep architectures to achieve video representations learning for action recognition. Most methods treat sampled frames equally and average all the frame-level predictions at the testing stage. However, within a video, discriminative actions may occur sparsely in a few frames and most other frames are irrelevant to the ground truth and may even lead to a wrong prediction. As a result, we think that the strategy of selecting relevant frames would be a further important key to enhance the existing deep learning based action recognition. In this paper, we propose an attentionaware sampling method for action recognition, which aims to discard the irrelevant and misleading frames and preserve the most discriminative frames. We formulate the process of mining key frames from videos as a Markov decision process and train the attention agent through deep reinforcement learning without extra labels. The agent takes features and predictions from the baseline model as input and generates importance scores for all frames. Moreover, our approach is extensible, which can be applied to different existing deep learning based action recognition models. We achieve very competitive action recognition performance on two widely used action recognition datasets.
【Keywords】:
【Paper Link】 【Pages】:8255-8262
【Authors】: Xuan Dong ; Weixin Li ; Xiaojie Wang ; Yunhong Wang
【Abstract】: In the monochrome-color dual-lens system, the gray image captured by the monochrome camera has better quality than the color image from the color camera, but does not have color information. To get high-quality color images, it is desired to colorize the gray image with the color image as reference. Related works usually use hand-crafted methods to search for the best-matching pixel in the reference image for each pixel in the input gray image, and copy the color of the best-matching pixel as the result. We propose a novel deep convolution network to solve the colorization problem in an end-to-end way. Based on our observation that, for each pixel in the input image, there usually exist multiple pixels in the reference image that have the correct colors, our method performs weighted average of colors of the candidate pixels in the reference image to utilize more candidate pixels with correct colors. The weight values between pixels in the input image and the reference image are obtained by learning a weight volume using deep feature representations, where an attention operation is proposed to focus on more useful candidate pixels and a 3-D regulation is performed to learn with context information. In addition, to correct wrongly colorized pixels in occlusion regions, we propose a color residue joint learning module to correct the colorization result with the input gray image as guidance. We evaluate our method on the Scene Flow, Cityscapes, Middlebury, and Sintel datasets. Experimental results show that our method largely outperforms the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8263-8270
【Authors】: Hehe Fan ; Linchao Zhu ; Yi Yang
【Abstract】: Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities. The core of this problem involves moving object capture and future motion prediction. While object capture specifies which objects are moving in videos, motion prediction describes their future dynamics. Motivated by this analysis, we propose a Cubic Long Short-Term Memory (CubicLSTM) unit for video prediction. CubicLSTM consists of three branches, i.e., a spatial branch for capturing moving objects, a temporal branch for processing motions, and an output branch for combining the first two branches to generate predicted frames. Stacking multiple CubicLSTM units along the spatial branch and output branch, and then evolving along the temporal branch can form a cubic recurrent neural network (CubicRNN). Experiment shows that CubicRNN produces more accurate video predictions than prior methods on both synthetic and real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:8271-8278
【Authors】: Kuncheng Fang ; Lian Zhou ; Cheng Jin ; Yuejie Zhang ; Kangnian Weng ; Tao Zhang ; Weiguo Fan
【Abstract】: Automatically generating natural language description for video is an extremely complicated and challenging task. To tackle the obstacles of traditional LSTM-based model for video captioning, we propose a novel architecture to generate the optimal descriptions for videos, which focuses on constructing a new network structure that can generate sentences superior to the basic model with LSTM, and establishing special attention mechanisms that can provide more useful visual information for caption generation. This scheme discards the traditional LSTM, and exploits the fully convolutional network with coarse-to-fine and inherited attention designed according to the characteristics of fully convolutional structure. Our model cannot only outperform the basic LSTM-based model, but also achieve the comparable performance with those of state-of-the-art methods
【Keywords】:
【Paper Link】 【Pages】:8279-8286
【Authors】: Yutong Feng ; Yifan Feng ; Haoxuan You ; Xibin Zhao ; Yue Gao
【Abstract】: Mesh is an important and powerful type of data for 3D shapes and widely studied in the field of computer vision and computer graphics. Regarding the task of 3D shape representation, there have been extensive research efforts concentrating on how to represent 3D shapes well using volumetric grid, multi-view and point cloud. However, there is little effort on using mesh data in recent years, due to the complexity and irregularity of mesh data. In this paper, we propose a mesh neural network, named MeshNet, to learn 3D shape representation from mesh data. In this method, face-unit and feature splitting are introduced, and a general architecture with available and effective blocks are proposed. In this way, MeshNet is able to solve the complexity and irregularity problem of mesh and conduct 3D shape representation well. We have applied the proposed MeshNet method in the applications of 3D shape classification and retrieval. Experimental results and comparisons with the state-of-the-art methods demonstrate that the proposed MeshNet can achieve satisfying 3D shape classification and retrieval performance, which indicates the effectiveness of the proposed method on 3D shape representation.
【Keywords】:
【Paper Link】 【Pages】:8287-8294
【Authors】: Yang Fu ; Xiaoyang Wang ; Yunchao Wei ; Thomas Huang
【Abstract】: In this work, we propose a novel Spatial-Temporal Attention (STA) approach to tackle the large-scale person reidentification task in videos. Different from the most existing methods, which simply compute representations of video clips using frame-level aggregation (e.g. average pooling), the proposed STA adopts a more effective way for producing robust clip-level feature representation. Concretely, our STA fully exploits those discriminative parts of one target person in both spatial and temporal dimensions, which results in a 2-D attention score matrix via inter-frame regularization to measure the importances of spatial parts across different frames. Thus, a more robust clip-level feature representation can be generated according to a weighted sum operation guided by the mined 2-D attention score matrix. In this way, the challenging cases for video-based person re-identification such as pose variation and partial occlusion can be well tackled by the STA. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMCVideoReID. In particular, the mAP reaches 87.7% on MARS, which significantly outperforms the state-of-the-arts with a large margin of more than 11.6%.
【Keywords】:
【Paper Link】 【Pages】:8295-8302
【Authors】: Yang Fu ; Yunchao Wei ; Yuqian Zhou ; Honghui Shi ; Gao Huang ; Xinchao Wang ; Zhiqiang Yao ; Thomas Huang
【Abstract】: Despite the remarkable progress in person re-identification (Re-ID), such approaches still suffer from the failure cases where the discriminative body parts are missing. To mitigate this type of failure, we propose a simple yet effective Horizontal Pyramid Matching (HPM) approach to fully exploit various partial information of a given person, so that correct person candidates can be identified even if some key parts are missing. With HPM, we make the following contributions to produce more robust feature representations for the Re-ID task: 1) we learn to classify using partial feature representations at different horizontal pyramid scales, which successfully enhance the discriminative capabilities of various person parts; 2) we exploit average and max pooling strategies to account for person-specific discriminative information in a global-local manner. To validate the effectiveness of our proposed HPM method, extensive experiments are conducted on three popular datasets including Market-1501, DukeMTMCReID and CUHK03. Respectively, we achieve mAP scores of 83.1%, 74.5% and 59.7% on these challenging benchmarks, which are the new state-of-the-arts.
【Keywords】:
【Paper Link】 【Pages】:8303-8311
【Authors】: Junyu Gao ; Tianzhu Zhang ; Changsheng Xu
【Abstract】: Recently, with the ever-growing action categories, zero-shot action recognition (ZSAR) has been achieved by automatically mining the underlying concepts (e.g., actions, attributes) in videos. However, most existing methods only exploit the visual cues of these concepts but ignore external knowledge information for modeling explicit relationships between them. In fact, humans have remarkable ability to transfer knowledge learned from familiar classes to recognize unfamiliar classes. To narrow the knowledge gap between existing methods and humans, we propose an end-to-end ZSAR framework based on a structured knowledge graph, which can jointly model the relationships between action-attribute, action-action, and attribute-attribute. To effectively leverage the knowledge graph, we design a novel Two-Stream Graph Convolutional Network (TS-GCN) consisting of a classifier branch and an instance branch. Specifically, the classifier branch takes the semantic-embedding vectors of all the concepts as input, then generates the classifiers for action categories. The instance branch maps the attribute embeddings and scores of each video instance into an attribute-feature space. Finally, the generated classifiers are evaluated on the attribute features of each video, and a classification loss is adopted for optimizing the whole network. In addition, a self-attention module is utilized to model the temporal information of videos. Extensive experimental results on three realistic action benchmarks Olympic Sports, HMDB51 and UCF101 demonstrate the favorable performance of our proposed framework.
【Keywords】:
【Paper Link】 【Pages】:8312-8319
【Authors】: Lianli Gao ; Daiyuan Chen ; Jingkuan Song ; Xing Xu ; Dongxiang Zhang ; Heng Tao Shen
【Abstract】: Generating photo-realistic images conditioned on semantic text descriptions is a challenging task in computer vision field. Due to the nature of hierarchical representations learned in CNN, it is intuitive to utilize richer convolutional features to improve text-to-image synthesis. In this paper, we propose Perceptual Pyramid Adversarial Network (PPAN) to directly synthesize multi-scale images conditioned on texts in an adversarial way. Specifically, we design one pyramid generator and three independent discriminators to synthesize and regularize multi-scale photo-realistic images in one feed-forward process. At each pyramid level, our method takes coarse-resolution features as input, synthesizes highresolution images, and uses convolutions for up-sampling to finer level. Furthermore, the generator adopts the perceptual loss to enforce semantic similarity between the synthesized image and the ground truth, while a multi-purpose discriminator encourages semantic consistency, image fidelity and class invariance. Experimental results show that our PPAN sets new records for text-to-image synthesis on two benchmark datasets: CUB (i.e., 4.38 Inception Score and .290 Visual-semantic Similarity) and Oxford-102 (i.e., 3.52 Inception Score and .297 Visual-semantic Similarity).
【Keywords】:
【Paper Link】 【Pages】:8320-8327
【Authors】: Lianli Gao ; Kaixuan Fan ; Jingkuan Song ; Xianglong Liu ; Xing Xu ; Heng Tao Shen
【Abstract】: In daily life, deliberation is a common behavior for human to improve or refine their work (e.g., writing, reading and drawing). To date, encoder-decoder framework with attention mechanisms has achieved great progress for image captioning. However, such framework is in essential an one-pass forward process while encoding to hidden states and attending to visual features, but lacks of the deliberation action. The learned hidden states and visual attention are directly used to predict the final captions without further polishing. In this paper, we present a novel Deliberate Residual Attention Network, namely DA, for image captioning. The first-pass residual-based attention layer prepares the hidden states and visual attention for generating a preliminary version of the captions, while the second-pass deliberate residual-based attention layer refines them. Since the second-pass is based on the rough global features captured by the hidden layer and visual attention in the first-pass, our DA has the potential to generate better sentences. We further equip our DA with discriminative loss and reinforcement learning to disambiguate image/caption pairs and reduce exposure bias. Our model improves the state-of-the-arts on the MSCOCO dataset and reaches 37.5% BELU-4, 28.5% METEOR and 125.6% CIDEr. It also outperforms the-state-ofthe-arts from 25.1% BLEU-4, 20.4% METEOR and 53.1% CIDEr to 29.4% BLEU-4, 23.0% METEOR and 66.6% on the Flickr30K dataset.
【Keywords】:
【Paper Link】 【Pages】:8328-8335
【Authors】: Zhanning Gao ; Le Wang ; Qilin Zhang ; Zhenxing Niu ; Nanning Zheng ; Gang Hua
【Abstract】: We propose a temporal action detection by spatial segmentation framework, which simultaneously categorize actions and temporally localize action instances in untrimmed videos. The core idea is the conversion of temporal detection task into a spatial semantic segmentation task. Firstly, the video imprint representation is employed to capture the spatial/temporal interdependences within/among frames and represent them as spatial proximity in a feature space. Subsequently, the obtained imprint representation is spatially segmented by a fully convolutional network. With such segmentation labels projected back to the video space, both temporal action boundary localization and per-frame spatial annotation can be obtained simultaneously. The proposed framework is robust to variable lengths of untrimmed videos, due to the underlying fixed-size imprint representations. The efficacy of the framework is validated in two public action detection datasets.
【Keywords】:
【Paper Link】 【Pages】:8336-8343
【Authors】: Jie Gu ; Gaofeng Meng ; Cheng Da ; Shiming Xiang ; Chunhong Pan
【Abstract】: Opinion-unaware no-reference image quality assessment (NR-IQA) methods have received many interests recently because they do not require images with subjective scores for training. Unfortunately, it is a challenging task, and thus far no opinion-unaware methods have shown consistently better performance than the opinion-aware ones. In this paper, we propose an effective opinion-unaware NR-IQA method based on reinforcement recursive list-wise ranking. We formulate the NR-IQA as a recursive list-wise ranking problem which aims to optimize the whole quality ordering directly. During training, the recursive ranking process can be modeled as a Markov decision process (MDP). The ranking list of images can be constructed by taking a sequence of actions, and each of them refers to selecting an image for a specific position of the ranking list. Reinforcement learning is adopted to train the model parameters, in which no ground-truth quality scores or ranking lists are necessary for learning. Experimental results demonstrate the superior performance of our approach compared with existing opinion-unaware NR-IQA methods. Furthermore, our approach can compete with the most effective opinion-aware methods. It improves the state-of-the-art by over 2% on the CSIQ benchmark and outperforms most compared opinion-aware models on TID2013.
【Keywords】:
【Paper Link】 【Pages】:8344-8351
【Authors】: Jiaxin Gu ; Ce Li ; Baochang Zhang ; Jungong Han ; Xianbin Cao ; Jianzhuang Liu ; David S. Doermann
【Abstract】: The advancement of deep convolutional neural networks (DCNNs) has driven significant improvement in the accuracy of recognition systems for many computer vision tasks. However, their practical applications are often restricted in resource-constrained environments. In this paper, we introduce projection convolutional neural networks (PCNNs) with a discrete back propagation via projection (DBPP) to improve the performance of binarized neural networks (BNNs). The contributions of our paper include: 1) for the first time, the projection function is exploited to efficiently solve the discrete back propagation problem, which leads to a new highly compressed CNNs (termed PCNNs); 2) by exploiting multiple projections, we learn a set of diverse quantized kernels that compress the full-precision kernels in a more efficient way than those proposed previously; 3) PCNNs achieve the best classification performance compared to other state-ofthe-art BNNs on the ImageNet and CIFAR datasets.
【Keywords】:
【Paper Link】 【Pages】:8352-8359
【Authors】: Shanyan Guan ; Shuo Wen ; Dexin Yang ; Bingbing Ni ; Wendong Zhang ; Jun Tang ; Xiaokang Yang
【Abstract】: We present a practical and effective method for human action transfer. Given a sequence of source action and limited target information, we aim to transfer motion from source to target. Although recent works based on GAN or VAE achieved impressive results for action transfer in 2D, there still exists a lot of problems which cannot be avoided, such as distorted and discontinuous human body shape, blurry cloth texture and so on. In this paper, we try to solve these problems in a novel 3D viewpoint. On the one hand, we design a skeleton-to-3D-mesh generator to generate the 3D model, which achieves huge improvement on appearance reconstruction. Furthermore, we add a temporal connection to improve the smoothness of the model. On the other hand, instead of directly utilizing the image in RGB space, we transform the target appearance information into UV space for further pose transformation. Specially, unlike conventional graphics render method directly projects visible pixels to UV space, our transformation is according to pixel’s semantic information. We perform experiments on Human3.6M and HumanEva-I to evaluate the performance of pose generator. Both qualitative and quantitative results show that our method outperforms methods based on generation method in 2D. Additionally, we compare our render method with graphic methods on Human3.6M and People-snapshot. The comparison results show that our render method is more robust and effective.
【Keywords】:
【Paper Link】 【Pages】:8360-8367
【Authors】: Yuchen Guo ; Guiguang Ding ; Jungong Han ; Xiaohan Ding ; Sicheng Zhao ; Zheng Wang ; Chenggang Yan ; Qionghai Dai
【Abstract】: Zero-shot learning (ZSL) is to build recognition models for previously unseen target classes which have no labeled data for training by transferring knowledge from some other related auxiliary source classes with abundant labeled samples to the target ones with class attributes as the bridge. The key is to learn a similarity based ranking function between samples and class labels using the labeled source classes so that the proper (unseen) class label for a test sample can be identified by the function. In order to learn the function, single-view ranking based loss is widely used which aims to rank the true label prior to the other labels for a training sample. However, we argue that the ranking can be performed from the other view, which aims to place the images belonging to a label before the images from the other classes. Motivated by it, we propose a novel DuAl-view RanKing (DARK) loss for zeroshot learning simultaneously ranking labels for an image by point-to-point metric and ranking images for a label by pointto-set metric, which is capable of better modeling the relationship between images and classes. In addition, we also notice that previous ZSL approaches mostly fail to well exploit the hardness of training samples, either using only very hard ones or using all samples indiscriminately. In this work, we also introduce a sample hardness assessment method to ZSL which assigns different weights to training samples based on their hardness, which leads to a more accurate and robust ZSL model. Experiments on benchmarks demonstrate that DARK outperforms the state-of-the-arts for (generalized) ZSL.
【Keywords】:
【Paper Link】 【Pages】:8368-8375
【Authors】: Yunhui Guo ; Yandong Li ; Liqiang Wang ; Tajana Rosing
【Abstract】: There is a growing interest in designing models that can deal with images from different visual domains. If there exists a universal structure in different visual domains that can be captured via a common parameterization, then we can use a single model for all domains rather than one model per domain. A model aware of the relationships between different domains can also be trained to work on new domains with less resources. However, to identify the reusable structure in a model is not easy. In this paper, we propose a multi-domain learning architecture based on depthwise separable convolution. The proposed approach is based on the assumption that images from different domains share cross-channel correlations but have domain-specific spatial correlations. The proposed model is compact and has minimal overhead when being applied to new domains. Additionally, we introduce a gating mechanism to promote soft sharing between different domains. We evaluate our approach on Visual Decathlon Challenge, a benchmark for testing the ability of multi-domain models. The experiments show that our approach can achieve the highest score while only requiring 50% of the parameters compared with the state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:8376-8384
【Authors】: Zhizhong Han ; Mingyang Shang ; Yu-Shen Liu ; Matthias Zwicker
【Abstract】: In this paper, we present a novel unsupervised representation learning approach for 3D shapes, which is an important research challenge as it avoids the manual effort required for collecting supervised data. Our method trains an RNNbased neural network architecture to solve multiple view inter-prediction tasks for each shape. Given several nearby views of a shape, we define view inter-prediction as the task of predicting the center view between the input views, and reconstructing the input views in a low-level feature space. The key idea of our approach is to implement the shape representation as a shape-specific global memory that is shared between all local view inter-predictions for each shape. Intuitively, this memory enables the system to aggregate information that is useful to better solve the view inter-prediction tasks for each shape, and to leverage the memory as a viewindependent shape representation. Our approach obtains the best results using a combination of L2 and adversarial losses for the view inter-prediction task. We show that VIP-GAN outperforms state-of-the-art methods in unsupervised 3D feature learning on three large-scale 3D shape benchmarks.
【Keywords】:
【Paper Link】 【Pages】:8385-8392
【Authors】: Yi Hao ; Nannan Wang ; Jie Li ; Xinbo Gao
【Abstract】: Person Re-identification(re-ID) has great potential to contribute to video surveillance that automatically searches and identifies people across different cameras. Heterogeneous person re-identification between thermal(infrared) and visible images is essentially a cross-modality problem and important for night-time surveillance application. Current methods usually train a model by combining classification and metric learning algorithms to obtain discriminative and robust feature representations. However, the combined loss function ignored the correlation between classification subspace and feature embedding subspace. In this paper, we use Sphere Softmax to learn a hypersphere manifold embedding and constrain the intra-modality variations and cross-modality variations on this hypersphere. We propose an end-to-end dualstream hypersphere manifold embedding network(HSMEnet) with both classification and identification constraint. Meanwhile, we design a two-stage training scheme to acquire decorrelated features, we refer the HSME with decorrelation as D-HSME. We conduct experiments on two crossmodality person re-identification datasets. Experimental results demonstrate that our method outperforms the state-of-the-art methods on two datasets. On RegDB dataset, rank-1 accuracy is improved from 33.47% to 50.85%, and mAP is improved from 31.83% to 47.00%.
【Keywords】:
【Paper Link】 【Pages】:8393-8400
【Authors】: Dongliang He ; Xiang Zhao ; Jizhou Huang ; Fu Li ; Xiao Liu ; Shilei Wen
【Abstract】: The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016; Gao et al. 2017) while observing only 10 or less clips per video.
【Keywords】:
【Paper Link】 【Pages】:8401-8408
【Authors】: Dongliang He ; Zhichao Zhou ; Chuang Gan ; Fu Li ; Xiao Liu ; Yandong Li ; Limin Wang ; Shilei Wen
【Abstract】: Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatialtemporal network (StNet) architecture for both local and global modeling in videos. Particularly, StNet stacks N successive video frames into a super-image which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatialtemporal structure, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet, which employs a separate channel-wise and temporal-wise convolution over the feature sequence of a video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset.
【Keywords】:
【Paper Link】 【Pages】:8409-8416
【Authors】: Tong He ; Stefano Soatto
【Abstract】: We present a method to infer 3D pose and shape of vehicles from a single image. To tackle this ill-posed problem, we optimize two-scale projection consistency between the generated 3D hypotheses and their 2D pseudo-measurements. Specifically, we use a morphable wireframe model to generate a fine-scaled representation of vehicle shape and pose. To reduce its sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse representation which improves robustness. We also integrate three task priors, including unsupervised monocular depth, a ground plane constraint as well as vehicle shape priors, with forward projection errors into an overall energy function.
【Keywords】:
【Paper Link】 【Pages】:8417-8424
【Authors】: Xiang He ; Sibei Yang ; Guanbin Li ; Haofeng Li ; Huiyou Chang ; Yizhou Yu
【Abstract】: Recent progress in biomedical image segmentation based on deep convolutional neural networks (CNNs) has drawn much attention. However, its vulnerability towards adversarial samples cannot be overlooked. This paper is the first one that discovers that all the CNN-based state-of-the-art biomedical image segmentation models are sensitive to adversarial perturbations. This limits the deployment of these methods in safety-critical biomedical fields. In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks. To this end, non-local context encoder (NLCE) is proposed to model short- and long-range spatial dependencies and encode global contexts for strengthening feature activations by channel-wise attention. The NLCE modules enhance the robustness and accuracy of the non-local context encoding network (NLCEN), which learns robust enhanced pyramid feature representations with NLCE modules, and then integrates the information across different levels. Experiments on both lung and skin lesion segmentation datasets have demonstrated that NLCEN outperforms any other state-of-the-art biomedical image segmentation methods against adversarial attacks. In addition, NLCE modules can be applied to improve the robustness of other CNN-based biomedical image segmentation methods.
【Keywords】:
【Paper Link】 【Pages】:8425-8432
【Authors】: Saihui Hou ; Zilei Wang
【Abstract】: In this work, we propose a novel method named Weighted Channel Dropout (WCD) for the regularization of deep Convolutional Neural Network (CNN). Different from Dropout which randomly selects the neurons to set to zero in the fully-connected layers, WCD operates on the channels in the stack of convolutional layers. Specifically, WCD consists of two steps, i.e., Rating Channels and Selecting Channels, and three modules, i.e., Global Average Pooling, Weighted Random Selection and Random Number Generator. It filters the channels according to their activation status and can be plugged into any two consecutive layers, which unifies the original Dropout and Channel-Wise Dropout. WCD is totally parameter-free and deployed only in training phase with very slight computation cost. The network in test phase remains unchanged and thus the inference cost is not added at all. Besides, when combining with the existing networks, it requires no re-pretraining on ImageNet and thus is well-suited for the application on small datasets. Finally, WCD with VGGNet-16, ResNet-101, Inception-V3 are experimentally evaluated on multiple datasets. The extensive results demonstrate that WCD can bring consistent improvements over the baselines.
【Keywords】:
【Paper Link】 【Pages】:8433-8440
【Authors】: Yuenan Hou ; Zheng Ma ; Chunxiao Liu ; Chen Change Loy
【Abstract】: The training of many existing end-to-end steering angle prediction models heavily relies on steering angles as the supervisory signal. Without learning from much richer contexts, these methods are susceptible to the presence of sharp road curves, challenging traffic conditions, strong shadows, and severe lighting changes. In this paper, we considerably improve the accuracy and robustness of predictions through heterogeneous auxiliary networks feature mimicking, a new and effective training method that provides us with much richer contextual signals apart from steering direction. Specifically, we train our steering angle predictive model by distilling multi-layer knowledge from multiple heterogeneous auxiliary networks that perform related but different tasks, e.g., image segmentation or optical flow estimation. As opposed to multi-task learning, our method does not require expensive annotations of related tasks on the target set. This is made possible by applying contemporary off-the-shelf networks on the target set and mimicking their features in different layers after transformation. The auxiliary networks are discarded after training without affecting the runtime efficiency of our model. Our approach achieves a new state-of-the-art on Udacity and Comma.ai, outperforming the previous best by a large margin of 12.8% and 52.1%1, respectively. Encouraging results are also shown on Berkeley Deep Drive (BDD) dataset.
【Keywords】:
【Paper Link】 【Pages】:8441-8448
【Authors】: Tao Hu ; Pengwan Yang ; Chiliang Zhang ; Gang Yu ; Yadong Mu ; Cees G. M. Snoek
【Abstract】: Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremendous amounts of data. The scarcity of annotated data becomes even more challenging in semantic segmentation since pixellevel annotation in segmentation task is more labor-intensive to acquire. To tackle this issue, we propose an Attentionbased Multi-Context Guiding (A-MCG) network, which consists of three branches: the support branch, the query branch, the feature fusion branch. A key differentiator of A-MCG is the integration of multi-scale context features between support and query branches, enforcing a better guidance from the support set. In addition, we also adopt a spatial attention along the fusion branch to highlight context information from several scales, enhancing self-supervision in one-shot learning. To address the fusion problem in multi-shot learning, Conv-LSTM is adopted to collaboratively integrate the sequential support features to elevate the final accuracy. Our architecture obtains state-of-the-art on unseen classes in a variant of PASCAL VOC12 dataset and performs favorably against previous work with large gains of 1.1%, 1.4% measured in mIoU in the 1-shot and 5-shot setting.
【Keywords】:
【Paper Link】 【Pages】:8449-8456
【Authors】: Jia-Hong Huang ; Cuong Duc Dao ; Modar Alfadly ; Bernard Ghanem
【Abstract】: Deep neural networks have been playing an essential role in many computer vision tasks including Visual Question Answering (VQA). Until recently, the study of their accuracy was the main focus of research but now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating their tolerance to varying noise levels. In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later. In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. We hypothesize that the level of noise is negatively correlated to the similarity of a basic question to the main question. Hence, to apply noise on any given main question, we rank a pool of basic questions based on their similarity by casting this ranking task as a LASSO optimization problem. Then, we propose a novel robustness measure Rscore and two largescale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models.
【Keywords】:
【Paper Link】 【Pages】:8457-8464
【Authors】: Jianglei Huang ; Wengang Zhou
【Abstract】: Target model update plays an important role in visual object tracking. However, performing optimal model update is challenging. In this work, we propose to achieve an optimal target model by learning a transformation matrix from the last target model to the newly generated one, which results into a minimization objective. In this objective, there exists two challenges. The first is that the newly generated target model is unreliable. To overcome this problem, we propose to impose a penalty to limit the distance between the learned target model and the last one. The second is that as time evolves, we can not decide whether the last target model has been corrupted or not. To get out of this dilemma, we propose a reinitialization term. Besides, to control the complexity of the transformation matrix, we also add a regularizer. We find that the optimization formula’s solution, with some simplifications, degenerates to EMA. Finally, despite the simplicity, extensive experiments conducted on several commonly used benchmarks demonstrate the effectiveness of our proposed approach in relatively long term scenarios.
【Keywords】:
【Paper Link】 【Pages】:8465-8472
【Authors】: Qiuyuan Huang ; Zhe Gan ; Asli Çelikyilmaz ; Dapeng Oliver Wu ; Jianfeng Wang ; Xiaodong He
【Abstract】: We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.
【Keywords】:
【Paper Link】 【Pages】:8473-8480
【Authors】: Wei Huang ; Huimin Yu ; Weiwei Zheng ; Jing Zhang
【Abstract】: A novel coordination framework between the segmentation and the recognition is proposed, to conduct the two tasks collaboratively and iteratively. To accomplish the cooperation, objects are expressed in two aspects: shape and appearance, which are learned and leveraged as constraints to the segmentation so that the object segmentation mask will be consistent with the object regions in the image and the knowledge we have. For the shape, a bottom-top-bottom pathway is built using an encoder-decoder network with capsule neurons, where the encoder extracts the features of the shape that used for recognition and the decoder generates reference shapes according to these features and the recognition result. During this procedure, capsule neurons can parse the existence of the object and cope with the interference in the segmentation. The appearance knowledge is utilized in another pathway to assist the segmentation processing. Both the shape and appearance information are dependent on the recognition result, thus allowing the classifier to convey object information to the segmenter. Experiments demonstrate the effectiveness of our framework and model in collaboratively segmenting and recognizing objects that can be recognized using their shapes/shape-patterns.
【Keywords】:
【Paper Link】 【Pages】:8481-8488
【Authors】: Wenlong Huang ; Brian Lai ; Weijian Xu ; Zhuowen Tu
【Abstract】: In this paper, we study the 3D volumetric modeling problem by adopting the Wasserstein introspective neural networks method (WINN) that was previously applied to 2D static images. We name our algorithm 3DWINN which enjoys the same properties as WINN in the 2D case: being simultaneously generative and discriminative. Compared to the existing 3D volumetric modeling approaches, 3DWINN demonstrates competitive results on several benchmarks in both the generation and the classification tasks. In addition to the standard inception score, the Frechet Inception Distance (FID) metric is´ also adopted to measure the quality of 3D volumetric generations. In addition, we study adversarial attacks for volumetric data and demonstrate the robustness of 3DWINN against adversarial examples while achieving appealing results in both classification and generation within a single model. 3DWINN is a general framework and it can be applied to the emerging tasks for 3D object and scene modeling.1
【Keywords】:
【Paper Link】 【Pages】:8489-8496
【Authors】: Yan Huang ; Yang Long ; Liang Wang
【Abstract】: Although image and sentence matching has been widely studied, its intrinsic few-shot problem is commonly ignored, which has become a bottleneck for further performance improvement. In this work, we focus on this challenging problem of few-shot image and sentence matching, and propose a Gated Visual-Semantic Embedding (GVSE) model to deal with it. The model consists of three corporative modules in terms of uncommon VSE, common VSE, and gated metric fusion. The uncommon VSE exploits external auxiliary resources to extract generic features for representing uncommon instances and words in images and sentences, and then integrates them by modeling their semantic relation to obtain global representations for association analysis. To better model other common instances and words in rest content of images and sentences, the common VSE learns their discriminative representations directly from scratch. After obtaining two similarity metrics from the two VSE modules with different advantages, the gated metric fusion module adaptively fuses them by automatically balancing their relative importance. Based on the fused metric, we perform extensive experiments in terms of few-shot and conventional image and sentence matching, and demonstrate the effectiveness of the proposed model by achieving the state-of-the-art results on two public benchmark datasets.
【Keywords】:
【Paper Link】 【Pages】:8497-8504
【Authors】: Yuanjun Huang ; Xianbin Cao ; Xiantong Zhen ; Jungong Han
【Abstract】: Dynamic scene classification is an important yet challenging problem especially with the presence of defected or irrelevant frames due to unconstrained imaging conditions such as illumination, camera motion and irrelevant background. In this paper, we propose the attentive temporal pyramid network (ATP-Net) to establish effective representations of dynamic scenes by extracting and aggregating the most informative and discriminative features. The proposed ATP-Net detects informative features of frames that contain the most relevant information to scenes by a temporal pyramid structure with the incorporated attention mechanism. These frame features are effectively fused by a newly designed kernel aggregation layer based on kernel approximation into a discriminative holistic representations of dynamic scenes. The proposed ATP-Net leverages the strength of attention mechanism to select the most relevant frame features and the ability of kernels to achieve optimal feature fusion for discriminative representations of dynamic scenes. Extensive experiments and comparisons are conducted on three benchmark datasets and the results show our superiority over the state-of-the-art methods on all these three benchmark datasets.
【Keywords】:
【Paper Link】 【Pages】:8505-8512
【Authors】: Zhengyue Huang ; Zhehui Zhao ; Hengguang Zhou ; Xibin Zhao ; Yue Gao
【Abstract】: 3D object retrieval has a compelling demand in the field of computer vision with the rapid development of 3D vision technology and increasing applications of 3D objects. 3D objects can be described in different ways such as voxel, point cloud, and multi-view. Among them, multi-view based approaches proposed in recent years show promising results. Most of them require a fixed predefined camera position setting which provides a complete and uniform sampling of views for objects in the training stage. However, this causes heavy over-fitting problems which make the models failed to generalize well in free camera setting applications, particularly when insufficient views are provided. Experiments show the performance drastically drops when the number of views reduces, hindering these methods from practical applications. In this paper, we investigate the over-fitting issue and remove the constraint of the camera setting. First, two basic feature augmentation strategies Dropout and Dropview are introduced to solve the over-fitting issue, and a more precise and more efficient method named DropMax is proposed after analyzing the drawback of the basic ones. Then, by reducing the over-fitting issue, a camera constraint-free multi-view convolutional neural network named DeepCCFV is constructed. Extensive experiments on both single-modal and cross-modal cases demonstrate the effectiveness of the proposed method in free camera settings comparing with existing state-of-theart 3D object retrieval methods.
【Keywords】:
【Paper Link】 【Pages】:8513-8520
【Authors】: Jianwen Jiang ; Di Bao ; Ziqiang Chen ; Xibin Zhao ; Yue Gao
【Abstract】: 3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.
【Keywords】:
【Paper Link】 【Pages】:8521-8528
【Authors】: Lai Jiang ; Zhe Wang ; Mai Xu ; Zulin Wang
【Abstract】: The transformed domain fearures of images show effectiveness in distinguishing salient and non-salient regions. In this paper, we propose a novel deep complex neural network, named SalDCNN, to predict image saliency by learning features in both pixel and transformed domains. Before proposing Sal-DCNN, we analyze the saliency cues encoded in discrete Fourier transform (DFT) domain. Consequently, we have the following findings: 1) the phase spectrum encodes most saliency cues; 2) a certain pattern of the amplitude spectrum is important for saliency prediction; 3) the transformed domain spectrum is robust to noise and down-sampling for saliency prediction. According to these findings, we develop the structure of SalDCNN, including two main stages: the complex dense encoder and three-stream multi-domain decoder. Given the new SalDCNN structure, the saliency maps can be predicted under the supervision of ground-truth fixation maps in both pixel and transformed domains. Finally, the experimental results show that our Sal-DCNN method outperforms other 8 state-of-theart methods for image saliency prediction on 3 databases.
【Keywords】:
【Paper Link】 【Pages】:8529-8536
【Authors】: Zhengkai Jiang ; Peng Gao ; Chaoxu Guo ; Qian Zhang ; Shiming Xiang ; Chunhong Pan
【Abstract】: Deep convolutional neural networks have achieved great success on various image recognition tasks. However, it is nontrivial to transfer the existing networks to video due to the fact that most of them are developed for static image. Frame-byframe processing is suboptimal because temporal information that is vital for video understanding is totally abandoned. Furthermore, frame-by-frame processing is slow and inefficient, which can hinder the practical usage. In this paper, we propose LWDN (Locally-Weighted Deformable Neighbors) for video object detection without utilizing time-consuming optical flow extraction networks. LWDN can latently align the high-level features between keyframes and keyframes or nonkeyframes. Inspired by (Zhu et al. 2017a) and (Hetang et al. 2017) who propose to aggregate features between keyframes and keyframes, we adopt brain-inspired memory mechanism to propagate and update the memory feature from keyframes to keyframes. We call this process Memory-Guided Propagation. With such a memory mechanism, the discriminative ability of features in keyframes and non-keyframes are both enhanced, which helps to improve the detection accuracy. Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU.
【Keywords】:
【Paper Link】 【Pages】:8537-8544
【Authors】: Yunjae Jung ; Donghyeon Cho ; Dahun Kim ; Sanghyun Woo ; In So Kweon
【Abstract】: In this paper, we address the problem of unsupervised video summarization that automatically extracts key-shots from an input video. Specifically, we tackle two critical issues based on our empirical observations: (i) Ineffective feature learning due to flat distributions of output importance scores for each frame, and (ii) training difficulty when dealing with longlength video inputs. To alleviate the first problem, we propose a simple yet effective regularization loss term called variance loss. The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance. For the second problem, we design a novel two-stream network named Chunk and Stride Network (CSNet) that utilizes local (chunk) and global (stride) temporal view on the video features. Our CSNet gives better summarization results for long-length videos compared to the existing methods. In addition, we introduce an attention mechanism to handle the dynamic information in videos. We demonstrate the effectiveness of the proposed methods by conducting extensive ablation studies and show that our final model achieves new state-of-the-art results on two benchmark datasets.
【Keywords】:
【Paper Link】 【Pages】:8545-8552
【Authors】: Dahun Kim ; Donghyeon Cho ; In So Kweon
【Abstract】: Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all. Recently, this worthwhile stream of study extends to video domain where the cost of human labeling is even more expensive. However, the most of existing methods are still based on 2D CNN architectures that can not directly capture spatio-temporal information for video applications. In this paper, we introduce a new self-supervised task called as Space-Time Cubic Puzzles to train 3D CNNs using large scale video dataset. This task requires a network to arrange permuted 3D spatio-temporal crops. By completing Space-Time Cubic Puzzles, the network learns both spatial appearance and temporal relation of video frames, which is our final goal. In experiments, we demonstrate that our learned 3D representation is well transferred to action recognition tasks, and outperforms state-of-the-art 2D CNN-based competitors on UCF101 and HMDB51 datasets.
【Keywords】:
【Paper Link】 【Pages】:8553-8560
【Authors】: Jogendra Nath Kundu ; Maharshi Gor ; R. Venkatesh Babu
【Abstract】: Human motion prediction model has applications in various fields of computer vision. Without taking into account the inherent stochasticity in the prediction of future pose dynamics, such methods often converges to a deterministic undesired mean of multiple probable outcomes. Devoid of this, we propose a novel probabilistic generative approach called Bidirectional Human motion prediction GAN, or BiHMP-GAN. To be able to generate multiple probable human-pose sequences, conditioned on a given starting sequence, we introduce a random extrinsic factor r, drawn from a predefined prior distribution. Furthermore, to enforce a direct content loss on the predicted motion sequence and also to avoid mode-collapse, a novel bidirectional framework is incorporated by modifying the usual discriminator architecture. The discriminator is trained also to regress this extrinsic factor r, which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. To further regularize the training, we introduce a novel recursive prediction strategy. In spite of being in a probabilistic framework, the enhanced discriminator architecture allows predictions of an intermediate part of pose sequence to be used as a conditioning for prediction of the latter part of the sequence. The bidirectional setup also provides a new direction to evaluate the prediction quality against a given test sequence. For a fair assessment of BiHMP-GAN, we report performance of the generated motion sequence using (i) a critic model trained to discriminate between real and fake motion sequence, and (ii) an action classifier trained on real human motion dynamics. Outcomes of both qualitative and quantitative evaluations, on the probabilistic generations of the model, demonstrate the superiority of BiHMP-GAN over previously available methods.
【Keywords】:
【Paper Link】 【Pages】:8561-8568
【Authors】: Bin Li ; Xi Li ; Zhongfei Zhang ; Fei Wu
【Abstract】: With the representation effectiveness, skeleton-based human action recognition has received considerable research attention, and has a wide range of real applications. In this area, many existing methods typically rely on fixed physicalconnectivity skeleton structure for recognition, which is incapable of well capturing the intrinsic high-order correlations among skeleton joints. In this paper, we propose a novel spatio-temporal graph routing (STGR) scheme for skeletonbased action recognition, which adaptively learns the intrinsic high-order connectivity relationships for physicallyapart skeleton joints. Specifically, the scheme is composed of two components: spatial graph router (SGR) and temporal graph router (TGR). The SGR aims to discover the connectivity relationships among the joints based on sub-group clustering along the spatial dimension, while the TGR explores the structural information by measuring the correlation degrees between temporal joint node trajectories. The proposed scheme is naturally and seamlessly incorporated into the framework of graph convolutional networks (GCNs) to produce a set of skeleton-joint-connectivity graphs, which are further fed into the classification networks. Moreover, an insightful analysis on receptive field of graph node is provided to explain the necessity of our method. Experimental results on two benchmark datasets (NTU-RGB+D and Kinetics) demonstrate the effectiveness against the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:8569-8576
【Authors】: Bo Li ; Zhengxing Sun ; Yuqi Guo
【Abstract】: Image saliency detection has recently witnessed rapid progress due to deep neural networks. However, there still exist many important problems in the existing deep learning based methods. Pixel-wise convolutional neural network (CNN) methods suffer from blurry boundaries due to the convolutional and pooling operations. While region-based deep learning methods lack spatial consistency since they deal with each region independently. In this paper, we propose a novel salient object detection framework using a superpixelwise variational autoencoder (SuperVAE) network. We first use VAE to model the image background and then separate salient objects from the background through the reconstruction residuals. To better capture semantic and spatial contexts information, we also propose a perceptual loss to take advantage from deep pre-trained CNNs to train our SuperVAE network. Without the supervision of mask-level annotated data, our method generates high quality saliency results which can better preserve object boundaries and maintain the spatial consistency. Extensive experiments on five wildly-used benchmark datasets show that the proposed method achieves superior or competitive performance compared to other algorithms including the very recent state-of-the-art supervised methods.
【Keywords】:
【Paper Link】 【Pages】:8577-8584
【Authors】: Buyu Li ; Yu Liu ; Xiaogang Wang
【Abstract】: Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient. Further, we propose a novel gradient harmonizing mechanism (GHM) to be a hedging for the disharmonies. The philosophy behind GHM can be easily embedded into both classification loss function like cross-entropy (CE) and regression loss function like smooth-L1 (SL1) loss. To this end, two novel loss functions called GHM-C and GHM-R are designed to balancing the gradient flow for anchor classification and bounding box refinement, respectively. Ablation study on MS COCO demonstrates that without laborious hyper-parameter tuning, both GHM-C and GHM-R can bring substantial improvement for single-stage detector. Without any whistles and bells, the proposed model achieves 41.6 mAP on COCO testdev set which surpass the state-of-the-art method, Focal Loss (FL) + SL1, by 0.8. The code1 is released to facilitate future research.
【Keywords】:
【Paper Link】 【Pages】:8585-8593
【Authors】: Chenyang Li ; Xin Zhang ; Lufan Liao ; Lianwen Jin ; Weixin Yang
【Abstract】: The skeleton based gesture recognition is gaining more popularity due to its wide possible applications. The key issues are how to extract discriminative features and how to design the classification model. In this paper, we first leverage a robust feature descriptor, path signature (PS), and propose three PS features to explicitly represent the spatial and temporal motion characteristics, i.e., spatial PS (S PS), temporal PS (T PS) and temporal spatial PS (T S PS). Considering the significance of fine hand movements in the gesture, we propose an ”attention on hand” (AOH) principle to define joint pairs for the S PS and select single joint for the T PS. In addition, the dyadic method is employed to extract the T PS and T S PS features that encode global and local temporal dynamics in the motion. Secondly, without the recurrent strategy, the classification model still faces challenges on temporal variation among different sequences. We propose a new temporal transformer module (TTM) that can match the sequence key frames by learning the temporal shifting parameter for each input. This is a learning-based module that can be included into standard neural network architecture. Finally, we design a multi-stream fully connected layer based network to treat spatial and temporal features separately and fused them together for the final result. We have tested our method on three benchmark gesture datasets, i.e., ChaLearn 2016, ChaLearn 2013 and MSRC-12. Experimental results demonstrate that we achieve the state-of-the-art performance on skeleton-based gesture recognition with high computational efficiency.
【Keywords】:
【Paper Link】 【Pages】:8594-8601
【Authors】: Guanbin Li ; Xin Zhu ; Yirui Zeng ; Qing Wang ; Liang Lin
【Abstract】: Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.
【Keywords】:
【Paper Link】 【Pages】:8602-8609
【Authors】: Haoliang Li ; Sinno Jialin Pan ; Renjie Wan ; Alex C. Kot
【Abstract】: Heterogeneous Transfer Learning (HTL) aims to solve transfer learning problems where a source domain and a target domain are of heterogeneous types of features. Most existing HTL approaches either explicitly learn feature mappings between the heterogeneous domains or implicitly reconstruct heterogeneous cross-domain features based on matrix completion techniques. In this paper, we propose a new HTL method based on a deep matrix completion framework, where kernel embedding of distributions is trained in an adversarial manner for learning heterogeneous features across domains. We conduct extensive experiments on two different vision tasks to demonstrate the effectiveness of our proposed method compared with a number of baseline methods.
【Keywords】:
【Paper Link】 【Pages】:8610-8617
【Authors】: Hui Li ; Peng Wang ; Chunhua Shen ; Guyu Zhang
【Abstract】: Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using offthe-shelf neural network components and only word-level annotations. It is composed of a 31-layer ResNet, an LSTMbased encoder-decoder framework and a 2-dimensional attention module. Despite its simplicity, the proposed method is robust. It achieves state-of-the-art performance on irregular text recognition benchmarks and comparable results on regular text datasets. The code will be released.
【Keywords】:
【Paper Link】 【Pages】:8618-8625
【Authors】: Jianing Li ; Shiliang Zhang ; Tiejun Huang
【Abstract】: This paper proposes a two-stream convolution network to extract spatial and temporal cues for video based person ReIdentification (ReID). A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network. The resulting M3D convolution network introduces a fraction of parameters into the 2D CNN, but gains the ability of multi-scale temporal feature learning. With this compact architecture, M3D convolution network is also more efficient and easier to optimize than existing 3D convolution networks. The temporal stream further involves Residual Attention Layers (RAL) to refine the temporal features. By jointly learning spatial-temporal attention masks in a residual manner, RAL identifies the discriminative spatial regions and temporal cues. The other stream in our network is implemented with a 2D CNN for spatial feature extraction. The spatial and temporal features from two streams are finally fused for the video based person ReID. Evaluations on three widely used benchmarks datasets, i.e.,MARS, PRID2011, and iLIDS-VID demonstrate the substantial advantages of our method over existing 3D convolution networks and state-of-art methods.
【Keywords】:
【Paper Link】 【Pages】:8626-8633
【Authors】: Nannan Li ; Zhenzhong Chen ; Shan Liu
【Abstract】: Reinforcement learning (RL) has shown its advantages in image captioning by optimizing the non-differentiable metric directly in the reward learning process. However, due to the reward hacking problem in RL, maximizing reward may not lead to better quality of the caption, especially from the aspects of propositional content and distinctiveness. In this work, we propose to use a new learning method, meta learning, to utilize supervision from the ground truth whilst optimizing the reward function in RL. To improve the propositional content and the distinctiveness of the generated captions, the proposed model provides the global optimal solution by taking different gradient steps towards the supervision task and the reinforcement task, simultaneously. Experimental results on MS COCO validate the effectiveness of our approach when compared with the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8634-8641
【Authors】: Qiaozhe Li ; Xin Zhao ; Ran He ; Kaiqi Huang
【Abstract】: Pedestrian attribute recognition in surveillance is a challenging task due to poor image quality, significant appearance variations and diverse spatial distribution of different attributes. This paper treats pedestrian attribute recognition as a sequential attribute prediction problem and proposes a novel visual-semantic graph reasoning framework to address this problem. Our framework contains a spatial graph and a directed semantic graph. By performing reasoning using the Graph Convolutional Network (GCN), one graph captures spatial relations between regions and the other learns potential semantic relations between attributes. An end-to-end architecture is presented to perform mutual embedding between these two graphs to guide the relational learning for each other. We verify the proposed framework on three large scale pedestrian attribute datasets including PETA, RAP, and PA100k. Experiments show superiority of the proposed method over state-of-the-art methods and effectiveness of our joint GCN structures for sequential attribute prediction.
【Keywords】:
【Paper Link】 【Pages】:8642-8649
【Authors】: Wenbin Li ; Jinglin Xu ; Jing Huo ; Lei Wang ; Yang Gao ; Jiebo Luo
【Abstract】: Few-shot learning aims to recognize new concepts from very few examples. However, most of the existing few-shot learning methods mainly concentrate on the first-order statistic of concept representation or a fixed metric on the relation between a sample and a concept. In this work, we propose a novel end-to-end deep architecture, named Covariance Metric Networks (CovaMNet). The CovaMNet is designed to exploit both the covariance representation and covariance metric based on the distribution consistency for the few-shot classification tasks. Specifically, we construct an embedded local covariance representation to extract the second-order statistic information of each concept and describe the underlying distribution of this concept. Upon the covariance representation, we further define a new deep covariance metric to measure the consistency of distributions between query samples and new concepts. Furthermore, we employ the episodic training mechanism to train the entire network in an end-to-end manner from scratch. Extensive experiments in two tasks, generic few-shot image classification and fine-grained fewshot image classification, demonstrate the superiority of the proposed CovaMNet. The source code can be available from https://github.com/WenbinLee/CovaMNet.git.
【Keywords】:
【Paper Link】 【Pages】:8650-8657
【Authors】: Xiangyang Li ; Shuqiang Jiang ; Jungong Han
【Abstract】: Dense captioning is a challenging task which not only detects visual elements in images but also generates natural language sentences to describe them. Previous approaches do not leverage object information in images for this task. However, objects provide valuable cues to help predict the locations of caption regions as caption regions often highly overlap with objects (i.e. caption regions are usually parts of objects or combinations of them). Meanwhile, objects also provide important information for describing a target caption region as the corresponding description not only depicts its properties, but also involves its interactions with objects in the image. In this work, we propose a novel scheme with an object context encoding Long Short-Term Memory (LSTM) network to automatically learn complementary object context for each caption region, transferring knowledge from objects to caption regions. All contextual objects are arranged as a sequence and progressively fed into the context encoding module to obtain context features. Then both the learned object context features and region features are used to predict the bounding box offsets and generate the descriptions. The context learning procedure is in conjunction with the optimization of both location prediction and caption generation, thus enabling the object context encoding LSTM to capture and aggregate useful object context. Experiments on benchmark datasets demonstrate the superiority of our proposed approach over the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8658-8665
【Authors】: Xiangpeng Li ; Jingkuan Song ; Lianli Gao ; Xianglong Liu ; Wenbing Huang ; Xiangnan He ; Chuang Gan
【Abstract】: Most of the recent progresses on visual question answering are based on recurrent neural networks (RNNs) with attention. Despite the success, these models are often timeconsuming and having difficulties in modeling long range dependencies due to the sequential nature of RNNs. We propose a new architecture, Positional Self-Attention with Coattention (PSAC), which does not require RNNs for video question answering. Specifically, inspired by the success of self-attention in machine translation task, we propose a Positional Self-Attention to calculate the response at each position by attending to all positions within the same sequence, and then add representations of absolute positions. Therefore, PSAC can exploit the global dependencies of question and temporal information in the video, and make the process of question and video encoding executed in parallel. Furthermore, in addition to attending to the video features relevant to the given questions (i.e., video attention), we utilize the co-attention mechanism by simultaneously modeling “what words to listen to” (question attention). To the best of our knowledge, this is the first work of replacing RNNs with selfattention for the task of visual question answering. Experimental results of four tasks on the benchmark dataset show that our model significantly outperforms the state-of-the-art on three tasks and attains comparable result on the Count task. Our model requires less computation time and achieves better performance compared with the RNNs-based methods. Additional ablation study demonstrates the effect of each component of our proposed model.
【Keywords】:
【Paper Link】 【Pages】:8666-8673
【Authors】: Yang Li ; Jianke Zhu ; Steven C. H. Hoi ; Wenjie Song ; Zhefeng Wang ; Hantang Liu
【Abstract】: Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few of them is capable of recovering the underlying similarity transformation. To tackle this challenging problem, in this paper, we propose a new correlation filter-based tracker with a novel robust estimation of similarity transformation on the large displacements. In order to efficiently search in such a large 4-DoF space in real-time, we formulate the problem into two 2-DoF sub-problems and apply an efficient Block Coordinates Descent solver to optimize the estimation result. Specifically, we employ an efficient phase correlation scheme to deal with both scale and rotation changes simultaneously in log-polar coordinates. Moreover, a variant of correlation filter is used to predict the translational motion individually. Our experimental results demonstrate that the proposed tracker achieves very promising prediction performance compared with the state-of-the-art visual object tracking methods while still retaining the advantages of high efficiency and simplicity in conventional correlation filter-based tracking methods.
【Keywords】:
【Paper Link】 【Pages】:8674-8681
【Authors】: Yanghao Li ; Sijie Song ; Yuqi Li ; Jiaying Liu
【Abstract】: Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared with some existing temporal methods which are limited in linear transformations, our TB model considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling. We further leverage the factorized bilinear model in linear complexity and a bottleneck network design to build our TB blocks, which also constrains the parameters and computation cost. We consider two schemes in terms of the incorporation of TB blocks and the original 2D spatial convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8682-8689
【Authors】: Zhaoqun Li ; Cheng Xu ; Biao Leng
【Abstract】: How to obtain the desirable representation of a 3D shape, which is discriminative across categories and polymerized within classes, is a significant challenge in 3D shape retrieval. Most existing 3D shape retrieval methods focus on capturing strong discriminative shape representation with softmax loss for the classification task, while the shape feature learning with metric loss is neglected for 3D shape retrieval. In this paper, we address this problem based on the intuition that the cosine distance of shape embeddings should be close enough within the same class and far away across categories. Since most of 3D shape retrieval tasks use cosine distance of shape features for measuring shape similarity, we propose a novel metric loss named angular triplet-center loss, which directly optimizes the cosine distances between the features. It inherits the triplet-center loss property to achieve larger inter-class distance and smaller intra-class distance simultaneously. Unlike previous metric loss utilized in 3D shape retrieval methods, where Euclidean distance is adopted and the margin design is difficult, the proposed method is more convenient to train feature embeddings and more suitable for 3D shape retrieval. Moreover, the angle margin is adopted to replace the cosine margin in order to provide more explicit discriminative constraints on an embedding space. Extensive experimental results on two popular 3D object retrieval benchmarks, ModelNet40 and ShapeNetCore 55, demonstrate the effectiveness of our proposed loss, and our method has achieved state-ofthe-art results on various 3D shape datasets.
【Keywords】:
【Paper Link】 【Pages】:8690-8697
【Authors】: Zhihui Li ; Lina Yao ; Xiaoqin Zhang ; Xianzhi Wang ; Salil Kanhere ; Huaxiang Zhang
【Abstract】: Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.
【Keywords】:
【Paper Link】 【Pages】:8698-8705
【Authors】: Dong Liang ; Rui Wang ; Xiaowei Tian ; Cong Zou
【Abstract】: Human image generation is a very challenging task since it is affected by many factors. Many human image generation methods focus on generating human images conditioned on a given pose, while the generated backgrounds are often blurred. In this paper, we propose a novel Partition-Controlled GAN to generate human images according to target pose and background. Firstly, human poses in the given images are extracted, and foreground/background are partitioned for further use. Secondly, we extract and fuse appearance features, pose features and background features to generate the desired images. Experiments on Market-1501 and DeepFashion datasets show that our model not only generates realistic human images but also produce the human pose and background as we want. Extensive experiments on COCO and LIP datasets indicate the potential of our method.
【Keywords】:
【Paper Link】 【Pages】:8706-8713
【Authors】: Mingyang Liang ; Xiaoyang Guo ; Hongsheng Li ; Xiaogang Wang ; You Song
【Abstract】: Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any depth or disparity supervision. The estimated depth provides additional information complementary to original images, which can be helpful for other vision tasks such as tracking, recognition and detection. However, there are large appearance variations between images from different spectral bands, which is a challenge for cross-spectral stereo matching. Existing deep unsupervised stereo matching methods are sensitive to the appearance variations and do not perform well on cross-spectral data. We propose a novel unsupervised crossspectral stereo matching framework based on image-to-image translation. First, a style adaptation network transforms images across different spectral bands by cycle consistency and adversarial learning, during which appearance variations are minimized. Then, a stereo matching network is trained with image pairs from the same spectra using view reconstruction loss. At last, the estimated disparity is utilized to supervise the spectral translation network in an end-to-end way. Moreover, a novel style adaptation network F-cycleGAN is proposed to improve the robustness of spectral translation. Our method can tackle appearance variations and enhance the robustness of unsupervised cross-spectral stereo matching. Experimental results show that our method achieves good performance without using depth supervision or explicit semantic information.
【Keywords】:
【Paper Link】 【Pages】:8714-8721
【Authors】: Minghui Liao ; Jian Zhang ; Zhaoyi Wan ; Fengming Xie ; Jiajun Liang ; Pengyuan Lyu ; Cong Yao ; Xiang Bai
【Abstract】: Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images are actually distributed in two-dimensional space. It is a nature quite different from that of speech, which is essentially a one-dimensional signal. In principle, directly compressing features of text into a one-dimensional form may lose useful information and introduce extra noise. In this paper, we approach scene text recognition from a two-dimensional perspective. A simple yet effective model, called Character Attention Fully Convolutional Network (CA-FCN), is devised for recognizing the text of arbitrary shapes. Scene text recognition is realized with a semantic segmentation network, where an attention mechanism for characters is adopted. Combined with a word formation module, CA-FCN can simultaneously recognize the script and predict the position of each character. Experiments demonstrate that the proposed algorithm outperforms previous methods on both regular and irregular text datasets. Moreover, it is proven to be more robust to imprecise localizations in the text detection phase, which are very common in practice.
【Keywords】:
【Paper Link】 【Pages】:8722-8729
【Authors】: Mingbao Lin ; Rongrong Ji ; Hong Liu ; Xiaoshuai Sun ; Yongjian Wu ; Yunsheng Wu
【Abstract】: When facing large-scale image datasets, online hashing serves as a promising solution for online retrieval and prediction tasks. It encodes the online streaming data into compact binary codes, and simultaneously updates the hash functions to renew codes of the existing dataset. To this end, the existing methods update hash functions solely based on the new data batch, without investigating the correlation between such new data and the existing dataset. In addition, existing works update the hash functions using a relaxation process in its corresponding approximated continuous space. And it remains as an open problem to directly apply discrete optimizations in online hashing. In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework. BSODH employs a well-designed hashing algorithm to preserve the similarity between the streaming data and the existing dataset via an asymmetric graph regularization. We further identify the “data-imbalance” problem brought by the constructed asymmetric graph, which restricts the application of discrete optimization in our problem. Therefore, a novel balanced similarity is further proposed, which uses two equilibrium factors to balance the similar and dissimilar weights and eventually enables the usage of discrete optimizations. Extensive experiments conducted on three widely-used benchmarks demonstrate the advantages of the proposed method over the stateof-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8730-8737
【Authors】: Shuyuan Lin ; Guobao Xiao ; Yan Yan ; David Suter ; Hanzi Wang
【Abstract】: Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.
【Keywords】:
【Paper Link】 【Pages】:8738-8745
【Authors】: Yutian Lin ; Xuanyi Dong ; Liang Zheng ; Yan Yan ; Yi Yang
【Abstract】: Most person re-identification (re-ID) approaches are based on supervised learning, which requires intensive manual annotation for training data. However, it is not only resourceintensive to acquire identity annotation but also impractical to label the large-scale real-world data. To relieve this problem, we propose a bottom-up clustering (BUC) approach to jointly optimize a convolutional neural network (CNN) and the relationship among the individual samples. Our algorithm considers two fundamental facts in the re-ID task, i.e., diversity across different identities and similarity within the same identity. Specifically, our algorithm starts with regarding individual sample as a different identity, which maximizes the diversity over each identity. Then it gradually groups similar samples into one identity, which increases the similarity within each identity. We utilizes a diversity regularization term in the bottom-up clustering procedure to balance the data volume of each cluster. Finally, the model achieves an effective trade-off between the diversity and similarity. We conduct extensive experiments on the large-scale image and video re-ID datasets, including Market-1501, DukeMTMCreID, MARS and DukeMTMC-VideoReID. The experimental results demonstrate that our algorithm is not only superior to state-of-the-art unsupervised re-ID approaches, but also performs favorably than competing transfer learning and semi-supervised learning methods.
【Keywords】:
【Paper Link】 【Pages】:8746-8753
【Authors】: Hong Liu ; Jie Li ; Yongjian Wu ; Rongrong Ji
【Abstract】: Symmetric positive defined (SPD) matrix has attracted increasing research focus in image/video analysis, which merits in capturing the Riemannian geometry in its structured 2D feature representation. However, computation in the vector space on SPD matrices cannot capture the geometric properties, which corrupts the classification performance. To this end, Riemannian based deep network has become a promising solution for SPD matrix classification, because of its excellence in performing non-linear learning over SPD matrix. Besides, Riemannian metric learning typically adopts a kNN classifier that cannot be extended to large-scale datasets, which limits its application in many time-efficient scenarios. In this paper, we propose a Bag-of-Matrix-Summarization (BoMS) method to be combined with Riemannian network, which handles the above issues towards highly efficient and scalable SPD feature representation. Our key innovation lies in the idea of summarizing data in a Riemannian geometric space instead of the vector space. First, the whole training set is compressed with a small number of matrix features to ensure high scalability. Second, given such a compressed set, a constant-length vector representation is extracted by efficiently measuring the distribution variations between the summarized data and the latent feature of the Riemannian network. Finally, the proposed BoMS descriptor is integrated into the Riemannian network, upon which the whole framework is end-to-end trained via matrix back-propagation. Experiments on four different classification tasks demonstrate the superior performance of the proposed method over the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8754-8761
【Abstract】: Recently, learning to hash has been widely studied for image retrieval thanks to the computation and storage efficiency of binary codes. For most existing learning to hash methods, sufficient training images are required and used to learn precise hashing codes. However, in some real-world applications, there are not always sufficient training images in the domain of interest. In addition, some existing supervised approaches need a amount of labeled data, which is an expensive process in terms of time, labor and human expertise. To handle such problems, inspired by transfer learning, we propose a simple yet effective unsupervised hashing method named Optimal Projection Guided Transfer Hashing (GTH) where we borrow the images of other different but related domain i.e., source domain to help learn precise hashing codes for the domain of interest i.e., target domain. Besides, we propose to seek for the maximum likelihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap. Furthermore, an alternating optimization method is adopted to obtain the two projections of target and source domains such that the domain hashing disparity is reduced gradually. Extensive experiments on various benchmark databases verify that our method outperforms many state-of-the-art learning to hash methods. The implementation details are available at https://github.com/liuji93/GTH.
【Keywords】:
【Paper Link】 【Pages】:8762-8769
【Authors】: Mengyuan Liu ; Fanyang Meng ; Chen Chen ; Songtao Wu
【Abstract】: Human action recognition aims to classify a given video according to which type of action it contains. Disturbance brought by clutter background and unrelated motions makes the task challenging for video frame-based methods. To solve this problem, this paper takes advantage of pose estimation to enhance the performances of video frame features. First, we present a pose feature called dynamic pose image (DPI), which describes human action as the aggregation of a sequence of joint estimation maps. Different from traditional pose features using sole joints, DPI suffers less from disturbance and provides richer information about human body shape and movements. Second, we present attention-based dynamic texture images (att-DTIs) as pose-guided video frame feature. Specifically, a video is treated as a space-time volume, and DTIs are obtained by observing the volume from different views. To alleviate the effect of disturbance on DTIs, we accumulate joint estimation maps as attention map, and extend DTIs to attention-based DTIs (att-DTIs). Finally, we fuse DPI and att-DTIs with multi-stream deep neural networks and late fusion scheme for action recognition. Experiments on NTU RGB+D, UTD-MHAD, and Penn-Action datasets show the effectiveness of DPI and att-DTIs, as well as the complementary property between them.
【Keywords】:
【Paper Link】 【Pages】:8770-8777
【Authors】: Pengpeng Liu ; Irwin King ; Michael R. Lyu ; Jia Xu
【Abstract】: We present DDFlow, a data distillation approach to learning optical flow estimation from unlabeled data. The approach distills reliable predictions from a teacher network, and uses these predictions as annotations to guide a student network to learn optical flow. Unlike existing work relying on handcrafted energy terms to handle occlusion, our approach is data-driven, and learns optical flow for occluded pixels. This enables us to train our model with a much simpler loss function, and achieve a much higher accuracy. We conduct a rigorous evaluation on the challenging Flying Chairs, MPI Sintel, KITTI 2012 and 2015 benchmarks, and show that our approach significantly outperforms all existing unsupervised learning methods, while running at real time.
【Keywords】:
【Paper Link】 【Pages】:8778-8785
【Authors】: Xinhai Liu ; Zhizhong Han ; Yu-Shen Liu ; Matthias Zwicker
【Abstract】: Exploring contextual information in the local region is important for shape understanding and analysis. Existing studies often employ hand-crafted or explicit ways to encode contextual information of local regions. However, it is hard to capture fine-grained contextual information in hand-crafted or explicit manners, such as the correlation between different areas in a local region, which limits the discriminative ability of learned features. To resolve this issue, we propose a novel deep learning model for 3D point clouds, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way. Point2Sequence employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention. Specifically, Point2Sequence first learns the feature of each area scale in a local region. Then, it captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure, where an attention mechanism is proposed to highlight the importance of different area scales. Experimental results show that Point2Sequence achieves state-of-the-art performance in shape classification and segmentation tasks.
【Keywords】:
【Paper Link】 【Pages】:8786-8793
【Authors】: Yiheng Liu ; Zhenxun Yuan ; Wengang Zhou ; Houqiang Li
【Abstract】: Video-based person re-identification is a crucial task of matching video sequences of a person across multiple camera views. Generally, features directly extracted from a single frame suffer from occlusion, blur, illumination and posture changes. This leads to false activation or missing activation in some regions, which corrupts the appearance and motion representation. How to explore the abundant spatial-temporal information in video sequences is the key to solve this problem. To this end, we propose a Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses noisy parts of the current frame’s features by referring historical frames. With RRU, the quality of each frame’s appearance representation is improved. Then we use the Spatial-Temporal clues Integration Module (STIM) to mine the spatial-temporal information from those upgraded features. Meanwhile, the multilevel training objective is used to enhance the capability of RRU and STIM. Through the cooperation of those modules, the spatial and temporal features mutually promote each other and the final spatial-temporal feature representation is more discriminative and robust. Extensive experiments are conducted on three challenging datasets, i.e., iLIDS-VID, PRID-2011 and MARS. The experimental results demonstrate that our approach outperforms existing state-of-the-art methods of video-based person re-identification on iLIDS-VID and MARS and achieves favorable results on PRID-2011.
【Keywords】:
【Paper Link】 【Pages】:8794-8802
【Authors】: Yu-Lun Liu ; Yi-Tung Liao ; Yen-Yu Lin ; Yung-Yu Chuang
【Abstract】: Video frame interpolation algorithms predict intermediate frames to produce videos with higher frame rates and smooth view transitions given two consecutive frames as inputs. We propose that: synthesized frames are more reliable if they can be used to reconstruct the input frames with high quality. Based on this idea, we introduce a new loss term, the cycle consistency loss. The cycle consistency loss can better utilize the training data to not only enhance the interpolation results, but also maintain the performance better with less training data. It can be integrated into any frame interpolation network and trained in an end-to-end manner. In addition to the cycle consistency loss, we propose two extensions: motion linearity loss and edge-guided training. The motion linearity loss approximates the motion between two input frames to be linear and regularizes the training. By applying edge-guided training, we further improve results by integrating edge information into training. Both qualitative and quantitative experiments demonstrate that our model outperforms the state-of-the-art methods. The source codes of the proposed method and more experimental results will be available at https://github.com/alex04072000/CyclicGen.
【Keywords】:
【Paper Link】 【Pages】:8803-8810
【Authors】: Hao Luo ; Wenxuan Xie ; Xinggang Wang ; Wenjun Zeng
【Abstract】: State-of-the-art object detectors and trackers are developing fast. Trackers are in general more efficient than detectors but bear the risk of drifting. A question is hence raised – how to improve the accuracy of video object detection/tracking by utilizing the existing detectors and trackers within a given time budget? A baseline is frame skipping – detecting every N-th frames and tracking for the frames in between. This baseline, however, is suboptimal since the detection frequency should depend on the tracking quality. To this end, we propose a scheduler network, which determines to detect or track at a certain frame, as a generalization of Siamese trackers. Although being light-weight and simple in structure, the scheduler network is more effective than the frame skipping baselines and flow-based approaches, as validated on ImageNet VID dataset in video object detection/tracking.
【Keywords】:
【Paper Link】 【Pages】:8811-8818
【Authors】: Zhixiong Nan ; Yang Liu ; Nanning Zheng ; Song-Chun Zhu
【Abstract】: In this paper, we are studying the problem of recognizing attribute-object pairs that do not appear in the training dataset, which is called unseen attribute-object pair recognition. Existing methods mainly learn a discriminative classifier or compose multiple classifiers to tackle this problem, which exhibit poor performance for unseen pairs. The key reasons for this failure are 1) they have not learned an intrinsic attributeobject representation, and 2) the attribute and object are processed either separately or equally so that the inner relation between the attribute and object has not been explored. To explore the inner relation of attribute and object as well as the intrinsic attribute-object representation, we propose a generative model with the encoder-decoder mechanism that bridges visual and linguistic information in a unified end-to-end network. The encoder-decoder mechanism presents the impressive potential to find an intrinsic attribute-object feature representation. In addition, combining visual and linguistic features in a unified model allows to mine the relation of attribute and object. We conducted extensive experiments to compare our method with several state-of-the-art methods on two challenging datasets. The results show that our method outperforms all other methods.
【Keywords】:
【Paper Link】 【Pages】:8819-8826
【Authors】: Navaneet K. L. ; Priyanka Mandikal ; Mayank Agarwal ; R. Venkatesh Babu
【Abstract】: Knowledge of 3D properties of objects is a necessity in order to build effective computer vision systems. However, lack of large scale 3D datasets can be a major constraint for datadriven approaches in learning such properties. We consider the task of single image 3D point cloud reconstruction, and aim to utilize multiple foreground masks as our supervisory data to alleviate the need for large scale 3D datasets. A novel differentiable projection module, called ‘CAPNet’, is introduced to obtain such 2D masks from a predicted 3D point cloud. The key idea is to model the projections as a continuous approximation of the points in the point cloud. To overcome the challenges of sparse projection maps, we propose a loss formulation termed ‘affinity loss’ to generate outlierfree reconstructions. We significantly outperform the existing projection based approaches on a large-scale synthetic dataset. We show the utility and generalizability of such a 2D supervised approach through experiments on a real-world dataset, where lack of 3D data can be a serious concern. To further enhance the reconstructions, we also propose a test stage optimization procedure to obtain reconstructions that display high correspondence with the observed input image.
【Keywords】:
【Paper Link】 【Pages】:8827-8834
【Authors】: Guozhu Peng ; Shangfei Wang
【Abstract】: Current works on facial action unit (AU) recognition typically require fully AU-labeled training samples. To reduce the reliance on time-consuming manual AU annotations, we propose a novel semi-supervised AU recognition method leveraging two kinds of readily available auxiliary information. The method leverages the dependencies between AUs and expressions as well as the dependencies among AUs, which are caused by facial anatomy and therefore embedded in all facial images, independent on their AU annotation status. The other auxiliary information is facial image synthesis given AUs, the dual task of AU recognition from facial images, and therefore has intrinsic probabilistic connections with AU recognition, regardless of AU annotations. Specifically, we propose a dual semi-supervised generative adversarial network for AU recognition from partially AU-labeled and fully expressionlabeled facial images. The proposed network consists of an AU classifier C, an image generator G, and a discriminator D. In addition to minimize the supervised losses of the AU classifier and the face generator for labeled training data, we explore the probabilistic duality between the tasks using adversary learning to force the convergence of the face-AU-expression tuples generated from the AU classifier and the face generator, and the ground-truth distribution in labeled data for all training data. This joint distribution also includes the inherent AU dependencies. Furthermore, we reconstruct the facial image using the output of the AU classifier as the input of the face generator, and create AU labels by feeding the output of the face generator to the AU classifier. We minimize reconstruction losses for all training data, thus exploiting the informative feedback provided by the dual tasks. Within-database and cross-database experiments on three benchmark databases demonstrate the superiority of our method in both AU recognition and face synthesis compared to state-of-the-art works.
【Keywords】:
【Paper Link】 【Pages】:8835-8842
【Authors】: Yuankai Qi ; Shengping Zhang ; Weigang Zhang ; Li Su ; Qingming Huang ; Ming-Hsuan Yang
【Abstract】: In recent years, convolutional neural networks (CNNs) have achieved great success in visual tracking. Most of existing methods train or fine-tune a binary classifier to distinguish the target from its background. However, they may suffer from the performance degradation due to insufficient training data. In this paper, we show that attribute information (e.g., illumination changes, occlusion and motion) in the context facilitates training an effective classifier for visual tracking. In particular, we design an attribute-based CNN with multiple branches, where each branch is responsible for classifying the target under a specific attribute. Such a design reduces the appearance diversity of the target under each attribute and thus requires less data to train the model. We combine all attributespecific features via ensemble layers to obtain more discriminative representations for the final target/background classification. The proposed method achieves favorable performance on the OTB100 dataset compared to state-of-the-art tracking methods. After being trained on the VOT datasets, the proposed network also shows a good generalization ability on the UAV-Traffic dataset, which has significantly different attributes and target appearances with the VOT datasets.
【Keywords】:
【Paper Link】 【Pages】:8843-8850
【Authors】: Rui Qian ; Yunchao Wei ; Honghui Shi ; Jiachen Li ; Jiaying Liu ; Thomas S. Huang
【Abstract】: Semantic scene parsing is suffering from the fact that pixellevel annotations are hard to be collected. To tackle this issue, we propose a Point-based Distance Metric Learning (PDML) in this paper. PDML does not require dense annotated masks and only leverages several labeled points that are much easier to obtain to guide the training process. Concretely, we leverage semantic relationship among the annotated points by encouraging the feature representations of the intra- and intercategory points to keep consistent, i.e. points within the same category should have more similar feature representations compared to those from different categories. We formulate such a characteristic into a simple distance metric loss, which collaborates with the point-wise cross-entropy loss to optimize the deep neural networks. Furthermore, to fully exploit the limited annotations, distance metric learning is conducted across different training images instead of simply adopting an image-dependent manner. We conduct extensive experiments on two challenging scene parsing benchmarks of PASCALContext and ADE 20K to validate the effectiveness of our PDML, and competitive mIoU scores are achieved.
【Keywords】:
【Paper Link】 【Pages】:8851-8858
【Authors】: Zengyi Qin ; Jinglu Wang ; Yan Lu
【Abstract】: Localizing objects in the real 3D space, which plays a crucial role in scene understanding, is particularly challenging given only a single RGB image due to the geometric information loss during imagery projection. We propose MonoGRNet for the amodal 3D object localization from a monocular RGB image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension. MonoGRNet is a single, unified network composed of four task-specific subnetworks, responsible for 2D object detection, instance depth estimation (IDE), 3D localization and local corner regression. Unlike the pixel-level depth estimation that needs per-pixel annotations, we propose a novel IDE method that directly predicts the depth of the targeting 3D bounding box’s center using sparse supervision. The 3D localization is further achieved by estimating the position in the horizontal and vertical dimensions. Finally, MonoGRNet is jointly learned by optimizing the locations and poses of the 3D bounding boxes in the global context. We demonstrate that MonoGRNet achieves state-of-the-art performance on challenging datasets.
【Keywords】:
【Paper Link】 【Pages】:8859-8867
【Authors】: Youngmin Ro ; Jongwon Choi ; Dae Ung Jo ; Byeongho Heo ; Jongin Lim ; Jin Young Choi
【Abstract】: In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently finetune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any addons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.
【Keywords】:
【Paper Link】 【Pages】:8868-8875
【Authors】: Deepak Babu Sam ; Neeraj N. Sajjan ; Himanshu Maurya ; R. Venkatesh Babu
【Abstract】: We present an unsupervised learning method for dense crowd count estimation. Marred by large variability in appearance of people and extreme overlap in crowds, enumerating people proves to be a difficult task even for humans. This implies creating large-scale annotated crowd data is expensive and directly takes a toll on the performance of existing CNN based counting models on account of small datasets. Motivated by these challenges, we develop Grid Winner-Take-All (GWTA) autoencoder to learn several layers of useful filters from unlabeled crowd images. Our GWTA approach divides a convolution layer spatially into a grid of cells. Within each cell, only the maximally activated neuron is allowed to update the filter. Almost 99.9% of the parameters of the proposed model are trained without any labeled data while the rest 0.1% are tuned with supervision. The model achieves superior results compared to other unsupervised methods and stays reasonably close to the accuracy of supervised baseline. Furthermore, we present comparisons and analyses regarding the quality of learned features across various models.
【Keywords】:
【Paper Link】 【Pages】:8876-8884
【Authors】: Sanket Shah ; Anand Mishra ; Naganand Yadati ; Partha Pratim Talukdar
【Abstract】: Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has not been addressed in prior research. We address this gap in this paper, and introduce KVQA – the first dataset for the task of (world) knowledge-aware VQA. KVQA consists of 183K question-answer pairs involving more than 18K named entities and 24K images. Questions in this dataset require multi-entity, multi-relation, and multi-hop reasoning over large Knowledge Graphs (KG) to arrive at an answer. To the best of our knowledge, KVQA is the largest dataset for exploring VQA over KG. Further, we also provide baseline performances using state-of-the-art methods on KVQA.
【Keywords】:
【Paper Link】 【Pages】:8885-8892
【Authors】: Lingyun Song ; Jun Liu ; Buyue Qian ; Yihe Chen
【Abstract】: Image captioning and visual language grounding are two important tasks for image understanding, but are seldom considered together. In this paper, we propose a Progressive Attention-Guided Network (PAGNet), which simultaneously generates image captions and predicts bounding boxes for caption words. PAGNet mainly has two distinctive properties: i) It can progressively refine the predictive results of image captioning, by updating the attention map with the predicted bounding boxes. ii) It learns bounding boxes of the words using a weakly supervised strategy, which combines the frameworks of Multiple Instance Learning (MIL) and Markov Decision Process (MDP). By using the attention map generated in the captioning process, PAGNet significantly reduces the search space of the MDP. We conduct experiments on benchmark datasets to demonstrate the effectiveness of PAGNet and results show that PAGNet achieves the best performance.
【Keywords】:
【Paper Link】 【Pages】:8893-8900
【Authors】: Ying Tai ; Yicong Liang ; Xiaoming Liu ; Lei Duan ; Jilin Li ; Chengjie Wang ; Feiyue Huang ; Yu Chen
【Abstract】: In recent years, heatmap regression based models have shown their effectiveness in face alignment and pose estimation. However, Conventional Heatmap Regression (CHR) is not accurate nor stable when dealing with high-resolution facial videos, since it finds the maximum activated location in heatmaps which are generated from rounding coordinates, and thus leads to quantization errors when scaling back to the original high-resolution space. In this paper, we propose a Fractional Heatmap Regression (FHR) for high-resolution video-based face alignment. The proposed FHR can accurately estimate the fractional part according to the 2D Gaussian function by sampling three points in heatmaps. To further stabilize the landmarks among continuous video frames while maintaining the precise at the same time, we propose a novel stabilization loss that contains two terms to address time delay and non-smooth issues, respectively. Experiments on 300W, 300VW and Talking Face datasets clearly demonstrate that the proposed method is more accurate and stable than the state-ofthe-art models.
【Keywords】:
【Paper Link】 【Pages】:8901-8908
【Authors】: Mehmet Ozgur Turkoglu ; William Thong ; Luuk J. Spreeuwers ; Berkay Kicanaoglu
【Abstract】: The visual world we sense, interpret and interact everyday is a complex composition of interleaved physical entities. Therefore, it is a very challenging task to generate vivid scenes of similar complexity using computers. In this work, we present a scene generation framework based on Generative Adversarial Networks (GANs) to sequentially compose a scene, breaking down the underlying problem into smaller ones. Different than the existing approaches, our framework offers an explicit control over the elements of a scene through separate background and foreground generators. Starting with an initially generated background, foreground objects then populate the scene one-by-one in a sequential manner. Via quantitative and qualitative experiments on a subset of the MS-COCO dataset, we show that our proposed framework produces not only more diverse images but also copes better with affine transformations and occlusion artifacts of foreground objects than its counterparts.
【Keywords】:
【Paper Link】 【Pages】:8909-8916
【Authors】: Bairui Wang ; Lin Ma ; Wei Zhang ; Wenhao Jiang ; Feng Zhang
【Abstract】: In this paper, we propose a novel model with a hierarchical photo-scene encoder and a reconstructor for the task of album storytelling. The photo-scene encoder contains two subencoders, namely the photo and scene encoders, which are stacked together and behave hierarchically to fully exploit the structure information of the photos within an album. Specifically, the photo encoder generates semantic representation for each photo while exploiting temporal relationships among them. The scene encoder, relying on the obtained photo representations, is responsible for detecting the scene changes and generating scene representations. Subsequently, the decoder dynamically and attentively summarizes the encoded photo and scene representations to generate a sequence of album representations, based on which a story consisting of multiple coherent sentences is generated. In order to fully extract the useful semantic information from an album, a reconstructor is employed to reproduce the summarized album representations based on the hidden states of the decoder. The proposed model can be trained in an end-to-end manner, which results in an improved performance over the state-of-the-arts on the public visual storytelling (VIST) dataset. Ablation studies further demonstrate the effectiveness of the proposed hierarchical photo-scene encoder and reconstructor.
【Keywords】:
【Paper Link】 【Pages】:8917-8924
【Authors】: Chong Wang ; Zheng-Jun Zha ; Dong Liu ; Hongtao Xie
【Abstract】: High-level semantic knowledge in addition to low-level visual cues is essentially crucial for co-saliency detection. This paper proposes a novel end-to-end deep learning approach for robust co-saliency detection by simultaneously learning highlevel group-wise semantic representation as well as deep visual features of a given image group. The inter-image interaction at semantic-level as well as the complementarity between group semantics and visual features are exploited to boost the inferring of co-salient regions. Specifically, the proposed approach consists of a co-category learning branch and a co-saliency detection branch. While the former is proposed to learn group-wise semantic vector using co-category association of an image group as supervision, the latter is to infer precise co-salient maps based on the ensemble of group semantic knowledge and deep visual cues. The group semantic vector is broadcasted to each spatial location of multi-scale visual feature maps and is used as a top-down semantic guidance for boosting the bottom-up inferring of co-saliency. The co-category learning and co-saliency detection branches are jointly optimized in a multi-task learning manner, further improving the robustness of the approach. Moreover, we construct a new large-scale co-saliency dataset COCO-SEG to facilitate research of co-saliency detection. Extensive experimental results on COCO-SEG and a widely used benchmark Cosal2015 have demonstrated the superiority of the proposed approach as compared to the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:8925-8932
【Authors】: Chunyu Wang ; Haibo Qiu ; Alan L. Yuille ; Wenjun Zeng
【Abstract】: Estimating 3D human poses from 2D joint positions is an illposed problem, and is further complicated by the fact that the estimated 2D joints usually have errors to which most of the 3D pose estimators are sensitive. In this work, we present an approach to refine inaccurate 3D pose estimations. The core idea of the approach is to learn a number of bases to obtain tight approximations of the low-dimensional pose manifold where a 3D pose is represented by a convex combination of the bases. The representation requires that globally the refined poses are close to the pose manifold thus avoiding generating illegitimate poses. Second, the designed bases also have the property to guarantee that the distances among the body joints of a pose are within reasonable ranges. Experiments on benchmark datasets show that our approach obtains more legitimate poses over the baselines. In particular, the limb lengths are closer to the ground truth.
【Keywords】:
【Paper Link】 【Pages】:8933-8940
【Authors】: Guangcong Wang ; Jianhuang Lai ; Peigen Huang ; Xiaohua Xie
【Abstract】: Most of current person re-identification (ReID) methods neglect a spatial-temporal constraint. Given a query image, conventional methods compute the feature distances between the query image and all the gallery images and return a similarity ranked table. When the gallery database is very large in practice, these approaches fail to obtain a good performance due to appearance ambiguity across different camera views. In this paper, we propose a novel two-stream spatial-temporal person ReID (st-ReID) framework that mines both visual semantic information and spatial-temporal information. To this end, a joint similarity metric with Logistic Smoothing (LS) is introduced to integrate two kinds of heterogeneous information into a unified framework. To approximate a complex spatial-temporal probability distribution, we develop a fast Histogram-Parzen (HP) method. With the help of the spatial-temporal constraint, the st-ReID model eliminates lots of irrelevant images and thus narrows the gallery database. Without bells and whistles, our st-ReID method achieves rank-1 accuracy of 98.1% on Market-1501 and 94.4% on DukeMTMC-reID, improving from the baselines 91.2% and 83.8%, respectively, outperforming all previous state-of-theart methods by a large margin.
【Keywords】:
【Paper Link】 【Pages】:8941-8948
【Authors】: Hanqing Wang ; Jiaolong Yang ; Wei Liang ; Xin Tong
【Abstract】: 3D object reconstruction is a fundamental task of many robotics and AI problems. With the aid of deep convolutional neural networks (CNNs), 3D object reconstruction has witnessed a significant progress in recent years. However, possibly due to the prohibitively high dimension of the 3D object space, the results from deep CNNs are often prone to missing some shape details. In this paper, we present an approach which aims to preserve more shape details and improve the reconstruction quality. The key idea of our method is to leverage object mask and pose estimation from CNNs to assist the 3D shape learning by constructing a probabilistic singleview visual hull inside of the network. Our method works by first predicting a coarse shape as well as the object pose and silhouette using CNNs, followed by a novel 3D refinement CNN which refines the coarse shapes using the constructed probabilistic visual hulls. Experiment on both synthetic data and real images show that embedding a single-view visual hull for shape refinement can significantly improve the reconstruction quality by recovering more shapes details and improving shape consistency with the input image.
【Keywords】:
【Paper Link】 【Pages】:8949-8956
【Authors】: Jinglu Wang ; Bo Sun ; Yan Lu
【Abstract】: In this paper, we address the problem of reconstructing an object’s surface from a single image using generative networks. First, we represent a 3D surface with an aggregation of dense point clouds from multiple views. Each point cloud is embedded in a regular 2D grid aligned on an image plane of a viewpoint, making the point cloud convolution-favored and ordered so as to fit into deep network architectures. The point clouds can be easily triangulated by exploiting connectivities of the 2D grids to form mesh-based surfaces. Second, we propose an encoder-decoder network that generates such kind of multiple view-dependent point clouds from a single image by regressing their 3D coordinates and visibilities. We also introduce a novel geometric loss that is able to interpret discrepancy over 3D surfaces as opposed to 2D projective planes, resorting to the surface discretization on the constructed meshes. We demonstrate that the multi-view point regression network outperforms state-of-the-art methods with a significant improvement on challenging datasets.
【Keywords】:
【Paper Link】 【Pages】:8957-8964
【Authors】: Weixuan Wang ; Zhihong Chen ; Haifeng Hu
【Abstract】: Recently, attention mechanism has been successfully applied in image captioning, but the existing attention methods are only established on low-level spatial features or high-level text features, which limits richness of captions. In this paper, we propose a Hierarchical Attention Network (HAN) that enables attention to be calculated on pyramidal hierarchy of features synchronously. The pyramidal hierarchy consists of features on diverse semantic levels, which allows predicting different words according to different features. On the other hand, due to the different modalities of features, a Multivariate Residual Module (MRM) is proposed to learn the joint representations from features. The MRM is able to model projections and extract relevant relations among different features. Furthermore, we introduce a context gate to balance the contribution of different features. Compared with the existing methods, our approach applies hierarchical features and exploits several multimodal integration strategies, which can significantly improve the performance. The HAN is verified on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy’s test split.
【Keywords】:
【Paper Link】 【Pages】:8965-8972
【Authors】: Xin Wang ; Jiawei Wu ; Da Zhang ; Yu Su ; William Yang Wang
【Abstract】: Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios. Here we introduce a novel task, zeroshot video captioning, that aims at describing out-of-domain videos of unseen activities. Videos of different activities usually require different captioning strategies in many aspects, i.e. word selection, semantic construction, and style expression etc, which poses a great challenge to depict novel activities without paired training data. But meanwhile, similar activities share some of those aspects in common. Therefore, we propose a principled Topic-Aware Mixture of Experts (TAMoE) model for zero-shot video captioning, which learns to compose different experts based on different topic embeddings, implicitly transferring the knowledge learned from seen activities to unseen ones. Besides, we leverage external topic-related text corpus to construct the topic embedding for each activity, which embodies the most relevant semantic vectors within the topic. Empirical results not only validate the effectiveness of our method in utilizing semantic knowledge for video captioning, but also show its strong generalization ability when describing novel activities.
【Keywords】:
【Paper Link】 【Pages】:8973-8980
【Authors】: Xingxing Wei ; Jun Zhu ; Sha Yuan ; Hang Su
【Abstract】: Although adversarial samples of deep neural networks (DNNs) have been intensively studied on static images, their extensions in videos are never explored. Compared with images, attacking a video needs to consider not only spatial cues but also temporal cues. Moreover, to improve the imperceptibility as well as reduce the computation cost, perturbations should be added on as few frames as possible, i.e., adversarial perturbations are temporally sparse. This further motivates the propagation of perturbations, which denotes that perturbations added on the current frame can transfer to the next frames via their temporal interactions. Thus, no (or few) extra perturbations are needed for these frames to misclassify them. To this end, we propose the first white-box video attack method, which utilizes an l2,1-norm based optimization algorithm to compute the sparse adversarial perturbations for videos. We choose the action recognition as the targeted task, and networks with a CNN+RNN architecture as threat models to verify our method. Thanks to the propagation, we can compute perturbations on a shortened version video, and then adapt them to the long version video to fool DNNs. Experimental results on the UCF101 dataset demonstrate that even only one frame in a video is perturbed, the fooling rate can still reach 59.7%.
【Keywords】:
【Paper Link】 【Pages】:8981-8988
【Authors】: Longyin Wen ; Dawei Du ; Shengkun Li ; Xiao Bian ; Siwei Lyu
【Abstract】: The majority of Multi-Object Tracking (MOT) algorithms based on the tracking-by-detection scheme do not use higher order dependencies among objects or tracklets, which makes them less effective in handling complex scenarios. In this work, we present a new near-online MOT algorithm based on non-uniform hypergraph, which can model different degrees of dependencies among tracklets in a unified objective. The nodes in the hypergraph correspond to the tracklets and the hyperedges with different degrees encode various kinds of dependencies among them. Specifically, instead of setting the weights of hyperedges with different degrees empirically, they are learned automatically using the structural support vector machine algorithm (SSVM). Several experiments are carried out on various challenging datasets (i.e., PETS09, ParkingLot sequence, SubwayFace, and MOT16 benchmark), to demonstrate that our method achieves favorable performance against the state-of-the-art MOT methods.
【Keywords】:
【Paper Link】 【Pages】:8989-8996
【Authors】: Yu-Hui Wen ; Lin Gao ; Hongbo Fu ; Fang-Lue Zhang ; Shihong Xia
【Abstract】: Hierarchical structure and different semantic roles of joints in human skeleton convey important information for action recognition. Conventional graph convolution methods for modeling skeleton structure consider only physically connected neighbors of each joint, and the joints of the same type, thus failing to capture highorder information. In this work, we propose a novel model with motif-based graph convolution to encode hierarchical spatial structure, and a variable temporal dense block to exploit local temporal information over different ranges of human skeleton sequences. Moreover, we employ a non-local block to capture global dependencies of temporal domain in an attention mechanism. Our model achieves improvements over the stateof-the-art methods on two large-scale datasets.
【Keywords】:
【Paper Link】 【Pages】:8997-9004
【Authors】: Chenfei Wu ; Jinlai Liu ; Xiaojie Wang ; Ruifan Li
【Abstract】: The task of Visual Question Answering (VQA) has emerged in recent years for its potential applications. To address the VQA task, the model should fuse feature elements from both images and questions efficiently. Existing models fuse image feature element vi and question feature element qi directly, such as an element product viqi. Those solutions largely ignore the following two key points: 1) Whether vi and qi are in the same space. 2) How to reduce the observation noises in vi and qi. We argue that two differences between those two feature elements themselves, like (vi − vj) and (qi −qj), are more probably in the same space. And the difference operation would be beneficial to reduce observation noise. To achieve this, we first propose Differential Networks (DN), a novel plug-and-play module which enables differences between pair-wise feature elements. With the tool of DN, we then propose DN based Fusion (DF), a novel model for VQA task. We achieve state-of-the-art results on four publicly available datasets. Ablation studies also show the effectiveness of difference operations in DF model.
【Keywords】:
【Paper Link】 【Pages】:9005-9012
【Authors】: Xiang Wu ; Huaibo Huang ; Vishal M. Patel ; Ran He ; Zhenan Sun
【Abstract】: Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms. Existing approaches attempt to tackle this problem by either synthesizing visible faces from NIR faces, extracting domain-invariant features from these modalities, or projecting heterogeneous data onto a common latent space for cross-modal matching. In this paper, we take a different approach in which we make use of the Disentangled Variational Representation (DVR) for crossmodal matching. First, we model a face representation with an intrinsic identity information and its within-person variations. By exploring the disentangled latent variable space, a variational lower bound is employed to optimize the approximate posterior for NIR and VIS representations. Second, aiming at obtaining more compact and discriminative disentangled latent space, we impose a minimization of the identity information for the same subject and a relaxed correlation alignment constraint between the NIR and VIS modality variations. An alternative optimization scheme is proposed for the disentangled variational representation part and the heterogeneous face recognition network part. The mutual promotion between these two parts effectively reduces the NIR and VIS domain discrepancy and alleviates over-fitting. Extensive experiments on three challenging NIR-VIS heterogeneous face recognition databases demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:9013-9020
【Authors】: Xuanlu Xiang ; Zhipeng Wang ; Zhicheng Zhao ; Fei Su
【Abstract】: In this paper, aiming at two key problems of instance-level image retrieval, i.e., the distinctiveness of image representation and the generalization ability of the model, we propose a novel deep architecture - Multiple Saliency and Channel Sensitivity Network(MSCNet). Specifically, to obtain distinctive global descriptors, an attention-based multiple saliency learning is first presented to highlight important details of the image, and then a simple but effective channel sensitivity module based on Gram matrix is designed to boost the channel discrimination and suppress redundant information. Additionally, in contrast to most existing feature aggregation methods, employing pre-trained deep networks, MSCNet can be trained in two modes: the first one is an unsupervised manner with an instance loss, and another is a supervised manner, which combines classification and ranking loss and only relies on very limited training data. Experimental results on several public benchmark datasets, i.e., Oxford buildings, Paris buildings and Holidays, indicate that the proposed MSCNet outperforms the state-of-the-art unsupervised and supervised methods.
【Keywords】:
【Paper Link】 【Pages】:9021-9029
【Authors】: Xinyu Xiao ; Lingfeng Wang ; Shiming Xiang ; Chunhong Pan
【Abstract】: The image captioning is to describe an image with natural language as human, which has benefited from the advances in deep neural network and achieved substantial progress in performance. However, the perspective of human description to scene has not been fully considered in this task recently. Actually, the human description to scene is tightly related to the endogenous knowledge and the exogenous salient objects simultaneously, which implies that the content in the description is confined to the known salient objects. Inspired by this observation, this paper proposes a novel framework, which explicitly applies the known salient objects in image captioning. Under this framework, the known salient objects are served as the themes to guide the description generation. According to the property of the known salient object, a theme is composed of two components: its endogenous concept (what) and the exogenous spatial attention feature (where). Specifically, the prediction of each word is dominated by the concept and spatial attention feature of the corresponding theme in the process of caption prediction. Moreover, we introduce a novel learning method of Distinctive Learning (DL) to get more specificity of generated captions like human descriptions. It formulates two constraints in the theme learning process to encourage distinctiveness between different images. Particularly, reinforcement learning is introduced into the framework to address the exposure bias problem between the training and the testing modes. Extensive experiments on the COCO and Flickr30K datasets achieve superior results when compared with the state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:9030-9037
【Authors】: De Xie ; Cheng Deng ; Hao Wang ; Chao Li ; Dapeng Tao
【Abstract】: Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatiotemporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow to model temporal information, which are often expensive to compute and store. Second, it has limited ability to capture details and local context information for video data. Third, it lacks explicit semantic guidance that greatly decrease the classification performance. In this paper, we proposed a new two-stream based deep framework for video classification to discover spatial and temporal information only from RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the semantic adversarial learning (SAL) module is introduced and integrated in our framework. The MPA enables the network capturing global and local feature to generate a comprehensive representation for video, and the SAL can make this representation gradually approximate to the real video semantics in an adversarial manner. Experimental results on two public benchmarks demonstrate our proposed methods achieves state-of-the-art results on standard video datasets.
【Keywords】:
【Paper Link】 【Pages】:9038-9045
【Authors】: Enze Xie ; Yuhang Zang ; Shuai Shao ; Gang Yu ; Cong Yao ; Guangyao Li
【Abstract】: Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on
【Keywords】:
【Paper Link】 【Pages】:9046-9053
【Authors】: Lele Xie ; Yuliang Liu ; Lianwen Jin ; Zecheng Xie
【Abstract】: Most current detection methods have adopted anchor boxes as regression references. However, the detection performance is sensitive to the setting of the anchor boxes. A proper setting of anchor boxes may vary significantly across different datasets, which severely limits the universality of the detectors. To improve the adaptivity of the detectors, in this paper, we present a novel dimension-decomposition region proposal network (DeRPN) that can perfectly displace the traditional Region Proposal Network (RPN). DeRPN utilizes an anchor string mechanism to independently match object widths and heights, which is conducive to treating variant object shapes. In addition, a novel scale-sensitive loss is designed to address the imbalanced loss computations of different scaled objects, which can avoid the small objects being overwhelmed by larger ones. Comprehensive experiments conducted on both general object detection datasets (Pascal VOC 2007, 2012 and MS COCO) and scene text detection datasets (ICDAR 2013 and COCO-Text) all prove that our DeRPN can significantly outperform RPN. It is worth mentioning that the proposed DeRPN can be employed directly on different models, tasks, and datasets without any modifications of hyperparameters or specialized optimization, which further demonstrates its adaptivity. The code has been released at https://github.com/HCIILAB/DeRPN.
【Keywords】:
【Paper Link】 【Pages】:9054-9061
【Authors】: Jingwei Xin ; Nannan Wang ; Xinbo Gao ; Jie Li
【Abstract】: Facial prior knowledge based methods recently achieved great success on the task of face image super-resolution (SR). The combination of different type of facial knowledge could be leveraged for better super-resolving face images, e.g., facial attribute information with texture and shape information. In this paper, we present a novel deep end-to-end network for face super resolution, named Residual Attribute Attention Network (RAAN), which realizes the efficient feature fusion of various types of facial information. Specifically, we construct a multi-block cascaded structure network with dense connection. Each block has three branches: Texture Prediction Network (TPN), Shape Generation Network (SGN) and Attribute Analysis Network (AAN). We divide the task of face image reconstruction into three steps: extracting the pixel level representation information from the input very low resolution (LR) image via TPN and SGN, extracting the semantic level representation information by AAN from the input, and finally combining the pixel level and semantic level information to recover the high resolution (HR) image. Experiments on benchmark database illustrate that RAAN significantly outperforms state-of-the-arts for very low-resolution face SR problem, both quantitatively and qualitatively.
【Keywords】:
【Paper Link】 【Pages】:9062-9069
【Authors】: Huijuan Xu ; Kun He ; Bryan A. Plummer ; Leonid Sigal ; Stan Sclaroff ; Kate Saenko
【Abstract】: We address the problem of text-based activity retrieval in video. Given a sentence describing an activity, our task is to retrieve matching clips from an untrimmed video. To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work. First, we inject text features early on when generating clip proposals, to help eliminate unlikely clips and thus speed up processing and boost performance. Second, to learn a fine-grained similarity metric for retrieval, we use visual features to modulate the processing of query sentences at the word level in a recurrent neural network. A multi-task loss is also employed by adding query re-generation as an auxiliary task. Our approach significantly outperforms prior work on two challenging benchmarks: Charades-STA and ActivityNet Captions.
【Keywords】:
【Paper Link】 【Pages】:9070-9078
【Authors】: Yunlu Xu ; Chengwei Zhang ; Zhanzhan Cheng ; Jianwen Xie ; Yi Niu ; Shiliang Pu ; Fei Wu
【Abstract】: This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Experiments on THUMOS’14 and ActivityNet1.3 datasets show that our approach outperforms the state-of-theart weakly-supervised method, and performs at par with the fully-supervised counterparts.
【Keywords】:
【Paper Link】 【Pages】:9079-9086
【Authors】: Shipeng Yan ; Songyang Zhang ; Xuming He
【Abstract】: Despite recent success of deep neural networks, it remains challenging to efficiently learn new visual concepts from limited training data. To address this problem, a prevailing strategy is to build a meta-learner that learns prior knowledge on learning from a small set of annotated data. However, most of existing meta-learning approaches rely on a global representation of images and a meta-learner with complex model structures, which are sensitive to background clutter and difficult to interpret. We propose a novel meta-learning method for few-shot classification based on two simple attention mechanisms: one is a spatial attention to localize relevant object regions and the other is a task attention to select similar training data for label prediction. We implement our method via a dual-attention network and design a semantic-aware meta-learning loss to train the meta-learner network in an end-to-end manner. We validate our model on three few-shot image classification datasets with extensive ablative study, and our approach shows competitive performances over these datasets with fewer parameters. For facilitating the future research, code and data split are available: https://github.com/tonysy/STANet-PyTorch
【Keywords】:
【Paper Link】 【Pages】:9087-9094
【Authors】: Fan Yang ; Ryota Hinami ; Yusuke Matsui ; Steven Ly ; Shin'ichi Satoh
【Abstract】: Diffusion is commonly used as a ranking or re-ranking method in retrieval tasks to achieve higher retrieval performance, and has attracted lots of attention in recent years. A downside to diffusion is that it performs slowly in comparison to the naive k-NN search, which causes a non-trivial online computational cost on large datasets. To overcome this weakness, we propose a novel diffusion technique in this paper. In our work, instead of applying diffusion to the query, we precompute the diffusion results of each element in the database, making the online search a simple linear combination on top of the k-NN search process. Our proposed method becomes 10∼ times faster in terms of online search speed. Moreover, we propose to use late truncation instead of early truncation in previous works to achieve better retrieval performance.
【Keywords】:
【Paper Link】 【Pages】:9095-9102
【Authors】: Lingxiao Yang ; David Zhang ; Lei Zhang
【Abstract】: The recent success of deep network in visual trackers learning largely relies on human labeled data, which are however expensive to annotate. Recently, some unsupervised methods have been proposed to explore the learning of visual trackers without labeled data, while their performance lags far behind the supervised methods. We identify the main bottleneck of these methods as inconsistent objectives between off-line training and online tracking stages. To address this problem, we propose a novel unsupervised learning pipeline which is based on the discriminative correlation filter network. Our method iteratively updates the tracker by alternating between target localization and network optimization. In particular, we propose to learn the network from a single movie, which could be easily obtained other than collecting thousands of video clips or millions of images. Extensive experiments demonstrate that our approach is insensitive to the employed movies, and the trained visual tracker achieves leading performance among existing unsupervised learning approaches. Even compared with the same network trained with human labeled bounding boxes, our tracker achieves similar results on many tracking benchmarks. Code is available at: https://github.com/ZjjConan/UL-Tracker-AAAI2019.
【Keywords】:
【Paper Link】 【Pages】:9103-9110
【Authors】: Jiangchao Yao ; Hao Wu ; Ya Zhang ; Ivor W. Tsang ; Jun Sun
【Abstract】: Learning with noisy labels is imperative in the Big Data era since it reduces expensive labor on accurate annotations. Previous method, learning with noise transition, has enjoyed theoretical guarantees when it is applied to the scenario with the class-conditional noise. However, this approach critically depends on an accurate pre-estimated noise transition, which is usually impractical. Subsequent improvement adapts the preestimation in the form of a Softmax layer along with the training progress. However, the parameters in the Softmax layer are highly tweaked for the fragile performance and easily get stuck into undesired local minimums. To overcome this issue, we propose a Latent Class-Conditional Noise model (LCCN) that models the noise transition in a Bayesian form. By projecting the noise transition into a Dirichlet-distributed space, the learning is constrained on a simplex instead of some adhoc parametric space. Furthermore, we specially deduce a dynamic label regression method for LCCN to iteratively infer the latent true labels and jointly train the classifier and model the noise. Our approach theoretically safeguards the bounded update of the noise transition, which avoids arbitrarily tuning via a batch of samples. Extensive experiments have been conducted on controllable noise data with CIFAR10 and CIFAR-100 datasets, and the agnostic noise data with Clothing1M and WebVision17 datasets. Experimental results have demonstrated that the proposed model outperforms several state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:9111-9118
【Authors】: Weidong Yin ; Ziwei Liu ; Chen Change Loy
【Abstract】: We address the problem of instance-level facial attribute transfer without paired training data, e.g., faithfully transferring the exact mustache from a source face to a target face. This is a more challenging task than the conventional semantic-level attribute transfer, which only preserves the generic attribute style instead of instance-level traits. We propose the use of geometry-aware flow, which serves as a wellsuited representation for modeling the transformation between instance-level facial attributes. Specifically, we leverage the facial landmarks as the geometric guidance to learn the differentiable flows automatically, despite of the large pose gap existed. Geometry-aware flow is able to warp the source face attribute into the target face context and generate a warp-and-blend result. To compensate for the potential appearance gap between source and target faces, we propose a hallucination sub-network that produces an appearance residual to further refine the warp-and-blend result. Finally, a cycle-consistency framework consisting of both attribute transfer module and attribute removal module is designed, so that abundant unpaired face images can be used as training data. Extensive evaluations validate the capability of our approach in transferring instance-level facial attributes faithfully across large pose and appearance gaps. Thanks to the flow representation, our approach can readily be applied to generate realistic details on high-resolution images1.
【Keywords】:
【Paper Link】 【Pages】:9119-9126
【Authors】: Haoxuan You ; Yifan Feng ; Xibin Zhao ; Changqing Zou ; Rongrong Ji ; Yue Gao
【Abstract】: Three-dimensional (3D) shape recognition has drawn much research attention in the field of computer vision. The advances of deep learning encourage various deep models for 3D feature representation. For point cloud and multi-view data, two popular 3D data modalities, different models are proposed with remarkable performance. However the relation between point cloud and views has been rarely investigated. In this paper, we introduce Point-View Relation Network (PVRNet), an effective network designed to well fuse the view features and the point cloud feature with a proposed relation score module. More specifically, based on the relation score module, the point-single-view fusion feature is first extracted by fusing the point cloud feature and each single view feature with point-singe-view relation, then the pointmulti- view fusion feature is extracted by fusing the point cloud feature and the features of different number of views with point-multi-view relation. Finally, the point-single-view fusion feature and point-multi-view fusion feature are further combined together to achieve a unified representation for a 3D shape. Our proposed PVRNet has been evaluated on ModelNet40 dataset for 3D shape classification and retrieval. Experimental results indicate our model can achieve significant performance improvement compared with the state-of-the-art models.
【Keywords】:
【Paper Link】 【Pages】:9127-9134
【Authors】: Zhou Yu ; Dejing Xu ; Jun Yu ; Ting Yu ; Zhou Zhao ; Yueting Zhuang ; Dacheng Tao
【Abstract】: Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos.
【Keywords】:
【Paper Link】 【Pages】:9135-9142
【Authors】: Jinyang Yuan ; Bin Li ; Xiangyang Xue
【Abstract】: Humans perceive the seemingly chaotic world in a structured and compositional way with the prerequisite of being able to segregate conceptual entities from the complex visual scenes. The mechanism of grouping basic visual elements of scenes into conceptual entities is termed as perceptual grouping. In this work, we propose a new type of spatial mixture models with learnable priors for perceptual grouping. Different from existing methods, the proposed method disentangles the representation of an object into “shape” and “appearance” which are modeled separately by the mixture weights and the conditional probability distributions. More specifically, each object in the visual scene is modeled by one mixture component, whose mixture weights and the parameter of the conditional probability distribution are generated by two neural networks, respectively. The mixture weights focus on modeling spatial dependencies (i.e., shape) and the conditional probability distributions deal with intra-object variations (i.e., appearance). In addition, the background is separately modeled as a special component complementary to the foreground objects. Our extensive empirical tests on two perceptual grouping datasets demonstrate that the proposed method outperforms the stateof-the-art methods under most experimental configurations. The learned conceptual entities are generalizable to novel visual scenes and insensitive to the diversity of objects.
【Keywords】:
【Paper Link】 【Pages】:9143-9150
【Authors】: Li Yuan ; Francis E. H. Tay ; Ping Li ; Li Zhou ; Jiashi Feng
【Abstract】: In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycleconsistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and “supervises” the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-theart performance and superiority of the Cycle-SUM model over previous baselines.
【Keywords】:
【Paper Link】 【Pages】:9151-9158
【Authors】: Longhao Yuan ; Chao Li ; Danilo P. Mandic ; Jianting Cao ; Qibin Zhao
【Abstract】: In tensor completion tasks, the traditional low-rank tensor decomposition models suffer from the laborious model selection problem due to their high model sensitivity. In particular, for tensor ring (TR) decomposition, the number of model possibilities grows exponentially with the tensor order, which makes it rather challenging to find the optimal TR decomposition. In this paper, by exploiting the low-rank structure of the TR latent space, we propose a novel tensor completion method which is robust to model selection. In contrast to imposing the low-rank constraint on the data space, we introduce nuclear norm regularization on the latent TR factors, resulting in the optimization step using singular value decomposition (SVD) being performed at a much smaller scale. By leveraging the alternating direction method of multipliers (ADMM) scheme, the latent TR factors with optimal rank and the recovered tensor can be obtained simultaneously. Our proposed algorithm is shown to effectively alleviate the burden of TR-rank selection, thereby greatly reducing the computational cost. The extensive experimental results on both synthetic and real-world data demonstrate the superior performance and efficiency of the proposed approach against the state-of-the-art algorithms.
【Keywords】:
【Paper Link】 【Pages】:9159-9166
【Authors】: Yitian Yuan ; Tao Mei ; Wenwu Zhu
【Abstract】: We have witnessed the tremendous growth of videos over the Internet, where most of these videos are typically paired with abundant sentence descriptions, such as video titles, captions and comments. Therefore, it has been increasingly crucial to associate specific video segments with the corresponding informative text descriptions, for a deeper understanding of video content. This motivates us to explore an overlooked problem in the research community — temporal sentence localization in video, which aims to automatically determine the start and end points of a given sentence within a paired video. For solving this problem, we face three critical challenges: (1) preserving the intrinsic temporal structure and global context of video to locate accurate positions over the entire video sequence; (2) fully exploring the sentence semantics to give clear guidance for localization; (3) ensuring the efficiency of the localization method to adapt to long videos. To address these issues, we propose a novel Attention Based Location Regression (ABLR) approach to localize sentence descriptions in videos in an efficient end-to-end manner. Specifically, to preserve the context information, ABLR first encodes both video and sentence via Bi-directional LSTM networks. Then, a multi-modal co-attention mechanism is presented to generate both video and sentence attentions. The former reflects the global video structure, while the latter highlights the sentence details for temporal localization. Finally, a novel attention based location prediction network is designed to regress the temporal coordinates of sentence from the previous attentions. We evaluate the proposed ABLR approach on two public datasets ActivityNet Captions and TACoS. Experimental results show that ABLR significantly outperforms the existing approaches in both effectiveness and efficiency.
【Keywords】:
【Paper Link】 【Pages】:9167-9175
【Authors】: Yuan Yuan ; Dong Wang ; Qi Wang
【Abstract】: Human actions captured in video sequences contain two crucial factors for action recognition, i.e., visual appearance and motion dynamics. To model these two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are adopted in most existing successful methods for recognizing actions. However, CNN based methods are limited in modeling long-term motion dynamics. RNNs are able to learn temporal motion dynamics but lack effective ways to tackle unsteady dynamics in long-duration motion. In this work, we propose a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones. In particular, we present a differential memory controller to make a discrete decision on whether the external memory module should be updated with current feature. The discrete memory controller takes in the memory history, context embedding and current feature as inputs and controls information flow into the external memory module. Additionally, we train this discrete memory controller using straight-through estimator. We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.
【Keywords】:
【Paper Link】 【Pages】:9176-9184
【Authors】: Yuan Yuan ; Zhitong Xiong ; Qi Wang
【Abstract】: RGB image classification has achieved significant performance improvement with the resurge of deep convolutional neural networks. However, mono-modal deep models for RGB image still have several limitations when applied to RGB-D scene recognition. 1) Images for scene classification usually contain more than one typical object with flexible spatial distribution, so the object-level local features should also be considered in addition to global scene representation. 2) Multi-modal features in RGB-D scene classification are still under-utilized. Simply combining these modal-specific features suffers from the semantic gaps between different modalities. 3) Most existing methods neglect the complex relationships among multiple modality features. Considering these limitations, this paper proposes an adaptive crossmodal (ACM) feature learning framework based on graph convolutional neural networks for RGB-D scene recognition. In order to make better use of the modal-specific cues, this approach mines the intra-modality relationships among the selected local features from one modality. To leverage the multi-modal knowledge more effectively, the proposed approach models the inter-modality relationships between two modalities through the cross-modal graph (CMG). We evaluate the proposed method on two public RGB-D scene classification datasets: SUN-RGBD and NYUD V2, and the proposed method achieves state-of-the-art performance.
【Keywords】:
【Paper Link】 【Pages】:9185-9194
【Authors】: Ji Zhang ; Yannis Kalantidis ; Marcus Rohrbach ; Manohar Paluri ; Ahmed Elgammal ; Mohamed Elhoseiny
【Abstract】: Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of 〈subject, relation, object〉 triples. In real-world scenarios with large numbers of objects and relations, some are seen very commonly while others are barely seen. We develop a new relationship detection model that embeds objects and relations into two vector spaces where both discriminative capability and semantic affinity are preserved. We learn a visual and a semantic module that map features from the two modalities into a shared space, where matched pairs of features have to discriminate against those unmatched, but also maintain close distances to semantically similar ones. Benefiting from that, our model can achieve superior performance even when the visual entity categories scale up to more than 80,000, with extremely skewed class distribution. We demonstrate the efficacy of our model on a large and imbalanced benchmark based of Visual Genome that comprises 53,000+ objects and 29,000+ relations, a scale at which no previous work has been evaluated at. We show superiority of our model over competitive baselines on the original Visual Genome dataset with 80,000+ categories. We also show state-of-the-art performance on the VRD dataset and the scene graph dataset which is a subset of Visual Genome with 200 categories.
【Keywords】:
【Paper Link】 【Pages】:9195-9202
【Authors】: Jianfu Zhang ; Yuanyuan Huang ; Yaoyi Li ; Weijie Zhao ; Liqing Zhang
【Abstract】: Recent studies show significant progress in image-to-image translation task, especially facilitated by Generative Adversarial Networks. They can synthesize highly realistic images and alter the attribute labels for the images. However, these works employ attribute vectors to specify the target domain which diminishes image-level attribute diversity. In this paper, we propose a novel model formulating disentangled representations by projecting images to latent units, grouped feature channels of Convolutional Neural Network, to disassemble the information between different attributes. Thanks to disentangled representation, we can transfer attributes according to the attribute labels and moreover retain the diversity beyond the labels, namely, the styles inside each image. This is achieved by specifying some attributes and swapping the corresponding latent units to “swap” the attributes appearance, or applying channel-wise interpolation to blend different attributes. To verify the motivation of our proposed model, we train and evaluate our model on face dataset CelebA. Furthermore, the evaluation of another facial expression dataset RaFD demonstrates the generalizability of our proposed model.
【Keywords】:
【Paper Link】 【Pages】:9203-9210
【Authors】: Kaihao Zhang ; Wenhan Luo ; Lin Ma ; Hongdong Li
【Abstract】: We study the problem of sketch image recognition. This problem is plagued with two major challenges: 1) sketch images are often scarce in contrast to the abundance of natural images, rendering the training task difficult, and 2) the significant domain gap between sketch image and its natural image counterpart makes the task of bridging the two domains challenging. In order to overcome these challenges, in this paper we propose to transfer the knowledge of a network learned from natural images to a sketch network - a new deep net architecture which we term as cousin network. This network guides a sketch-recognition network to extract more relevant features that are close to those of natural images, via adversarial training. Moreover, to enhance the transfer ability of the classification model, a sketch-to-image attribute warehouse is constructed to approximate the transformation between the sketch domain and the real image domain. Extensive experiments conducted on the TU-Berlin dataset show that the proposed model is able to efficiently distill knowledge from natural images and achieves superior performance than the current state of the art.
【Keywords】:
【Paper Link】 【Pages】:9211-9218
【Authors】: Xiaobing Zhang ; Haigang Gong ; Xili Dai ; Fan Yang ; Nianbo Liu ; Ming Liu
【Abstract】: With the breakthrough of deep learning, lip reading technologies are under extraordinarily rapid progress. It is well-known that Chinese is the most widely spoken language in the world. Unlike alphabetic languages, it involves more than 1,000 pronunciations as Pinyin, and nearly 90,000 pictographic characters as Hanzi, which makes lip reading of Chinese very challenging. In this paper, we implement visual-only Chinese lip reading of unconstrained sentences in a two-step end-to-end architecture (LipCH-Net), in which two deep neural network models are employed to perform the recognition of Pictureto-Pinyin (mouth motion pictures to pronunciations) and the recognition of Pinyin-to-Hanzi (pronunciations to texts) respectively, before having a jointly optimization to improve the overall performance. In addition, two modules in the Pinyin-to-Hanzi model are pre-trained separately with large auxiliary data in advance of sequence-to-sequence training to make the best of long sequence matches for avoiding ambiguity. We collect 6-month daily news broadcasts from China Central Television (CCTV) website, and semi-automatically label them into a 20.95 GB dataset with 20,495 natural Chinese sentences. When trained on the CCTV dataset, the LipCH-Net model outperforms the performance of all stateof-the-art lip reading frameworks. According to the results, our scheme not only accelerates training and reduces overfitting, but also overcomes syntactic ambiguity of Chinese which provides a baseline for future relevant work.
【Keywords】:
【Paper Link】 【Pages】:9219-9226
【Authors】: Xiaopeng Zhang ; Yang Yang ; Jiashi Feng
【Abstract】: This paper addresses Weakly Supervised Object Localization (WSOL) with only image-level supervision. We model the missing object locations as latent variables, and contribute a novel self-directed optimization strategy to infer them. With the strategy, our developed Self-Directed Localization Network (SD-LocNet) is able to localize object instance whose initial location is noisy. The self-directed inference hinges on an adaptive sampling method to identify reliable object instance via measuring its localization stability score. In this way, the resulted model is robust to noisy initialized object locations which we find is important in WSOL. Furthermore, we introduce a reliability induced prior propagation strategy to transfer object priors of the reliable instances to those unreliable ones by promoting their feature similarity, which effectively refines the unreliable object instances for better localization. The proposed SD-LocNet achieves 70.9% Cor-Loc and 51.3% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.
【Keywords】:
【Paper Link】 【Pages】:9227-9234
【Authors】: Xiaoyu Zhang ; Haichao Shi ; Changsheng Li ; Kai Zheng ; Xiaobin Zhu ; Lixin Duan
【Abstract】: Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming. In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos. Our proposed framework consists of two major components. First, for action frame localization, we take advantage of the self-attention mechanism to weight each frame, such that the influence of background frames can be effectively eliminated. Second, considering that there are trimmed videos publicly available and also they contain useful information to leverage, we present an additional module to transfer the knowledge from trimmed videos for improving the classification performance in untrimmed ones. Extensive experiments are conducted on two benchmark datasets (i.e., THUMOS14 and ActivityNet1.3), and experimental results clearly corroborate the efficacy of our method.
【Keywords】:
【Paper Link】 【Pages】:9235-9242
【Authors】: Yaqing Zhang ; Xi Li ; Zhongfei Zhang
【Abstract】: Person re-identification (Re-ID) is typically cast as the problem of semantic representation and alignment, which requires precisely discovering and modeling the inherent spatial structure information on person images. Motivated by this observation, we propose a Key-Value Memory Matching Network (KVM-MN) model that consists of key-value memory representation and key-value co-attention matching. The proposed KVM-MN model is capable of building an effective local-position-aware person representation that encodes the spatial feature information in the form of multi-head key-value memory. Furthermore, the proposed KVM-MN model makes use of multi-head co-attention to automatically learn a number of cross-person-matching patterns, resulting in more robust and interpretable matching results. Finally, we build a setwise learning mechanism that implements a more generalized query-to-gallery-image-set learning procedure. Experimental results demonstrate the effectiveness of the proposed model against the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:9243-9250
【Authors】: Yingying Zhang ; Qiaoyong Zhong ; Liang Ma ; Di Xie ; Shiliang Pu
【Abstract】: Person re-identification (ReID) aims to match people across multiple non-overlapping video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is one of the state-of-the-arts. In this work, we explore the margin between positive and negative pairs of triplets and prove that large margin is beneficial. In particular, we propose a novel multi-stage training strategy which learns incremental triplet margin and improves triplet loss effectively. Multiple levels of feature maps are exploited to make the learned features more discriminative. Besides, we introduce global hard identity searching method to sample hard identities when generating a training batch. Extensive experiments on Market-1501, CUHK03, and DukeMTMCreID show that our approach yields a performance boost and outperforms most existing state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:9251-9258
【Authors】: Jian Zhao ; Yu Cheng ; Yi Cheng ; Yang Yang ; Fang Zhao ; Jianshu Li ; Hengzhu Liu ; Shuicheng Yan ; Jiashi Feng
【Abstract】: Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages still remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intraclass variations. As opposed to current techniques for ageinvariant face recognition, which either directly extract ageinvariant features for recognition, or first synthesize a face that matches target age before feature extraction, we argue that it is more desirable to perform both tasks jointly so that they can leverage each other. To this end, we propose a deep Age-Invariant Model (AIM) for face recognition in the wild with three distinct novelties. First, AIM presents a novel unified deep architecture jointly performing cross-age face synthesis and recognition in a mutual boosting way. Second, AIM achieves continuous face rejuvenation/aging with remarkable photorealistic and identity-preserving properties, avoiding the requirement of paired data and the true age of testing samples. Third, we develop effective and novel training strategies for end-to-end learning the whole deep architecture, which generates powerful age-invariant face representations explicitly disentangled from the age variation. Extensive experiments on several cross-age datasets (MORPH, CACD and FG-NET) demonstrate the superiority of the proposed AIM model over the state-of-the-arts. Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild.
【Keywords】:
【Paper Link】 【Pages】:9259-9266
【Authors】: Qijie Zhao ; Tao Sheng ; Yongtao Wang ; Zhi Tang ; Ying Chen ; Ling Cai ; Haibin Ling
【Abstract】: Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask RCNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MSCOCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https://github.com/qijiezhao/M2Det.
【Keywords】:
【Paper Link】 【Pages】:9267-9274
【Authors】: Xin Zhao ; Zhe Liu ; Ruolan Hu ; Kaiqi Huang
【Abstract】: 3D object detection plays an important role in a large number of real-world applications. It requires us to estimate the localizations and the orientations of 3D objects in real scenes. In this paper, we present a new network architecture which focuses on utilizing the front view images and frustum point clouds to generate 3D detection results. On the one hand, a PointSIFT module is utilized to improve the performance of 3D segmentation. It can capture the information from different orientations in space and the robustness to different scale shapes. On the other hand, our network obtains the useful features and suppresses the features with less information by a SENet module. This module reweights channel features and estimates the 3D bounding boxes more effectively. Our method is evaluated on both KITTI dataset for outdoor scenes and SUN-RGBD dataset for indoor scenes. The experimental results illustrate that our method achieves better performance than the state-of-the-art methods especially when point clouds are highly sparse.
【Keywords】:
【Paper Link】 【Pages】:9275-9282
【Authors】: Xin Zhao ; Liufang Sang ; Guiguang Ding ; Jungong Han ; Na Di ; Chenggang Yan
【Abstract】: Pedestrian attribute recognition is to predict attribute labels of pedestrian from surveillance images, which is a very challenging task for computer vision due to poor imaging quality and small training dataset. It is observed that many semantic pedestrian attributes to be recognised tend to show spatial locality and semantic correlations by which they can be grouped while previous works mostly ignore this phenomenon. Inspired by Recurrent Neural Network (RNN)’s super capability of learning context correlations and Attention Model’s capability of highlighting the region of interest on feature map, this paper proposes end-to-end Recurrent Convolutional (RC) and Recurrent Attention (RA) models, which are complementary to each other. RC model mines the correlations among different attribute groups with convolutional LSTM unit, while RA model takes advantage of the intra-group spatial locality and inter-group attention correlation to improve the performance of pedestrian attribute recognition. Our RA method combines the Recurrent Learning and Attention Model to highlight the spatial position on feature map and mine the attention correlations among different attribute groups to obtain more precise attention. Extensive empirical evidence shows that our recurrent model frameworks achieve state-of-the-art results, based on pedestrian attribute datasets, i.e. standard PETA and RAP datasets.
【Keywords】:
【Paper Link】 【Pages】:9283-9290
【Authors】: Mingmin Zhen ; Jinglu Wang ; Lei Zhou ; Tian Fang ; Long Quan
【Abstract】: Semantic segmentation is pixel-wise classification which retains critical spatial information. The “feature map reuse” has been commonly adopted in CNN based approaches to take advantage of feature maps in the early layers for the later spatial reconstruction. Along this direction, we go a step further by proposing a fully dense neural network with an encoderdecoder structure that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. In addition, we propose the boundary-aware loss function to focus more attention on the pixels near the boundary, which boosts the “hard examples” labeling. We have demonstrated the best performance of the FDNet on the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when not considering training on other datasets.
【Keywords】:
【Paper Link】 【Pages】:9291-9298
【Authors】: Xiawu Zheng ; Rongrong Ji ; Xiaoshuai Sun ; Baochang Zhang ; Yongjian Wu ; Feiyue Huang
【Abstract】: Recent advances on fine-grained image retrieval prefer learning convolutional neural network (CNN) with specific fullyconnect layer designed loss function for discriminative feature representation. Essentially, such loss should establish a robust metric to efficiently distinguish high-dimensional features within and outside fine-grained categories. To this end, the existing loss functions are defected in two aspects: (a) The feature relationship is encoded inside the training batch. Such a local scope leads to low accuracy. (b) The error is established by the mean square, which needs pairwise distance computation in training set and results in low efficiency. In this paper, we propose a novel metric learning scheme, termed Normalize-Scale Layer and Decorrelated Global Centralized Ranking Loss, which achieves extremely efficient and discriminative learning, i.e., 5× speedup over triplet loss and 12% recall boost on CARS196. Our method originates from the classic softmax loss, which has a global structure but does not directly optimize the distance metric as well as the inter/intra class distance. We tackle this issue through a hypersphere layer and a global centralized ranking loss with a pairwise decorrelated learning. In particular, we first propose a Normalize-Scale Layer to eliminate the gap between metric distance (for measuring distance in retrieval) and dot product (for dimension reduction in classification). Second, the relationship between features is encoded under a global centralized ranking loss, which targets at optimizing metric distance globally and accelerating learning procedure. Finally, the centers are further decorrelated by Gram-Schmidt process, leading to extreme efficiency (with 20 epochs in training procedure) and discriminability in feature learning. We have conducted quantitative evaluations on two fine-grained retrieval benchmark. The superior performance demonstrates the merits of the proposed approach over the state-of-the-arts.
【Keywords】:
【Paper Link】 【Pages】:9299-9306
【Authors】: Hang Zhou ; Yu Liu ; Ziwei Liu ; Ping Luo ; Xiaogang Wang
【Abstract】: Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.
【Keywords】:
【Paper Link】 【Pages】:9307-9315
【Authors】: Lipu Zhou ; Yi Yang ; Montiel Abello ; Michael Kaess
【Abstract】: This paper proposes a novel algorithm to solve the pose estimation problem from 2D/3D line correspondences, known as the Perspective-n-Line (PnL) problem. It is widely known that minimizing the geometric distance generally results in more accurate results than minimizing an algebraic distance. However, the rational form of the reprojection distance of the line yields a complicated cost function, which makes solving the first-order optimality conditions infeasible. Furthermore, iterative algorithms based on the reprojection distance are time-consuming for a large-scale problem. In contrast to previous works which minimize a cost function based on an algebraic distance that may not approximate the reprojection distance of the line, we design two simple algebraic distances to gradually approximate the reprojection distance. This speeds up the computation, and maintains the robustness of the geometric distance. The two algebraic distances result in two polynomial cost functions, which can be efficiently solved. We directly solve the first-order optimality conditions of the first problem with a novel hidden variable method. This algorithm makes use of the specific structure of the resulting polynomial system, therefore it is more stable than the general Gröbner basis polynomial solver. Then, we minimize the second polynomial cost function by the damped Newton iteration, starting from the solution of the first cost function. Experimental results show that the first step of our algorithm is already superior to the state-of-the-art algorithms in terms of accuracy and applicability, and faster than the algorithms based on Gröbner basis polynomial solver. The second step yields comparable results to the results from minimizing the reprojection distance, but is much more efficient. For speed, our algorithm is applicable to real-time applications.
【Keywords】:
【Paper Link】 【Pages】:9316-9323
【Authors】: Yiyi Zhou ; Rongrong Ji ; Jinsong Su ; Xiangming Li ; Xiaoshuai Sun
【Abstract】: In this paper, we uncover the issue of knowledge inertia in visual question answering (VQA), which commonly exists in most VQA models and forces the models to mainly rely on the question content to “guess” answer, without regard to the visual information. Such an issue not only impairs the performance of VQA models, but also greatly reduces the credibility of the answer prediction. To this end, simply highlighting the visual features in the model is undoable, since the prediction is built upon the joint modeling of two modalities and largely influenced by the data distribution. In this paper, we propose a Pairwise Inconformity Learning (PIL) to tackle the issue of knowledge inertia. In particular, PIL takes full advantage of the similar image pairs with diverse answers to an identical question provided in VQA2.0 dataset. It builds a multi-modal embedding space to project pos./neg. feature pairs, upon which word vectors of answers are modeled as anchors. By doing so, PIL strengthens the importance of visual features in prediction with a novel dynamic-margin based triplet loss that efficiently increases the semantic discrepancies between pos./neg. image pairs. To verify the proposed PIL, we plug it on a baseline VQA model as well as a set of recent VQA models, and conduct extensive experiments on two benchmark datasets, i.e., VQA1.0 and VQA2.0. Experimental results show that PIL can boost the accuracy of the existing VQA models (1.56%-2.93% gain) with a negligible increase in parameters (0.85%-5.4% parameters). Qualitative results also reveal the elimination of knowledge inertia in the existing VQA models after implementing our PIL.
【Keywords】:
【Paper Link】 【Pages】:9324-9331
【Authors】: Yiyi Zhou ; Rongrong Ji ; Jinsong Su ; Xiaoshuai Sun ; Weiqiu Chen
【Abstract】: In visual question answering (VQA), recent advances have well advocated the use of attention mechanism to precisely link the question to the potential answer areas. As the difficulty of the question increases, more VQA models adopt multiple attention layers to capture the deeper visual-linguistic correlation. But a negative consequence is the explosion of parameters, which makes the model vulnerable to over-fitting, especially when limited training examples are given. In this paper, we propose an extremely compact alternative to this static multi-layer architecture towards accurate yet efficient attention modeling, termed as Dynamic Capsule Attention (CapsAtt). Inspired by the recent work of Capsule Network, CapsAtt treats visual features as capsules and obtains the attention output via dynamic routing, which updates the attention weights by calculating coupling coefficients between the underlying and output capsules. Meanwhile, CapsAtt also discards redundant projection matrices to make the model much more compact. We quantify CapsAtt on three benchmark VQA datasets, i.e., COCO-QA, VQA1.0 and VQA2.0. Compared to the traditional multi-layer attention model, CapsAtt achieves significant improvements of up to 4.1%, 5.2% and 2.2% on three datasets, respectively. Moreover, with much fewer parameters, our approach also yields competitive results compared to the latest VQA models. To further verify the generalization ability of CapsAtt, we also deploy it on another challenging multi-modal task of image captioning, where state-of-the-art performance is achieved with a simple network structure.
【Keywords】:
【Paper Link】 【Pages】:9332-9339
【Authors】: Hongyuan Zhu ; Xi Peng ; Joey Tianyi Zhou ; Songfan Yang ; Vijay Chanderasekh ; Liyuan Li ; Joo-Hwee Lim
【Abstract】: Single image rain-streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. Previous works solve this problem using various hand-designed priors or by explicitly mapping synthetic rain to paired clean image in a supervised way. In practice, however, the pre-defined priors are easily violated and the paired training data are hard to collect. To overcome these limitations, in this work, we propose RainRemoval-GAN (RRGAN), the first end-to-end adversarial model that generates realistic rain-free images using only unpaired supervision. Our approach alleviates the paired training constraints by introducing a physical-model which explicitly learns a recovered images and corresponding rain-streaks from the differentiable programming perspective. The proposed network consists of a novel multiscale attention memory generator and a novel multiscale deeply supervised discriminator. The multiscale attention memory generator uses a memory with attention mechanism to capture the latent rain streaks context at different stages to recover the clean images. The deeply supervised multiscale discriminator imposes constraints at the recovered output in terms of local details and global appearance to the clean image set. Together with the learned rainstreaks, a reconstruction constraint is employed to ensure the appearance consistent with the input image. Experimental results on public benchmark demonstrates our promising performance compared with nine state-of-the-art methods in terms of PSNR, SSIM, visual qualities and running time.
【Keywords】:
【Paper Link】 【Pages】:9340-9347
【Authors】: Yun-Zhi Zhuge ; Yu Zeng ; Huchuan Lu
【Abstract】: Benefiting from the rapid development of Convolutional Neural Networks (CNNs), some salient object detection methods have achieved remarkable results by utilizing multi-level convolutional features. However, the saliency training datasets is of limited scale due to the high cost of pixel-level labeling, which leads to a limited generalization of the trained model on new scenarios during testing. Besides, some FCN-based methods directly integrate multi-level features, ignoring the fact that the noise in some features are harmful to saliency detection. In this paper, we propose a novel approach that transforms prior information into an embedding space to select attentive features and filter out outliers for salient object detection. Our network firstly generates a coarse prediction map through an encorder-decorder structure. Then a Feature Embedding Network (FEN) is trained to embed each pixel of the coarse map into a metric space, which incorporates much attentive features that highlight salient regions and suppress the response of non-salient regions. Further, the embedded features are refined through a deep-to-shallow Recursive Feature Integration Network (RFIN) to improve the details of prediction maps. Moreover, to alleviate the blurred boundaries, we propose a Guided Filter Refinement Network (GFRN) to jointly optimize the predicted results and the learnable guidance maps. Extensive experiments on five benchmark datasets demonstrate that our method outperforms state-of-the-art results. Our proposed method is end-to-end and achieves a realtime speed of 38 FPS.
【Keywords】:
【Paper Link】 【Pages】:9348-9355
【Authors】: Li'an Zhuo ; Baochang Zhang ; Chen Chen ; Qixiang Ye ; Jianzhuang Liu ; David S. Doermann
【Abstract】: In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as expensive to compute as the true gradient in many scenarios. This paper introduces a calibrated stochastic gradient descent (CSGD) algorithm for deep neural network optimization. A theorem is developed to prove that an unbiased estimator for the network variables can be obtained in a probabilistic way based on the Lipschitz hypothesis. Our work is significantly distinct from existing gradient optimization methods, by providing a theoretical framework for unbiased variable estimation in the deep learning paradigm to optimize the model parameter calculation. In particular, we develop a generic gradient calibration layer which can be easily used to build convolutional neural networks (CNNs). Experimental results demonstrate that CNNs with our CSGD optimization scheme can improve the stateof-the-art performance for natural image classification, digit recognition, ImageNet object classification, and object detection tasks. This work opens new research directions for developing more efficient SGD updates and analyzing the backpropagation algorithm.
【Keywords】:
【Paper Link】 【Pages】:9357-9364
【Authors】: Joshua Eckroth ; Eric Schoen
【Abstract】: This paper describes the genetic algorithm used to select news stories about artificial intelligence for AAAI’s weekly AIAlert, emailed to nearly 11,000 subscribers. Each week, about 1,500 news stories covering various aspects of artificial intelligence and machine learning are discovered by i2k Connect’s NewsFinder agent. Our challenge is to select just 10 stories from this collection that represent the important news about AI. Since stories and topics do not necessarily repeat in later weeks, we cannot use click tracking and supervised learning to predict which stories or topics are most preferred by readers. Instead, we must build a representative selection of stories a priori, using information about each story’s topics, content, publisher, date of publication, and other features. This paper describes a genetic algorithm that achieves this task. We demonstrate its effectiveness by comparing several engagement metrics from six months of “A/B testing” experiments that compare random story selection vs. a simple scoring algorithm vs. our new genetic algorithm.
【Keywords】:
【Paper Link】 【Pages】:9365-9372
【Authors】: Christopher Lesner ; Alexander Ran ; Marko Rukonic ; Wei Wang
【Abstract】: A major part of financial accounting involves tracking and organizing business transactions over and over each month and hence automation of this task is of significant value to the users of accounting software. In this paper we present a large-scale recommendation system that successfully recommends company specific categories for several million small businesses in US, UK, Australia, Canada, India and France and handles billions of financial transactions each year. Our system uses machine learning to combine fragments of information from millions of users in a manner that allows us to accurately recommend user-specific Chart of Accounts categories. Accounts are handled even if named using abbreviations or in a foreign language. Transactions are handled even if a given user has never categorized a transaction like that before. The development of such a system and testing it at scale over billions of transactions is a first in the financial industry.
【Keywords】:
【Paper Link】 【Pages】:9373-9380
【Authors】: Marc Maier ; Hayley Carlotto ; Freddie Sanchez ; Sherriff Balogun ; Sears Merritt
【Abstract】: Life insurance provides trillions of dollars of financial security for hundreds of millions of individuals and families worldwide. Life insurance companies must accurately assess individual-level mortality risk to simultaneously maintain financial strength and price their products competitively. The traditional underwriting process used to assess this risk is based on manually examining an applicant’s health, behavioral, and financial profile. The existence of large historical data sets provides an unprecedented opportunity for artificial intelligence and machine learning to transform underwriting in the life insurance industry. We present an overview of how a rich application data set and survival modeling were combined to develop a life score that has been deployed in an algorithmic underwriting system at MassMutual, an American mutual life insurance company serving millions of clients. Through a novel evaluation framework, we show that the life score outperforms traditional underwriting by 6% on the basis of claims. We describe how engagement with actuaries, medical doctors, underwriters, and reinsurers was paramount to building an algorithmic underwriting system with a predictive model at its core. Finally, we provide details of the deployed system and highlight its value, which includes saving millions of dollars in operational efficiency while driving the decisions behind tens of billions of dollars of benefits.
【Keywords】:
【Paper Link】 【Pages】:9381-9388
【Authors】: Atri Mandal ; Nikhil Malhotra ; Shivali Agarwal ; Anupama Ray ; Giriprasad Sridhara
【Abstract】: Ticket assignment/dispatch is a crucial part of service delivery business with lot of scope for automation and optimization. In this paper, we present an end-to-end automated helpdesk email ticket assignment system, which is also offered as a service. The objective of the system is to determine the nature of the problem mentioned in an incoming email ticket and then automatically dispatch it to an appropriate resolver group (or team) for resolution.The proposed system uses an ensemble classifier augmented with a configurable rule engine. While design of a classifier that is accurate is one of the main challenges, we also need to address the need of designing a system that is robust and adaptive to changing business needs. We discuss some of the main design challenges associated with email ticket assignment automation and how we solve them. The design decisions for our system are driven by high accuracy, coverage, business continuity, scalability and optimal usage of computational resources.Our system has been deployed in production of three major service providers and currently assigning over 90,000 emails per month, on an average, with an accuracy close to 90% and covering at least 90% of email tickets. This translates to achieving human-level accuracy and results in a net saving of more than 50000 man-hours of effort per annum. Till date, our deployed system has already served more than 700,000 tickets in production.
【Keywords】:
【Paper Link】 【Pages】:9389-9396
【Authors】: Rohit Takhar ; Varun Aggarwal
【Abstract】: Evaluators wish to test candidates on their ability to propose the correct algorithmic approach to solve programming problems. Recently, several automated systems for grading programs have been proposed, but none of them address uncompilable codes. We present the first approach to grade uncompilable codes and provide semantic feedback on them using machine learning. We propose two methods that allow us to derive informative semantic features from programs. One of this approach makes the program compilable by correcting errors, while the other relaxes syntax/grammar rules to help parse uncompilable codes. We compare the relative efficacy of these approaches towards grading. We finally combine them to build an algorithm which rivals the accuracy of experts in grading programs. Additionally, we show that the models learned for compilable codes can be reused for uncompilable codes. We present case studies, where companies are able to hire more efficiently by deploying our technology.
【Keywords】:
【Paper Link】 【Pages】:9398-9403
【Authors】: Aaron Adler ; Peter Samouelian ; Michael Atighetchi ; Yat Fu
【Abstract】: Boundary Protection Devices (BPDs) are used by US Government mission partners to regulate the flow of information across networks of differing security levels. BPDs provide several critical functions, including preventing unauthorized sharing, sanitizing information, and preventing cyber attacks. Their application in national security and critical infrastructure environments (e.g., military missions, nuclear power plants, clean water distribution systems) calls for a comprehensive load monitoring system that provides resilience and scalability, as well as an automated and vendor neutral configuration management system that can efficiently respond to security threats at machine speed. Their design as one-way traffic control systems, however, presents challenges for dynamic load adaptation techniques that require access to application server performance metrics across network boundaries. Moreover, the structured review and approval process that regulates their configuration and use presents two significant challenges: (1) Adaptation techniques that alter the configuration of BPDs must be predictable, understandable, and pre-approved by administrators, and (2) Software can be installed on BPDs only after completing a stringent accreditation process. These challenges often lead to manual configuration management practices, which are inefficient or ineffective in many cases. The Hammerhead prototype, developed as part of the SHARC project, addresses these challenges using knowledge representation, a rule-oriented adaptation bundle format, and an extensible, open-source constraint solver.
【Keywords】:
【Paper Link】 【Pages】:9404-9409
【Authors】: José Luis Ambite ; Jonathan Gordon ; Lily Fierro ; Gully Burns ; Joel Mathew
【Abstract】: The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using Schema.org. We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.
【Keywords】:
【Paper Link】 【Pages】:9410-9415
【Authors】: Akinori Asahara ; Hidekazu Morita ; Chiharu Mitsumata ; Kanta Ono ; Masao Yano ; Tetsuya Shoji
【Abstract】: This paper describes a new machine-learning application to speed up Small-angle neutron scattering (SANS) experiments, and its method based on probabilistic modeling. SANS is one of the scattering experiments to observe microstructures of materials; in it, two-dimensional patterns on a plane (SANS pattern) are obtained as measurements. It takes a long time to obtain accurate experimental results because the SANS pattern is a histogram of detected neutrons. For shortening the measurement time, we propose an earlystopping method based on Gaussian mixture modeling with a prior generated from B-spline regression results. An experiment using actual SANS data was carried out to examine the accuracy of the method. It was confirmed that the accuracy with the proposed method converged 4 minutes after starting the experiment (normal SANS takes about 20 minutes).
【Keywords】:
【Paper Link】 【Pages】:9416-9421
【Authors】: Sebastian Blank ; Florian Wilhelm ; Hans-Peter Zorn ; Achim Rettinger
【Abstract】: Almost all of today’s knowledge is stored in databases and thus can only be accessed with the help of domain specific query languages, strongly limiting the number of people which can access the data. In this work, we demonstrate an end-to-end trainable question answering (QA) system that allows a user to query an external NoSQL database by using natural language. A major challenge of such a system is the non-differentiability of database operations which we overcome by applying policy-based reinforcement learning. We evaluate our approach on Facebook’s bAbI Movie Dialog dataset and achieve a competitive score of 84.2% compared to several benchmark models. We conclude that our approach excels with regard to real-world scenarios where knowledge resides in external databases and intermediate labels are too costly to gather for non-end-to-end trainable QA systems.
【Keywords】:
【Paper Link】 【Pages】:9422-9427
【Authors】: Joseph Bockhorst ; Devin Conathan ; Glenn M. Fung
【Abstract】: We present an approach for designing conversational interfaces (chatbots) that users interact with to determine whether or not a business rule applies in a context possessing uncertainty (from the point of view of the chatbot) as to the value of input facts. Our approach relies on Bayesian network models that bring together a business rule’s logical, deterministic aspects with its probabilistic components in a common framework. Our probabilistic-logic bots (PL-bots) evaluate business rules by iteratively prompting users to provide the values of unknown facts. The order facts are solicited is dynamic, depends on known facts, and is chosen using mutual information as a heuristic so as to minimize the number of interactions with the user. We have created a web-based content creation and editing tool that quickly enables subject matter experts to create and validate PL-bots with minimal training and without requiring a deep understanding of logic or probability. To date, domain experts at a well-known insurance company have successfully created and deployed over 80 PLbots to help insurance agents determine customer eligibility for policy discounts and endorsements.
【Keywords】:
【Paper Link】 【Pages】:9428-9433
【Authors】: Andrea Borghesi ; Andrea Bartolini ; Michele Lombardi ; Michela Milano ; Luca Benini
【Abstract】: Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states).We propose a novel approach for anomaly detection in HighPerformance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with).We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).
【Keywords】:
【Paper Link】 【Pages】:9434-9439
【Authors】: Anna Paola Carrieri ; Will P. M. Rowe ; Martyn D. Winn ; Edward O. Pyzer-Knapp
【Abstract】: Research on the microbiome is an emerging and crucial science that finds many applications in healthcare, food safety, precision agriculture and environmental studies. Huge amounts of DNA from microbial communities are being sequenced and analyzed by scientists interested in extracting meaningful biological information from this big data. Analyzing massive microbiome sequencing datasets, which embed the functions and interactions of thousands of different bacterial, fungal and viral species, is a significant computational challenge. Artificial intelligence has the potential for building predictive models that can provide insights for specific cutting edge applications such as guiding diagnostics and developing personalised treatments, as well as maintaining soil health and fertility. Current machine learning workflows that predict traits of host organisms from their commensal microbiome do not take into account the whole genetic material constituting the microbiome, instead basing the analysis on specific marker genes. In this paper, to the best of our knowledge, we introduce the first machine learning workflow that efficiently performs host phenotype prediction from whole shotgun metagenomes by computing similaritypreserving compact representations of the genetic material. Our workflow enables prediction tasks, such as classification and regression, from Terabytes of raw sequencing data that do not necessitate any pre-prossessing through expensive bioinformatics pipelines. We compare the performance in terms of time, accuracy and uncertainty of predictions for four different classifiers. More precisely, we demonstrate that our ML workflow can efficiently classify real data with high accuracy, using examples from dog and human metagenomic studies, representing a step forward towards real time diagnostics and a potential for cloud applications.
【Keywords】:
【Paper Link】 【Pages】:9440-9445
【Authors】: Srija Chakraborty ; Subhasish Das ; Ayan Banerjee ; Sandeep K. S. Gupta ; Philip R. Christensen
【Abstract】: Instruments onboard spacecraft acquire large amounts of data which is to be transmitted over a very low bandwidth. Consequently for some missions, the volume of data collected greatly exceeds the volume that can be downlinked before the next orbit. This necessitates the introduction of an intelligent autonomous decision making module that maximizes the return of the most scientifically relevant dataset over the low bandwidth for experts to analyze further. We propose an iterative rule based approach, guided by expert knowledge, to represent scientifically interesting geological landforms with respect to expert selected attributes. The rules are utilized to assign a priority based on how novel a test instance is with respect to its rule. High priority instances from the test set are used to iteratively update the learned rules. We then determine the effectiveness of the proposed approach on images acquired by a Mars orbiter and observe an expert-acceptable prioritization order generated by the rules that can potentially increase the return of scientifically relevant observations.
【Keywords】:
【Paper Link】 【Pages】:9446-9451
【Authors】: Amir Elmishali ; Roni Stern ; Meir Kalech
【Abstract】: In this paper, we present the DeBGUer tool, a web-based tool for prediction and isolation of software bugs. DeBGUer is a partial implementation of the Learn, Diagnose, and Plan (LDP) paradigm, which is a recently introduced paradigm for integrating Artificial Intelligence (AI) in the software bug detection and correction process. In LDP, a diagnosis (DX) algorithm is used to suggest possible explanations – diagnoses – for an observed bug. If needed, a test planning algorithm is subsequently used to suggest further testing. Both diagnosis and test planning algorithms consider a fault prediction model, which associates each software component (e.g., class or method) with the likelihood that it contains a bug. DeBGUer implements the first two components of LDP, bug prediction (Learn) and bug diagnosis (Diagnose). It provides an easy-to-use web interface, and has been successfully tested on 12 projects.
【Keywords】:
【Paper Link】 【Pages】:9452-9459
【Authors】: Natalie Fridman ; Doron Amir ; Yinon Douchan ; Noa Agmon
【Abstract】: There is a growing need for coverage of large maritime areas, mainly in the exclusive economic zone (EEZ). Due to the difficulty of accessing such large areas, the use of satellite based sensors is the most efficient and cost-effective way to perform this task. Vessel behavior prediction is a necessary ability for detection of moving vessels with satellite imagery. In this paper we present an algorithm for selection of the best satellite observation window to detect a moving object. First, we describe a model for vessel behavior prediction and compare its performance to two base models. We use real marine traffic data (AIS) to compare their ability to predict vessel behavior in a time frame of between 1–24 hours. Then, we present a KINGFISHER, maritime intelligence system which uses our algorithm to track suspected vessels with satellite sensor. We also present the results of the algorithm in operational scenarios of the KINGFISHER.
【Keywords】:
【Paper Link】 【Pages】:9460-9465
【Authors】: Yu Gong ; Xusheng Luo ; Kenny Q. Zhu ; Wenwu Ou ; Zhao Li ; Lu Duan
【Abstract】: This paper studies the problem of automatically extracting a short title from a manually written longer description of Ecommerce products for display on mobile devices. It is a new extractive summarization problem on short text inputs, for which we propose a feature-enriched network model, combining three different categories of features in parallel. Experimental results show that our framework significantly outperforms several baselines by a substantial gain of 4.5%. Moreover, we produce an extractive summarization dataset for Ecommerce short texts and will release it to the research community.
【Keywords】:
【Paper Link】 【Pages】:9466-9471
【Authors】: Kyoohyung Han ; Seungwan Hong ; Jung Hee Cheon ; Daejun Park
【Abstract】: Machine learning on (homomorphic) encrypted data is a cryptographic method for analyzing private and/or sensitive data while keeping privacy. In the training phase, it takes as input an encrypted training data and outputs an encrypted model without ever decrypting. In the prediction phase, it uses the encrypted model to predict results on new encrypted data. In each phase, no decryption key is needed, and thus the data privacy is ultimately guaranteed. It has many applications in various areas such as finance, education, genomics, and medical field that have sensitive private data. While several studies have been reported on the prediction phase, few studies have been conducted on the training phase.In this paper, we present an efficient algorithm for logistic regression on homomorphic encrypted data, and evaluate our algorithm on real financial data consisting of 422,108 samples over 200 features. Our experiment shows that an encrypted model with a sufficient Kolmogorov Smirnow statistic value can be obtained in ∼17 hours in a single machine. We also evaluate our algorithm on the public MNIST dataset, and it takes ∼2 hours to learn an encrypted model with 96.4% accuracy. Considering the inefficiency of homomorphic encryption, our result is encouraging and demonstrates the practical feasibility of the logistic regression training on large encrypted data, for the first time to the best of our knowledge.
【Keywords】:
【Paper Link】 【Pages】:9472-9477
【Authors】: Ramin M. Hasani ; Guodong Wang ; Radu Grosu
【Abstract】: This paper studies an intelligent technique for the healthmonitoring and prognostics of common rotary machine components, with regards to bearings in particular. During a run-to-failure experiment, rich unsupervised features from vibration sensory data are extracted by a trained sparse autoencoder. Then, the correlation of the initial samples (presumably healthy), along with the successive samples, are calculated and passed through a moving-average filter. The normalized output which is referred to as the auto-encoder correlation based (AEC) rate, determines an informative attribute of the system, depicting its health status. AEC automatically identifies the degradation starting point in the machine component. We show that AEC rate well-generalizes in several run-tofailure tests. We demonstrate the superiority of the AEC over many other state-of-the-art approaches for the health monitoring of machine bearings.
【Keywords】:
【Paper Link】 【Pages】:9478-9483
【Authors】: Samuel Jero ; Maria Leonor Pacheco ; Dan Goldwasser ; Cristina Nita-Rotaru
【Abstract】: Grammar-based fuzzing is a technique used to find software vulnerabilities by injecting well-formed inputs generated following rules that encode application semantics. Most grammar-based fuzzers for network protocols rely on human experts to manually specify these rules. In this work we study automated learning of protocol rules from textual specifications (i.e. RFCs). We evaluate the automatically extracted protocol rules by applying them to a state-of-the-art fuzzer for transport protocols and show that it leads to a smaller number of test cases while finding the same attacks as the system that uses manually specified rules.
【Keywords】:
【Paper Link】 【Pages】:9484-9491
【Authors】: Hannah Rae Kerner ; Danika F. Wellington ; Kiri L. Wagstaff ; James F. Bell ; Chiman Kwan ; Heni Ben Amor
【Abstract】: In this work, we present a system based on convolutional autoencoders for detecting novel features in multispectral images. We introduce SAMMIE: Selections based on Autoencoder Modeling of Multispectral Image Expectations. Previous work using autoencoders employed the scalar reconstruction error to classify new images as novel or typical. We show that a spatial-spectral error map can enable both accurate classification of novelty in multispectral images as well as human-comprehensible explanations of the detection. We apply our methodology to the detection of novel geologic features in multispectral images of the Martian surface collected by the Mastcam imaging system on the Mars Science Laboratory Curiosity rover.
【Keywords】:
【Paper Link】 【Pages】:9492-9497
【Authors】: Saiprasad Koturwar ; Soma Shiraishi ; Kota Iwamoto
【Abstract】: As an alternative to bar-code scanning, we are developing a real-time retail product detector for point-of-sale automation. The major challenge associated with image based object detection arise from occlusion and the presence of other objects in close proximity. For robust product detection under such conditions, it is crucial to train the detector on a rich set of images with varying degrees of occlusion and proximity between the products, which fairly represents a wide range of customer tendencies of placing products together. However, generating a fairly large database of such images traditionally requires a large amount of human effort. On the other hand, acquiring individual object images with their corresponding masks is a relatively easy task. We propose an realistic image synthesis approach which uses individual object images and their corresponding masks to create training images with desired properties (occlusion and congestion among the products). We train our product detector over images thus generated and achieve a consistent performance improvement across different types of test data. With the proposed approach, detector achieves an improvement of 46.2% (from 0.67 to 0.98) and 40% (from 0.60 to 0.84) over precision and recall respectively, compared to using a basic training dataset containing one product per image.
【Keywords】:
【Paper Link】 【Pages】:9498-9503
【Authors】: Abhinav Kumar ; Aishwarya Gupta ; Bishal Santra ; K. S. Lalitha ; Manasa Kolla ; Mayank Gupta ; Rishabh Singh
【Abstract】: High Occupancy Vehicle/High Occupancy Tolling (HOV/HOT) lanes are operated based on voluntary HOV declarations by drivers. A majority of these declarations are wrong to leverage faster HOV lane speeds illegally. It is a herculean task to manually regulate HOV lanes and identify these violators. Therefore, an automated way of counting the number of people in a car is prudent for fair tolling and for violator detection.In this paper, we propose a Vehicle Passenger Detection System (VPDS) which works by capturing images through Near Infrared (NIR) cameras on the toll lanes and processing them using deep Convolutional Neural Networks (CNN) models. Our system has been deployed in 3 cities over a span of two years and has served roughly 30 million vehicles with an accuracy of 97% which is a remarkable improvement over manual review which is 37% accurate. Our system can generate an accurate report of HOV lane usage which helps policy makers pave the way towards de-congestion.
【Keywords】:
【Paper Link】 【Pages】:9504-9509
【Authors】: Ugur Kuter ; Brian Kettler ; Katherine Guo ; Martin O. Hofmann ; Valerie Champagne ; Kurt Lachevet ; Jennifer Lautenschlager ; Robert P. Goldman ; Luis Asencios ; Josh Hamell
【Abstract】: Degraded communications are expected in large-scale disaster response and military operations, which nevertheless require rapid, concerted actions by distributed decision makers, each with limited visibility into the changing situation and in charge of a limited set of resources. We describe LAPLATA, a novel architecture that addresses these challenges by separating mission planning from allocation/scheduling for scalability but at the cost of some negotiation. We describe formal algorithms that achieve near-optimal performance according to mission completion percentage and subject matter expert review: assumption-based planning and replanning, profileassisted cooperative allocation, and schedule negotiation. We validate our approach on a realistic problem specification and compare results against subject matter expert solutions.
【Keywords】:
【Paper Link】 【Pages】:9510-9515
【Authors】: Gilbert Lim ; Zhan Wei Lim ; Dejiang Xu ; Daniel Shu Wei Ting ; Tien Yin Wong ; Mong-Li Lee ; Wynne Hsu
【Abstract】: Ischemic stroke is a leading cause of death and long-term disability that is difficult to predict reliably. Retinal fundus photography has been proposed for stroke risk assessment, due to its non-invasiveness and the similarity between retinal and cerebral microcirculations, with past studies claiming a correlation between venular caliber and stroke risk. However, it may be that other retinal features are more appropriate. In this paper, extensive experiments with deep learning on six retinal datasets are described. Feature isolation involving segmented vascular tree images is applied to establish the effectiveness of vessel caliber and shape alone for stroke classification, and dataset ablation is applied to investigate model generalizability on unseen sources. The results suggest that vessel caliber and shape could be indicative of ischemic stroke, and sourcespecific features could influence model performance.
【Keywords】:
【Paper Link】 【Pages】:9516-9521
【Authors】: Zhan Wei Lim ; Mong-Li Lee ; Wynne Hsu ; Tien Yin Wong
【Abstract】: Though deep learning systems have achieved high accuracy in detecting diseases from medical images, few such systems have been deployed in highly automated disease screening settings due to lack of trust in how well these systems can generalize to out-of-datasets. We propose to use uncertainty estimates of the deep learning system’s prediction to know when to accept or to disregard its prediction. We evaluate the effectiveness of using such estimates in a real-life application for the screening of diabetic retinopathy. We also generate visual explanation of the deep learning system to convey the pixels in the image that influences its decision. Together, these reveal the deep learning system’s competency and limits to the human, and in turn the human can know when to trust the deep learning system.
【Keywords】:
【Paper Link】 【Pages】:9522-9527
【Authors】: Yu Lu ; Xi Zhang ; Xianghua Fu ; Fangxiong Chen ; Kelvin K. L. Wong
【Abstract】: Obstetric ultrasound examination of physiological parameters has been mainly used to estimate the fetal weight during pregnancy and baby weight before labour to monitor fetal growth and reduce prenatal morbidity and mortality. However, the problem is that ultrasound estimation of fetal weight is subject to populations’ difference, strict operating requirements for sonographers, and poor access to ultrasound in low-resource areas. Inaccurate estimations may lead to negative perinatal outcomes. We consider that machine learning can provide an accurate estimation for obstetricians alongside traditional clinical practices, as well as an efficient and effective support tool for pregnant women for self-monitoring. We present a robust methodology using a data set comprising 4,212 intrapartum recordings. The cubic spline function is used to fit the curves of several key characteristics that are extracted from ultrasound reports. A number of simple and powerful machine learning algorithms are trained, and their performance is evaluated with real test data. We also propose a novel evaluation performance index called the intersectionover-union (loU) for our study. The results are encouraging using an ensemble model consisting of Random Forest, XGBoost, and LightGBM algorithms. The experimental results show an loU of 0.64 between predicted range of fetal weight at any gestational age from the ensemble model and that from ultrasound. Comparing with the ultrasound method, the estimation accuracy is improved by 12%, and the mean relative error is reduced by 3%.
【Keywords】:
【Paper Link】 【Pages】:9528-9533
【Authors】: Neil Mallinar ; Abhishek Shah ; Rajendra Ugrani ; Ayush Gupta ; Manikandan Gurusankar ; Tin Kam Ho ; Q. Vera Liao ; Yunfeng Zhang ; Rachel K. E. Bellamy ; Robert Yates ; Chris Desmarais ; Blake McGregor
【Abstract】: Many conversational agents in the market today follow a standard bot development framework which requires training intent classifiers to recognize user input. The need to create a proper set of training examples is often the bottleneck in the development process. In many occasions agent developers have access to historical chat logs that can provide a good quantity as well as coverage of training examples. However, the cost of labeling them with tens to hundreds of intents often prohibits taking full advantage of these chat logs. In this paper, we present a framework called search, label, and propagate (SLP) for bootstrapping intents from existing chat logs using weak supervision. The framework reduces hours to days of labeling effort down to minutes of work by using a search engine to find examples, then relies on a data programming approach to automatically expand the labels. We report on a user study that shows positive user feedback for this new approach to build conversational agents, and demonstrates the effectiveness of using data programming for autolabeling. While the system is developed for training conversational agents, the framework has broader application in significantly reducing labeling effort for training text classifiers.
【Keywords】:
【Paper Link】 【Pages】:9534-9540
【Authors】: Stéphane Martin ; Boi Faltings ; Vincent Schickel
【Abstract】: We describe the selection, implementation and online evaluation of two e-commerce recommender systems developed with our partner company, Prediggo. The first one is based on the novel method of Bayesian Variable-order Markov Modeling (BVMM). The second, SSAGD, is a novel variant of the Matrix-Factorization technique (MF), which is considered state-of-the-art in the recommender literature.We discuss the offline tests we carried out to select the best MF variant, and present the results of two A/B tests performed on live ecommerce websites after the deployment of the new algorithms. Comparing the new recommenders and Prediggo’s proprietary algorithm of Ontology Filtering, we show that the BVMM significantly outperforms the two others in terms of CTR and prediction speed, and leads to a strong increase in recommendation-mediated sales. Although MF exhibits reasonably good accuracy, the BVMM is still significantly more accurate and avoids the high memory requirements of MF. This scalability is essential for its application in online businesses.
【Keywords】:
【Paper Link】 【Pages】:9541-9546
【Authors】: Nicholas McCarthy ; Mohammad Karzand ; Freddy Lécué
【Abstract】: Flight delays impact airlines, airports and passengers. Delay prediction is crucial during the decision-making process for all players in commercial aviation, and in particular for airlines to meet their on-time performance objectives. Although many machine learning approaches have been experimented with, they fail in (i) predicting delays in minutes with low errors (less than 15 minutes), (ii) being applied to small carriers i.e., low cost companies characterized by a small amount of data. This work presents a Long Short-Term Memory (LSTM) approach to predicting flight delay, modeled as a sequence of flights across multiple airports for a particular aircraft throughout the day. We then suggest a transfer learning approach between heterogeneous feature spaces to train a prediction model for a given smaller airline using the data from another larger airline. Our approach is demonstrated to be robust and accurate for low cost airlines in Europe.
【Keywords】:
【Paper Link】 【Pages】:9547-9551
【Authors】: Shekoofeh Mokhtari ; Ahmad Mahmoody ; Dragomir Yankov ; Ning Xie
【Abstract】: Map search is a major vertical in all popular search engines. It also plays an important role in personal assistants on mobile, home or desktop devices. A significant fraction of map search traffic is comprised of “address queries” - queries where either the entire query or some terms in it refer to an address or part of an address (road segment, intersection etc.). Here we demonstrate that correctly understanding and tagging address queries are critical for map search engines to fulfill them. We describe several recurrent sequence architectures for tagging such queries. We compare their performance on two subcategories of address queries - single entity (aka single point) addresses and multi entity (aka multi point) addresses, and finish by providing guidance on the best practices when dealing with each of these subcategories.
【Keywords】:
【Paper Link】 【Pages】:9552-9557
【Authors】: Srikanth Mujjiga ; Vamsi Krishna ; Kalyan Chakravarthi ; Vijayananda J
【Abstract】: Clinical documents are vital resources for radiologists when they have to consult or refer while studying similar cases. In large healthcare facilities where millions of reports are generated, searching for relevant documents is quite challenging. With abundant interchangeable words in clinical domain, understanding the semantics of the words in the clinical documents is vital to improve the search results. This paper details an end to end semantic search application to address the large scale information retrieval problem of clinical reports. The paper specifically focuses on the challenge of identifying semantics in the clinical reports to facilitate search at semantic level. The semantic search works by mapping the documents into the concept space and the search is performed in the concept space. A unique approach of framing the concept mapping problem as a language translation problem is proposed in this paper. The concept mapper is modelled using the Neural machine translation model (NMT) based on encoder-decoder with attention architecture. The regular expression based concept mapper takes approximately 3 seconds to extract UMLS concepts from a single document, where as the trained NMT does the same in approximately 30 milliseconds. NMT based model further enables incorporation of negation detection to identify whether a concept is negated or not, facilitating search for negated queries.
【Keywords】:
【Paper Link】 【Pages】:9558-9564
【Authors】: Hadi NekoeiQachkanloo ; Benyamin Ghojogh ; Ali Saheb Pasand ; Mark Crowley
【Abstract】: This paper proposes a novel trading system which plays the role of an artificial counselor for stock investment. In this paper, the stock future prices (technical features) are predicted using Support Vector Regression. Thereafter, the predicted prices are used to recommend which portions of the budget an investor should invest in different existing stocks to have an optimum expected profit considering their level of risk tolerance. Two different methods are used for suggesting best portions, which are Markowitz portfolio theory and fuzzy investment counselor. The first approach is an optimization-based method which considers merely technical features, while the second approach is based on Fuzzy Logic taking into account both technical and fundamental features of the stock market. The experimental results on New York Stock Exchange (NYSE) show the effectiveness of the proposed system.
【Keywords】:
【Paper Link】 【Pages】:9565-9572
【Authors】: Galia Nordon ; Gideon Koren ; Varda Shalev ; Eric Horvitz ; Kira Radinsky
【Abstract】: We present a system that jointly harnesses large-scale electronic health records data and a concept graph mined from the medical literature to guide drug repurposing—the process of applying known drugs in new ways to treat diseases. Our study is unique in methods and scope, per the scale of the concept graph and the quantity of data. We harness 10 years of nation-wide medical records of more than 1.5 million people and extract medical knowledge from all of PubMed, the world’s largest corpus of online biomedical literature. We employ links on the concept graph to provide causal signals to prioritize candidate influences between medications and target diseases. We show results of the system on studies of drug repurposing for hypertension and diabetes. In both cases, we present drug families identified by the algorithm which were previously unknown. We verify the results via clinical expert opinion and by prospective clinical trials on hypertension.
【Keywords】:
【Paper Link】 【Pages】:9573-9580
【Authors】: Maleeha Qazi ; Srinivas Tunuguntla ; Peng Lee ; Teja Kanchinadam ; Glenn Fung ; Neeraj Arora
【Abstract】: In the insurance industry, timely and effective interaction with customers are at the core of everyday operations and processes that are key for a satisfactory customer experience. These interactions often result in sequences of data derived from events that occur over time. Such recurrent patterns can provide valuable information that can be used in a variety of ways to improve customer related work-flows. In this paper we demonstrate the application of a recently proposed algorithm to uncover such time patterns that takes into account the time between events to form such patterns. We use temporal customer data generated from two different use-cases (satisfaction and fraud) to show that this algorithm successfully detects patterns that occur in the insurance context.
【Keywords】:
【Paper Link】 【Pages】:9581-9588
【Authors】: Xin Qiu ; Risto Miikkulainen
【Abstract】: Conversion rate optimization means designing web interfaces such that more visitors perform a desired action (such as register or purchase) on the site. One promising approach, implemented in Sentient Ascend, is to optimize the design using evolutionary algorithms, evaluating each candidate design online with actual visitors. Because such evaluations are costly and noisy, several challenges emerge: How can available visitor traffic be used most efficiently? How can good solutions be identified most reliably? How can a high conversion rate be maintained during optimization? This paper proposes a new technique to address these issues. Traffic is allocated to candidate solutions using a multi-armed bandit algorithm, using more traffic on those evaluations that are most useful. In a best-arm identification mode, the best candidate can be identified reliably at the end of evolution, and in a campaign mode, the overall conversion rate can be optimized throughout the entire evolution process. Multi-armed bandit algorithms thus improve performance and reliability of machine discovery in noisy real-world environments.
【Keywords】:
【Paper Link】 【Pages】:9589-9594
【Authors】: Sudeshna Roy ; Meghana Madhyastha ; Sheril Lawrence ; Vaibhav Rajan
【Abstract】: The Internet has rich and rapidly increasing sources of high quality educational content. Inferring prerequisite relations between educational concepts is required for modern large-scale online educational technology applications such as personalized recommendations and automatic curriculum creation. We present PREREQ, a new supervised learning method for inferring concept prerequisite relations. PREREQ is designed using latent representations of concepts obtained from the Pairwise Latent Dirichlet Allocation model, and a neural network based on the Siamese network architecture. PREREQ can learn unknown concept prerequisites from course prerequisites and labeled concept prerequisite data. It outperforms state-of-the-art approaches on benchmark datasets and can effectively learn from very less training data. PREREQ can also use unlabeled video playlists, a steadily growing source of training data, to learn concept prerequisites, thus obviating the need for manual annotation of course prerequisites.
【Keywords】:
【Paper Link】 【Pages】:9595-9600
【Authors】: Tárik S. Salem ; Karan Kathuria ; Heri Ramampiaro ; Helge Langseth
【Abstract】: Keeping the electricity production in balance with the actual demand is becoming a difficult and expensive task in spite of an involvement of experienced human operators. This is due to the increasing complexity of the electric power grid system with the intermittent renewable production as one of the contributors. A beforehand information about an occurring imbalance can help the transmission system operator to adjust the production plans, and thus ensure a high security of supply by reducing the use of costly balancing reserves, and consequently reduce undesirable fluctuations of the 50 Hz power system frequency. In this paper, we introduce the relatively new problem of an intra-hour imbalance forecasting for the transmission system operator (TSO). We focus on the use case of the Norwegian TSO, Statnett. We present a complementary imbalance forecasting tool that is able to support the TSO in determining the trend of future imbalances, and show the potential to proactively alleviate imbalances with a higher accuracy compared to the contemporary solution.
【Keywords】:
【Paper Link】 【Pages】:9601-9606
【Authors】: Athar Sefid ; Jian Wu ; Allen C. Ge ; Jing Zhao ; Lu Liu ; Cornelia Caragea ; Prasenjit Mitra ; C. Lee Giles
【Abstract】: Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision. The existing solution which is based on information retrieval and string similarity on titles works well only if the titles are cleaned. We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset. The blocking function uses the classic BM25 algorithm to find the matching candidates from the reference data that has been indexed by ElasticSearch. The core components use supervised methods which combine features extracted from all available metadata fields. The system also leverages available citation information to match entities. The combination of metadata and citation achieves high accuracy that significantly outperforms the baseline method on the same test dataset. We apply this system to match the database of CiteSeerX against Web of Science, PubMed, and DBLP. This method will be deployed in the CiteSeerX system to clean metadata and link records to other scholarly big datasets.
【Keywords】:
【Paper Link】 【Pages】:9607-9612
【Authors】: Lijing Wang ; Jiangzhuo Chen ; Madhav Marathe
【Abstract】: Influenza-like illness (ILI) is among the most common diseases worldwide. Producing timely, well-informed, and reliable forecasts for ILI is crucial for preparedness and optimal interventions. In this work, we focus on short-term but highresolution forecasting and propose DEFSI (Deep Learning Based Epidemic Forecasting with Synthetic Information), an epidemic forecasting framework that integrates the strengths of artificial neural networks and causal methods. In DEFSI, we build a two-branch neural network structure to take both within-season observations and between-season observations as features. The model is trained on geographically highresolution synthetic data. It enables detailed forecasting when high-resolution surveillance data is not available. Furthermore, the model is provided with better generalizability and physical consistency. Our method achieves comparable/better performance than state-of-the-art methods for short-term ILI forecasting at the state level. For high-resolution forecasting at the county level, DEFSI significantly outperforms the other methods.
【Keywords】:
【Paper Link】 【Pages】:9613-9618
【Authors】: Ming-Che Wu ; Mei-Chen Yeh
【Abstract】: A major problem in metropolitan areas is finding parking spaces. Existing parking guidance systems often adopt fixed sensors or cameras that cannot provide information from the driver’s point of view. Motivated by the advent of dashboard cameras (dashcams), we develop neural-network-based methods for detecting vacant parking spaces in videos recorded by a dashcam. Detecting vacant parking spaces in dashcam videos enables early detection of spaces. Different from conventional object detection methods, we leverage the monotonicity of the detection confidence with respect to the distance away of the approaching target parking space and propose a new loss function, which can not only yield improved detection results but also enable early detection. To evaluate our detection method, we create a new large dataset containing 5,800 dashcam videos captured from 22 indoor and outdoor parking lots. To the best of our knowledge, this is the first and largest driver’s view video dataset that supports parking space detection and provides parking space occupancy annotations.
【Keywords】:
【Paper Link】 【Pages】:9619-9625
【Authors】: Haoran Zhang ; Ahmed Magooda ; Diane J. Litman ; Richard Correnti ; Elaine Wang ; Lindsay Clare Matsumura ; Emily Howe ; Rafael Quintana
【Abstract】: Writing a good essay typically involves students revising an initial paper draft after receiving feedback. We present eRevise, a web-based writing and revising environment that uses natural language processing features generated for rubricbased essay scoring to trigger formative feedback messages regarding students’ use of evidence in response-to-text writing. By helping students understand the criteria for using text evidence during writing, eRevise empowers students to better revise their paper drafts. In a pilot deployment of eRevise in 7 classrooms spanning grades 5 and 6, the quality of text evidence usage in writing improved after students received formative feedback then engaged in paper revision.
【Keywords】:
【Paper Link】 【Pages】:9627-9634
【Authors】: Shuto Araki ; Juan Pablo Arenas Uribe ; Zach Wilkerson ; Steven Bogaerts ; Chad Byers
【Abstract】: Birds of a Feather is a single-player card game in which cards are arranged in a grid. The player attempts to combine stacks of cards under certain rules, with the goal being to combine all cards into a single stack. This paper highlights several approaches for efficiently classifying whether a randomlychosen state has a single-stack solution. These approaches use graph theory and machine learning concepts to prune a state’s search space, resulting in significant reductions in runtime relative to a baseline search.
【Keywords】:
【Paper Link】 【Pages】:9635-9643
【Authors】: Trevor Bihl ; Todd Jenkins ; Chadwick Cox ; Ashley DeMange ; Kerry Hill ; Edmund Zelnio
【Abstract】: As research and development (R&D) in autonomous systems progresses further, more interdisciplinary knowledge is needed from domains as diverse as artificial intelligence (AI), bi-ology, psychology, modeling and simulation (M&S), and robotics. Such R&D efforts are necessarily interdisciplinary in nature and require technical as well as further soft skills of teamwork, communication and integration. In this paper, we introduce a 14 week, summer long internship for developing these skills in undergraduate science and engineering interns through R&D. The internship was designed to be modular and divided into three parts: training, innovation, and application/integration. The end result of the internship was 1) the development of an M&S ecosystem for autonomy concepts, 2) development and robotics testing of reasoning methods through both Bayesian methods and cognitive models of the basal ganglia, and 3) a process for future internships within the modular construct. Through collaboration with full-time professional staff, who actively learned with the interns, this internship incorporates a feedback loop to educate and per-form fundamental R&D. Future iterations of this internship can leverage the M&S ecosystem and adapt the modular internship framework to focus on different innovations, learning paradigms, and/or applications.
【Keywords】:
【Paper Link】 【Pages】:9644-9647
【Authors】: Eric Eaton
【Abstract】: After years of taking a trial-and-error approach to managing a moderate-size academic research group, I settled on using a set of online tools and protocols that seem effective, require relatively little effort to use and maintain, and are inexpensive. This paper discusses this approach to communication, project management, document and code management, and logistics. It is my hope that other researchers, especially new faculty and research scientists, might find this set of tools and protocols useful when determining how to manage their own research group. This paper is targeted toward research groups based in mathematics and engineering, although faculty in other disciplines may find inspiration in some of these ideas.
【Keywords】:
【Paper Link】 【Pages】:9648-9655
【Authors】: Richard Hoshino ; Max Notarangelo
【Abstract】: In this paper, we analyze Birds of a Feather (BoaF), a perfectinformation one-player card game that is the subject of the 2019 EAAI Undergraduate Research Challenge. We prove that the generalized N × N BoaF game is NP-complete, and then explore the one million deals in the 4×4 BoaF testbed. We present several graph-theoretic algorithms to prove that 1880 of these million deals are unsolvable, and conclude the paper with two search algorithms that efficiently show that all of the remaining 998,120 deals are in fact solvable.
【Keywords】:
【Paper Link】 【Pages】:9656-9661
【Authors】: Bryon Kucharski ; Azad Deihim ; Mehmet Ergezer
【Abstract】: This research was conducted by an interdisciplinary team of two undergraduate students and a faculty to explore solutions to the Birds of a Feather (BoF) Research Challenge. BoF is a newly-designed perfect-information solitaire-type game. The focus of the study was to design and implement different algorithms and evaluate their effectiveness. The team compared the provided depth-first search (DFS) to heuristic algorithms such as Monte Carlo tree search (MCTS), as well as a novel heuristic search algorithm guided by machine learning. Since all of the studied algorithms converge to a solution from a solvable deal, effectiveness of each approach was measured by how quickly a solution was reached, and how many nodes were traversed until a solution was reached. The employed methods have a potential to provide artificial intelligence enthusiasts with a better understanding of BoF and novel ways to solve perfect-information games and puzzles in general. The results indicate that the proposed heuristic search algorithms guided by machine learning provide a significant improvement in terms of number of nodes traversed over the provided DFS algorithm.
【Keywords】:
【Paper Link】 【Pages】:9662-9669
【Authors】: Yaman Kumar ; Swati Aggarwal ; Debanjan Mahata ; Rajiv Ratn Shah ; Ponnurangam Kumaraguru ; Roger Zimmermann
【Abstract】: In the era of MOOCs, online exams are taken by millions of candidates, where scoring short answers is an integral part. It becomes intractable to evaluate them by human graders. Thus, a generic automated system capable of grading these responses should be designed and deployed. In this paper, we present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS). We propose and explain the design and development of a system for SAS, namely AutoSAS. Given a question along with its graded samples, AutoSAS can learn to grade that prompt successfully. This paper further lays down the features such as lexical diversity, Word2Vec, prompt, and content overlap that plays a pivotal role in building our proposed model. We also present a methodology for indicating the factors responsible for scoring an answer. The trained model is evaluated on an extensively used public dataset, namely Automated Student Assessment Prize Short Answer Scoring (ASAP-SAS). AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts as measured by Quadratic Weighted Kappa (QWK), showing performance comparable to humans.
【Keywords】:
【Paper Link】 【Pages】:9670-9677
【Authors】: Pat Langley
【Abstract】: Modern introductory courses on AI do not train students to create intelligent systems or provide broad coverage of this complex field. In this paper, we identify problems with common approaches to teaching artificial intelligence and suggest alternative principles that courses should adopt instead. We illustrate these principles in a proposed course that teaches students not only about component methods, such as pattern matching and decision making, but also about their combination into higher-level abilities for reasoning, sequential control, plan generation, and integrated intelligent agents. We also present a curriculum that instantiates this organization, including sample programming exercises and a project that requires system integration. Participants also gain experience building knowledge-based agents that use their software to produce intelligent behavior.
【Keywords】:
【Paper Link】 【Pages】:9678-9685
【Authors】: Weiming Lu ; Yangfan Zhou ; Jiale Yu ; Chenhao Jia
【Abstract】: Prerequisite relations among concepts are crucial for educational applications. However, it is difficult to automatically extract domain-specific concepts and learn the prerequisite relations among them without labeled data.In this paper, we first extract high-quality phrases from a set of educational data, and identify the domain-specific concepts by a graph based ranking method. Then, we propose an iterative prerequisite relation learning framework, called iPRL, which combines a learning based model and recovery based model to leverage both concept pair features and dependencies among learning materials. In experiments, we evaluated our approach on two real-world datasets Textbook Dataset and MOOC Dataset, and validated that our approach can achieve better performance than existing methods. Finally, we also illustrate some examples of our approach.
【Keywords】:
【Paper Link】 【Pages】:9686-9692
【Authors】: Todd W. Neller ; Connor Berson ; Jivan Kharel ; Ryan Smolik
【Abstract】: In this article, we describe the lessons learned in creating an efficient solver for the solitaire game Birds of a Feather. We introduce a new variant of depth-first search that we call best-n depth-first search that achieved a 99.56% reduction in search time over 100,000 puzzle seeds. We evaluate a number of potential node-ordering search features and pruning tests, perform an analysis of solvability prediction with such search features, and consider possible future research directions suggested by the most computationally expensive puzzle seeds encountered in our testing.
【Keywords】:
【Paper Link】 【Pages】:9693-9699
【Authors】: Todd W. Neller ; Daniel Ziegler
【Abstract】: In this article, we describe a computer-aided design process for generating high-quality Birds of a Feather solitaire card puzzles. In each iteration, we generate puzzles via combinatorial optimization of an objective function. After solving and subjectively rating such puzzles, we compute objective puzzle features and regress our ratings onto such features to provide insight for objective function improvements. Through this iterative improvement process, we demonstrate the importance of the halfway solvability ratio in quality puzzle design. We relate our observations to recent work on tension in puzzle design, and suggest next steps for more efficient puzzle generation.
【Keywords】:
【Paper Link】 【Pages】:9700-9705
【Authors】: Christian Roberson ; Katarina Sperduto
【Abstract】: Artificial intelligence in games serves as an excellent platform for facilitating collaborative research with undergraduates. This paper explores several aspects of a research challenge proposed for a newly-developed variant of a solitaire game. We present multiple classes of game states that can be identified as solvable or unsolvable. We present a heuristic for quickly finding goal states in a game state search tree. Finally, we introduce a Monte Carlo Tree Search-based player for the solitaire variant that can win almost any solvable starting deal efficiently.
【Keywords】:
【Paper Link】 【Pages】:9706-9712
【Authors】: Benjamin Sang ; Sejong Yoon
【Abstract】: Birds of a Feather is a single player, perfect information card game. The game can have multiple board sizes with larger boards introducing larger search spaces that grow exponentially. In this paper, we investigate the solvability of the game, aiming at building a machine learning method to automatically classify whether a given board state has a solution path or not. We propose a method based on image-based features of the board state and deep neural network. Experimental results show that the proposed method can make reasonable predictions of the solvability of a game at an arbitrary stage of the game.
【Keywords】:
【Paper Link】 【Pages】:9713-9720
【Authors】: Anjali Singh ; Ruhi Sharma Mittal ; Shubham Atreja ; Mourvi Sharma ; Seema Nagar ; Prasenjit Dey ; Mohit Jain
【Abstract】: Images are an essential tool for communicating with children, particularly at younger ages when they are still developing their emergent literacy skills. Hence, assessments that use images to assess their conceptual knowledge and visual literacy, are an important component of their learning process. Creating assessments at scale is a challenging task, which has led to several techniques being proposed for automatic generation of textual assessments. However, none of them focuses on generating image-based assessments. To understand the manual process of creating visual assessments, we interviewed primary school teachers. Based on the findings from the preliminary study, we present a novel approach which uses image semantics to generate visual multiple choice questions (VMCQs) for young learners, wherein options are presented in the form of images. We propose a metric to measure the semantic similarity between two images, which we use to identify the four options – one answer and three distractor images – for a given question. We also use this metric for generating VMCQs at two difficulty levels – easy and hard. Through a quantitative evaluation, we show that the system-generated VMCQs are comparable to VMCQs created by experts, hence establishing the effectiveness of our approach.
【Keywords】:
【Paper Link】 【Pages】:9721-9728
【Authors】: Abhijit Suresh ; Tamara Sumner ; Jennifer Jacobs ; Bill Foland ; Wayne Ward
【Abstract】: Our work builds on advances in deep learning for natural language processing to automatically analyze transcribed classroom discourse and reliably generate information about teachers’ uses of specific discursive strategies called ”talk moves.” Talk moves can be used by both teachers and learners to construct conversations in which students share their thinking, actively consider the ideas of others, and engage in sustained reasoning. Currently, providing teachers with detailed feedback about the talk moves in their lessons requires highly trained observers to hand code transcripts of classroom recordings and analyze talk moves and/or one-on-one expert coaching, a time-consuming and expensive process that is unlikely to scale. We created a bidirectional long short-term memory (bi-LSTM) network that can automate the annotation process. We have demonstrated the feasibility of this deep learning approach to reliably identify a set of teacher talk moves at the sentence level with an F1 measure of 65%.
【Keywords】:
【Paper Link】 【Pages】:9729-9736
【Authors】: Randi Williams ; Hae Won Park ; Lauren Oh ; Cynthia Breazeal
【Abstract】: PopBots is a hands-on toolkit and curriculum designed to help young children learn about artificial intelligence (AI) by building, programming, training, and interacting with a social robot. Today’s children encounter AI in the forms of smart toys and computationally curated educational and entertainment content. However, children have not yet been empowered to understand or create with this technology. Existing computational thinking platforms have made ideas like sequencing and conditionals accessible to young learners. Going beyond this, we seek to make AI concepts accessible. We designed PopBots to address the specific learning needs of children ages four to seven by adapting constructionist ideas into an AI curriculum. This paper describes how we designed the curriculum and evaluated its effectiveness with 80 Pre-K and Kindergarten children. We found that the use of a social robot as a learning companion and programmable artifact was effective in helping young children grasp AI concepts. We also identified teaching approaches that had the greatest impact on student’s learning. Based on these, we make recommendations for future modules and iterations for the PopBots platform.
【Keywords】:
【Paper Link】 【Pages】:9737-9744
【Authors】: Yuanlin Zhang ; Jianlan Wang ; Fox Bolduc ; William G. Murray ; Wendy Staffen
【Abstract】: This paper presents a framework to integrate Science and Computing teaching using Logic Programming. We developed two modules: one for chemistry and the other for chemistry and physics. They are implemented in an elective course for 8th graders. Through clinical interviews, video taped class observations, exit interviews and our own experiences with the class, Logic Programming based approach is accessible to the students.
【Keywords】:
【Paper Link】 【Pages】:9746-9747
【Authors】: Andrea Danyluk ; Scott Buck
【Abstract】: In August 2017, the ACM Education Council initiated a task force to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force is seeking to define what the computing contributions are to this new field, in order to provide guidance for computer science or similar departments offering data science programs of study at the undergraduate level. The ACM Data Science Task Force has completed the initial draft of a curricular report. The computing-knowledge areas identified in the report are drawn from across computing disciplines and include several sub-areas of AI. This short paper describes the overall project, highlights AI-relevant areas, and seeks to open a dialog about the AI competencies that are to be considered central to a data science undergraduate curriculum.
【Keywords】:
【Paper Link】 【Pages】:9748-9749
【Authors】: Richard Hoshino ; Maximilian Kahn
【Abstract】: In this paper, we analyze Birds of a Feather (BoaF), a solitaire game played with 16 cards. While the large majority of deals are solvable, the set of unsolvable deals share certain characteristics that can be determined from the adjacency matrix of the corresponding “compatibility graph”. We create a binary decision tree based on just three variables to predict whether a given deal is solvable. Our predictive model, tested on 30,000 random deals, correctly classifies over 99.9% of our data.
【Keywords】:
【Paper Link】 【Pages】:9751-9753
【Authors】: Todd W. Neller ; Raja Sooriamurthi ; Michael Guerzhoy ; Lisa Zhang ; Paul Talaga ; Christopher Archibald ; Adam Summerville ; Joseph C. Osborn ; Cinjon Resnick ; Avital Oliver ; Surya Bhupatiraju ; Kumar Krishna Agrawal ; Nate Derbinsky ; Elena Strange ; Marion Neumann ; Jonathan Chen ; Zac Christensen ; Michael Wollowski ; Oscar Youngquist
【Abstract】: The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning experience, we here present abstracts of ten AI assignments from the 2019 session that are easily adoptable, playfully engaging, and flexible for a variety of instructor needs. Assignment specifications and supporting resources may be found at http: //modelai.gettysburg.edu.
【Keywords】:
【Paper Link】 【Pages】:9755-9759
【Authors】: Vincent Conitzer
【Abstract】: Research in artificial intelligence, as well as in economics and other related fields, generally proceeds from the premise that each agent has a well-defined identity, well-defined preferences over outcomes, and well-defined beliefs about the world. However, as we design AI systems, we in fact need to specify where the boundaries between one agent and another in the system lie, what objective functions these agents aim to maximize, and to some extent even what belief formation processes they use. The premise of this paper is that as AI is being broadly deployed in the world, we need well-founded theories of, and methodologies and algorithms for, how to design preferences, identities, and beliefs. This paper lays out an approach to address these problems from a rigorous foundation in decision theory, game theory, social choice theory, and the algorithmic and computational aspects of these fields.
【Keywords】:
【Paper Link】 【Pages】:9760-9764
【Authors】: Ian Davidson ; Peter B. Walker
【Abstract】: Most applications of machine intelligence have focused on demonstrating crystallized intelligence. Crystallized intelligence relies on accessing problem-specific knowledge, skills and experience stored in long term memory. In this paper, we challenge the AI community to design AIs to completely take tests of fluid intelligence which assess the ability to solve novel problems using problem-independent solving skills. Tests of fluid intelligence such as the NNAT are used extensively by schools to determine entry into gifted education programs. We explain the differences between crystallized and fluid intelligence, the importance and capabilities of machines demonstrating fluid intelligence and pose several challenges to the AI community, including that a machine taking such a test would be considered gifted by school districts in the state of California. Importantly, we show existing work on seemingly related fields such as transfer, zero-shot, life-long and meta learning (in their current form) are not directly capable of demonstrating fluid intelligence but instead are task-transductive mechanisms.
【Keywords】:
【Paper Link】 【Pages】:9765-9769
【Authors】: Eugene C. Freuder
【Abstract】: As AI becomes more ubiquitous there is increasing interest in computers being able to provide explanations for their conclusions. This paper proposes exploring the relationship between the structure of a problem and its explanation. The nature of this challenge is introduced through a series of simple constraint satisfaction problems.
【Keywords】:
【Paper Link】 【Pages】:9770-9774
【Authors】: David Harel ; Assaf Marron ; Ariel Rosenfeld ; Moshe Y. Vardi ; Gera Weiss
【Abstract】: Artificial intelligence (AI) techniques, including, e.g., machine learning, multi-agent collaboration, planning, and heuristic search, are emerging as ever-stronger tools for solving hard problems in real-world applications. Executable specification techniques (ES), including, e.g., Statecharts and scenario-based programming, is a promising development approach, offering intuitiveness, ease of enhancement, compositionality, and amenability to formal analysis. We propose an approach for integrating AI and ES techniques in developing complex intelligent systems, which can greatly simplify agile/spiral development and maintenance processes. The approach calls for automated detection of whether certain goals and sub-goals are met; a clear division between sub-goals solved with AI and those solved with ES; compositional and incremental addition of AI-based or ES-based components, each focusing on a particular gap between a current capability and a well-stated goal; and, iterative refinement of sub-goals solved with AI into smaller sub-sub-goals where some are solved with ES, and some with AI. We describe the principles of the approach and its advantages, as well as key challenges and suggestions for how to tackle them.
【Keywords】:
【Paper Link】 【Pages】:9775-9779
【Authors】: Pat Langley
【Abstract】: In this paper, we pose a new challenge for AI researchers – to develop intelligent systems that support justified agency. We illustrate this ability with examples and relate it to two more basic topics that are receiving increased attention – agents that explain their decisions and ones that follow societal norms. In each case, we describe the target abilities, consider design alternatives, note some open questions, and review prior research. After this, we return to justified agency, offering a hypothesis about its relation to explanatory and normative behavior. We conclude by proposing testbeds and experiments to evaluate this empirical claim and encouraging other researchers to contribute to this crucial area.
【Keywords】:
【Paper Link】 【Pages】:9780-9784
【Authors】: Dino Pedreschi ; Fosca Giannotti ; Riccardo Guidotti ; Anna Monreale ; Salvatore Ruggieri ; Franco Turini
【Abstract】: Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-toglobal framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box.
【Keywords】:
【Paper Link】 【Pages】:9785-9789
【Authors】: Francesca Rossi ; Nicholas Mattei
【Abstract】: The more AI agents are deployed in scenarios with possibly unexpected situations, the more they need to be flexible, adaptive, and creative in achieving the goal we have given them. Thus, a certain level of freedom to choose the best path to the goal is inherent in making AI robust and flexible enough. At the same time, however, the pervasive deployment of AI in our life, whether AI is autonomous or collaborating with humans, raises several ethical challenges. AI agents should be aware and follow appropriate ethical principles and should thus exhibit properties such as fairness or other virtues. These ethical principles should define the boundaries of AI’s freedom and creativity. However, it is still a challenge to understand how to specify and reason with ethical boundaries in AI agents and how to combine them appropriately with subjective preferences and goal specifications. Some initial attempts employ either a data-driven examplebased approach for both, or a symbolic rule-based approach for both. We envision a modular approach where any AI technique can be used for any of these essential ingredients in decision making or decision support systems, paired with a contextual approach to define their combination and relative weight. In a world where neither humans nor AI systems work in isolation, but are tightly interconnected, e.g., the Internet of Things, we also envision a compositional approach to building ethically bounded AI, where the ethical properties of each component can be fruitfully exploited to derive those of the overall system. In this paper we define and motivate the notion of ethically-bounded AI, we describe two concrete examples, and we outline some outstanding challenges.
【Keywords】:
【Paper Link】 【Pages】:9790-9794
【Authors】: Barry Smyth
【Abstract】: We propose endurance sports as a rich and novel domain for recommender systems and machine learning research. As sports like marathon running, triathlons, and mountain biking become more and more popular among recreational athletes, there exists a growing opportunity to develop solutions to a number of interesting prediction, classification, and recommendation challenges, to better support the complex training and competition needs of athletes. Such solutions have the potential to improve the health and well-being of large populations of users, by promoting and optimising exercise as part of a productive and healthy lifestyle.
【Keywords】:
【Paper Link】 【Pages】:9795-9799
【Authors】: David S. Touretzky ; Christina Gardner-McCune ; Fred Martin ; Deborah W. Seehorn
【Abstract】: The ubiquity of AI in society means the time is ripe to consider what educated 21st century digital citizens should know about this subject. In May 2018, the Association for the Advancement of Artificial Intelligence (AAAI) and the Computer Science Teachers Association (CSTA) formed a joint working group to develop national guidelines for teaching AI to K-12 students. Inspired by CSTA's national standards for K-12 computing education, the AI for K-12 guidelines will define what students in each grade band should know about artificial intelligence, machine learning, and robotics. The AI for K-12 working group is also creating an online resource directory where teachers can find AI- related videos, demos, software, and activity descriptions they can incorporate into their lesson plans. This blue sky talk invites the AI research community to reflect on the big ideas in AI that every K-12 student should know, and how we should communicate with the public about advances in AI and their future impact on society. It is a call to action for more AI researchers to become AI educators, creating resources that help teachers and students understand our work.
【Keywords】:
【Paper Link】 【Pages】:9801-9807
【Authors】: Terrance E. Boult ; Steve Cruz ; Akshay Raj Dhamija ; Manuel Günther ; James Henrydoss ; Walter J. Scheirer
【Abstract】: As science attempts to close the gap between man and machine by building systems capable of learning, we must embrace the importance of the unknown. The ability to differentiate between known and unknown can be considered a critical element of any intelligent self-learning system. The ability to reject uncertain inputs has a very long history in machine learning, as does including a background or garbage class to account for inputs that are not of interest. This paper explains why neither of these is genuinely sufficient for handling unknown inputs – uncertain is not unknown, and unknowns need not appear to be uncertain to a learning system. The past decade has seen the formalization and development of many open set algorithms, which provably bound the risk from unknown classes. We summarize the state of the art, core ideas, and results and explain why, despite the efforts to date, the current techniques are genuinely insufficient for handling unknown inputs, especially for deep networks.
【Keywords】:
【Paper Link】 【Pages】:9808-9814
【Authors】: Peter Flach
【Abstract】: This paper gives an overview of some ways in which our understanding of performance evaluation measures for machine-learned classifiers has improved over the last twenty years. I also highlight a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This suggests that in order to make further progress we need to develop a proper measurement theory of machine learning. I then demonstrate by example what such a measurement theory might look like and what kinds of new results it would entail. Finally, I argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models and causal inference.
【Keywords】:
【Paper Link】 【Pages】:9815-9822
【Authors】: Hui Lin ; Vincent Ng
【Abstract】: The focus of automatic text summarization research has exhibited a gradual shift from extractive methods to abstractive methods in recent years, owing in part to advances in neural methods. Originally developed for machine translation, neural methods provide a viable framework for obtaining an abstract representation of the meaning of an input text and generating informative, fluent, and human-like summaries. This paper surveys existing approaches to abstractive summarization, focusing on the recently developed neural approaches.
【Keywords】:
【Paper Link】 【Pages】:9823-9829
【Authors】: Héctor Muñoz-Avila ; Dustin Dannenhauer ; Noah Reifsnyder
【Abstract】: In part motivated by topics such as agency safety, there is an increasing interest in goal reasoning, a form of agency where the agents formulate their own goals. One of the crucial aspects of goal reasoning agents is their ability to detect if the execution of their courses of actions meet their own expectations. We present a taxonomy of different forms of expectations as used by goal reasoning agents when monitoring their own execution. We summarize and contrast the current understanding of how to define and check expectations based on different knowledge sources used. We also identify gaps in our understanding of expectations.
【Keywords】:
【Paper Link】 【Pages】:9830-9836
【Authors】: Jörg Rothe
【Abstract】: Borda Count is one of the earliest and most important voting rules. Going far beyond voting, we summarize recent advances related to Borda in computational social choice and, more generally, in collective decision making. We first present a variety of well known attacks modeling strategic behavior in voting—including manipulation, control, and bribery—and discuss how resistant Borda is to them in terms of computational complexity. We then describe how Borda can be used to maximize social welfare when indivisible goods are to be allocated to agents with ordinal preferences. Finally, we illustrate the use of Borda in forming coalitions of players in a certain type of hedonic game. All these approaches are central to applications in artificial intelligence.
【Keywords】:
【Paper Link】 【Pages】:9837-9843
【Authors】: Victor S. Sheng ; Jing Zhang
【Abstract】: With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of training sets for prediction model learning. However, the labels obtained from crowdsourcing are often imperfect, which brings great challenges in model learning. Since 2008, the machine learning community has noticed the great opportunities brought by crowdsourcing and has developed a large number of techniques to deal with inaccuracy, randomness, and uncertainty issues when learning with crowdsourcing. This paper summarizes the technical progress in this field during past eleven years. We focus on two fundamental issues: the data (label) quality and the prediction model quality. For data quality, we summarize ground truth inference methods and some machine learning based methods to further improve data quality. For the prediction model quality, we summarize several learning paradigms developed under the crowdsourcing scenario. Finally, we further discuss several promising future research directions to attract researchers to make contributions in crowdsourcing.
【Keywords】:
【Paper Link】 【Pages】:9845-9846
【Authors】: David Allen ; Rahul R. Divekar ; Jaimie Drozdal ; Lilit Balagyozyan ; Shuyue Zheng ; Ziyi Song ; Huang Zou ; Jeramey Tyler ; Xiangyang Mou ; Rui Zhao ; Helen Zhou ; Jianling Yue ; Jeffrey O. Kephart ; Hui Su
【Abstract】: The Rensselaer Mandarin Project enables a group of foreign language students to improve functional understanding, pronunciation and vocabulary in Mandarin Chinese through authentic speaking situations in a virtual visit to China. Students use speech, gestures, and combinations thereof to navigate an immersive, mixed reality, stylized realism game experience through interaction with AI agents, immersive technologies, and game mechanics. The environment was developed in a black box theater equipped with a human-scale 360◦ panoramic screen (140h, 200r), arrays of markerless motion tracking sensors, and speakers for spatial audio.
【Keywords】:
【Paper Link】 【Pages】:9847-9848
【Authors】: Alberto Barrón-Cedeño ; Giovanni Da San Martino ; Israa Jaradat ; Preslav Nakov
【Abstract】: We present proppy, the first publicly available real-world, real-time propaganda detection system for online news, which aims at raising awareness, thus potentially limiting the impact of propaganda and helping fight disinformation. The system constantly monitors a number of news sources, deduplicates and clusters the news into events, and organizes the articles about an event on the basis of the likelihood that they contain propagandistic content. The system is trained on known propaganda sources using a variety of stylistic features. The evaluation results on a standard dataset show stateof-the-art results for propaganda detection.
【Keywords】:
【Paper Link】 【Pages】:9849-9850
【Authors】: Tathagata Chakraborti ; Christian Muise ; Shubham Agarwal ; Luis A. Lastras
【Abstract】: The state of the art in automated conversational agents for enterprise (e.g. for customer support) require a lengthy design process with experts in the loop who have to figure out and specify complex conversation patterns. This demonstration looks at a prototype interface that aims to bring down the expertise required to design such agents as well as the time taken to do so. Specifically, we will focus on how a metawriter can assist the domain-writer during the design process and how complex conversation patterns can be derived from simplifying abstractions at the interface level.
【Keywords】:
【Paper Link】 【Pages】:9851-9852
【Authors】: Fahim Dalvi ; Avery Nortonsmith ; Anthony Bau ; Yonatan Belinkov ; Hassan Sajjad ; Nadir Durrani ; James R. Glass
【Abstract】: We present a toolkit to facilitate the interpretation and understanding of neural network models. The toolkit provides several methods to identify salient neurons with respect to the model itself or an external task. A user can visualize selected neurons, ablate them to measure their effect on the model accuracy, and manipulate them to control the behavior of the model at the test time. Such an analysis has a potential to serve as a springboard in various research directions, such as understanding the model, better architectural choices, model distillation and controlling data biases. The toolkit is available for download.1
【Keywords】:
【Paper Link】 【Pages】:9853-9854
【Authors】: Chi-Han Du ; Yi-Shyuan Chiang ; Kun-Che Tsai ; Liang-Chih Liu ; Ming-Feng Tsai ; Chuan-Ju Wang
【Abstract】: We present FRIDAYS, a financial risk information detecting and analyzing system that enables financial professionals to efficiently comprehend financial reports in terms of risk and domain-specific sentiment cues. Our system is designed to integrate multiple NLP models trained on financial reports but on different levels (i.e., word, multi-word, and sentence levels) and to illustrate the prediction results generated by the models. The system is available online at https://cfda.csie.org/FRIDAYS/.
【Keywords】:
【Paper Link】 【Pages】:9855-9856
【Authors】: Yining Hong ; Jialu Wang ; Yuting Jia ; Weinan Zhang ; Xinbing Wang
【Abstract】: We present Academic Reader, a system which can read academic literatures and answer the relevant questions for researchers. Academic Reader leverages machine reading comprehension technique, which has been successfully applied in many fields but has not been involved in academic literature reading. An interactive platform is established to demonstrate the functions of Academic Reader. Pieces of academic literature and relevant questions are input to our system, which then outputs answers. The system can also gather users’ revised answers and perform active learning to continuously improve its performance. A case study is provided presenting the performance of our system on all papers accepted in KDD 2018, which demonstrates how our system facilitates massive academic literature reading.
【Keywords】:
【Paper Link】 【Pages】:9857-9858
【Authors】: Zhuoxuan Jiang ; Jie Ma ; Jingyi Lu ; GuangYuan Yu ; Yipeng Yu ; Shaochun Li
【Abstract】: We propose a general framework for goal-driven conversation assistant based on Planning methods. It aims to rapidly build a dialogue agent with less handcrafting and make the more interpretable and efficient dialogue management in various scenarios. By employing the Planning method, dialogue actions can be efficiently defined and reusable, and the transition of the dialogue are managed by a Planner. The proposed framework consists of a pipeline of Natural Language Understanding (intent labeler), Planning of Actions (with a World Model), and Natural Language Generation (learned by an attention-based neural network). We demonstrate our approach by creating conversational agents for several independent domains.
【Keywords】:
【Paper Link】 【Pages】:9859-9860
【Authors】: Yuta Kobayashi ; Hiroyuki Shindo ; Yuji Matsumoto
【Abstract】: We present a browser-based scientific article search system with graphical visualization. This system is based on triples of distributed representations of articles, each triple representing a scientific discourse facet (Objective, Method, or Result) using both text and citation information. Because each facet of an article is encoded as a separate vector, the similarity between articles can be measured by considering the articles not only in their entirety but also on a facet-by-facet basis. Our system provides three search options: a similarity ranking search, a citation graph with facet-labeled edges, and a scatter plot visualization with facets as the axes.
【Keywords】:
【Paper Link】 【Pages】:9861-9862
【Authors】: Gyeongbok Lee ; Sungdong Kim ; Seung-won Hwang
【Abstract】: Question answering (QA) extracting answers from text to the given question in natural language, has been actively studied and existing models have shown a promise of outperforming human performance when trained and evaluated with SQuAD dataset. However, such performance may not be replicated in the actual setting, for which we need to diagnose the cause, which is non-trivial due to the complexity of model. We thus propose a web-based UI that provides how each model contributes to QA performances, by integrating visualization and analysis tools for model explanation. We expect this framework can help QA model researchers to refine and improve their models.
【Keywords】:
【Paper Link】 【Pages】:9863-9864
【Authors】: Daniel Leidner ; Peter Schmaus ; Florian Schmidt ; Benedikt Pleintinger ; Ralph Bayer ; Adrian S. Bauer ; Thomas Krüger ; Neal Y. Lii
【Abstract】: Intelligent robotic coworkers are considered a valuable addition in many application areas. This applies not only to terrestrial domains, but also to the exploration of our solar system. As humankind moves toward an ever increasing presence in space, infrastructure has to be constructed and maintained on distant planets such as Mars. AI-enabled robots will play a major role in this scenario. The space agencies envisage robotic co-workers to be deployed to set-up habitats, energy, and return vessels for future human scientists. By leveraging AI planning methods, this vision has already become one step closer to reality. In the METERON SUPVIS Justin experiment, the intelligent robotic coworker Rollin’ Justin was controlled from Astronauts aboard the International Space Station (ISS) in order to maintain a Martian mock-up solar panel farm located on Earth to demonstrate the technology readiness of the developed methods. For this work, the system is demonstrated at AAAI 2019, controlling Rollin’ Justin located in Munich, Germany from Honolulu, Hawaii.
【Keywords】:
【Paper Link】 【Pages】:9865-9866
【Authors】: Simone Mellace ; Jérôme Guzzi ; Alessandro Giusti ; Luca Maria Gambardella
【Abstract】: We showcase a model to generate a soundscape from a camera stream in real time. The approach relies on a training video with an associated meaningful audio track; a granular synthesizer generates a novel sound by randomly sampling and mixing audio data from such video, favoring timestamps whose frame is similar to the current camera frame; the semantic similarity between frames is computed by a pretrained neural network. The demo is interactive: a user points a mobile phone to different objects and hears how the generated sound changes.
【Keywords】:
【Paper Link】 【Pages】:9867-9868
【Authors】: Mirko Nava ; Jérôme Guzzi ; R. Omar Chavez-Garcia ; Luca Maria Gambardella ; Alessandro Giusti
【Abstract】: We demonstrate a self-supervised approach which learns to detect long-range obstacles from video: it automatically obtains training labels by associating the camera frames acquired at a given pose to short-range sensor readings acquired at a different pose.
【Keywords】:
【Paper Link】 【Pages】:9869-9870
【Authors】: Sadegh Nobari
【Abstract】: We introduce Dynamic Bandit Algorithm (DBA), a practical solution to improve the shortcoming of the pervasively employed reinforcement learning algorithm called Multi-Arm Bandit, aka Bandit. Bandit makes real-time decisions based on the prior observations. However, Bandit is heavily biased to the priors that it cannot quickly adapt itself to a trend that is interchanging. As a result, Bandit cannot, quickly enough, make profitable decisions when the trend is changing. Unlike Bandit, DBA focuses on quickly adapting itself to detect these trends early enough. Furthermore, DBA remains as almost as light as Bandit in terms of computations. Therefore, DBA can be easily deployed in production as a light process similar to The Bandit. We demonstrate how critical and beneficial is the main focus of DBA, i.e. the ability to quickly finding the most profitable option in real-time, over its stateof-the-art competitors. Our experiments are augmented with a visualization mechanism that explains the profitability of the decisions made by each algorithm in each step by animations. Finally we observe that DBA can substantially outperform the original Bandit by close to 3 times for a set Key Performance Indicator (KPI) in a case of having 3 arms.
【Keywords】:
【Paper Link】 【Pages】:9871-9872
【Authors】: Daniel Rotman ; Dror Porat ; Yevgeny Burshtein ; Udi Barzelay
【Abstract】: With the increasing popularity of video content, automatic video understanding is becoming more and more important for streamlining video content consumption and reuse. In this work, we present TVAN—temporal video analyzer—a system for temporal video analysis aimed at enabling efficient and robust video description and search. Its main components include: temporal video segmentation, compact scene representation for efficient visual recognition, and concise scene description generation. We provide a technical overview of the system, as well as demonstrate its usefulness for the task of video search and navigation.
【Keywords】:
【Paper Link】 【Pages】:9873-9874
【Authors】: Yu Zhang ; Morteza Saberi ; Min Wang ; Elizabeth Chang
【Abstract】: As the volume of scientific papers grows rapidly in size, knowledge management for scientific publications is greatly needed. Information extraction and knowledge fusion techniques have been proposed to obtain information from scholarly publications and build knowledge repositories. However, retrieving the knowledge of problem/solution from academic papers to support users on solving specific research problems is rarely seen in the state of the art. Therefore, to remedy this gap, a knowledge-driven solution support system (K3S) is proposed in this paper to extract the information of research problems and proposed solutions from academic papers, and integrate them into knowledge maps. With the bibliometric information of the papers, K3S is capable of providing recommended solutions for any extracted problems. The subject of intrusion detection is chosen for demonstration in which required information is extracted with high accuracy, a knowledge map is constructed properly, and solutions to address intrusion problems are recommended.
【Keywords】:
【Paper Link】 【Pages】:9876-9877
【Authors】: David Abel
【Abstract】: Reinforcement learning presents a challenging problem: agents must generalize experiences, efficiently explore the world, and learn from feedback that is delayed and often sparse, all while making use of a limited computational budget. Abstraction is essential to all of these endeavors. Through abstraction, agents can form concise models of both their surroundings and behavior, supporting effective decision making in diverse and complex environments. To this end, the goal of my doctoral research is to characterize the role abstraction plays in reinforcement learning, with a focus on state abstraction. I offer three desiderata articulating what it means for a state abstraction to be useful, and introduce classes of state abstractions that provide a partial path toward satisfying these desiderata. Collectively, I develop theory for state abstractions that can 1) preserve near-optimal behavior, 2) be learned and computed efficiently, and 3) can lower the time or data needed to make effective decisions. I close by discussing extensions of these results to an information theoretic paradigm of abstraction, and an extension to hierarchical abstraction that enjoys the same desirable properties.
【Keywords】:
【Paper Link】 【Pages】:9878-9879
【Authors】: Nikhil Bhargava
【Abstract】: Multi-agent coordination is not a simple problem. While significant research has gone into computing plans efficiently and managing competing preferences, the execution of multiagent plans can still fail even when the plan space is small and agent goals are universally aligned. The reason for this difficulty is that in order to guarantee successful execution of a plan, effective multi-agent coordination requires communication to ensure that all actors have accurate beliefs about the state of the world. My thesis will focus on the problem of characterizing, modeling, and providing efficient algorithms for addressing planning and execution when there agents cannot maintain perfect communication.
【Keywords】:
【Paper Link】 【Pages】:9880-9881
【Authors】: Christopher K. Fourie
【Abstract】: This thesis work intends to explore the development of a shared mental model between an autonomous agent and a human, where we aim to promote fluency in continuing interactions defined by repetitive tasks. That is, with repetitive actions, experimentation and increasing iterations, we wish the robot to learn how its own behavior affects that of its partner. To accomplish this, we propose a model that encodes both human and robot actions in a probabilistic space describing the temporal transition points between activities. The purpose of such a model is not only in passive predictive power (understanding the future actions of an associate), but also to encode the latent effect of a robot’s action on the future actions of the associate.
【Keywords】:
【Paper Link】 【Pages】:9882-9883
【Authors】: Rick Goldstein
【Abstract】: Traffic congestion is a widespread annoyance throughout global metropolitan areas. It causes increases in travel time, increases in emissions, inefficient usage of gasoline, and driver frustration. Inefficient signal patterns at traffic lights are one major cause of such congestion. Intersection scheduling strategies that make real-time decisions to extend or end a green signal based on real-time traffic data offer one opportunity reduce congestion and its negative impacts. My research proposes Expressive Real-time Intersection Scheduling (ERIS). ERIS is a decentralized, schedule-driven control method which makes a decision every second based on current traffic conditions to reduce congestion.
【Keywords】:
【Paper Link】 【Pages】:9884-9885
【Authors】: Ana Valeria González-Garduño
【Abstract】: In this thesis, I focus on language independent methods of improving utterance understanding and response generation and attempt to tackle some of the issues surrounding current systems. The aim is to create a unified approach to dialogue generation inspired by developments in both goal oriented and open ended dialogue systems. The main contributions in this thesis are: 1) Introducing hybrid approaches to dialogue generation using retrieval and encoder-decoder architectures to produce fluent but precise utterances in dialogues, 2) Proposing supervised, semi-supervised and Reinforcement Learning methods for domain adaptation in goal oriented dialogue and 3) Introducing models that can adapt cross lingually.
【Keywords】:
【Paper Link】 【Pages】:9886-9887
【Authors】: Negar Hassanpour
【Abstract】: To identify the appropriate action to take, an intelligent agent must infer the causal effects of every possible action choices. A prominent example is precision medicine that attempts to identify which medical procedure will benefit each individual patient the most. This requires answering counterfactual questions such as: ""Would this patient have lived longer, had she received an alternative treatment?"". In my PhD, I attempt to explore ways to address the challenges associated with causal effect estimation; with a focus on devising methods that enhance performance according to the individual-based measures (as opposed to population-based measures).
【Keywords】:
【Paper Link】 【Pages】:9888-9889
【Authors】: Emmanuel Johnson
【Abstract】: Negotiation is an integral part of our daily lives regardless of occupation. Although ubiquitous to our experience, we are never taught to negotiate. This lack of training presents many consequences from unfair salary negotiation to geopolitical ramification. The ability to resolve conflicts and negotiate is becoming more critical due to the rise of automated systems which look to replace various repetitive task jobs. In hopes of improving human negotiation skills, my work seeks to develop automated negotiation agents capable of providing personalized feedback. In this paper, I provide an overview of my past , current, and future work.
【Keywords】:
【Paper Link】 【Pages】:9890-9891
【Authors】: Khimya Khetarpal
【Abstract】: Learning temporal abstractions which are partial solutions to a task and could be reused for other similar or even more complicated tasks is intuitively an ingredient which can help agents to plan, learn and reason efficiently at multiple resolutions of perceptions and time. Just like humans acquire skills and build on top of already existing skills to solve more complicated tasks, AI agents should be able to learn and develop skills continually, hierarchically and incrementally over time. In my research, I aim to answer the following question: How should an agent efficiently represent, learn and use knowledge of the world in continual tasks? My work builds on the options framework, but provides novel extensions driven by this question. We introduce the notion of interest functions. Analogous to temporally extended actions, we propose learning temporally extended perception. The key idea is to learn temporal abstractions unifying both action and perception.
【Keywords】:
【Paper Link】 【Pages】:9892-9893
【Authors】: Neeti Pokhriyal
【Abstract】: Many data analytics problems involve data coming from multiple sources, sensors, modalities or feature spaces, that describe the object of interest in a unique way, and typically exhibit heterogeneous properties. The varied data sources are termed as views, and the task of learning from such multi-view data is known as multi-view learning. In my thesis, I target the problem of poverty prediction and mapping from multi-source data. Currently, poverty is estimated through intensive household surveys, which is costly and time consuming. The need is to timely and accurately predict poverty and map it to spatially fine-grained baseline data. The primary aim of my thesis is to develop novel multi-view algorithms that combine disparate data sources for poverty mapping. Another aim of my work is to relax the core assumptions faced by existing multi-view learning algorithms, and produce factorized subspaces.
【Keywords】:
【Paper Link】 【Pages】:9894-9895
【Authors】: Sathya N. Ravi
【Abstract】: The impact of numerical optimization on modern data analysis has been quite significant. Today, these methods lie at the heart of most statistical machine learning applications in domains spanning genomics, finance and medicine. The expanding scope of these applications (and the complexity of the associated data) has continued to raise the expectations of various criteria associated with the underlying algorithms. Broadly speaking, my research work can be classified into two AI categories: Optimization in ML (Opt-ML) and Optimization in CV (Opt-CV).
【Keywords】:
【Paper Link】 【Pages】:9896-9897
【Authors】: Sandhya Saisubramanian
【Abstract】: This thesis aims to provide a foundation for risk-aware decision making. Decision making under uncertainty is a core capability of an autonomous agent. A cornerstone for with long-term autonomy and safety is risk-aware decision making. A risk-aware model fully accounts for a known set of risks in the environment, with respect to the problem under consideration, and the process of decision making using such a model is risk-aware decision making. Formulating risk-aware models is critical for robust reasoning under uncertainty, since the impact of using less accurate models may be catastrophic in extreme cases due to overly optimistic view of problems. I propose adaptive modeling, a framework that helps balance the trade-off between model simplicity and risk awareness, for different notions of risks, while remaining computationally tractable.
【Keywords】:
【Paper Link】 【Pages】:9898-9899
【Authors】: Atena M. Tabakhi
【Abstract】: The key assumption in Weighted Constraint Satisfaction Problems (WCSPs) is that all constraints are specified a priori. This assumption does not hold in some applications that involve users preferences. Incomplete WCSPs (IWCSPs) extend WCSPs by allowing some constraints to be partially specified. Unfortunately, existing IWCSP approaches either guarantee to return optimal solutions or not provide any quality guarantees on solutions found. To bridge the two extremes, we propose a number of parameterized heuristics that allow users to find boundedly-suboptimal solutions, where the error bound depends on user-defined parameters. These heuristics thus allow users to trade off solution quality for fewer elicited preferences and faster computation times.
【Keywords】:
【Paper Link】 【Pages】:9900-9901
【Authors】: Faraz Torabi
【Abstract】: Humans and other animals have a natural ability to learn skills from observation, often simply from seeing the effects of these skills: without direct knowledge of the underlying actions being taken. For example, after observing an actor doing a jumping jack, a child can copy it despite not knowing anything about what's going on inside the actor's brain and nervous system. The main focus of this thesis is extending this ability to artificial autonomous agents, an endeavor recently referred to as "imitation learning from observation." Imitation learning from observation is especially relevant today due to the accessibility of many online videos that can be used as demonstrations for robots. Meanwhile, advances in deep learning have enabled us to solve increasingly complex control tasks mapping visual input to motor commands. This thesis contributes algorithms that learn control policies from state-only demonstration trajectories. Two types of algorithms are considered. The first type begins by recovering the missing action information from demonstrations and then leverages existing imitation learning algorithms on the full state-action trajectories. Our preliminary work has shown that learning an inverse dynamics model of the agent in a self-supervised fashion and then inferring the actions performed by the demonstrator enables sufficient action recovery for this purpose. The second type of algorithm uses model-free end-to-end learning. Our preliminary results indicate that iteratively optimizing a policy based on the closeness of the imitator's and expert's state transitions leads to a policy that closely mimics the demonstrator's trajectories.
【Keywords】:
【Paper Link】 【Pages】:9902-9903
【Authors】: Abhinav Verma
【Abstract】: We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.
【Keywords】:
【Paper Link】 【Pages】:9904-9905
【Authors】: Christabel Wayllace
【Abstract】: Given an environment and a set of allowed modifications, the task of goal recognition design (GRD) is to select a valid set of modifications that minimizes the maximal number of steps an agent can take before its goal is revealed to an observer. This document presents an extension of GRD to the stochastic domain: the Stochastic Goal Recognition Design (S-GRD). The GRD framework aims to consider: (1) Stochastic agent action outcomes; (2) Partial observability of agent states and actions; and (3) Suboptimal agents. In this abstract we present the progress made towards the final objective as well as a timeline of projected conclusion.
【Keywords】:
【Paper Link】 【Pages】:9906-9907
【Authors】: Ruohan Zhang
【Abstract】: We propose a framework that uses learned human visual attention model to guide the learning process of an imitation learning or reinforcement learning agent. We have collected high-quality human action and eye-tracking data while playing Atari games in a carefully controlled experimental setting. We have shown that incorporating a learned human gaze model into deep imitation learning yields promising results.
【Keywords】:
【Paper Link】 【Pages】:9909-9910
【Authors】: Prince M. Abudu
【Abstract】: Applications that require heterogeneous sensor deployments continue to face practical challenges owing to resource constraints within their operating environments (i.e. energy efficiency, computational power and reliability). This has motivated the need for effective ways of selecting a sensing strategy that maximizes detection accuracy for events of interest using available resources and data-driven approaches. Inspired by those limitations, we ask a fundamental question: whether state-of-the-art Recurrent Neural Networks can observe different series of data and communicate their hidden states to collectively solve an objective in a distributed fashion. We realize our answer by conducting a series of systematic analyses of a Communicating Recurrent Neural Network architecture on varying time-steps, objective functions and number of nodes. The experimental setup we employ models tasks synonymous with those in Wireless Sensor Networks. Our contributions show that Recurrent Neural Networks can communicate through their hidden states and we achieve promising results.
【Keywords】:
【Paper Link】 【Pages】:9911-9912
【Authors】: Emily Alfs ; Doina Caragea ; Nathan Albin ; Pietro Poggi-Corradini
【Abstract】: The proliferation of Android apps has resulted in many malicious apps entering the market and causing significant damage. Robust techniques that determine if an app is malicious are greatly needed. We propose the use of a network-based approach to effectively separate malicious from benign apps, based on a small labeled dataset. The apps in our dataset come from the Google Play Store and have been scanned for malicious behavior using Virus Total to produce a ground truth dataset with labels malicous or benign. The apps in the resulting dataset have been represented using binary feature vectors (where the features represent permissions, intent actions, discriminative APIs, obfuscation signatures, and native code signatures). We have used the feature vectors corresponding to apps to build a weighted network that captures the “closeness” between apps. We propagate labels from the labeled apps to unlabeled apps, and evaluate the effectiveness of the proposed approach using the F1-measure. We have conducted experiments to compare three variants of the label propagation approaches on datasets that include increasingly larger amounts of labeled data. The results have shown that a variant proposed in this study gives the best results overall.
【Keywords】:
【Paper Link】 【Pages】:9913-9914
【Authors】: Wenjun Bai ; Changqin Quan ; Zhi-Wei Luo
【Abstract】: Learning flexible latent representation of observed data is an important precursor for most downstream AI applications. To this end, we propose a novel form of variational encoder, i.e., encapsulated variational encoders (EVE) to exert direct control over encoded latent representations along with its learning algorithm, i.e., the EVE compatible automatic variational differentiation inference algorithm. Armed with this property, our derived EVE is capable of learning converged and diverged latent representations. Using CIFAR-10 as an example, we show that the learning of converged latent representations brings a considerable improvement on the discriminative performance of the semi-supervised EVE. Using MNIST as a demonstration, the generative modelling performance of the EVE induced variational auto-encoder (EVAE) can be largely enhanced with the help of learned diverged latent representations.
【Keywords】:
【Paper Link】 【Pages】:9915-9916
【Authors】: Diana Benavides Prado
【Abstract】: In our research, we study the problem of learning a sequence of supervised tasks. This is a long-standing challenge in machine learning. Our work relies on transfer of knowledge between hypotheses learned with Support Vector Machines. Transfer occurs in two directions: forward and backward. We have proposed to selectively transfer forward support vector coefficients from previous hypotheses as upper-bounds on support vector coefficients to be learned on a target task. We also proposed a novel method for refining existing hypotheses by transferring backward knowledge from a target hypothesis learned recently. We have improved this method through a hypothesis refinement approach that refines whilst encouraging retention of knowledge. Our contribution is represented in a long-term learning framework for binary classification tasks received sequentially one at a time.
【Keywords】:
【Paper Link】 【Pages】:9917-9918
【Authors】: Theodora Bendlin ; Hadi Hosseini
【Abstract】: We introduce a new manipulation strategy available to women in the men-proposing stable matching, called manipulation through an accomplice. In this strategy, a woman can team up with a potential male “accomplice” who manipulates on her behalf to obtain a better match for her. We investigate the stability of the matching obtained after this manipulation, provide an algorithm to compute such strategies, and show its benefit compared to single-woman manipulation strategies.
【Keywords】:
【Paper Link】 【Pages】:9919-9920
【Authors】: Umang Bhatt ; Pradeep Ravikumar ; José M. F. Moura
【Abstract】: Developing human-machine trust is a prerequisite for adoption of machine learning systems in decision critical settings (e.g healthcare and governance). Users develop appropriate trust in these systems when they understand how the systems make their decisions. Interpretability not only helps users understand what a system learns but also helps users contest that system to align with their intuition. We propose an algorithm, AVA: Aggregate Valuation of Antecedents, that generates a consensus feature attribution, retrieving local explanations and capturing global patterns learned by a model. Our empirical results show that AVA rivals current benchmarks.
【Keywords】:
【Paper Link】 【Pages】:9921-9922
【Authors】: Arpita Biswas ; Siddharth Barman
【Abstract】: We consider the problem of allocating a set of indivisible goods among a group of homogeneous agents under matroid constraints and additive valuations, in a fair manner. We propose a novel algorithm that computes a fair allocation for instances with additive and identical valuations, even under matroid constraints. Our result provides a computational anchor to the existential result of the fairness notion, called EF1 (envy-free up to one good) by Biswas and Barman in this setting. We further provide examples to show that the fairness notions stronger than EF1 does not always exist in this setting.
【Keywords】:
【Paper Link】 【Pages】:9923-9924
【Authors】: Narayan Changder ; Samir Aknine ; Animesh Dutta
【Abstract】: Optimal Coalition Structure Generation (CSG) is a significant research problem that remains difficult to solve. Given n agents, the ODP-IP algorithm (Michalak et al. 2016) achieves the current lowest worst-case time complexity of O(3n). We devise an Imperfect Dynamic Programming (ImDP) algorithm for CSG with runtime O(n2n). Imperfect algorithm means that there are some contrived inputs for which the algorithm fails to give the optimal result. Experimental results confirmed that ImDP algorithm performance is better for several data distribution, and for some it improves dramatically ODP-IP. For example, given 27 agents, with ImDP for agentbased uniform distribution time gain is 91% (i.e. 49 minutes).
【Keywords】:
【Paper Link】 【Pages】:9925-9926
【Authors】: Qiwei Chen ; Cheng Wu ; Yiming Wang
【Abstract】: A method based on Robust Principle Component Analysis (RPCA) technique is proposed to detect small targets in infrared images. Using the low rank characteristic of background and the sparse characteristic of target, the observed image is regarded as the sum of a low-rank background matrix and a sparse outlier matrix, and then the decomposition is solved by the RPCA. The infrared small target is extracted from the single-frame image or multi-frame sequence. In order to get more efficient algorithm, the iteration process in the augmented Lagrange multiplier method is improved. The simulation results show that the method can detect out the small target precisely and efficiently.
【Keywords】:
【Paper Link】 【Pages】:9927-9928
【Authors】: Sixuan Chen ; Shuai Yu
【Abstract】: Attention-based recurrent neural network models for joint intent detection and slot filling have achieved a state-of-the-art performance. Most previous works exploited semantic level information to calculate the attention weights. However, few works have taken the importance of word level information into consideration. In this paper, we propose WAIS, word attention for joint intent detection and slot filling. Considering that intent detection and slot filling have a strong relationship, we further propose a fusion gate that integrates the word level information and semantic level information together for jointly training the two tasks. Extensive experiments show that the proposed model has robust superiority over its competitors and sets the state-of-the-art.
【Keywords】:
【Paper Link】 【Pages】:9929-9930
【Authors】: Yifeng Chen ; Cheng Wu ; Yiming Wang
【Abstract】: For large-scale iris recognition tasks, the determination of classification thresholds remains a challenging task, especially in practical applications where sample space is growing rapidly. Due to the complexity of iris samples, the classification threshold is difficult to determine with the increase of samples. The key issue to solving such threshold determination problems is to obtain iris feature vectors with more obvious discrimination. Therefore, we train deep convolutional neural networks based on a large number of iris samples to extract iris features. More importantly, an optimized center loss function referred to Tight Center (T -Center) Loss is used to solve the problem of insufficient discrimination caused by Softmax loss function. In order to evaluate the effectiveness of our proposed method, we use cosine similarity to estimate the similarity between the features on the published datasets CASIA-IrisV4 and IITD2.0. Our experiment results demonstrate that the T -Center loss can minimize intra-class variance and maximize inter-class variance, which achieve significant performance on the benchmark experiments.
【Keywords】:
【Paper Link】 【Pages】:9931-9932
【Authors】: Yuxin Chen ; Tengjiao Wang ; Wei Chen ; Qiang Li ; Zhen Qiu
【Abstract】: Lacking in sequence preserving mechanism, existing heterogeneous information network (HIN) embedding discards the essential type sequence information during embedding. We propose a Type Sequence Preserving HIN Embedding model (SeqHINE) which expands the HIN embedding to sequence level. SeqHINE incorporates the type sequence information via type-aware GRU and preserves representative sequence information by decay function. Abundant experiments show that SeqHINE can outperform state-of-the-art even with 50% less labeled data.
【Keywords】:
【Paper Link】 【Pages】:9933-9934
【Authors】: Ameer Dharamshi ; Rosie Yuyan Zou
【Abstract】: Facial verification is a core problem studied by researchers in computer vision. Recently published one-to-one comparison models have successfully achieved accuracy results that surpass the abilities of humans. A natural extension to the one-to-one facial verification problem is a one-to-many classification. In this abstract, we present our exploration of different methods of performing one-to-many facial verification using low-resolution images. The CSEye model introduces a direct comparison between the features extracted from each of the candidate images and the suspect before performing the classification task. Initial experiments using 10-to-1 comparisons of faces from the Labelled Faces of the Wild dataset yield promising results.
【Keywords】:
【Paper Link】 【Pages】:9935-9936
【Authors】: Wenyu Du ; Baocheng Li ; Min Yang ; Qiang Qu ; Ying Shen
【Abstract】: In this paper, we propose a Multi-Task learning approach for Answer Selection (MTAS), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains (tasks). Specifically, MTAS consists of two key components: (i) A category classification model that learns rich category-aware document representation; (ii) An answer selection model that provides the matching scores of question-answer pairs. These two tasks work on a shared document encoding layer, and they cooperate to learn a high-quality answer selection system. In addition, a multi-head attention mechanism is proposed to learn important information from different representation subspaces at different positions. We manually annotate the first Chinese question answering dataset in law domain (denoted as LawQA) to evaluate the effectiveness of our model. The experimental results show that our model MTAS consistently outperforms the compared methods.1
【Keywords】:
【Paper Link】 【Pages】:9937-9938
【Authors】: Amir Erfan Eshratifar ; Mohammad Saeed Abrishami ; David Eigen ; Massoud Pedram
【Abstract】: Transfer-learning and meta-learning are two effective methods to apply knowledge learned from large data sources to new tasks. In few-class, few-shot target task settings (i.e. when there are only a few classes and training examples available in the target task), meta-learning approaches that optimize for future task learning have outperformed the typical transfer approach of initializing model weights from a pretrained starting point. But as we experimentally show, metalearning algorithms that work well in the few-class setting do not generalize well in many-shot and many-class cases. In this paper, we propose a joint training approach that combines both transfer-learning and meta-learning. Benefiting from the advantages of each, our method obtains improved generalization performance on unseen target tasks in both few- and many-class and few- and many-shot scenarios.
【Keywords】:
【Paper Link】 【Pages】:9939-9940
【Authors】: Víctor Gallego ; Roi Naveiro ; David Ríos Insua
【Abstract】: In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.
【Keywords】:
【Paper Link】 【Pages】:9941-9942
【Authors】: Mononito Goswami ; Shiven Mian ; Jack Mostow
【Abstract】: Intelligent Tutoring Systems (ITS) have great potential to change the educational landscape by bringing scientifically tested one-to-one tutoring to remote and under-served areas. However, effective ITSs are too complex to perfect. Instead, a practical guiding principle for ITS development and improvement is to fix what’s most broken. In this paper we present SPOT (Statistical Probe of Tutoring): a tool that mines data logged by an Intelligent Tutoring System to identify the ‘hot spots’ most detrimental to its efficiency and effectiveness in terms of its software reliability, usability, task difficulty, student engagement, and other criteria. SPOT uses heuristics and machine learning to discover, characterize, and prioritize such hot spots in order to focus ITS refinement on what matters most. We applied SPOT to data logged by RoboTutor, an ITS that teaches children basic reading, writing and arithmetic.
【Keywords】:
【Paper Link】 【Pages】:9943-9944
【Authors】: Zijian Hu ; Fuli Luo ; Yutong Tan ; Wenxin Zeng ; Zhifang Sui
【Abstract】: Word Sense Disambiguation (WSD), as a tough task in Natural Language Processing (NLP), aims to identify the correct sense of an ambiguous word in a given context. There are two mainstreams in WSD. Supervised methods mainly utilize labeled context to train a classifier which generates the right probability distribution of word senses. Meanwhile knowledge-based (unsupervised) methods which focus on glosses (word sense definitions) always calculate the similarity of context-gloss pair as score to find out the right word sense. In this paper, we propose a generative adversarial framework WSD-GAN which combines two mainstream methods in WSD. The generative model, based on supervised methods, tries to generate a probability distribution over the word senses. Meanwhile the discriminative model, based on knowledge-based methods, focuses on predicting the relevancy of the context-gloss pairs and identifies the correct pairs over the others. Furthermore, in order to optimize both two models, we leverage policy gradient to enhance the performances of the two models mutually. Our experimental results show that WSD-GAN achieves competitive results on several English all-words WSD datasets.
【Keywords】:
【Paper Link】 【Pages】:9945-9946
【Authors】: Ling Huang ; Chang-Dong Wang ; Hong-Yang Chao
【Abstract】: In this paper, we define a new problem of multi-layer network community detection, namely higher-order multi-layer community detection. A multi-layer motif (M-Motif) approach is proposed, which discovers communities with good intralayer higher-order community quality while preserving interlayer higher-order community consistency. Experimental results have confirmed the superiority of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:9947-9948
【Authors】: Sein Jang ; Young-Ho Park ; Aziz Nasridinov
【Abstract】: Visual surveillance through closed circuit television (CCTV) can help to prevent crime. In this paper, we propose an automatic visual surveillance network (AVS-Net), which simultaneously performs image processing and object detection to determine the dangers of situations captured by CCTV. In addition, we add a relation module to infer the relationships of the objects in the images. Experimental results show that the relation module greatly improves classification accuracy, even if there is not enough information.
【Keywords】:
【Paper Link】 【Pages】:9949-9950
【Authors】: Younkook Kang ; Sungwon Lyu ; Jeeyung Kim ; Bongjoon Park ; Sungzoon Cho
【Abstract】: In automated material handling systems (AMHS), delivery time is an important issue directly associated with the production cost and the quality of the product. In this paper, we propose a dynamic routing strategy to shorten delivery time and delay. We set the target of control by analyzing traffic flows and selecting the region with the highest flow rate and congestion frequency. Then, we impose a routing cost in order to dynamically reflect the real-time changes of traffic states. Our deep reinforcement learning model consists of a Q-learning step and a recurrent neural network, through which traffic states and action values are predicted. Experiment results show that the proposed method decreases manufacturing costs while increasing productivity. Additionally, we find evidence the reinforcement learning structure proposed in this study can autonomously and dynamically adjust to the changes in traffic patterns.
【Keywords】:
【Paper Link】 【Pages】:9951-9952
【Authors】: Raghav Kapoor ; Yaman Kumar ; Kshitij Rajput ; Rajiv Ratn Shah ; Ponnurangam Kumaraguru ; Roger Zimmermann
【Abstract】: In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e, Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We apply transfer learning and make a LSTM based model for hate speech classification. This model surpasses the performance shown by the current best models to establish itself as the state-of-the-art in the unexplored domain of Hinglish offensive text classification. We also release our model and the embeddings trained for research purposes.
【Keywords】:
【Paper Link】 【Pages】:9953-9954
【Authors】: Kherroubi Zine el Abidine ; Samir Aknine ; Bacha Rebiha
【Abstract】: This work explores the design of a central collaborative driving strategy between connected cars with the objective of improving road safety in case of highway on-ramp merging scenario. Based on a suitable method for predicting vehicle motion and behavior for a central collaborative strategy, a dynamic Bayesian network method that predicts the intention of drivers in highway on-ramp is proposed. The method was validated using real data of detailed vehicle trajectories on a segment of interstate 80 in Emeryville, California.
【Keywords】:
【Paper Link】 【Pages】:9955-9956
【Authors】: Khimya Khetarpal ; Doina Precup
【Abstract】: Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can help agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture.
【Keywords】:
【Paper Link】 【Pages】:9957-9958
【Authors】: Igor Kiselev
【Abstract】: Previously proposed variational techniques for approximate MMAP inference in complex graphical models of high-order factors relax a dual variational objective function to obtain its tractable approximation, and further perform MMAP inference in the resulting simplified graphical model, where the sub-graph with decision variables is assumed to be a disconnected forest. In contrast, we developed novel variational MMAP inference algorithms and proximal convergent solvers, where we can improve the approximation accuracy while better preserving the original MMAP query by designing such a dual variational objective function that an upper bound approximation is applied only to the entropy of decision variables. We evaluate the proposed algorithms on both simulated synthetic datasets and diagnostic Bayesian networks taken from the UAI inference challenge, and our solvers outperform other variational algorithms in a majority of reported cases. Additionally, we demonstrate the important real-life application of the proposed variational approaches to solve complex tasks of policy optimization by MMAP inference, and performance of the implemented approximation algorithms is compared. Here, we demonstrate that the original task of optimizing POMDP controllers can be approached by its reformulation as the equivalent problem of marginal-MAP inference in a novel single-DBN generative model, which guarantees that the control policies computed by probabilistic inference over this model are optimal in the traditional sense. Our motivation for approaching the planning problem through probabilistic inference in graphical models is explained by the fact that by transforming a Markovian planning problem into the task of probabilistic inference (a marginal MAP problem) and applying belief propagation techniques in generative models, we can achieve a computational complexity reduction from PSPACE-complete or NEXP-complete to NPPP-complete in comparison to solving the POMDP and Dec-POMDP models respectively search vs. dynamic programming).
【Keywords】:
【Paper Link】 【Pages】:9959-9960
【Authors】: In-Seok Lee ; Jun-Geol Baek
【Abstract】: In the manufacturing process, process monitoring is very important. Real-time contrast (RTC) control chart outperforms existing monitoring methods. However, the performance of RTC control chart depends on the classifier. The existing RTC charts use random forest (RF), support vector machine (SVM), or kernel linear discriminant analysis (KLDA) as a classifier. RF classifier can find cause of faults but the performance is lower than others. Therefore, we suggest the data representation method to improve the RF based RTC control chart. Symbolic aggregate approximation (SAX) is famous method to improve the performance of classification and clustering. We convert the input data by using SAX. We change the parameters of SAX such as alphabet size and breakpoints to improve the performance. Experiment shows that represented data is efficient method to improve the performance of RTC control chart.
【Keywords】:
【Paper Link】 【Pages】:9961-9962
【Authors】: Seung-Geon Lee ; Jaedeok Kim ; Hyun-Joo Jung ; Yoonsuck Choe
【Abstract】: Estimating the relative importance of each sample in a training set has important practical and theoretical value, such as in importance sampling or curriculum learning. This kind of focus on individual samples invokes the concept of samplewise learnability: How easy is it to correctly learn each sample (cf. PAC learnability)? In this paper, we approach the sample-wise learnability problem within a deep learning context. We propose a measure of the learnability of a sample with a given deep neural network (DNN) model. The basic idea is to train the given model on the training set, and for each sample, aggregate the hits and misses over the entire training epochs. Our experiments show that the samplewise learnability measure collected this way is highly linearly correlated across different DNN models (ResNet-20, VGG-16, and MobileNet), suggesting that such a measure can provide deep general insights on the data’s properties. We expect our method to help develop better curricula for training, and help us better understand the data itself.
【Keywords】:
【Paper Link】 【Pages】:9963-9964
【Abstract】: In this paper, we propose a new feature extraction method called hvnLBP-TOP for video-based sentiment analysis. Furthermore, we use principal component analysis (PCA) and bidirectional long short term memory (bi-LSTM) for dimensionality reduction and classification. We achieved an average recognition accuracy of 71.1% on the MOUD dataset and 63.9% on the CMU-MOSI dataset.
【Keywords】:
【Paper Link】 【Pages】:9965-9966
【Authors】: Lile Li ; Quan Do ; Wei Liu
【Abstract】: Data across many business domains can be represented by two or more coupled data sets. Correlations among these coupled datasets have been studied in the literature for making more accurate cross-domain recommender systems. However, in existing methods, cross-domain recommendations mostly assume the coupled mode of data sets share identical latent factors, which limits the discovery of potentially useful domain-specific properties of the original data. In this paper, we proposed a novel cross-domain recommendation method called Coupled Factorization Machine (CoFM) that addresses this limitation. Compared to existing models, our research is the first model that uses factorization machines to capture both common characteristics of coupled domains while simultaneously preserving the differences among them. Our experiments with real-world datasets confirm the advantages of our method in making across-domain recommendations.
【Keywords】:
【Paper Link】 【Pages】:9967-9968
【Authors】: Lingge Li ; Nitish Nayak ; Jianming Bian ; Pierre Baldi
【Abstract】: Many experiments have been set-up to measure the parameters governing the neutrino oscillation probabilities accurately, with implications for the fundamental structure of the universe. Very often, this involves inferences from tiny samples of data which have complicated dependencies on multiple oscillation parameters simultaneously. This is typically carried out using the unified approach of Feldman and Cousins which is very computationally expensive, on the order of tens of millions of CPU hours. In this work, we propose an iterative method using Gaussian Process to efficiently find a confidence contour for the oscillation parameters and show that it produces the same results at a fraction of the computation cost.
【Keywords】:
【Paper Link】 【Pages】:9969-9970
【Authors】: Mingming Li ; Jiao Dai ; Fuqing Zhu ; Liangjun Zang ; Songlin Hu ; Jizhong Han
【Abstract】: In recommender systems, the user uncertain preference results in unexpected ratings. This paper makes an initial attempt in integrating the influence of user uncertain degree into the matrix factorization framework. Specifically, a fuzzy set of like for each user is defined, and the membership function is utilized to measure the degree of an item belonging to the fuzzy set. Furthermore, to enhance the computational effect on sparse matrix, the uncertain preference is formulated as a side-information for fusion. Experimental results on three real-world datasets show that the proposed approach produces stable improvements compared with others.
【Keywords】:
【Paper Link】 【Pages】:9971-9972
【Authors】: Yanran Li ; Wenjie Li
【Abstract】: We propose a chatbot, namely MOCHA to make good use of relevant entities when generating responses. Augmented with meta-path information, MOCHA is able to mention proper entities following the conversation flow.
【Keywords】:
【Paper Link】 【Pages】:9973-9974
【Authors】: Zhaohui Li ; Yue Feng ; Jun Xu ; Jiafeng Guo ; Yanyan Lan ; Xueqi Cheng
【Abstract】: Machine reading comprehension, whose goal is to find answers from the candidate passages for a given question, has attracted a lot of research efforts in recent years. One of the key challenge in machine reading comprehension is how to identify the main content from a large, redundant, and overlapping set of candidate sentences. In this paper we propose to tackle the challenge with Markov Decision Process in which the main content identification is formalized as sequential decision making and each action corresponds to selecting a sentence. Policy gradient is used to learn the model parameters. Experimental results based on MSMARCO showed that the proposed model, called MC-MDP, can select high quality main contents and significantly improved the performances of answer span prediction.
【Keywords】:
【Paper Link】 【Pages】:9975-9976
【Authors】: Zhijie Lin ; Kaiyang Lin ; Shiling Chen ; Linlin Li ; Zhou Zhao
【Abstract】: End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.
【Keywords】:
【Paper Link】 【Pages】:9977-9978
【Authors】: Shengchao Liu ; Yingyu Liang ; Anthony Gitter
【Abstract】: In settings with related prediction tasks, integrated multi-task learning models can often improve performance relative to independent single-task models. However, even when the average task performance improves, individual tasks may experience negative transfer in which the multi-task model’s predictions are worse than the single-task model’s. We show the prevalence of negative transfer in a computational chemistry case study with 128 tasks and introduce a framework that provides a foundation for reducing negative transfer in multitask models. Our Loss-Balanced Task Weighting approach dynamically updates task weights during model training to control the influence of individual tasks.
【Keywords】:
【Paper Link】 【Pages】:9979-9980
【Authors】: Xiaoming Liu ; Zhixiong Xu ; Lei Cao ; Xiliang Chen ; Kai Kang
【Abstract】: The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes “past-success exploration strategy combined with Softmax action selection”(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.
【Keywords】:
【Paper Link】 【Pages】:9981-9982
【Authors】: Xingbo Liu ; Xiushan Nie ; Yingxin Wang ; Yilong Yin
【Abstract】: Hashing can compress heterogeneous high-dimensional data into compact binary codes while preserving the similarity to facilitate efficient retrieval and storage, and thus hashing has recently received much attention from information retrieval researchers. Most of the existing hashing methods first predefine a fixed length (e.g., 32, 64, or 128 bit) for the hash codes before learning them with this fixed length. However, one sample can be represented by various hash codes with different lengths, and thus there must be some associations and relationships among these different hash codes because they represent the same sample. Therefore, harnessing these relationships will boost the performance of hashing methods. Inspired by this possibility, in this study, we propose a new model jointly multiple hash learning (JMH), which can learn hash codes with multiple lengths simultaneously. In the proposed JMH method, three types of information are used for hash learning, which come from hash codes with different lengths, the original features of the samples and label. In contrast to the existing hashing methods, JMH can learn hash codes with different lengths in one step. Users can select appropriate hash codes for their retrieval tasks according to the requirements in terms of accuracy and complexity. To the best of our knowledge, JMH is one of the first attempts to learn multi-length hash codes simultaneously. In addition, in the proposed model, discrete and closed-form solutions for variables can be obtained by cyclic coordinate descent, thereby making the proposed model much faster during training. Extensive experiments were performed based on three benchmark datasets and the results demonstrated the superior performance of the proposed method.
【Keywords】:
【Paper Link】 【Pages】:9983-9984
【Authors】: Zelei Liu ; Han Yu ; Leye Wang ; Liang Hu ; Qiang Yang
【Abstract】: We consider the problem of mobilizing community effort to reposition indiscriminantly parked shared bikes in urban environments through crowdsourcing. We propose an ethically aligned incentive optimization approach WSLS which maximizes the rate of success for bike repositioning while minimizing cost and prioritizing users’ wellbeing. Realistic simulations based on a dataset from Singapore demonstrate that WSLS significantly outperforms existing approaches.
【Keywords】:
【Paper Link】 【Pages】:9985-9986
【Authors】: Jianjie Lu ; Kai-yu Tong
【Abstract】: A weakly-supervised framework is proposed that cannot only make class inference but also provides reasonable decision basis in bone X-ray images. We implement it in three stages progressively: (1) design a classification network and use positive class activation map (PCAM) for attention location; (2) generate masks from attention maps and lead the model to make classification prediction from the activation areas; (3) label lesions in very few images and guide the model to learn simultaneously. We test the proposed method on a bone X-ray dataset. Results show that it achieves significant improvements in lesion location.
【Keywords】:
【Paper Link】 【Pages】:9987-9988
【Authors】: Yao Lu ; Linqing Liu ; Zhile Jiang ; Min Yang ; Randy Goebel
【Abstract】: We propose a Multi-task learning approach for Abstractive Text Summarization (MATS), motivated by the fact that humans have no difficulty performing such task because they have the capabilities of multiple domains. Specifically, MATS consists of three components: (i) a text categorization model that learns rich category-specific text representations using a bi-LSTM encoder; (ii) a syntax labeling model that learns to improve the syntax-aware LSTM decoder; and (iii) an abstractive text summarization model that shares its encoder and decoder with the text categorization and the syntax labeling tasks, respectively. In particular, the abstractive text summarization model enjoys significant benefit from the additional text categorization and syntax knowledge. Our experimental results show that MATS outperforms the competitors.1
【Keywords】:
【Paper Link】 【Pages】:9989-9990
【Authors】: Yingjing Lu
【Abstract】: The Mean Square Error (MSE) has shown its strength when applied in deep generative models such as Auto-Encoders to model reconstruction loss. However, in image domain especially, the limitation of MSE is obvious: it assumes pixel independence and ignores spatial relationships of samples. This contradicts most architectures of Auto-Encoders which use convolutional layers to extract spatial dependent features. We base on the structural similarity metric (SSIM) and propose a novel level weighted structural similarity (LWSSIM) loss for convolutional Auto-Encoders. Experiments on common datasets on various Auto-Encoder variants show that our loss is able to outperform the MSE loss and the Vanilla SSIM loss. We also provide reasons why our model is able to succeed in cases where the standard SSIM loss fails.
【Keywords】:
【Paper Link】 【Pages】:9991-9992
【Authors】: Anna Lukina
【Abstract】: The main focus of this work is an optimization-based framework for control of multi-agent systems that synthesizes actions steering a given system towards a specified state. The primary motivation for the research presented is a fascination with birds, which save energy on long-distance flights via forming a V-shape. We ask the following question: Are V-formations a result of solving an optimization problem and can this concept be utilized in multi-agent systems, particularly in drones swarms, to increase their safety and resilience? We demonstrate that our framework can be applied to any system modeled as a controllable Markov decision process with a cost (reward) function. A key feature of the procedure we propose is its automatic adaptation to the performance of optimization towards a given global objective. Combining model-predictive control and ideas from sequential Monte-Carlo methods, we introduce a performance-based adaptive horizon and implicitly build a Lyapunov function that guarantees convergence. We use statistical model-checking to verify the algorithm and assess its reliability.
【Keywords】:
【Paper Link】 【Pages】:9993-9994
【Authors】: Junyu Luo ; Min Yang ; Ying Shen ; Qiang Qu ; Haixia Chai
【Abstract】: In this paper, we propose a Document Embedding Network (DEN) to learn document embeddings in an unsupervised manner. Our model uses the encoder-decoder architecture as its backbone, which tries to reconstruct the input document from an encoded document embedding. Unlike the standard decoder for text reconstruction, we randomly block some words in the input document, and use the incomplete context information and the encoded document embedding to predict the blocked words in the document, inspired by the crossword game. Thus, our decoder can keep the balance between the known and unknown information, and consider both global and partial information when decoding the missing words. We evaluate the learned document embeddings on two tasks: document classification and document retrieval. The experimental results show that our model substantially outperforms the compared methods.1.
【Keywords】:
【Paper Link】 【Pages】:9995-9996
【Authors】: Daoming Lyu ; Fangkai Yang ; Bo Liu ; Daesub Yoon
【Abstract】: Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.
【Keywords】:
【Paper Link】 【Pages】:9997-9998
【Authors】: Gaku Morio ; Katsuhide Fujita
【Abstract】: This paper focuses on fundamental research that combines syntactic knowledge with neural studies, which utilize syntactic information in argument component identification and classification (AC-I/C) tasks in argument mining (AM). The following are our paper’s contributions: 1) We propose a way of incorporating a syntactic GCN into multi-task learning models for AC-I/C tasks. 2) We demonstrate the valid effectiveness of our proposed syntactic GCN in fair experiments in some datasets. We also found that syntactic GCNs are promising for lexically independent scenarios. Our code in the experiments is available for reproducibility.1
【Keywords】:
【Paper Link】 【Pages】:9999-10000
【Authors】: Daniel Muller
【Abstract】: Oversubscription planning (OSP) is the problem of choosing an action sequence which reaches a state with a high utility, given a budget for total action cost. This formulation allows us to handle situations with under-constrained resources, which do not allow us to achieve all possible goal facts. In optimal OSP, the task is further constrained to finding a path which achieves a state with maximal utility. An incremental BFBB search algorithm with landmark-based approximations, proposed for OSP heuristic search to address tasks with non-negative and 0-binary utility functions. Incremental BFBB maintained with the best solution so far and a set of reference states, extended with all the non-redundant value-carrying states discovered during the search. Each iteration requires search re-start in order to exploit the new knowledge obtained along the search. Recent work proposed an approach of relative estimation of achievements with value-driven landmarks to address arbitrary utility functions, which incrementally improves the best existing solution so far eliminating the need to maintain a set of reference states. We now propose a progressive frontier search algorithm, which alleviates the need to re-start from scratch once new information is acquired by capturing the frontier achieved at the end of each iteration which is used as a dynamic reference point to continue the search, leading to improved efficiency of the search.
【Keywords】:
【Paper Link】 【Pages】:10001-10002
【Authors】: Mengdie Nie ; Zhi-Jie Wang ; Chunjing Gan ; Zhe Quan ; Bin Yao ; Jian Yin
【Abstract】: Nearest neighbor search is a fundamental computational tool and has wide applications. In past decades, many datastructures have been developed to speed up this operation. In this paper, we propose a novel hierarchical datastructure for nearest neighbor search in moderately high dimension. Our proposed method maintains good run time guarantees, and it outperforms several state-of-the-art methods in practice.
【Keywords】:
【Paper Link】 【Pages】:10003-10004
【Authors】: Kaushal Paneri ; Vishnu TV ; Pankaj Malhotra ; Lovekesh Vig ; Gautam Shroff
【Abstract】: Deep neural networks are prone to overfitting, especially in small training data regimes. Often, these networks are overparameterized and the resulting learned weights tend to have strong correlations. However, convolutional networks in general, and fully convolution neural networks (FCNs) in particular, have been shown to be relatively parameter efficient, and have recently been successfully applied to time series classification tasks. In this paper, we investigate the application of different regularizers on the correlation between the learned convolutional filters in FCNs using Batch Normalization (BN) as a regularizer for time series classification (TSC) tasks. Results demonstrate that despite orthogonal initialization of the filters, the average correlation across filters (especially for filters in higher layers) tends to increase as training proceeds, indicating redundancy of filters. To mitigate this redundancy, we propose a strong regularizer, using simple yet effective filter decorrelation. Our proposed method yields significant gains in classification accuracy for 44 diverse time series datasets from the UCR TSC benchmark repository.
【Keywords】:
【Paper Link】 【Pages】:10005-10006
【Authors】: So-Hyun Park ; Sun-Young Ihm ; Aziz Nasridinov ; Young-Ho Park
【Abstract】: This study proposes a method to reduce the playing-related musculoskeletal disorders (PRMDs) that often occur among pianists. Specifically, we propose a feasibility test that evaluates several state-of-the-art deep learning algorithms to prevent injuries of pianist. For this, we propose (1) a C3P dataset including various piano playing postures and show (2) the application of four learning algorithms, which demonstrated their superiority in video classification, to the proposed C3P datasets. To our knowledge, this is the first study that attempted to apply the deep learning paradigm to reduce the PRMDs in pianist. The experimental results demonstrated that the classification accuracy is 80% on average, indicating that the proposed hypothesis about the effectiveness of the deep learning algorithms to prevent injuries of pianist is true.
【Keywords】:
【Paper Link】 【Pages】:10007-10008
【Authors】: Rey Pocius ; Lawrence Neal ; Alan Fern
【Abstract】: Commonly used sequential decision making tasks such as the games in the Arcade Learning Environment (ALE) provide rich observation spaces suitable for deep reinforcement learning. However, they consist mostly of low-level control tasks which are of limited use for the development of explainable artificial intelligence(XAI) due to the fine temporal resolution of the tasks. Many of these domains also lack built-in high level abstractions and symbols. Existing tasks that provide for both strategic decision-making and rich observation spaces are either difficult to simulate or are intractable. We provide a set of new strategic decision-making tasks specialized for the development and evaluation of explainable AI methods, built as constrained mini-games within the StarCraft II Learning Environment.
【Keywords】:
【Paper Link】 【Pages】:10009-10010
【Authors】: Jacob Rafati ; David C. Noelle
【Abstract】: Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. We present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences of the agent. When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on a variant of the rooms environment.
【Keywords】:
【Paper Link】 【Pages】:10011-10012
【Authors】: Jinfu Ren ; Yang Liu ; Jiming Liu
【Abstract】: In this paper, we propose a novel oversampling strategy dubbed Entropy-based Wasserstein Generative Adversarial Network (EWGAN) to generate data samples for minority classes in imbalanced learning. First, we construct an entropyweighted label vector for each class to characterize the data imbalance in different classes. Then we concatenate this entropyweighted label vector with the original feature vector of each data sample, and feed it into the WGAN model to train the generator. After the generator is trained, we concatenate the entropy-weighted label vector with random noise feature vectors, and feed them into the generator to generate data samples for minority classes. Experimental results on two benchmark datasets show that the samples generated by the proposed oversampling strategy can help to improve the classification performance when the data are highly imbalanced. Furthermore, the proposed strategy outperforms other state-of-the-art oversampling algorithms in terms of the classification accuracy.
【Keywords】:
【Paper Link】 【Pages】:10013-10014
【Authors】: Hanan Rosemarin ; Ariel Rosenfeld ; Sarit Kraus
【Abstract】: Emergency Departments (EDs) provide an imperative source of medical care. Central to the ED workflow is the patientcaregiver scheduling, directed at getting the right patient to the right caregiver at the right time. Unfortunately, common ED scheduling practices are based on ad-hoc heuristics which may not be aligned with the complex and partially conflicting ED’s objectives. In this paper, we propose a novel online deep-learning scheduling approach for the automatic assignment and scheduling of medical personnel to arriving patients. Our approach allows for the optimization of explicit, hospitalspecific multi-variate objectives and takes advantage of available data, without altering the existing workflow of the ED. In an extensive empirical evaluation, using real-world data, we show that our approach can significantly improve an ED’s performance metrics.
【Keywords】:
【Paper Link】 【Pages】:10015-10016
【Authors】: Gaetano Rossiello ; Alfio Gliozzo ; Michael R. Glass
【Abstract】: We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.
【Keywords】:
【Paper Link】 【Pages】:10017-10018
【Authors】: Eitan Rothberg ; Tingting Chen ; Hao Ji
【Abstract】: As technology and society grow increasingly dependent on computer vision, it becomes important to make sure that these technologies are secure. However, even today’s stateof-the-art classifiers are easily fooled by carefully manipulated images. The only solutions that have increased robustness against these manipulated images have come at the expense of accuracy on natural inputs. In this work, we propose a new training technique, localized adversarial training, that results in more accurate classification of both both natural and adversarial images by as much as 6.5% and 99.7%, respectively.
【Keywords】:
【Paper Link】 【Pages】:10019-10020
【Authors】: Pramit Saha ; Sidney S. Fels
【Abstract】: We propose a mixed deep neural network strategy, incorporating parallel combination of Convolutional (CNN) and Recurrent Neural Networks (RNN), cascaded with deep autoencoders and fully connected layers towards automatic identification of imagined speech from EEG. Instead of utilizing raw EEG channel data, we compute the joint variability of the channels in the form of a covariance matrix that provide spatio-temporal representations of EEG. The networks are trained hierarchically and the extracted features are passed onto the next network hierarchy until the final classification. Using a publicly available EEG based speech imagery database we demonstrate around 23.45% improvement of accuracy over the baseline method. Our approach demonstrates the promise of a mixed DNN approach for complex spatialtemporal classification problems.
【Keywords】:
【Paper Link】 【Pages】:10021-10022
【Authors】: Lakshay Sahni ; Debasrita Chakraborty ; Ashish Ghosh
【Abstract】: Latest developments in the field of power-efficient neural interface circuits provide an excellent platform for applications where power consumption is the primary concern. Developing neural networks to achieve pattern recognition on such hardware remains a daunting task owing to substantial computational complexity. We propose and demonstrate a Spiking Neural Network (SNN) with biologically reasonable time constants to implement basic Boolean Logic Gates. The same network can be further applied to more complex problem statements. We employ a frequency spike encoding for data representation in the model, and a simplified and computationally efficient model of a neuron with exponential synapses and Spike Timing Dependent Plasticity (STDP).
【Keywords】:
【Paper Link】 【Pages】:10023-10024
【Authors】: Khwaja Mohd. Salik ; Swati Aggarwal ; Yaman Kumar ; Rajiv Ratn Shah ; Rohit Jain ; Roger Zimmermann
【Abstract】: Lipreading is the process of understanding and interpreting speech by observing a speaker’s lip movements. In the past, most of the work in lipreading has been limited to classifying silent videos to a fixed number of text classes. However, this limits the applications of the lipreading since human language cannot be bound to a fixed set of words or languages. The aim of this work is to reconstruct intelligible acoustic speech signals from silent videos from various poses of a person which Lipper has never seen before. Lipper, therefore is a vocabulary and language agnostic, speaker independent and a near real-time model that deals with a variety of poses of a speaker. The model leverages silent video feeds from multiple cameras recording a subject to generate intelligent speech of a speaker. It uses a deep learning based STCNN+BiGRU architecture to achieve this goal. We evaluate speech reconstruction for speaker independent scenarios and demonstrate the speech output by overlaying the audios reconstructed by Lipper on the corresponding videos.
【Keywords】:
【Paper Link】 【Pages】:10025-10026
【Authors】: Ivan Samborskii ; Aleksandr Farseev ; Andrey Filchenkov ; Tat-Seng Chua
【Abstract】: Nowadays, video games play a very important role in human life and no longer purely associated with escapism or entertainment. In fact, gaming has become an essential part of our daily routines, which give rise to the exponential growth of various online game platforms. By participating in such platforms, individuals generate a multitude of game data points, which, for example, can be further used for automatic user profiling and recommendation applications. However, the literature on automatic learning from the game data is relatively sparse, which had inspired us to tackle the problem of player profiling in this first preliminary study. Specifically, in this work, we approach the task of player gender prediction based on various types of game data. Our initial experimental results inspire further research on user profiling in the game domain.
【Keywords】:
【Paper Link】 【Pages】:10027-10028
【Authors】: Sebastin Santy ; Wazeer Zulfikar ; Rishabh Mehrotra ; Emine Yilmaz
【Abstract】: We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external hierarchical task ontology to produce task descriptions from input images. Detailed experiments highlight the efficacy of the extracted descriptions, which could potentially find their way in many applications, including image alt text generation.
【Keywords】:
【Paper Link】 【Pages】:10029-10030
【Authors】: Eitan Sapiro-Gheiler
【Abstract】: This work shows the value of word-level statistical data from the US Congressional Record for studying the ideological positions and dynamic behavior of senators. Using classification techniques from machine learning, we predict senators’ party with near-perfect accuracy. We also develop text-based ideology scores to embed a politician’s ideological position in a one-dimensional policy space. Using these scores, we find that speech that diverges from voting positions may result in higher vote totals. To explain this behavior, we show that politicians use speech to move closer to their party’s average position. These results not only provide empirical support for political economy models of commitment, but also add to the growing literature of machine-learning-based text analysis in social science contexts.
【Keywords】:
【Paper Link】 【Pages】:10031-10032
【Authors】: Mingyue Shang ; Zhenxin Fu ; Hongzhi Yin ; Bo Tang ; Dongyan Zhao ; Rui Yan
【Abstract】: Natural language understanding is a challenging problem that covers a wide range of tasks. While previous methods generally train each task separately, we consider combining the cross-task features to enhance the task performance. In this paper, we incorporate the logic information with the help of the Natural Language Inference (NLI) task to the Story Cloze Test (SCT). Previous work on SCT considered various semantic information, such as sentiment and topic, but lack the logic information between sentences which is an essential element of stories. Thus we propose to extract the logic information during the course of the story to improve the understanding of the whole story. The logic information is modeled with the help of the NLI task. Experimental results prove the strength of the logic information.
【Keywords】:
【Paper Link】 【Pages】:10033-10034
【Authors】: Abhishek Singh ; Debojyoti Dutta ; Amit Saha
【Abstract】: Majority of the advancement in Deep learning (DL) has occurred in domains such as computer vision, and natural language processing, where abundant training data is available. A major obstacle in leveraging DL techniques for malware analysis is the lack of sufficiently big, labeled datasets. In this paper, we take the first steps towards building a model which can synthesize labeled dataset of malware images using GAN. Such a model can be utilized to perform data augmentation for training a classifier. Furthermore, the model can be shared publicly for community to reap benefits of dataset without sharing the original dataset. First, we show the underlying idiosyncrasies of malware images and why existing data augmentation techniques as well as traditional GAN training fail to produce quality artificial samples. Next, we propose a new method for training GAN where we explicitly embed prior domain knowledge about the dataset into the training procedure. We show improvements in training stability and sample quality assessed on different metrics. Our experiments show substantial improvement on baselines and promise for using such a generative model for malware visualization systems.
【Keywords】:
【Paper Link】 【Pages】:10035-10036
【Authors】: Kacper Sokol ; Peter A. Flach
【Abstract】: Explanations in machine learning come in many forms, but a consensus regarding their desired properties is still emerging. In our work we collect and organise these explainability desiderata and discuss how they can be used to systematically evaluate properties and quality of an explainable system using the case of class-contrastive counterfactual statements. This leads us to propose a novel method for explaining predictions of a decision tree with counterfactuals. We show that our model-specific approach exploits all the theoretical advantages of counterfactual explanations, hence improves decision tree interpretability by decoupling the quality of the interpretation from the depth and width of the tree.
【Keywords】:
【Paper Link】 【Pages】:10037-10038
【Authors】: Helge Spieker
【Abstract】: This paper proposes a framework for solving constraint problems with reinforcement learning (RL) and sequence-tosequence recurrent neural networks. We approach constraint solving as a declarative machine learning problem, where for a variable-length input sequence a variable-length output sequence has to be predicted. Using randomly generated instances and the number of constraint violations as a reward function, a problem-specific RL agent is trained to solve the problem. The predicted solution candidate of the RL agent is verified and repaired by CBLS to ensure solutions, that satisfy the constraint model. We introduce the framework and its components and discuss early results and future applications.
【Keywords】:
【Paper Link】 【Pages】:10039-10040
【Authors】: Shirly Stephen ; Torsten Hahmann
【Abstract】: Satisfiability of first-order logic (FOL) ontologies is typically verified by translation to propositional satisfiability (SAT) problems, which is then tackled by a SAT solver. Unfortunately, SAT solvers often experience scalability issues when reasoning with FOL ontologies and even moderately sized datasets. While SAT solvers have been found to capably handle complex axiomatizations, finding models of datasets gets considerably more complex and time-intensive as the number of clause exponentially increases with increase in individuals and axiomatic complexity. We identify FOL definitions as a specific bottleneck and demonstrate via experiments that the presence of many defined terms of the highest arity significantly slows down model finding. We also show that removing optional definitions and substituting these terms by their definiens leads to a reduction in the number of clauses, which makes SAT-based model finding practical for over 100 individuals in a FOL theory.
【Keywords】:
【Paper Link】 【Pages】:10041-10042
【Authors】: Lixin Su ; Jiafeng Guo ; Yixing Fan ; Yanyan Lan ; Ruqing Zhang ; Xueqi Cheng
【Abstract】: In Conversational Question Answering (CoQA), humans propose a series of questions to satisfy their information needs. Based on our preliminary analysis, there are two major types of questions, namely verification questions and knowledgeseeking questions. The first one is to verify some existing facts, while the latter is to obtain new knowledge about some specific object. These two types of questions differ significantly in their answering ways. However, existing methods usually treat them uniformly, which may easily be biased by the dominant type of questions and obtain inferior overall performance. In this work, we propose an adaptive framework to handle these two types of questions in different ways based on their own characteristics. We conduct experiments on the recently released CoQA benchmark dataset, and the results demonstrate that our method outperforms the state-of-the-art baseline methods.
【Keywords】:
【Paper Link】 【Pages】:10043-10044
【Authors】: Jiankai Sun ; Srinivasan Parthasarathy
【Abstract】: In this paper, we propose to solve the directed graph embedding problem via a two stage approach: in the first stage, the graph is symmetrized in one of several possible ways, and in the second stage, the so-obtained symmetrized graph is embeded using any state-of-the-art (undirected) graph embedding algorithm. Note that it is not the objective of this paper to propose a new (undirected) graph embedding algorithm or discuss the strengths and weaknesses of existing ones; all we are saying is that whichever be the suitable graph embedding algorithm, it will fit in the above proposed symmetrization framework.
【Keywords】:
【Paper Link】 【Pages】:10045-10046
【Authors】: Atena M. Tabakhi
【Abstract】: The key assumption in Weighted Constraint Satisfaction Problems (WCSPs) is that all constraints are specified a priori. This assumption does not hold in some applications that involve users preferences. Incomplete WCSPs (IWCSPs) extend WCSPs by allowing some constraints to be partially specified. Unfortunately, existing IWCSP approaches either guarantee to return optimal solutions or not provide any quality guarantees on solutions found. To bridge the two extremes, we propose a number of parameterized heuristics that allow users to find boundedly-suboptimal solutions, where the error bound depends on user-defined parameters. These heuristics thus allow users to trade off solution quality for fewer elicited preferences and faster computation times.
【Keywords】:
【Paper Link】 【Pages】:10047-10048
【Authors】: Eichi Takaya ; Yusuke Takeichi ; Mamiko Ozaki ; Satoshi Kurihara
【Abstract】: In the research field called connectomics, it is aimed to investigate the structure and connection of the neural system in the brain and sensory organ of the living things. Earlier studies have been proposed the method to help experts who suffer from labeling for three-dimensional reconstruction, that is important process to observe tiny neuronal structure in detail. In this paper, we proposed semi-supervised learning method, that performs pseudo-labeling. This makes it possible to automatically segment neuronal regions using only a small amount of labeled data. Experimental result showed that our method outperformed normal supervised learning with few labeled samples, while the accuracy was not sufficient yet.
【Keywords】:
【Paper Link】 【Pages】:10049-10050
【Authors】: Hongyao Tang ; Jianye Hao ; Li Wang ; Tim Baarslag ; Zan Wang
【Abstract】: Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and static social learning framework. However, two aspects of dynamics in real-world MASs are currently missing. First, the network topologies can dynamically change during the course of interaction. Second, the interaction utilities between each pair of agents may not be identical and not known as a prior. Both issues mentioned above increase the difficulty of coordination. In this paper, we consider the multiagent social learning in a dynamic environment in which agents can alter their connections and interact with randomly chosen neighbors with unknown utilities beforehand. We propose an optimal rewiring strategy to select most beneficial peers to maximize the accumulated payoffs in long-run interactions. We empirically demonstrate the effects of our approach in large-scale MASs.
【Keywords】:
【Paper Link】 【Pages】:10051-10052
【Authors】: Prayag Tiwari ; Massimo Melucci
【Abstract】: Machine Learning (ML) helps us to recognize patterns from raw data. ML is used in numerous domains i.e. biomedical, agricultural, food technology, etc. Despite recent technological advancements, there is still room for substantial improvement in prediction. Current ML models are based on classical theories of probability and statistics, which can now be replaced by Quantum Theory (QT) with the aim of improving the effectiveness of ML. In this paper, we propose the Binary Classifier Inspired by Quantum Theory (BCIQT) model, which outperforms the state of the art classification in terms of recall for every category.
【Keywords】:
【Paper Link】 【Pages】:10053-10054
【Authors】: Manan Tomar ; Akhil Sathuluri ; Balaraman Ravindran
【Abstract】: Generating a curriculum for guided learning involves subjecting the agent to easier goals first, and then gradually increasing their difficulty. This work takes a similar direction and proposes a dual curriculum scheme for solving robotic manipulation tasks with sparse rewards, called MaMiC. It includes a macro curriculum scheme which divides the task into multiple subtasks followed by a micro curriculum scheme which enables the agent to learn between such discovered subtasks. We show how combining macro and micro curriculum strategies help in overcoming major exploratory constraints considered in robot manipulation tasks without having to engineer any complex rewards and also illustrate the meaning and usage of the individual curricula. The performance of such a scheme is analysed on the Fetch environments.
【Keywords】:
【Paper Link】 【Pages】:10055-10056
【Authors】: Ziye Tong ; Junwei Zhou ; Yanchao Yang ; Lee-Ming Cheng
【Abstract】: In this paper, we propose a two-stage cascaded pose regression for facial landmark localization under occlusion. In the first stage, a global cascaded pose regression with robust initialization is performed to get localization results for the original face and its mirror image. The localization difference between the original image and the mirror image is used to determine whether the localization of each landmark is reliable, while unreliable localization with a large difference can be adjusted. In the second stage, the global results are divided into four parts, which are further refined by local regressions. Finally, the four refined local results are integrated and adjusted to get the final output.
【Keywords】:
【Paper Link】 【Pages】:10057-10058
【Authors】: Ashwini Tonge ; Cornelia Caragea
【Abstract】: With millions of images shared online, privacy concerns are on the rise. In this paper, we propose an approach to image privacy prediction by dynamically identifying powerful features corresponding to objects, scene context, and image tags derived from Convolutional Neural Networks for each test image. Specifically, our approach identifies the set of most “competent” features on the fly, according to each test image whose privacy has to be predicted. Experimental results on thousands of Flickr images show that our approach predicts the sensitive (or private) content more accurately than the models trained on each individual feature set (object, scene, and tags alone) or their combination.
【Keywords】:
【Paper Link】 【Pages】:10059-10060
【Authors】: Ziyu Wan ; Yan Li ; Min Yang ; Junge Zhang
【Abstract】: In this paper, we propose a Visual Center Adaptation Method (VCAM) to address the domain shift problem in zero-shot learning. For the seen classes in the training data, VCAM builds an embedding space by learning the mapping from semantic space to some visual centers. While for unseen classes in the test data, the construction of embedding space is constrained by a symmetric Chamfer-distance term, aiming to adapt the distribution of the synthetic visual centers to that of the real cluster centers. Therefore the learned embedding space can generalize the unseen classes well. Experiments on two widely used datasets demonstrate that our model significantly outperforms state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:10061-10062
【Authors】: Yueyang Wang ; Ziheng Duan ; Binbing Liao ; Fei Wu ; Yueting Zhuang
【Abstract】: Network embedding which assigns nodes in networks to lowdimensional representations has received increasing attention in recent years. However, most existing approaches, especially the spectral-based methods, only consider the attributes in homogeneous networks. They are weak for heterogeneous attributed networks that involve different node types as well as rich node attributes and are common in real-world scenarios. In this paper, we propose HANE, a novel network embedding method based on Graph Convolutional Networks, that leverages both the heterogeneity and the node attributes to generate high-quality embeddings. The experiments on the real-world dataset show the effectiveness of our method.
【Keywords】:
【Paper Link】 【Pages】:10063-10064
【Authors】: Frank J. Wouda ; Matteo Giuberti ; Giovanni Bellusci ; Bert-Jan F. van Beijnum ; Peter H. Veltink
【Abstract】: Previous research has shown that estimating full-body poses from a minimal sensor set using a trained ANN without explicitly enforcing time coherence has resulted in output pose sequences that occasionally show undesired jitter. To mitigate such effect, we propose to improve the ANN output by combining it with a state prediction using a Kalman Filter. Preliminary results are promising, as the jitter effects are diminished. However, the overall error does not decrease substantially.
【Keywords】:
【Paper Link】 【Pages】:10065-10066
【Authors】: Jianjun Wu ; Ying Sha ; Bo Jiang ; Jianlong Tan
【Abstract】: Structural representations of user social influence are critical for a variety of applications such as viral marketing and recommendation products. However, existing studies only focus on capturing and preserving the structure of relations, and ignore the diversity of influence relations patterns among users. To this end, we propose a deep structural influence learning model to learn social influence structure via mining rich features of each user, and fuse information from the aligned selfnetwork component for preserving global and local structure of the influence relations among users. Experiments on two real-world datasets demonstrate that the proposed model outperforms the state-of-the-art algorithms for learning rich representations in multi-label classification task.
【Keywords】:
【Paper Link】 【Pages】:10067-10068
【Authors】: Jingyun Xu ; Yi Cai
【Abstract】: Some text classification methods don’t work well on short texts due to the data sparsity. What’s more, they don’t fully exploit context-relevant knowledge. In order to tackle these problems, we propose a neural network to incorporate context-relevant knowledge into a convolutional neural network for short text classification. Our model consists of two modules. The first module utilizes two layers to extract concept and context features respectively and then employs an attention layer to extract those context-relevant concepts. The second module utilizes a convolutional neural network to extract high-level features from the word and the contextrelevant concept features. The experimental results on three datasets show that our proposed model outperforms the stateof-the-art models.
【Keywords】:
【Paper Link】 【Pages】:10069-10070
【Authors】: Hansheng Xue ; Jiajie Peng ; Xuequn Shang
【Abstract】: Multi-networks integration methods have achieved prominent performance on many network-based tasks, but these approaches often incur information loss problem. In this paper, we propose a novel multi-networks representation learning method based on semi-supervised autoencoder, termed as DeepMNE, which captures complex topological structures of each network and takes the correlation among multinetworks into account. The experimental results on two realworld datasets indicate that DeepMNE outperforms the existing state-of-the-art algorithms.
【Keywords】:
【Paper Link】 【Pages】:10071-10072
【Authors】: Lingkai Yang ; Yi-Nan Guo ; Jian Cheng
【Abstract】: Over-sampling technology for handling the class imbalanced problem generates more minority samples to balance the dataset size of different classes. However, sampling in original data space is ineffective as the data in different classes is overlapped or disjunct. Based on this, a new minority sample is presented in terms of the manifold distance rather than Euclidean distance. The overlapped majority and minority samples apt to distribute in fully disjunct subspaces from the view of manifold learning. Moreover, it can avoid generating samples between the minority data locating far away in manifold space. Experiments on 23 UCI datasets show that the proposed method has the better classification accuracy.
【Keywords】:
【Paper Link】 【Pages】:10073-10074
【Authors】: Yuhang Yao ; Xiao Zeng ; Tianyue Cao ; Luoyi Fu ; Xinbing Wang
【Abstract】: Due to little attention given to anonymous protection against eavesdropping attacks in Bitcoin network, this paper initiatively proposes a solution to Bitcoin anonymization based on network structure. We first present a general adversarial network model for formulizing deanonymization attack, then present a novel propagation method APRP(Adaptive PageRank Propagation) that adopts PageRank as propagation delay factor and constantly adjusts PR-value of nodes to adapt to network dynamics. Experiments on both simulated and real Bitcoin networks confirm the superiority of APRP in terms of 20-50% performance enhancement under various deanonymization attacks.
【Keywords】:
【Paper Link】 【Pages】:10075-10076
【Authors】: Salah Zaiem ; Fatiha Sadat
【Abstract】: As fas as we are aware, using Sequence to Sequence algorithms for query expansion has not been explored yet in Information Retrieval literature. We tried to fill this gap in the literature with a custom Query Expansion system trained and tested on open datasets. One specificity of our engine compared to classic ones is that it does not need the documents to expand the introduced query. We test our expansions on two different tasks : Information Retrieval and Answer preselection. Our method yielded a slight improvement in performance in both two tasks . Our main contributions are :• Starting from open datasets, we built a Query Expansion training set using sentence-embeddings-based Keyword Extraction.• We assess the ability of the Sequence to Sequence neural networks to capture expanding relations in the words embeddings’ space.We afterwards started a quantitative and qualitative analysis of the weights learned by our network. In the second part, I will discuss what is learned by a Recurrent Neural Network compared to what we know about human language learning.
【Keywords】:
【Paper Link】 【Pages】:10077-10078
【Authors】: Michal Zajac ; Konrad Zolna ; Negar Rostamzadeh ; Pedro O. Pinheiro
【Abstract】: Neural networks are prone to adversarial attacks. In general, such attacks deteriorate the quality of the input by either slightly modifying most of its pixels, or by occluding it with a patch. In this paper, we propose a method that keeps the image unchanged and only adds an adversarial framing on the border of the image. We show empirically that our method is able to successfully attack state-of-theart methods on both image and video classification problems. Notably, the proposed method results in a universal attack which is very fast at test time. Source code can be found at github.com/zajaczajac/adv_framing.
【Keywords】:
【Paper Link】 【Pages】:10079-10080
【Authors】: Zhiwei Zeng ; Chunyan Miao ; Cyril Leung ; Zhiqi Shen ; Jing Jih Chin
【Abstract】: The process of arguing is also the process of justifying and explaining. Here, we focus on argumentative explanations in Abstract Bipolar Argumentation. We propose new defence and acceptability semantics, which operates on both attack and support relations, and use them to formalize two types of explanations, concise and strong explanations. We also show how to compute the explanations with Bipolar Dispute Trees.
【Keywords】:
【Paper Link】 【Pages】:10081-10082
【Authors】: Zongliang Zhang ; Hongbin Zeng ; Jonathan Li ; Yiping Chen ; Chenhui Yang ; Cheng Wang
【Abstract】: This paper deals with the geometric multi-model fitting from noisy, unstructured point set data (e.g., laser scanned point clouds). We formulate multi-model fitting problem as a sequential decision making process. We then use a deep reinforcement learning algorithm to learn the optimal decisions towards the best fitting result. In this paper, we have compared our method against the state-of-the-art on simulated data. The results demonstrated that our approach significantly reduced the number of fitting iterations.
【Keywords】:
【Paper Link】 【Pages】:10083-10084
【Authors】: Weichan Zhong ; Xiaojun Chen ; Guowen Yuan ; Yiqin Li ; Feiping Nie
【Abstract】: In this paper, we propose a novel Adaptive Discriminant Analysis for semi-supervised feature selection, namely SADA. Instead of computing fixed similarities before performing feature selection, SADA simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative method. In each iteration, S is computed from the projected distance with the learned W and W is computed with the learned S. Therefore, SADA can learn better projection matrix W by weakening the effect of noise features with the adaptive similarity matrix. Experimental results on 4 data sets show the superiority of SADA compared to 5 semisupervised feature selection methods.
【Keywords】:
【Paper Link】 【Pages】:10085-10086
【Authors】: Yuqian Zhou ; Jianbo Jiao ; Haibin Huang ; Jue Wang ; Thomas Huang
【Abstract】: Discriminative learning based denoising model trained with Additive White Gaussian Noise (AWGN) performs well on synthesized noise. However, realistic noise can be spatialvariant, signal-dependent and a mixture of complicated noises. In this paper, we explore multiple strategies for applying an AWGN-based denoiser to realistic noise. Specifically, we trained a deep network integrating noise estimating and denoiser with mixed Gaussian (AWGN) and Random Value Impulse Noise (RVIN). To adapt the model to realistic noises, we investigated multi-channel, multi-scale and super-resolution approaches. Our preliminary results demonstrated the effectiveness of the newly-proposed noise model and adaptation strategies.
【Keywords】:
【Paper Link】 【Pages】:10087-10088
【Authors】: Konrad Zolna ; Krzysztof J. Geras ; Kyunghyun Cho
【Abstract】: Extracting saliency maps, which indicate parts of the image important to classification, requires many tricks to achieve satisfactory performance when using classifier-dependent methods. Instead, we propose classifier-agnostic saliency map extraction. This allows to find all parts of the image that any classifier could use, not just one given in advance. This way we extract much higher quality saliency maps.
【Keywords】: