30. AAAI 2017:San Francisco, California, USA

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. AAAI Press 【DBLP Link

Paper Num: 786 || Session Num: 36

Applications 11

1. SnapNETS: Automatic Segmentation of Network Sequences with Node Labels.

Paper Link】 【Pages】:3-9

【Authors】: Sorour E. Amiri ; Liangzhe Chen ; B. Aditya Prakash

【Abstract】: Given a sequence of snapshots of flu propagating over a population network, can we find a segmentation when the patterns of the disease spread change, possibly due to interventions? In this paper, we study the problem of segmenting graph sequences with labeled nodes. Memes on the Twitter network, diseases over a contact network, movie-cascades over a social network, etc. are all graph sequences with labeled nodes. Most related work is on plain graphs (and hence ignore the label dynamics) or fix parameters or require much feature engineering. Instead, we propose SnapNETS, to automatically find segmentations of such graph sequences, with different characteristics of nodes of each label in adjacent segments. It satisfies all the desired properties (being parameter-free, comprehensive and scalable) by leveraging a principled, multi-level, flexible framework which maps the problem to a path optimization problem over a weighted DAG. Extensive experiments on several diverse real datasets show that it finds cut points matching ground-truth or meaningful external signals outperforming non-trivial baselines. We also show that SnapNETS scales near-linearly with the size of the input.

【Keywords】: Segmentation, Graph, Sequence

2. Taming the Matthew Effect in Online Markets with Social Influence.

Paper Link】 【Pages】:10-16

【Authors】: Franco Berbeglia ; Pascal Van Hentenryck

【Abstract】: Social influence has been shown to create a Matthew effect in online markets, increasing inequalities and leading to “winner-take-all” phenomena. Matthew effects have been observed for numerous market policies, including when the products are presented to consumers by popularity or quality. This paper studies how to reduce Matthew effects, while keeping markets efficient and predictable when social influence is used. It presents a market strategy based on randomization and segmentation, that ensures that the best products, if they are close in quality, will have reasonably close market shares. The benefits of this market strategy is justified both theoretically and empirically and the loss in market efficiency is shown to be acceptable.

【Keywords】: Computational Social Science

3. A Leukocyte Detection Technique in Blood Smear Images Using Plant Growth Simulation Algorithm.

Paper Link】 【Pages】:17-23

【Authors】: Deblina Bhattacharjee ; Anand Paul

【Abstract】: For quite some time, the analysis of leukocyte images has drawn significant attention from the fields of medicine and computer vision alike where various techniques have been used to automate the manual analysis and classification of such images. Analysing such samples manually for detecting leukocytes is time-consuming and prone to error as the cells have different morphological features. Therefore, in order to automate and optimize the process, the nature-inspired Plant Growth Simulation Algorithm (PGSA) has been applied in this paper. An automated detection technique of white blood cells embedded in obscured, stained and smeared images of blood samples has been presented in this paper which is based on a random bionic algorithm and makes use of a fitness function that measures the similarity of the generated candidate solution to an actual leukocyte. As the proposed algorithm proceeds the set of candidate solutions evolves, guaranteeing their fit with the actual leukocytes outlined in the edge map of the image. The experimental results of the stained images and the empirical results reported validate the higher precision and sensitivity of the proposed method than the existing methods. Further, the proposed method reduces the feasible sets of candidate points in each iteration, thereby decreasing the required run time of load flow, objective function evaluation, thus reaching the goal state in minimum time and within the desired constraints.

【Keywords】: Computer Vision; Medical Image Classification

4. Partitioned Sampling of Public Opinions Based on Their Social Dynamics.

Paper Link】 【Pages】:24-30

【Authors】: Weiran Huang ; Liang Li ; Wei Chen

【Abstract】: Public opinion polling is usually done by random sampling from the entire population, treating individual opinions as independent. In the real world, individuals' opinions are often correlated, e.g., among friends in a social network. In this paper, we explore the idea of partitioned sampling, which partitions individuals with high opinion similarities into groups and then samples every group separately to obtain an accurate estimate of the population opinion. We rigorously formulate the above idea as an optimization problem. We then show that the simple partitions which contain only one sample in each group are always better, and reduce finding the optimal simple partition to a well-studied Min-r-Partition problem. We adapt an approximation algorithm and a heuristic algorithm to solve the optimization problem. Moreover, to obtain opinion similarity efficiently, we adapt a well-known opinion evolution model to characterize social interactions, and provide an exact computation of opinion similarities based on the model. We use both synthetic and real-world datasets to demonstrate that the partitioned sampling method results in significant improvement in sampling quality and it is robust when some opinion similarities are inaccurate or even missing.

【Keywords】: sampling; social networks; opinion evolution dynamics

5. Novel Geometric Approach for Global Alignment of PPI Networks.

Paper Link】 【Pages】:31-37

【Authors】: Yangwei Liu ; Hu Ding ; Danyang Chen ; Jinhui Xu

【Abstract】: In this paper we present a novel geometric method for the problem of global pairwise alignment of protein-protein interaction (PPI) networks. A PPI network can be viewed as a node-edge graph and its alignment often needs to solve some generalized version of the subgraph isomorphism problem which is notoriously challenging and NP-hard. All existing research has focused on designing algorithms with good practical performance. In this paper we propose a two-step algorithm for the global pairwise PPI network alignment which consists of a Geometric Step and an MCMF Step. Our algorithm first applies a graph embedding technique that preserves the topological structure of the original PPI networks and maps the problem from graph domain to geometric domain, and computes a rigid transformation for one of the embedded PPI networks so as to minimize its Earth Mover's Distance (EMD) to the other PPI network. It then solves a Min-Cost Max-Flow problem using the (scaled) inverse of sequence similarity scores as edge weight. By using the flow values from the two steps (i.e., EMD and Min-Cost Max-Flow) as the matching scores, we are able to combine the two matching results to obtain the desired alignment. Unlike other popular alignment algorithms which are either greedy or incremental, our algorithm globally optimizes the problem to yield an alignment with better quality.

【Keywords】:

6. Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach.

Paper Link】 【Pages】:38-44

【Authors】: Yihui Ma ; Jia Jia ; Suping Zhou ; Jingtian Fu ; Yejun Liu ; Zijian Tong

【Abstract】: In this paper, we aim to better understand the clothing fashion styles. There remain two challenges for us: 1) how to quantitatively describe the fashion styles of various clothing, 2) how to model the subtle relationship between visual features and fashion styles, especially considering the clothing collocations. Using the words that people usually use to describe clothing fashion styles on shopping websites, we build a Fashion Semantic Space (FSS) based on Kobayashi's aesthetics theory to describe clothing fashion styles quantitatively and universally. Then we propose a novel fashion-oriented multimodal deep learning based model, Bimodal Correlative Deep Autoencoder (BCDA) , to capture the internal correlation in clothing collocations. Employing the benchmark dataset we build with 32133 full-body fashion show images, we use BCDA to map the visual features to the FSS. The experiment results indicate that our model outperforms (+13% in terms of MSE) several alternative baselines, confirming that our model can better understand the clothing fashion styles. To further demonstrate the advantages of our model, we conduct some interesting case studies, including fashion trends analyses of brands, clothing collocation recommendation, etc.

【Keywords】: Fashion style; Multimodal deep learning; Clothing collocation

7. Profit-Driven Team Grouping in Social Networks.

Paper Link】 【Pages】:45-51

【Authors】: Shaojie Tang

【Abstract】: In this paper, we investigate the profit-driven team grouping problem in social networks. We consider a setting in which people possess different skills and compatibility among these individuals is captured by a social network. Here, we assume a collection of tasks, where each task requires a specific set of skills, and yields a different profit upon completion. Active and qualified individuals may collaborate with each other in the form of teams to accomplish a set of tasks. Our goal is to find a grouping method that maximizes the total profit of the tasks that these teams can complete. Any feasible grouping must satisfy the following three conditions: (i) each team possesses all skills required by the task, (ii) individuals within the same team are social compatible, and (iii) each individual is not overloaded. We refer to this as the Team Grouping problem. Our work presents a detailed analysis of the computational complexity of the problem, and propose a LP-based approximation algorithm to tackle it and its variants. Although we focus on team grouping in this paper, our results apply to a broad range of optimization problems that can be formulated as a cover decomposition problem.

【Keywords】: team grouping; social networks; cover decomposition

8. Gated Neural Networks for Option Pricing: Rationality by Design.

Paper Link】 【Pages】:52-58

【Authors】: Yongxin Yang ; Yu Zheng ; Timothy M. Hospedales

【Abstract】: We propose a neural network approach to price EU call options that significantly outperforms some existing pricing models and comes with guarantees that its predictions are economically reasonable. To achieve this, we introduce a class of gated neural networks that automatically learn to divide-and-conquer the problem space for robust and accurate pricing. We then derive instantiations of these networks that are 'rational by design' in terms of naturally encoding a valid call option surface that enforces no arbitrage principles. This integration of human insight within data-driven learning provides significantly better generalisation in pricing performance due to the encoded inductive bias in the learning, guarantees sanity in the model's predictions, and provides econometrically useful byproduct such as risk neutral density.

【Keywords】: Option Pricing

9. Local Discriminant Hyperalignment for Multi-Subject fMRI Data Alignment.

Paper Link】 【Pages】:59-65

【Authors】: Muhammad Yousefnezhad ; Daoqiang Zhang

【Abstract】: Multivariate Pattern (MVP) classification can map different cognitive states to the brain tasks. One of the main challenges in MVP analysis is validating the generated results across subjects. However, analyzing multi-subject fMRI data requires accurate functional alignments between neuronal activities of different subjects, which can rapidly increase the performance and robustness of the final results. Hyperalignment (HA) is one of the most effective functional alignment methods, which can be mathematically formulated by the Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the unsupervised CCA techniques, its solution may not be optimized for MVP analysis. By incorporating the idea of Local Discriminant Analysis (LDA) into CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel supervised HA method, which can provide better functional alignment for MVP analysis. Indeed, the locality is defined based on the stimuli categories in the train-set, where the correlation between all stimuli in the same category will be maximized and the correlation between distinct categories of stimuli approaches to near zero. Experimental studies on multi-subject MVP analysis confirm that the LDHA method achieves superior performance to other state-of-the-art HA algorithms.

【Keywords】: Hyperalignment; multi-subject fMRI; Canonical Correlation Analysis; Local Discriminant Hyperalignment

10. Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images.

Paper Link】 【Pages】:66-72

【Authors】: Lequan Yu ; Xin Yang ; Hao Chen ; Jing Qin ; Pheng-Ann Heng

【Abstract】: Automated prostate segmentation from 3D MR images is very challenging due to large variations of prostate shape and indistinct prostate boundaries. We propose a novel volumetric convolutional neural network (ConvNet) with mixed residual connections to cope with this challenging problem. Compared with previous methods, our volumetric ConvNet has two compelling advantages. First, it is implemented in a 3D manner and can fully exploit the 3D spatial contextual information of input data to perform efficient, precise and volume-to-volume prediction. Second and more important, the novel combination of residual connections (i.e., long and short) can greatly improve the training efficiency and discriminative capability of our network by enhancing the information propagation within the ConvNet both locally and globally. While the forward propagation of location information can improve the segmentation accuracy, the smooth backward propagation of gradient flow can accelerate the convergence speed and enhance the discrimination capability. Extensive experiments on the open MICCAI PROMISE12 challenge dataset corroborated the effectiveness of the proposed volumetric ConvNet with mixed residual connections. Our method ranked the first in the challenge, outperforming other competitors by a large margin with respect to most of evaluation metrics. The proposed volumetric ConvNet is general enough and can be easily extended to other medical image analysis tasks, especially ones with limited training data.

【Keywords】: 3D MRI; prostate segmentation; deep learning; CNNs

11. StructInf: Mining Structural Influence from Social Streams.

Paper Link】 【Pages】:73-80

【Authors】: Jing Zhang ; Jie Tang ; Yuanyi Zhong ; Yuchen Mo ; Juanzi Li ; Guojie Song ; Wendy Hall ; Jimeng Sun

【Abstract】: Social influence is a fundamental issue in social network analysis and has attracted tremendous attention with the rapid growth of online social networks. However, existing research mainly focuses on studying peer influence. This paper introduces a novel notion of structural influence and studies how to efficiently discover structural influence patterns from social streams. We present three sampling algorithms with theoretical unbiased guarantee to speed up the discovery process. Experiments on a big microblogging dataset show that the proposed sampling algorithms can achieve a 10 times speedup compared to the exact influence pattern mining algorithm, with an average error rate of only 1.0%. The extracted structural influence patterns have many applications. We apply them to predict retweet behavior, with performance being significantly improved.

【Keywords】: Social Networks

Artificial Intelligence and the Web 26

12. Transitive Hashing Network for Heterogeneous Multimedia Retrieval.

Paper Link】 【Pages】:81-87

【Authors】: Zhangjie Cao ; Mingsheng Long ; Jianmin Wang ; Qiang Yang

【Abstract】: Hashing is widely applied to large-scale multimedia retrieval due to the storage and retrieval efficiency. Cross-modal hashing enables efficient retrieval of one modality from database relevant to a query of another modality. Existing work on cross-modal hashing assumes that heterogeneous relationship across modalities is available for learning to hash. This paper relaxes this strict assumption by only requiring heterogeneous relationship in some auxiliary dataset different from the query or database domain. We design a novel hybrid deep architecture, transitive hashing network (THN), to jointly learn cross-modal correlation from the auxiliary dataset, and align the data distributions of the auxiliary dataset with that of the query or database domain, which generates compact transitive hash codes for efficient cross-modal retrieval. Comprehensive empirical evidence validates that the proposed THN approach yields state of the art retrieval performance on standard multimedia benchmarks, i.e. NUS-WIDE and ImageNet-YahooQA.

【Keywords】: Deep Hashing; Transitive Hashing

13. Marrying Uncertainty and Time in Knowledge Graphs.

Paper Link】 【Pages】:88-94

【Authors】: Melisachew Wudage Chekol ; Giuseppe Pirrò ; Joerg Schoenfisch ; Heiner Stuckenschmidt

【Abstract】: The management of uncertainty is crucial when harvesting structured content from unstructured and noisy sources. Knowledge Graphs ( KGs ) are a prominent example. KGs maintain both numerical and non-numerical facts, with the support of an underlying schema. These facts are usually accompanied by a confidence score that witnesses how likely is for them to hold. Despite their popularity, most of existing KGs focus on static data thus impeding the availabilityof timewise knowledge. What is missing is a comprehensive solution for the management of uncertain and temporal data in KGs . The goal of this paper is to fill this gap. We rely on two main ingredients. The first is a numerical extension of Markov Logic Networks (MLNs) that provide the necessary underpinning to formalize the syntax and semantics of uncertain temporal KGs . The second is a set of Datalog constraints with inequalities that extend the underlying schema of the KGs and help to detect inconsistencies. From a theoretical point of view, we discuss the complexity of two important classes of queries for uncertain temporal KGs: maximuma-posteriori and conditional probability inference. Due to the hardness of these problems and the fact that MLN solvers do not scale well, we also explore the usage of Probabilistic Soft Logics (PSL) as a practical tool to support our reasoning tasks. We report on an experimental evaluation comparing the MLN and PSL approaches.

【Keywords】: knowledge graphs; temporal; markov logic network

14. TweetFit: Fusing Multiple Social Media and Sensor Data for Wellness Profile Learning.

Paper Link】 【Pages】:95-101

【Authors】: Aleksandr Farseev ; Tat-Seng Chua

【Abstract】: Wellness is a widely popular concept that is commonly applied to fitness and self-help products or services. Inference of personal wellness-related attributes, such as body mass index or diseases tendency, as well as understanding of global dependencies between wellness attributes and users' behavior is of crucial importance to various applications in personal and public wellness domains. Meanwhile, the emergence of social media platforms and wearable sensors makes it feasible to perform wellness profiling for users from multiple perspectives. However, research efforts on wellness profiling and integration of social media and sensor data are relatively sparse, and this study represents one of the first attempts in this direction. Specifically, to infer personal wellness attributes, we proposed multi-source individual user profile learning framework named "TweetFit". "TweetFit" can handle data incompleteness and perform wellness attributes inference from sensor and social media data simultaneously. Our experimental results show that the integration of the data from sensors and multiple social media sources can substantially boost the wellness profiling performance.

【Keywords】: Multi-Task Learning; Multi-Source Learning; Multi-View Learning, Wellness; User Profiling; Sensors; Social Networks

15. POI2Vec: Geographical Latent Representation for Predicting Future Visitors.

Paper Link】 【Pages】:102-108

【Authors】: Shanshan Feng ; Gao Cong ; Bo An ; Yeow Meng Chee

【Abstract】: With the increasing popularity of location-aware social media applications, Point-of-Interest (POI) recommendation has recently been extensively studied. However, most of the existing studies explore from the users' perspective, namely recommending POIs for users. In contrast, we consider a new research problem of predicting users who will visit a given POI in a given future period. The challenge of the problem lies in the difficulty to effectively learn POI sequential transition and user preference, and integrate them for prediction. In this work, we propose a new latent representation model POI2Vec that is able to incorporate the geographical influence, which has been shown to be very important in modeling user mobility behavior. Note that existing representation models fail to incorporate the geographical influence. We further propose a method to jointly model the user preference and POI sequential transition influence for predicting potential visitors for a given POI. We conduct experiments on 2 real-world datasets to demonstrate the superiority of our proposed approach over the state-of-the-art algorithms for both next POI prediction and future user prediction.

【Keywords】: POI2vec, embedding, POI

16. A Dependency-Based Neural Reordering Model for Statistical Machine Translation.

Paper Link】 【Pages】:109-115

【Authors】: Christian Hadiwinoto ; Hwee Tou Ng

【Abstract】: In machine translation (MT) that involves translating between two languages with significant differences in word order, determining the correct word order of translated words is a major challenge. The dependency parse tree of a source sentence can help to determine the correct word order of the translated words. In this paper, we present a novel reordering approach utilizing a neural network and dependency-based embeddings to predict whether the translations of two source words linked by a dependency relation should remain in the same order or should be swapped in the translated sentence. Experiments on Chinese-to-English translation show that our approach yields a statistically significant improvement of 0.57 BLEU point on benchmark NIST test sets, compared to our prior state-of-the-art statistical MT system that uses sparse dependency-based reordering features.

【Keywords】: statistical machine translation; reordering; dependency parse tree

17. Joint Identification of Network Communities and Semantics via Integrative Modeling of Network Topologies and Node Contents.

Paper Link】 【Pages】:116-124

【Authors】: Dongxiao He ; Zhiyong Feng ; Di Jin ; Xiaobao Wang ; Weixiong Zhang

【Abstract】: The objective of discovering network communities, an essential step in complex systems analysis, is two-fold: identification of functional modules and their semantics at the same time. However, most existing community-finding methods have focused on finding communities using network topologies, and the problem of extracting module semantics has not been well studied and node contents, which often contain semantic information of nodes and networks, have not been fully utilized. We considered the problem of identifying network communities and module semantics at the same time. We introduced a novel generative model with two closely correlated parts, one for communities and the other for semantics. We developed a co-learning strategy to jointly train the two parts of the model by combining a nested EM algorithm and belief propagation. By extracting the latent correlation between the two parts, our new method is not only robust for finding communities and semantics, but also able to provide more than one semantic explanation to a community. We evaluated the new method on artificial benchmarks and analyzed the semantic interpretability by a case study. We compared the new method with eight state-of-the-art methods on ten real-world networks, showing its superior performance over the existing methods.

【Keywords】: complex networks; community detection; generative model; belief propagation

18. Random-Radius Ball Method for Estimating Closeness Centrality.

Paper Link】 【Pages】:125-131

【Authors】: Wataru Inariba ; Takuya Akiba ; Yuichi Yoshida

【Abstract】: In the analysis of real-world complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distance-based centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the random-radius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on real-world networks.

【Keywords】: Closeness centrality; Network analysis; Approximation algorithm

19. Read the Silence: Well-Timed Recommendation via Admixture Marked Point Processes.

Paper Link】 【Pages】:132-139

【Authors】: Hideaki Kim ; Tomoharu Iwata ; Yasuhiro Fujiwara ; Naonori Ueda

【Abstract】: Everything has its time, which is also true in the point-of-interest (POI) recommendation task. A truly intelligent recommender system, even if you don't visit any sites or remain silent, should draw hints of your next destination from the ``silence", and revise its recommendations as needed. In this paper, we construct a well-timed POI recommender system that updates its recommendations in accordance with the silence, the temporal period in which no visits are made. To achieve this, we propose a novel probabilistic model to predict the joint probabilities of the user visiting POIs and their time-points, by using the admixture or mixed-membership structure to extend marked point processes. With the admixture structure, the proposed model obtains a low dimensional representation for each user, leading to robust recommendation against sparse observations. We also develop an efficient and easy-to-implement estimation algorithm for the proposed model based on collapsed Gibbs and slice sampling. We apply the proposed model to synthetic and real-world check-in data, and show that it performs well in the well-timed recommendation task.

【Keywords】: marked point process; timely recommendation; admixture model; user modeling

20. Treatment Effect Estimation with Data-Driven Variable Decomposition.

Paper Link】 【Pages】:140-146

【Authors】: Kun Kuang ; Peng Cui ; Bo Li ; Meng Jiang ; Shiqiang Yang ; Fei Wang

【Abstract】: One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D$^2$VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that the proposed D$^2$VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.

【Keywords】: causal inference; treatment effect; variables decomposition

21. A Declarative Approach to Data-Driven Fact Checking.

Paper Link】 【Pages】:147-153

【Authors】: Julien Leblay

【Abstract】: Fact checking is an essential part of any investigative work. For linguistic, psychological and social reasons, it is an inherently human task. Yet, modern media make it increasingly difficult for experts to keep up with the pace at which information is produced. Hence, we believe there is value in tools to assist them in this process. Much of the effort on Web data research has been focused on coping with incompleteness and uncertainty. Comparatively, dealing with context has received less attention, although it is crucial in judging the validity of a claim. For instance, what holds true in a US state, might not in its neighbors, e.g., due to obsolete or superseded laws. In this work, we address the problem of checking the validity of claims in multiple contexts. We define a language to represent and query facts across different dimensions. The approach is non-intrusive and allows relatively easy modeling, while capturing incompleteness and uncertainty. We describe the syntax and semantics of the language. We present algorithms to demonstrate its feasibility, and we illustrate its usefulness through examples.

【Keywords】: Datalog+/-; Markov Logic; Fact Checking

Paper Link】 【Pages】:154-160

【Authors】: Zemin Liu ; Vincent W. Zheng ; Zhou Zhao ; Fanwei Zhu ; Kevin Chen-Chuan Chang ; Minghui Wu ; Jing Ying

【Abstract】: Many real-world networks have a rich collection of objects. The semantics of these objects allows us to capture different classes of proximities, thus enabling an important task of semantic proximity search. As the core of semantic proximity search, we have to measure the proximity on a heterogeneous graph, whose nodes are various types of objects. Most of the existing methods rely on engineering features about the graph structure between two nodes to measure their proximity. With recent development on graph embedding, we see a good chance to avoid feature engineering for semantic proximity search. There is very little work on using graph embedding for semantic proximity search. We also observe that graph embedding methods typically focus on embedding nodes, which is an "indirect'' approach to learn the proximity. Thus, we introduce a new concept of proximity embedding, which directly embeds the network structure between two possibly distant nodes. We also design our proximity embedding, so as to flexibly support both symmetric and asymmetric proximities. Based on the proximity embedding, we can easily estimate the proximity score between two nodes and enable search on the graph. We evaluate our proximity embedding method on three real-world public data sets, and show it outperforms the state-of-the-art baselines.

【Keywords】: semantic proximity search; heterogeneous graph; proximity embedding

23. Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.

Paper Link】 【Pages】:161-167

【Authors】: Yishuang Ning ; Jia Jia ; Zhiyong Wu ; Runnan Li ; Yongsheng An ; Yanfeng Wang ; Helen M. Meng

【Abstract】: Speech interaction systems have been gaining popularity in recent years. The main purpose of these systems is to generate more satisfactory responses according to users' speech utterances, in which the most critical problem is to analyze user intention. Researches show that user intention conveyed through speech is not only expressed by content, but also closely related with users' speaking manners (e.g. with or without acoustic emphasis). How to incorporate these heterogeneous attributes to infer user intention remains an open problem. In this paper, we define Intention Prominence (IP) as the semantic combination of focus by text and emphasis by speech, and propose a multi-task deep learning framework to predict IP. Specifically, we first use long short-term memory (LSTM) which is capable of modeling long short-term contextual dependencies to detect focus and emphasis, and incorporate the tasks for focus and emphasis detection with multi-task learning (MTL) to reinforce the performance of each other. We then employ Bayesian network (BN) to incorporate multimodal features (focus, emphasis, and location reflecting users' dialect conventions) to predict IP based on feature correlations. Experiments on a data set of 135,566 utterances collected from real-world Sogou Voice Assistant illustrate that our method can outperform the comparison methods over 6.9-24.5% in terms of F1-measure. Moreover, a real practice in the Sogou Voice Assistant indicates that our method can improve the performance on user intention understanding by 7%.

【Keywords】: Intention prominence; User intention understanding; Long Short-Term Memory (LSTM); Multi-task

24. Understanding the Semantic Structures of Tables with a Hybrid Deep Neural Network Architecture.

Paper Link】 【Pages】:168-174

【Authors】: Kyosuke Nishida ; Kugatsu Sadamitsu ; Ryuichiro Higashinaka ; Yoshihiro Matsuo

【Abstract】: We propose a new deep neural network architecture, TabNet, for table type classification. Table type is essential information for exploring the power of Web tables, and it is important to understand the semantic structures of tables in order to classify them correctly. A table is a matrix of texts, analogous to an image, which is a matrix of pixels, and each text consists of a sequence of tokens. Our hybrid architecture mirrors the structure of tables: its recurrent neural network (RNN) encodes a sequence of tokens for each cell to create a 3d table volume like image data, and its convolutional neural network (CNN) captures semantic features, e.g., the existence of rows describing properties, to classify tables. Experiments using Web tables with various structures and topics demonstrated that TabNet achieved considerable improvements over state-of-the-art methods specialized for table classification and other deep neural network architectures.

【Keywords】: Web Tables; Classification; Deep Learning; Recurrent Neural Networks; Convolutional Neural Networks

25. Radon - Rapid Discovery of Topological Relations.

Paper Link】 【Pages】:175-181

【Authors】: Mohamed Ahmed Sherif ; Kevin Dreßler ; Panayiotis Smeros ; Axel-Cyrille Ngonga Ngomo

【Abstract】: Geospatial data is at the core of the Semantic Web, of which the largest knowledge base contains more than 30 billions facts. Reasoning on these large amounts of geospatial data requires efficient methods for the computation of links between the resources contained in these knowledge bases. In this paper, we present Radon – efficient solution for the discovery of topological relations between geospatial resources according to the DE9-IM standard. Our evaluation shows that we outperform the state of the art significantly and by several orders of magnitude.

【Keywords】: Link Discovery; Topological relation; Optimization; Linked Data

26. Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity.

Paper Link】 【Pages】:182-188

【Authors】: Xiaoshuai Sun ; Jiewei Cao ; Chao Li ; Lei Zhu ; Heng Tao Shen

【Abstract】: In this paper, we present an automatic approach for on-line discovery of visual-lingual semantic fragments from weakly labeled Internet images. Instead of learning region-entity correspondences from well-labeled image-sentence pairs, our approach directly collects and enhances the weakly labeled visual contents from the Web and constructs an adaptive visual representation which automatically links generic lingual phrases to their related visual contents. To ensure reliable and efficient semantic discovery, we adopt non-parametric density estimation to re-rank the related visual instances and proposed a fast self-similarity-based quality assessment method to identify the high-quality semantic fragments. The discovered semantic fragments provide an adaptive joint representation for texts and images, based on which lingual-visual similarity can be defined for further co-analysis of heterogeneous multimedia data. Experimental results on semantic fragment quality assessment, sentence-based image retrieval, automatic multimedia insertion and ordering demonstrated the effectiveness of the proposed framework.The experiments show that the proposed methods can make effective use of the Web knowledge, and are able to generate competitive results compared to state-of-the-art approaches in various tasks.

【Keywords】: Web Knowledge Mining; Lingual-Visual Analysis; Automatic Multimedia Insertion and Ordering

27. Exploiting both Vertical and Horizontal Dimensions of Feature Hierarchy for Effective Recommendation.

Paper Link】 【Pages】:189-195

【Authors】: Zhu Sun ; Jie Yang ; Jie Zhang ; Alessandro Bozzon

【Abstract】: Feature hierarchy (FH) has proven to be effective to improve recommendation accuracy. Prior work mainly focuses on the influence of vertically affiliated features (i.e. child-parent) on user-item interactions. The relationships of horizontally organized features (i.e. siblings and cousins) in the hierarchy, however, has only been little investigated. We show in real-world datasets that feature relationships in horizontal dimension can help explain and further model user-item interactions. To fully exploit FH, we propose a unified recommendation framework that seamlessly incorporates both vertical and horizontal dimensions for effective recommendation. Our model further considers two types of semantically rich feature relationships in horizontal dimension, i.e. complementary and alternative relationships. Extensive validation on four real-world datasets demonstrates the superiority of our approach against the state of the art. An additional benefit of our model is to provide better interpretations of the generated recommendations.

【Keywords】:

28. Phrase-Based Presentation Slides Generation for Academic Papers.

Paper Link】 【Pages】:196-202

【Authors】: Sida Wang ; Xiaojun Wan ; Shikang Du

【Abstract】: Automatic generation of presentation slides for academic papers is a very challenging task. Previous methods for addressing this task are mainly based on document summarization techniques and they extract document sentences to form presentation slides, which are not well-structured and concise. In this study, we propose a phrase-based approach to generate well-structured and concise presentation slides for academic papers. Our approach first extracts phrases from the given paper, and then learns both the saliency of each phrase and the hierarchical relationship between a pair of phrases. Finally a greedy algorithm is used to select and align the salient phrases in order to form the well-structured presentation slides. Evaluation results on a real dataset verify the efficacy of our proposed approach.

【Keywords】: Presentation Slides Generation; Document Summarization; Text Mining

29. Community Preserving Network Embedding.

Paper Link】 【Pages】:203-209

【Authors】: Xiao Wang ; Peng Cui ; Jing Wang ; Jian Pei ; Wenwu Zhu ; Shiqiang Yang

【Abstract】: Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

【Keywords】: Network embedding; community structure; nonnegative matrix factorization

30. CLARE: A Joint Approach to Label Classification and Tag Recommendation.

Paper Link】 【Pages】:210-216

【Authors】: Yilin Wang ; Suhang Wang ; Jiliang Tang ; Guo-Jun Qi ; Huan Liu ; Baoxin Li

【Abstract】: Data classification and tag recommendation are both important and challenging tasks in social media. These two tasks are often considered independently and most efforts have been made to tackle them separately. However, labels in data classification and tags in tag recommendation are inherently related. For example, a Youtube video annotated with NCAA, stadium, pac12 is likely to be labeled as football, while a video/image with the class label of coast is likely to be tagged with beach, sea, water and sand. The existence of relations between labels and tags motivates us to jointly perform classification and tag recommendation for social media data in this paper. In particular, we provide a principled way to capture the relations between labels and tags, and propose a novel framework CLARE, which fuses data CLAssification and tag REcommendation into a coherent model. With experiments on three social media datasets, we demonstrate that the proposed framework CLARE achieves superior performance on both tasks compared to the state-of-the-art methods.

【Keywords】: classification;tag;recommendation

31. Multiple Source Detection without Knowing the Underlying Propagation Model.

Paper Link】 【Pages】:217-223

【Authors】: Zheng Wang ; Chaokun Wang ; Jisheng Pei ; Xiaojun Ye

【Abstract】: Information source detection, which is the reverse problem of information diffusion, has attracted considerable research effort recently. Most existing approaches assume that the underlying propagation model is fixed and given as input, which may limit their application range. In this paper, we study the multiple source detection problem when the underlying propagation model is unknown. Our basic idea is source prominence, namely the nodes surrounded by larger proportions of infected nodes are more likely to be infection sources. As such, we propose a multiple source detection method called Label Propagation based Source Identification (LPSI). Our method lets infection status iteratively propagate in the network as labels, and finally uses local peaks of the label propagation result as source nodes. In addition, both the convergent and iterative versions of LPSI are given. Extensive experiments are conducted on several real-world datasets to demonstrate the effectiveness of the proposed method.

【Keywords】: social network; source detection; information diffusion

32. Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network.

Paper Link】 【Pages】:224-230

【Authors】: Jufeng Yang ; Ming Sun ; Xiaoxiao Sun

【Abstract】: Visual sentiment analysis is raising more and more attention with the increasing tendency to express emotions through images. While most existing works assign a single dominant emotion to each image, we address the sentiment ambiguity by label distribution learning (LDL), which is motivated by the fact that image usually evokes multiple emotions. Two new algorithms are developed based on conditional probability neural network (CPNN). First, we proposed BCPNN which encodes image label into a binary representation to replace the signless integers used in CPNN, and employ it as a part of input for the neural network. Then, we train our ACPNN model by adding noises to ground truth label and augmenting affective distributions. Since current datasets are mostly annotated for single-label learning, we build two new datasets, one of which is relabeled on the popular Flickr dataset and the other is collected from Twitter. These datasets contain 20,745 images with multiple affective labels, which are over ten times larger than the existing ones. Experimental results show that the proposed methods outperform the state-of-the-art works on our large-scale datasets and other publicly available benchmarks.

【Keywords】:

33. Visual Sentiment Analysis by Attending on Local Image Regions.

Paper Link】 【Pages】:231-237

【Authors】: Quanzeng You ; Hailin Jin ; Jiebo Luo

【Abstract】: Visual sentiment analysis, which studies the emotional response of humans on visual stimuli such as images and videos, has been an interesting and challenging problem. It tries to understand the high-level content of visual data. The success of current models can be attributed to the development of robust algorithms from computer vision. Most of the existing models try to solve the problem by proposing either robust features or more complex models. In particular, visual features from the whole image or video are the main proposed inputs. Little attention has been paid to local areas, which we believe is pretty relevant to human's emotional response to the whole image. In this work, we study the impact of local image regions on visual sentiment analysis. Our proposed model utilizes the recent studied attention mechanism to jointly discover the relevant local regions and build a sentiment classifier on top of these local regions. The experimental results suggest that 1) our model is capable of automatically discovering sentimental local regions of given images and 2) it outperforms existing state-of-the-art algorithms to visual sentiment analysis.

【Keywords】: Sentiment Analysis; Visual Sentiment Analysis; Localization; Visual attention

34. Correlated Cascades: Compete or Cooperate.

Paper Link】 【Pages】:238-244

【Authors】: Ali Zarezade ; Ali Khodadadi ; Mehrdad Farajtabar ; Hamid R. Rabiee ; Hongyuan Zha

【Abstract】: In real world social networks, there are multiple cascades which are rarely independent. They usually compete or cooperate with each other. Motivated by the reinforcement theory in sociology we leverage the fact that adoption of a user to any behavior is modeled by the aggregation of behaviors of its neighbors. We use a multidimensional marked Hawkes process to model users product adoption and consequently spread of cascades in social networks. The resulting inference problem is proved to be convex and is solved in parallel by using the barrier method. The advantage of the proposed model is twofold; it models correlated cascades and also learns the latent diffusion network. Experimental results on synthetic and two real datasets gathered from Twitter, URL shortening and music streaming services, illustrate the superior performance of the proposed model over the alternatives.

【Keywords】: Social Network; Diffusion Process; Hawkes process; Reinforcement theory

35. Finding Critical Users for Social Network Engagement: The Collapsed k-Core Problem.

Paper Link】 【Pages】:245-251

【Authors】: Fan Zhang ; Ying Zhang ; Lu Qin ; Wenjie Zhang ; Xuemin Lin

【Abstract】: In social networks, the leave of critical users may significantly break network engagement, i.e., lead a large number of other users to drop out. A popular model to measure social network engagement is k-core, the maximal induced subgraph in which every vertex has at least k neighbors. To identify critical users for social network engagement, we propose the collapsed k-core problem: given a graph G, a positive integer k and a budget b, we aim to find b vertices in G such that the deletion of the b vertices leads to the smallest k-core. We prove the problem is NP-hard. Then, an efficient algorithm is proposed, which significantly reduces the number of candidate vertices to speed up the computation. Our comprehensive experiments on 9 real-life social networks demonstrate the effectiveness and efficiency of our proposed method.

【Keywords】: k-core;graph;social network

36. Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising.

Paper Link】 【Pages】:252-258

【Authors】: Jia Zhang ; Zheng Wang ; Qian Li ; Jialin Zhang ; Yanyan Lan ; Qiang Li ; Xiaoming Sun

【Abstract】: In this work, we study the guaranteed delivery model which is widely used in online advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e.g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.

【Keywords】: online advertising; guaranteed delivery; near optimal delivery policy; minimum user traffic consumption; maximum flow

37. Expectile Matrix Factorization for Skewed Data Analysis.

Paper Link】 【Pages】:259-266

【Authors】: Rui Zhu ; Di Niu ; Linglong Kong ; Zongpeng Li

【Abstract】: Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose expectile matrix factorization by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.

【Keywords】: Matrix Factorization; Expectile Regression; Nonconvex Optimization

Cognitive Modeling and Cognitive Systems 2

38. Associative Memory Using Dictionary Learning and Expander Decoding.

Paper Link】 【Pages】:267-273

【Authors】: Arya Mazumdar ; Ankit Singh Rawat

【Abstract】: An associative memory is a framework of content-addressable memory that stores a collection of message vectors (or a dataset) over a neural network while enabling a neurally feasible mechanism to recover any message in the dataset from its noisy version. Designing an associative memory requires addressing two main tasks: 1) learning phase: given a dataset, learn a concise representation of the dataset in the form of a graphical model (or a neural network), 2) recall phase: given a noisy version of a message vector from the dataset, output the correct message vector via a neurally feasible algorithm over the network learnt during the learning phase. This paper studies the problem of designing a class of neural associative memories which learns a network representation for a large dataset that ensures correction against a large number of adversarial errors during the recall phase. Specifically, the associative memories designed in this paper can store dataset containing exp( n ) n -length message vectors over a network with O ( n ) nodes and can tolerate Ω( n / polylog) adversarial errors. This paper carries out this memory design by mapping the learning phase and recall phase to the tasks of dictionary learning with a square dictionary and iterative error correction in an expander code, respectively.

【Keywords】:

39. An Integrated Model for Effective Saliency Prediction.

Paper Link】 【Pages】:274-281

【Authors】: Xiaoshuai Sun ; Zi Huang ; Hongzhi Yin ; Heng Tao Shen

【Abstract】: In this paper, we proposed an integrated model of both semantic-aware and contrast-aware saliency (SCA) combining both bottom-up and top-down cues for effective eye fixation prediction. The proposed (SCA) model contains two pathways. The first pathway is a deep neural network customized for semantic-aware saliency, which aims to capture the semantic information in images, especially for the presence of meaningful objects and object parts. The second pathway is based on on-line feature learning and information maximization, which learns an adaptive representation for the input and discovers the high contrast salient patterns within the image context. The two pathways characterize both long-term and short-term attention cues and are integrated using maxima normalization. Experimental results on artificial images and several benchmark dataset demonstrate the superior performance and better plausibility of the proposed model over both classic approaches and recent deep models.

【Keywords】: Saliency; Semantic; Contrast; Integrated Model

Game Playing and Interactive Entertainment 1

40. The Efficiency of the HyperPlay Technique Over Random Sampling.

Paper Link】 【Pages】:282-290

【Authors】: Michael John Schofield ; Michael Thielscher

【Abstract】: We show that the HyperPlay technique, which maintains a bag of updatable models for sampling an imperfect-information game, is more efficient than taking random samples of play sequences. Also, we demonstrate that random sampling may become impossible under the practical constraints of a game. We show the HyperPlay sample can become biased and not uniformly distributed across an information set and present a remedy for this bias, showing the impact on game results for biased and unbiased samples. We extrapolate the use of the technique beyond General Game Playing and in particular for enhanced security games with in-game percepts to facilitate a flexible defense response.

【Keywords】: General Game Playing; Imperfect Information; Security Games; Information Set Sampling

Game Theory and Economic Paradigms 65

41. Market Pricing for Data Streams.

Paper Link】 【Pages】:291-297

【Authors】: Melika Abolhassani ; Hossein Esfandiari ; MohammadTaghi Hajiaghayi ; Brendan Lucier ; Hadi Yami

【Abstract】: Internet-enabled marketplaces such as Amazon deal with huge datasets registering transaction of merchandises between lots of buyers and sellers. It is important that algorithms become more time and space efficient as the size of datasets increase. An algorithm that runs in polynomial time may not have a reasonable running time for such large datasets. Here, we study the development of pricing algorithms that are appropriate for use with massive datasets. We especially focus on the streaming setting, the common model for big data analysis. We present an envy-free mechanism for social welfare maximization problem in the streaming setting using O ( k 2 l ) space, where k is the number of different goods and l is the number of available items of each good. We also provide an α-approximation mechanism for revenue maximization in this setting given an α-approximation mechanism for the corresponding offline problem exists. Moreover, we provide mechanisms to approximate the optimum social welfare (or revenue) within 1 – ε factor, in space independent of l which would be favorable in case l is large compared to k . Finally, we present hardness results showing approximation of optimal prices that maximize social welfare (or revenue) in the streaming setting needs Ω( l ) space. We achieve our results by developing a powerful sampling technique for bipartite networks. The simplicity of our sampling technique empowers us to maintain the sample over the input sequence. Indeed, one can construct this sample in the distributed setting (a.k.a, MapReduce) and get the same results in two rounds of computations, or one may simply apply this sampling technique to provide faster offline algorithms.

【Keywords】: Auction; Big Data; Streaming Algorithms, Envy-Free Mechanisms

42. Automated Design of Robust Mechanisms.

Paper Link】 【Pages】:298-304

【Authors】: Michael Albert ; Vincent Conitzer ; Peter Stone

【Abstract】: We introduce a new class of mechanisms, robust mechanisms, that is an intermediary between ex-post mechanisms and Bayesian mechanisms. This new class of mechanisms allows the mechanism designer to incorporate imprecise estimates of the distribution over bidder valuations in a way that provides strong guarantees that the mechanism will perform at least as well as ex-post mechanisms, while in many cases performing better. We further extend this class to mechanisms that are with high probability incentive compatible and individually rational, ε-robust mechanisms. Using techniques from automated mechanism design and robust optimization, we provide an algorithm polynomial in the number of bidder types to design robust and ε-robust mechanisms. We show experimentally that this new class of mechanisms can significantly outperform traditional mechanism design techniques when the mechanism designer has an estimate of the distribution and the bidder’s valuation is correlated with an externally verifiable signal.

【Keywords】: Mechanism Design; Correlated distributions; Prior dependent; Auction Theory; Automated Mechanism Design; Game Theory

43. Incentivising Monitoring in Open Normative Systems.

Paper Link】 【Pages】:305-311

【Authors】: Natasha Alechina ; Joseph Y. Halpern ; Ian A. Kash ; Brian Logan

【Abstract】: We present an approach to incentivising monitoring for norm violations in open multi-agent systems such as Wikipedia. In such systems, there is no crisp definition of a norm violation; rather, it is a matter of judgement whether an agent's behaviour conforms to generally accepted standards of behaviour. Agents may legitimately disagree about borderline cases. Using ideas from scrip systems and peer prediction, we show how to design a mechanism that incentivises agents to monitor each other's behaviour for norm violations. The mechanism keeps the probability of undetected violations (submissions that the majority of the community would consider not conforming to standards) low, and is robust against collusion by the monitoring agents.

【Keywords】: mechanism design; scrip; peer prediction; monitoring for norm violations

44. Envy-Free Mechanisms with Minimum Number of Cuts.

Paper Link】 【Pages】:312-318

【Authors】: Reza Alijani ; Majid Farhadi ; Mohammad Ghodsi ; Masoud Seddighin ; Ahmad S. Tajik

【Abstract】: We study the problem of fair division of a heterogeneous resource among strategic players. Given a divisible heterogeneous cake, we wish to divide the cake among n players in a way that meets the following criteria: (I) every player(weakly) prefers his allocated cake to any other player’s share (such notion is known as envy-freeness), (II) the mechanism is strategy-proof (truthful), and (III) the number of cuts made on the cake is minimal. We provide methods, namely expansion process and expansion process with unlocking, for dividing the cake under different assumptions on the valuation functions of the players.

【Keywords】: envy-free; mechanism design; truthful; cake cutting; fair division

45. Strategic Signaling and Free Information Disclosure in Auctions.

Paper Link】 【Pages】:319-327

【Authors】: Shani Alkoby ; David Sarne ; Igal Milchtaich

【Abstract】: With the increasing interest in the role information providers play in multi-agent systems, much effort has been dedicated to analyzing strategic information disclosure and signaling by such agents. This paper analyzes the problem in the context of auctions (specifically for second-price auctions). It provides an equilibrium analysis to the case where the information provider can use signaling according to some pre-committed scheme before introducing its regular (costly) information selling offering. The signal provided, publicly discloses (for free) some of the information held by the information provider. Providing the signaling is thus somehow counter intuitive as the information provider ultimately attempts to maximize her gain from selling the information she holds. Still, we show that such signaling capability can be highly beneficial for the information provider and even improve social welfare. Furthermore, the examples provided demonstrate various possible other beneficial behaviors available to the different players as well as to a market designer, such as paying the information provider to leave the system or commit to a specific signaling scheme. Finally, the paper provides an extension of the underlying model, related to the use of mixed signaling strategies.

【Keywords】: Auctions; Strategic Information Provider; Free Information Disclosure

46. Complexity of Manipulating Sequential Allocation.

Paper Link】 【Pages】:328-334

【Authors】: Haris Aziz ; Sylvain Bouveret ; Jérôme Lang ; Simon Mackenzie

【Abstract】: Sequential allocation is a simple allocation mechanism in which agents are given pre-specified turns in which they take one item among those that are still available. It has long been known that sequential allocation is not strategyproof. This raises the question of the complexity of computing a preference report that yields a higher utility than the truthful preference. We show that the problem is NP-complete for one manipulating agent with additive utilities and several non-manipulating agents. In doing so, we correct a wrong claim made in a previous paper. We then give two additional results. First, we present a polynomial-time algorithm for optimal manipulation when the manipulator has additive binary utilities. Second, we consider a stronger notion of manipulation whereby the untruthful outcome yields more utility than the truthful outcome for all utilities consistent with the ordinal preferences; for this notion, we show that a manipulation, if any, can be computed in polynomial time.

【Keywords】: social choice; resource allocation; sequential allocation; manipulation; game theory

47. Algorithms for Max-Min Share Fair Allocation of Indivisible Chores.

Paper Link】 【Pages】:335-341

【Authors】: Haris Aziz ; Gerhard Rauchecker ; Guido Schryen ; Toby Walsh

【Abstract】: We consider Max-min Share (MmS) fair allocations of indivisible chores (items with negative utilities). We show that allocation of chores and classical allocation of goods (items with positive utilities) have some fundamental connections but also differences which prevent a straightforward application of algorithms for goods in the chores setting and vice-versa. We prove that an MmS allocation does not need to exist for chores and computing an MmS allocation - if it exists - is strongly NP-hard. In view of these non-existence and complexity results, we present a polynomial-time 2-approximation algorithm for MmS fairness for chores. We then introduce a new fairness concept called optimal MmS that represents the best possible allocation in terms of MmS that is guaranteed to exist. We use connections to parallel machine scheduling to give (1) a polynomial-time approximation scheme for computing an optimal MmS allocation when the number of agents is fixed and (2) an effective and efficient heuristic with an ex-post worst-case analysis.

【Keywords】: multi-agent systems; resource allocation; fair division

48. Nash Stability in Social Distance Games.

Paper Link】 【Pages】:342-348

【Authors】: Alkida Balliu ; Michele Flammini ; Giovanna Melideo ; Dennis Olivetti

【Abstract】: We consider Social Distance Games (SDGs), that is cluster formation games in which agent utilities are proportional to their harmonic centralities in the respective coalitions, i.e., to the average inverse distance from the other agents. We adopt Nash stable outcomes, that is states in which no agent can improve her utility by unilaterally changing her coalition, as the target solution concept. Although SDGs always admit a Nash equilibrium, we prove that it is NP-hard to find a social welfare maximizing one and obtain a negative result concerning the game convergence. We then focus on the performance of Nash equilibria and provide matching upper bound and lower bounds on the price of anarchy of Θ( n ), where n is the number of nodes of the underlying graph, and a lower bound on the price of stability of 6/5 - ε. Finally, we characterize the price of stability of SDGs for graphs with girth 4 and girth at least 5.

【Keywords】: Algorithmic Game Theory, Coalition Formation, Social Distance Games, Nash Stability

49. On Pareto Optimality in Social Distance Games.

Paper Link】 【Pages】:349-355

【Authors】: Alkida Balliu ; Michele Flammini ; Dennis Olivetti

【Abstract】: We investigate Pareto stability in Social Distance Games, that are coalition forming games in which agents utilities are proportional to their harmonic centralities in the respective coalitions, i.e., to the average inverse distance from the other agents. Pareto optimal solutions have been already considered in the literature as outcomes arising from the strategic interaction of the agents. In particular, they are stable under the deviation of the grand coalition, as they do not permit a simultaneous deviation by all the agents making all of them weakly better off and some strictly better off. We first show that, while computing a Pareto stable solution maximizing the social welfare is NP-hard in bounded degree graphs, a 2 min{Delta,sqrt n}-approximating one can be determined in polynomial time, where n is the number of agents and Delta the maximum node degree. We then determine asymptotically tight bounds on the Price of Pareto Optimality for several classes of social graphs arising from the following combinations: unbounded and bounded node degree, undirected and directed edges, unweighted and weighted edges.

【Keywords】: Algorithmic Game Theory, Coalition Formation, Social Distance Games, Pareto Stability

50. Team-Maxmin Equilibrium: Efficiency Bounds and Algorithms.

Paper Link】 【Pages】:356-362

【Authors】: Nicola Basilico ; Andrea Celli ; Giuseppe De Nittis ; Nicola Gatti

【Abstract】: The Team-maxmin equilibrium prescribes the optimal strategies for a team of rational players sharing the same goal and without the capability of correlating their strategies in strategic games against an adversary. This solution concept can capture situations in which an agent controls multiple resources - corresponding to the team members - that cannot communicate. It is known that such equilibrium always exists and it is unique (except degenerate cases) and these properties make it a credible solution concept to be used in real-world applications, especially in security scenarios. Nevertheless, to the best of our knowledge, the Team-maxmin equilibrium is almost completely unexplored in the literature. In this paper, we investigate bounds of (in)efficiency of the Team-maxmin equilibrium w.r.t. the Nash equilibria and w.r.t. the Maxmin equilibrium when the team members can play correlated strategies. Furthermore, we study a number of algorithms to find and/or approximate an equilibrium, discussing their theoretical guarantees and evaluating their performance by using a standard testbed of game instances.

【Keywords】: Game Theory; Equilibrium

51. A Study of Compact Reserve Pricing Languages.

Paper Link】 【Pages】:363-368

【Authors】: MohammadHossein Bateni ; Hossein Esfandiari ; Vahab S. Mirrokni ; Saeed Seddighin

【Abstract】: Online advertising allows advertisers to implement fine-tuned targeting of users. While such precise targeting leads to more effective advertising,  it introduces challenging multidimensional pricing and bidding problems for publishers and advertisers. In this context, advertisers and publishers need to deal with an exponential number of possibilities. As a result, designing efficient and compact multidimensional bidding and pricing systems and algorithms are practically important for online advertisement.  Compact bidding languages have already been studied in the context of multiplicative bidding.  In this paper, we study the compact pricing problem.

【Keywords】: Reserve Price Compressed

52. Faster and Simpler Algorithm for Optimal Strategies of Blotto Game.

Paper Link】 【Pages】:369-375

【Authors】: Soheil Behnezhad ; Sina Dehghani ; Mahsa Derakhshan ; MohammadTaghi Hajiaghayi ; Saeed Seddighin

【Abstract】: In the Colonel Blotto game, which was initially introduced by Borel in 1921, two colonels simultaneously distribute their troops across different battlefields.The winner of each battlefield is determined independently by a winner-take-all rule. The ultimate payoff of each colonel is the number of battlefields he wins. This game is commonly used for analyzing a wide range of applications such as the U.S presidential election, innovative technology competitions, advertisements, etc. There have been persistent efforts for finding the optimal strategies for the Colonel Blotto game. After almost a century Ahmadinejad, Dehghani, Hajiaghayi, Lucier, Mahini, and Seddighin provided a poly-time algorithm for finding the optimal strategies. They first model the problem by a Linear Program (LP) with exponential number of constraints and use Ellipsoid method to solve it. However, despite the theoretical importance of their algorithm, it ishighly impractical. In general, even Simplex method (despite its exponential running-time) performs better than Ellipsoid method in practice. In this paper, we provide the first polynomial-size LP formulation of the optimal strategies for the Colonel Blotto game. We use linear extension techniques. Roughly speaking, we project the strategy space polytope to a higher dimensional space, which results in a lower number of facets for the polytope.We use this polynomial-size LP to provide a novel, simpler and significantly faster algorithm for finding the optimal strategies for the Colonel Blotto game. We further show this representation is asymptotically tight in terms of the number of constraints. We also extend our approach to multi-dimensional Colonel Blotto games, and implement our algorithm to observe interesting properties of Colonel Blotto; for example, we observe the behavior of players in the discrete model is very similar to the previously studied continuous model.

【Keywords】: Blotto Nash polytime

53. Preference Elicitation For Participatory Budgeting.

Paper Link】 【Pages】:376-382

【Authors】: Gerdus Benade ; Swaprava Nath ; Ariel D. Procaccia ; Nisarg Shah

【Abstract】: Participatory budgeting enables the allocation of public funds by collecting and aggregating individual preferences; it has already had a sizable real-world impact. But making the most of this new paradigm requires a rethinking of some of the basics of computational social choice, including the very way in which individuals express their preferences. We analytically compare four preference elicitation methods -- knapsack votes, rankings by value or value for money, and threshold approval votes -- through the lens of implicit utilitarian voting, and find that threshold approval votes are qualitatively superior. This conclusion is supported by experiments using data from real participatory budgeting elections.

【Keywords】: Participatory budgeting; Implicit utilitarian voting; Distortion; Regret

54. Exclusion Method for Finding Nash Equilibrium in Multiplayer Games.

Paper Link】 【Pages】:383-389

【Authors】: Kimmo Berg ; Tuomas Sandholm

【Abstract】: We present a complete algorithm for finding an epsilon-Nash equilibrium, for arbitrarily small epsilon, in games with more than two players. The method improves the best-known upper bound with respect to the number of players n, and it is the first implemented algorithm, to our knowledge, that manages to solve all instances. The main components of our tree-search-based method are a node-selection strategy, an exclusion oracle, and a subdivision scheme. The node-selection strategy determines the next region (of the strategy profile probability vector space) to be explored — based on the region's size and an estimate of whether the region contains an equilibrium. The exclusion oracle provides a provably correct sufficient condition for there not to exist an equilibrium in the region. The subdivision scheme determines how the region is split if it cannot be excluded. Unlike the well-known incomplete methods, our method does not need to proceed locally, which avoids it getting stuck in a local minimum---in the space of players' regrets — that may be far from any actual equilibrium. The run time grows rapidly with the game size; this reflects the dimensionality of this difficult problem. That suggests a hybrid scheme where one of the relatively fast prior incomplete algorithms is run, and if it fails to find an equilibrium, then our method is used.

【Keywords】: Nash equilibrium; equilibrium finding; multi-player games; exclusion oracle; noncooperative games; computation; game theory

55. Teams in Online Scheduling Polls: Game-Theoretic Aspects.

Paper Link】 【Pages】:390-396

【Authors】: Robert Bredereck ; Jiehua Chen ; Rolf Niedermeier ; Svetlana Obraztsova ; Nimrod Talmon

【Abstract】: Consider an important meeting to be held in a team-based organization. Taking availability constraints into account, an online scheduling poll is being used in order to decide upon the exact time of the meeting. Decisions are to be taken during the meeting, therefore each team would like to maximize its relative attendance (i.e. the proportional number of its team members attending the meeting). We introduce a corresponding game, where each team can declare a lower total availability in the scheduling poll in order to improve its relative attendance—the pay-off. We are especially interested in situations where teams can form coalitions. We provide an efficient algorithm that, given a coalition, finds an optimal way for each team in a coalition to improve its pay-off. In contrast, we show that deciding whether such a coalition exists is NP-hard. We also study the existence of Nash equilibria: Finding Nash equilibria for various small sizes of teams and coalitions can be done in polynomial time while it is coNP-hard if the coalition size is unbounded.

【Keywords】: Algorithmic Game Theory; Computational Complexity Analysis; Algorithms

56. Probably Approximately Efficient Combinatorial Auctions via Machine Learning.

Paper Link】 【Pages】:397-405

【Authors】: Gianluca Brero ; Benjamin Lubin ; Sven Seuken

【Abstract】: A well-known problem in combinatorial auctions (CAs) is that the value space grows exponentially in the number of goods, which often puts a large burden on the bidders and on the auctioneer. In this paper, we introduce a new design paradigm for CAs based on machine learning (ML). Bidders report their values (bids) to a proxy agent by answering a small number of value queries. The proxy agent then uses an ML algorithm to generalize from those bids to the whole value space, and the efficient allocation is computed based on the generalized valuations. We introduce the concept of "probably approximate efficiency (PAE)" to measure the efficiency of the new ML-based auctions, and we formally show how the generelizability of an ML algorithm relates to the efficiency loss incurred by the corresponding ML-based auction. To instantiate our paradigm, we use support vector regression (SVR) as our ML algorithm, which enables us to keep the winner determination problem of the CA tractable. Different parameters of the SVR algorithm allow us to trade off the expressiveness, economic efficiency, and computational efficiency of the CA. Finally, we demonstrate experimentally that, even with a small number of bids, our ML-based auctions are highly efficient with high probability.

【Keywords】: combinatorial auctions; machine learning; mechanism design

57. Phragmén's Voting Methods and Justified Representation.

Paper Link】 【Pages】:406-413

【Authors】: Markus Brill ; Rupert Freeman ; Svante Janson ; Martin Lackner

【Abstract】: In the late 19th century, Lars Edvard Phragmén proposed a load-balancing approach for selecting committees based on approval ballots. We consider three committee voting rules resulting from this approach: two optimization variants one minimizing the maximal load and one minimizing the variance of loads —and a sequential variant. We study Phragmén's methods from an axiomatic point of view, focussing on justified representation and related properties that have recently been introduced by Aziz et al. (2015a) and Sánchez-Fernández et al. (2017). We show that the sequential variant satisfies proportional justified representation, making it the first known polynomial-time computable method with this property. Moreover, we show that the optimization variants satisfy perfect representation. We also analyze the com- putational complexity of Phragmén's methods and provide mixed-integer programming based algorithms for computing them.

【Keywords】: committee voting; load balancing; approval ballots; representation

58. Multiwinner Approval Rules as Apportionment Methods.

Paper Link】 【Pages】:414-420

【Authors】: Markus Brill ; Jean-François Laslier ; Piotr Skowron

【Abstract】: We establish a link between multiwinner elections and apportionment problems by showing how approval-based multiwinner election rules can be interpreted as methods of apportionment. We consider several multi-winner rules and observe that some, but not all, of them induce apportionment methods that are well established in the literature and in the actual practice of proportional representation. For instance, we show that Proportional Approval Voting induces the D'Hondt method and that Monroe's rule induces the largest remainder method. We also consider properties of apportionment methods and exhibit multiwinner rules that induce apportionment methods satisfying these properties.

【Keywords】:

59. Dynamic Thresholding and Pruning for Regret Minimization.

Paper Link】 【Pages】:421-429

【Authors】: Noam Brown ; Christian Kroer ; Tuomas Sandholm

【Abstract】: Regret minimization is widely used in determining strategies for imperfect-information games and in online learning. In large games, computing the regrets associated with a single iteration can be slow. For this reason, pruning — in which parts of the decision tree are not traversed in every iteration — has emerged as an essential method for speeding up iterations in large games. The ability to prune is a primary reason why the Counterfactual Regret Minimization (CFR) algorithm using regret matching has emerged as the most popular iterative algorithm for imperfect-information games, despite its relatively poor convergence bound. In this paper, we introduce dynamic thresholding, in which a threshold is set at every iteration such that any action in the decision tree with probability below the threshold is set to zero probability. This enables pruning for the first time in a wide range of algorithms. We prove that dynamic thresholding can be applied to Hedge while increasing its convergence bound by only a constant factor in terms of number of iterations. Experiments demonstrate a substantial improvement in performance for Hedge as well as the excessive gap technique.

【Keywords】: extensive-form game; equilibrium computation; regret minimization; convex optimization

60. Optimizing Positional Scoring Rules for Rank Aggregation.

Paper Link】 【Pages】:430-436

【Authors】: Ioannis Caragiannis ; Xenophon Chatzigeorgiou ; George A. Krimpas ; Alexandros A. Voudouris

【Abstract】: Nowadays, several crowdsourcing projects exploit social choice methods for computing an aggregate ranking of alternatives given individual rankings provided by workers. Motivated by such systems, we consider a setting where each worker is asked to rank a fixed (small) number of alternatives and, then, a positional scoring rule is used to compute the aggregate ranking. Among the apparently infinite such rules, what is the best one to use? To answer this question, we assume that we have partial access to an underlying true ranking. Then, the important optimization problem to be solved is to compute the positional scoring rule whose outcome, when applied to the profile of individual rankings, is as close as possible to the part of the underlying true ranking we know. We study this fundamental problem from a theoretical point of view and present positive and negative complexity results. Furthermore, we complement our theoretical findings with experiments on real-world and synthetic data.

【Keywords】: social choice; rank aggregation; positional scoring rules; crowdsourcing; approximation algorithms; computational complexity

61. On Markov Games Played by Bayesian and Boundedly-Rational Players.

Paper Link】 【Pages】:437-4443

【Authors】: Muthukumaran Chandrasekaran ; Yingke Chen ; Prashant Doshi

【Abstract】: We present a new game-theoretic framework in which Bayesian players with bounded rationality engage in a Markov game and each has private but incomplete information regarding other players' types. Instead of utilizing Harsanyi's abstract types and a common prior, we construct intentional player types whose structure is explicit and induces a {\em finite-level} belief hierarchy. We characterize an equilibrium in this game and establish the conditions for existence of the equilibrium. The computation of finding such equilibria is formalized as a constraint satisfaction problem and its effectiveness is demonstrated on two cooperative domains.

【Keywords】: Markov games; Bayesian players; Bounded rationality; Equilibria

62. Bounded Rationality of Restricted Turing Machines.

Paper Link】 【Pages】:444-450

【Authors】: Lijie Chen ; Pingzhong Tang ; Ruosong Wang

【Abstract】: Bounded rationality aims to understand the effects of how limited rationality affects decision-making. The traditional models in game theory and multiagent system research, such as finite automata or unrestricted Turing machine, fall short of capturing how intelligent agents make decision in realistic applications. To address this problem, we model bounded rational agents as restricted Turing machines: restrictions on running time and on storage space. We study our model under the context of two-person repeated games. In the case where the running time of Turing machines is restricted, we show that computing the best response of a given strategy is much harder than the strategy itself. In the case where the storage space of the Turing machines is restricted, we show the best response of a space restricted strategy can not be implemented by machines within the same size (up to a constant factor). Finally, we study how these restrictions affect the set of Nash equilibria in infinitely repeated games.We show restricting the agent’s computational resources will give rise to new Nash equilibria.

【Keywords】: bounded rationality; repeated games; restricted Turing machine

63. Winner Determination in Huge Elections with MapReduce.

Paper Link】 【Pages】:451-458

【Authors】: Theresa Csar ; Martin Lackner ; Reinhard Pichler ; Emanuel Sallinger

【Abstract】: In computational social choice, we are concerned with the development of methods for joint decision making. A central problem in this field is the winner determination problem, which aims at identifying the most preferred alternative(s). With the rise of modern e-business platforms, processing of huge amounts of preference data has become an issue. In this work, we apply the MapReduce framework - which has been specifically designed for dealing with big data - to various versions of the winner determination problem. We obtain efficient and highly parallel algorithms and provide a theoretical analysis and experimental evaluation.

【Keywords】: Winner Determination; Social Choice; Computational Social Choice; MapReduce; Parallel Computation; Voting; Cloud Computing

64. Approximation and Parameterized Complexity of Minimax Approval Voting.

Paper Link】 【Pages】:459-465

【Authors】: Marek Cygan ; Lukasz Kowalik ; Arkadiusz Socala ; Krzysztof Sornat

【Abstract】: We present three results on the complexity of MINIMAX APPROVAL VOTING. First, we study MINIMAX APPROVAL VOTING parameterized by the Hamming distance d from the solution to the votes. We show MINIMAX APPROVAL VOTING admits no algorithm running in time O ⋆ (2 o ( d log d ) , unless the Exponential Time Hypothesis (ETH) fails. This means that the O ⋆ ( d 2 d ) algorithm of Misra et al. (AAMAS 2015) is essentially optimal. Motivated by this, we then show a parameterized approximation scheme, running in time O ⋆ ((3/ε) 2 d ), which is essentially tight assuming ETH. Finally, we get a new polynomial-time randomized approximation scheme for MINIMAX APPROVAL VOTING, which runs in time n O(1/ε2·log(1/ε)) · poly( m ), almost matching the running time of the fastest known PTAS for CLOSEST STRING due to Ma and Sun (SIAM J. Comp. 2009).

【Keywords】: minimax approval voting; computational social choice; lower bound; parameterized complexity; ptas

65. The Computational Complexity of Weighted Greedy Matching.

Paper Link】 【Pages】:466-474

【Authors】: Argyrios Deligkas ; George B. Mertzios ; Paul G. Spirakis

【Abstract】: Motivated by the fact that in several cases a matching in a graph is stable if and only if it is produced by a greedy algorithm, we study the problem of computing a maximum weight greedy matching on weighted graphs, termed GREEDYMATCHING. In wide contrast to the maximum weight matching problem, for which many efficient algorithms are known, we prove that GREEDYMATCHING is strongly NP-hard and APX-complete, and thus it does not admit a PTAS unless P=NP, even on graphs with maximum degree at most 3 and with at most three different integer edge weights. Furthermore we prove that GREEDYMATCHING is strongly NP-hard if the input graph is in addition bipartite. Moreover we consider three natural parameters of the problem, for which we establish a sharp threshold behavior between NP-hardness and computational tractability. On the positive side, we present a randomized approximation algorithm (RGMA) for GREEDYMATCHING on a special class of weighted graphs, called bushgraphs. We highlight an unexpected connection between RGMA and the approximation of maximum cardinality matching in unweighted graphs via randomized greedy algorithms. We show that, if the approximation ratio of RGMA is ρ, then for every ε > 0 the randomized MRG algorithm of (Aronson et al. 1995) gives a (ρ − ε)-approximation for the maximum cardinality matching. We conjecture that a tightbound for ρ is 2/3; we prove our conjecture true for four subclasses of bush graphs. Proving a tight bound for the approximation ratio of MRG on unweighted graphs (and thus also proving a tight value for ρ) is a long-standing open problem (Poloczek and Szegedy 2012). This unexpected relation of our RGMA algorithm with the MRG algorithm may provide new insights for solving this problem.

【Keywords】: Greedy weighted matching; maximum cardinality matching; NPhard; approximation; randomized algorithm

66. Disarmament Games.

Paper Link】 【Pages】:473-479

【Authors】: Yuan Deng ; Vincent Conitzer

【Abstract】: Much recent work in the AI community concerns algorithms for computing optimal mixed strategies to commit to, as well as the deployment of such algorithms in real security applications. Another possibility is to commit not to play certain actions. If only one player makes such a commitment, then this is generally less powerful than completely committing to a single mixed strategy. However, if players can alternatingly commit not to play certain actions and thereby iteratively reduce their strategy spaces, then desirable outcomes can be obtained that would not have been possible with just a single player committing to a mixed strategy. We refer to such a setting as a disarmament game. In this paper, we study disarmament for two-player normal-form games. We show that deciding whether an outcome can be obtained with disarmament is NP-complete (even for a fixed number of rounds), if only pure strategies can be removed. On the other hand, for the case where mixed strategies can be removed, we provide a folk theorem that shows that all desirable utility profiles can be obtained, and give an efficient algorithm for (approximately) obtaining them.

【Keywords】: Disarmament

67. The Complexity of Stable Matchings under Substitutable Preferences.

Paper Link】 【Pages】:480-486

【Authors】: Yuan Deng ; Debmalya Panigrahi ; Bo Waggoner

【Abstract】: In various matching market settings, such as hospital-doctor matching markets (Hatfield and Milgrom 2005), the existence of stable outcomes depends on substitutability of preferences. But can these stable matchings be computed efficiently, as in the one-to-one matching case? The algorithm of (Hatfield and Milgrom 2005) requires efficient implementation of a choice function over substitutable preferences. We show that even given efficient access to a value oracle or preference relation satisfying substitutability, exponentially many queries may be required in the worst case to implement a choice function. Indeed, this extends to examples where a stable matching requires exponential time to compute. We characterize the computational complexity of stable matchings by showing that efficient computation of a choice function is equivalent to efficient verification—determining whether or not, for a given set, the most preferred subset is the entire set itself. Clearly, verification is necessary for computation, but we show that it is also sufficient: specifically, given a verifier, we design a polynomial-time algorithm for computing a choice function, implying an efficient algorithm for stable matching. We then show that a verifier can be implemented efficiently for various classes of functions, such as submodular functions, implying efficient stable matching algorithms for a broad range of settings. We also investigate the effect of ties in the preference order, which causes complications both in defining substitutes and in computation. In this case, we tightly connect the computational complexity of the choice function to a measure on the number of ties.

【Keywords】: Complexity, Stable Matching, Subsitute

68. Small Representations of Big Kidney Exchange Graphs.

Paper Link】 【Pages】:487-493

【Authors】: John P. Dickerson ; Aleksandr M. Kazachkov ; Ariel D. Procaccia ; Tuomas Sandholm

【Abstract】: Kidney exchanges are organized markets where patients swap willing but incompatible donors. In the last decade, kidney exchanges grew from small and regional to large and national — and soon, international. This growth results in more lives saved, but exacerbates the empirical hardness of the NP-complete problem of optimally matching patients to donors. State-of-the-art matching engines use integer programming techniques to clear fielded kidney exchanges, but these methods must be tailored to specific models and objective functions, and may fail to scale to larger exchanges. In this paper, we observe that if the kidney exchange compatibility graph can be encoded by a constant number of patient and donor attributes, the clearing problem is solvable in polynomial time. We give necessary and sufficient conditions for losslessly shrinking the representation of an arbitrary compatibility graph. Then, using real compatibility graphs from the UNOS US-wide kidney exchange, we show how many attributes are needed to encode real graphs. The experiments show that, indeed, small numbers of attributes suffice.

【Keywords】: Matching; kidney exchange; combinatorial optimization; barter exchange

69. What Do Multiwinner Voting Rules Do? An Experiment Over the Two-Dimensional Euclidean Domain.

Paper Link】 【Pages】:494-501

【Authors】: Edith Elkind ; Piotr Faliszewski ; Jean-François Laslier ; Piotr Skowron ; Arkadii Slinko ; Nimrod Talmon

【Abstract】: We visualize aggregate outputs of popular multiwinner voting rules — SNTV, STV, Bloc, k-Borda, Monroe, Chamberlin–Courant, and PAV — for elections generated according to the two-dimensional Euclidean model. We consider three applications of multiwinner voting, namely, parliamentary elections, portfolio/movie selection, and shortlisting, and use our results to understand which of our rules seem to be best suited for each application. In particular, we show that STV (one of the few nontrivial rules used in real high-stake elections) exhibits excellent performance, whereas the Bloc rule (also often used in practice) performs poorly.

【Keywords】: multiwinner elections; Euclidean preferences; shortlisting; proportional representation

70. Extensive-Form Perfect Equilibrium Computation in Two-Player Games.

Paper Link】 【Pages】:502-508

【Authors】: Gabriele Farina ; Nicola Gatti

【Abstract】: We study the problem of computing an Extensive-Form Perfect Equilibrium (EFPE) in 2-player games. This equilibrium concept refines the Nash equilibrium requiring resilience with respect to a specific vanishing perturbation, representing  mistakes of the players at each decision node. The scientific challenge is intrinsic to the EFPE definition: it requires a perturbation over the agent form, but the agent form is computationally inefficient due to the presence of highly nonlinear constraints. We show that the sequence form can be exploited in a non-trivial way and that, for general-sum games, finding an EFPE is equivalent to solving a suitably perturbed linear complementarity problem. We prove that Lemke's algorithm can be applied, showing that computing an EFPE is PPAD-complete. In the notable case of zero-sum games, the problem is in FP and can be solved by linear programming. Our algorithms also allow one to find a Nash equilibrium when players cannot perfectly control their moves, being subject to a given execution uncertainty, as is the case in most realistic physical settings.

【Keywords】: Complexity; Sequence Form; Nash Equilibria

71. Selfish Knapsack.

Paper Link】 【Pages】:509-515

【Authors】: Itai Feigenbaum ; Matthew P. Johnson

【Abstract】: We consider a strategic variant of the knapsack problem: the items are owned by agents, and agents can misrepresent their sets of items---either by hiding items (understating), or by reporting fake ones (overstating). Each agent's utility equals the total value of her items included in the knapsack. We wish to maximize social welfare, and attempt to design mechanisms that lead to small worst-case approximation ratios at equilibrium. We provide a randomized mechanism with attractive strategic properties: it has a price of anarchy of 2 for Bayes-Nash and coarse correlated equilibria. For overstating-only agents, it becomes strategyproof, and has a matching lower bound. For the case of two understating-only agents, we provide a specialized randomized strategyproof 1.522-approximate mechanism, and a lower bound of 1.09. When all agents but one are honest, we provide a deterministic strategyproof 1.618-approximate mechanism with a matching lower bound. The latter two mechanisms are also useful in problems beyond the one in consideration.

【Keywords】: Social Choice / Voting; Game Theory; Equilibrium

72. Obvious Strategyproofness Needs Monitoring for Good Approximations.

Paper Link】 【Pages】:516-522

【Authors】: Diodato Ferraioli ; Carmine Ventre

【Abstract】: Obvious strategyproofness (OSP) is an appealing concept as it allows to maintain incentive compatibility even in the presence of agents that are not fully rational, e.g., those who struggle with contingent reasoning (Li 2015). However, it has been shown to impose some limitations, e.g., no OSP mechanism can return a stable matching (Ashlagi and Gonczarowski 2015). We here deepen the study of the limitations of OSP mechanisms by looking at their approximation guarantees for basic optimization problems paradigmatic of the area, i.e., machine scheduling and facility location. We prove a number of bounds on the approximation guarantee of OSP mechanisms, which show that OSP can come at a significant cost. However, rather surprisingly, we prove that OSP mechanisms can return optimal solutions when they use monitoring — a novel mechanism design paradigm that introduces a mild level of scrutiny on agents’ declarations (Kovacs, Meyer, and Ventre 2015).

【Keywords】: Obvious strategyproofness; machine scheduling; facility location

73. Crowdsourced Outcome Determination in Prediction Markets.

Paper Link】 【Pages】:523-529

【Authors】: Rupert Freeman ; Sébastien Lahaie ; David M. Pennock

【Abstract】: A prediction market is a useful means of aggregating information about a future event. To function, the market needs a trusted entity who will verify the true outcome in the end. Motivated by the recent introduction of decentralized prediction markets, we introduce a mechanism that allows for the outcome to be determined by the votes of a group of arbiters who may themselves hold stakes in the market. Despite the potential conflict of interest, we derive conditions under which we can incentivize arbiters to vote truthfully by using funds raised from market fees to implement a peer prediction mechanism. Finally, we investigate what parameter values could be used in a real-world implementation of our mechanism.

【Keywords】: Peer prediction; Prediction market; Market scoring rule; Crowdsourcing predictions; Decentralized computation

74. Security Games on a Plane.

Paper Link】 【Pages】:530-536

【Authors】: Jiarui Gan ; Bo An ; Yevgeniy Vorobeychik ; Brian Gauch

【Abstract】: Most existing models of Stackelberg security games ignore the underlying topology of the space in which targets and defence resources are located. As a result, allocation of resources is restricted to a discrete collection of exogenously defined targets. However, in many practical security settings, defense resources can be located on a continuous plane. Better defense solutions could therefore be potentially achieved by placing resources in a space outside of actual targets (e.g., between targets). To address this limitation, we propose a model called Security Game on a Plane (SGP) in which targets are distributed on a 2-dimensional plane, and security resources, to be allocated on the same plane, protect targets within a certain effective distance. We investigate the algorithmic aspects of SGP. We find that computing a strong Stackelberg equilibrium of an SGP is NP-hard even for zero-sum games, and these are inapproximable in general. On the positive side, we find an exact solution technique for general SGPs based on an existing approach, and develop a PTAS (polynomial-time approximation scheme) for zero-sum SGP to more fundamentally overcome the computational obstacle. Our experiments demonstrate the value of considering SGP and effectiveness of our algorithms.

【Keywords】: Game Theory; Stackelberg Security Game; Plane

75. Engineering Agreement: The Naming Game with Asymmetric and Heterogeneous Agents.

Paper Link】 【Pages】:537-543

【Authors】: Jie Gao ; Bo Li ; Grant Schoenebeck ; Fang-Yi Yu

【Abstract】: Being popular in language evolution, cognitive science, and culture dynamics, the Naming Game has been widely used to analyze how agents reach global consensus via communications in multi-agent systems. Most prior work considered networks that are symmetric and homogeneous (e.g., vertex transitive). In this paper we consider asymmetric or heterogeneous settings that complement the current literature: 1) we show that increasing asymmetry in network topology can improve convergence rates. The star graph empirically converges faster than all previously studied graphs; 2) we consider graph topologies that are particularly challenging for naming game such as disjoint cliques or multi-level trees and ask how much extra homogeneity (random edges) is required to allow convergence or fast convergence. We provided theoretical analysis which was confirmed by simulations; 3) we analyze how consensus can be manipulated when stubborn nodes are introduced at different points of the process. Early introduction of stubborn nodes can easily influence the outcome in certain family of networks while late introduction of stubborn nodes has much less power.

【Keywords】: dynamic system;consensus;naming game

76. Vote Until Two of You Agree: Mechanisms with Small Distortion and Sample Complexity.

Paper Link】 【Pages】:544-550

【Authors】: Stephen Gross ; Elliot Anshelevich ; Lirong Xia

【Abstract】: To design social choice mechanisms with desirable utility properties, normative properties, and low sample complexity, we propose a new randomized mechanism called 2-Agree. This mechanism asks random voters for their top alternatives until at least two voters agree, at which point it selects that alternative as the winner. We prove that, despite its simplicity and low sample complexity, 2-Agree achieves almost optimal distortion on a metric space when the number of alternatives is not large, and satisfies anonymity, neutrality, ex-post Pareto efficiency, very strong SD-participation, and is approximately truthful. We further show that 2-Agree works well for larger number of alternatives with decisive agents.

【Keywords】: social choice, spacial preferences, distortion

77. Computing Least Cores of Supermodular Cooperative Games.

Paper Link】 【Pages】:551-557

【Authors】: Daisuke Hatano ; Yuichi Yoshida

【Abstract】: One of the goals of a cooperative game is to compute a valuedivision to the players from which they have no incentive todeviate. This concept is formalized as the notion of the core.To obtain a value division that motivates players to cooperate to a greater extent or that is more robust under noise, the notions of the strong least core and the weak least core have been considered. In this paper, we characterize the strong and the weak least cores of supermodular cooperative games using the theory of minimizing crossing submodular functions. We then apply our characterizations to two representative supermodular cooperative games, namely, the induced subgraph game generalized to hypergraphs and the airport game. For these games, we derive explicit forms of the strong and weak least core values, and provide polynomial-time algorithms that compute value divisions in the strong and weak least cores.

【Keywords】: Cooperative game; Supermodular function; Crossing submodular

Paper Link】 【Pages】:558-564

【Authors】: Karel Horák ; Branislav Bosanský ; Michal Pechoucek

【Abstract】: Security problems can be modeled as two-player partially observable stochastic games with one-sided partial observability and infinite horizon (one-sided POSGs). We seek for optimal strategies of player 1 that correspond to robust strategies against the worst-case opponent (player 2) that is assumed to have a perfect information about the game. We present a novel algorithm for approximately solving one-sided POSGs based on the heuristic search value iteration (HSVI) for POMDPs. Our results include (1) theoretical properties of one-sided POSGs and their value functions, (2) guarantees showing the convergence of our algorithm to optimal strategies, and (3) practical demonstration of applicability and scalability of our algorithm on three different domains: pursuit-evasion, patrolling, and search games.

【Keywords】:

79. Group Activity Selection on Social Networks.

Paper Link】 【Pages】:565-571

【Authors】: Ayumi Igarashi ; Dominik Peters ; Edith Elkind

【Abstract】: We propose a new variant of the group activity selection problem (GASP), where the agents are placed on a social network and activities can only be assigned to connected subgroups. We show that if multiple groupscan simultaneously engage in the same activity, finding a stable outcome is easy as long as the networkis acyclic. In contrast, if each activity can be assigned to a single group only, finding stable outcomes becomes computationally intractable, even if the underlying network is very simple: the problem of determining whether a given instance of a GASP admits a Nash stable outcome turns out to be NP-hard when the social network is a path, a star, or if the size of each connected component is bounded by a constant.On the other hand, we obtain fixed-parameter tractability results for this problem with respectto the number of activities.

【Keywords】: game theory; social networks; hedonic games; group activity selection; parameterised complexity

80. Resource Graph Games: A Compact Representation for Games with Structured Strategy Spaces.

Paper Link】 【Pages】:572-578

【Authors】: Albert Xin Jiang ; Hau Chan ; Kevin Leyton-Brown

【Abstract】: In many real-world systems, strategic agents' decisions can be understood as complex - i.e., consisting of multiple sub-decisions - and hence can give rise to an exponential number of pure strategies. Examples include network congestion games, simultaneous auctions, and security games. However, agents' sets of strategies are often structured, allowing them to be represented compactly. There currently exists no general modeling language that captures a wide range of commonly seen strategy structure and utility structure. We propose Resource Graph Games (RGGs), the first general compact representation for games with structured strategy spaces, which is able to represent a wide range of games studied in literature. We leverage recent results about multilinearity, a key property of games that allows us to represent the mixed strategies compactly, and, as a result, to compute various equilibrium concepts efficiently. While not all RGGs are multilinear, we provide a general method of converting RGGs to those that are multilinear, and identify subclasses of RGGs whose converted version allow efficient computation.

【Keywords】: Games; Compact representation; Action graph; multilinear; mixed strategies; computation; examples; structured space; structured utility; Nash equilibrium; coarse correlated equilibrium; modeling

81. Complexity of the Stable Invitations Problem.

Paper Link】 【Pages】:579-585

【Authors】: Hooyeon Lee ; Vassilevska Williams

【Abstract】: We study the Stable Invitations Problem (SIP) in which an event organizer is to invite a subset of agents (from a group of agents) to an event, subject to certain rationality criteria. In SIP, the agents have friends, enemies, and preferences on the number of attendees at the event; an agent is willing to attend the event if all friends of the agent attend, no enemy of the agent attends, and the number of attendees is acceptable to the agent. We consider two solution concepts: (1) individual rationality (everyone who is invited is willing to attend) and (2) (Nash) stability (no agent wants to deviate from the given invitation).It is known that finding an invitation of given size for either concept is NP-complete. In this work, we study the complexity of SIP on a finer scale, through the lense of parameterized complexity.For the two solution concepts and the special cases where the number of friends and/or enemies is bounded above by a constant, we show that the problems belong to different complexity classes when parameterized by the size of solutions.For instance finding an individually rational invitation of size k is W[1]-complete, yet finding a stable invitation is W[2]-complete.Moreover, when all friend and enemy relations are symmetric, finding a solution of either of the concepts becomes fixed-parameter tractable unless agents have unbounded number(s) of enemies.

【Keywords】: fixed parameter tractability; stable invitation; group assignment

82. Mechanism Design in Social Networks.

Paper Link】 【Pages】:586-592

【Authors】: Bin Li ; Dong Hao ; Dengji Zhao ; Tao Zhou

【Abstract】: This paper studies an auction design problem for a seller to sell a commodity in a social network, where each individual (the seller or a buyer) can only communicate with her neighbors. The challenge to the seller is to design a mechanism to incentivize the buyers, who are aware of the auction, to further propagate the information to their neighbors so that more buyers will participate in the auction and hence, the seller will be able to make a higher revenue. We propose a novel auction mechanism, called information diffusion mechanism (IDM), which incentivizes the buyers to not only truthfully report their valuations on the commodity to the seller, but also further propagate the auction information to all their neighbors. In comparison, the direct extension of the well-known Vickrey-Clarke-Groves (VCG) mechanism in social networks can also incentivize the information diffusion, but it will decrease the seller's revenue or even lead to a deficit sometimes. The formalization of the problem has not yet been addressed in the literature of mechanism design and our solution is very significant in the presence of large-scale online social networks.

【Keywords】: Mechanism Design; Information Diffusion; Auctions

83. Optimal Personalized Defense Strategy Against Man-In-The-Middle Attack.

Paper Link】 【Pages】:593-599

【Authors】: Xiaohong Li ; Shuxin Li ; Jianye Hao ; Zhiyong Feng ; Bo An

【Abstract】: The Man-In-The-Middle (MITM) attack is one of the most common attacks employed in the network hacking. MITM attackers can successfully invoke attacks such as denial of service (DoS) and port stealing, and lead to surprisingly harmful consequences for users in terms of both financial loss and security issues. The conventional defense approaches mainly consider how to detect and eliminate those attacks or how to prevent those attacks from being launched in the first place. This paper proposes a game-theoretic defense strategy from a different perspective, which aims at minimizing the loss that the whole system sustains given that the MITM attacks are inevitable. We model the interaction between the attacker and the defender as a Stackelberg security game and adopt the Strong Stackelberg Equilibrium (SSE) as the defender's strategy. Since the defender's strategy space is infinite in our model, we employ a novel method to reduce the searching space of computing the optimal defense strategy. Finally, we empirically evaluate our optimal defense strategy by comparing it with non-strategic defense strategies. The results indicate that our game-theoretic defense strategy significantly outperforms other non-strategic defense strategies in terms of decreasing the total losses against MITM attacks.

【Keywords】: Stackelberg security game;Strong Stackelberg Equilibrium;Man-In-The-Middle attack

84. Network, Popularity and Social Cohesion: A Game-Theoretic Approach.

Paper Link】 【Pages】:600-606

【Authors】: Jiamou Liu ; Ziheng Wei

【Abstract】: In studies of social dynamics, cohesion refers to a group's tendency to stay in unity, which -- as argued in sociometry — arises from the network topology of interpersonal ties. We follow this idea and propose a game-based model of cohesion that not only relies on the social network, but also reflects individuals' social needs. In particular, our model is a type of cooperative games where players may gain popularity by strategically forming groups. A group is socially cohesive if the grand coalition is core stable. We study social cohesion in some special types of graphs and draw a link between social cohesion and a classical notion of structural cohesion by White and Harary. We then focus on the problem of deciding whether a given social network is socially cohesive and show that this problem is CoNP-complete. Nevertheless, we give two efficient heuristics for coalition structures where players enjoy high popularity and experimentally evaluate their performances.

【Keywords】: Computational Social Science;Social Networks;Coordination and Collaboration;Equilibrium;Game Theory

85. Sequential Peer Prediction: Learning to Elicit Effort using Posted Prices.

Paper Link】 【Pages】:607-613

【Authors】: Yang Liu ; Yiling Chen

【Abstract】: Peer prediction mechanisms are often adopted to elicit truthful contributions from crowd workers when no ground-truth verification is available. Recently, mechanisms of this type have been developed to incentivize effort exertion, in addition to truthful elicitation. In this paper, we study a sequential peer prediction problem where a data requester wants to dynamically determine the reward level to optimize the trade-off between the quality of information elicited from workers and the total expected payment. In this problem, workers have homogeneous expertise and heterogeneous cost for exerting effort, both unknown to the requester. We propose a sequential posted-price mechanism to dynamically learn the optimal reward level from workers' contributions and to incentivize effort exertion and truthful reporting. We show that (1) in our mechanism, workers exerting effort according to a non-degenerate threshold policy and then reporting truthfully is an equilibrium that returns highest utility for every worker, and (2) The regret of our learning mechanism w.r.t. offering the optimal reward (price) is upper bounded by Õ( T { 3/4 ) where T is the learning horizon. We further show the power of our learning approach when the reports of workers do not necessarily follow the game-theoretic equilibrium.

【Keywords】: crowdsouring; peer prediction; sequential learning

86. An Ambiguity Aversion Model for Decision Making under Ambiguity.

Paper Link】 【Pages】:614-621

【Authors】: Wenjun Ma ; Xudong Luo ; Yuncheng Jiang

【Abstract】: In real life, decisions are often made under ambiguity, where it is difficult to estimate accurately the probability of each single possible consequence of a choice. However, this problem has not been solved well in existing work for the following two reasons. (i) Some of them cannot cover the Ellsberg paradox and the Machina Paradox. Thus, the choices that they predict could be inconsistent with empirical observations. (ii) Some of them rely on parameter tuning without offering explanations for the reasonability of setting such bounds of parameters. Thus, the prediction of such a model in new decision making problems is doubtful. To the end, this paper proposes a new decision making model based on D-S theory and the emotion of ambiguity aversion. Some insightful properties of our model and the validating on two famous paradoxes show that our model indeed is a better alternative for decision making under ambiguity.

【Keywords】: Decision Making under Ambiguity; D-S Theory; Ellsberg Paradox; Machina Paradox; Ambiguity Aversion

87. Optimal Pricing for Submodular Valuations with Bounded Curvature.

Paper Link】 【Pages】:622-628

【Authors】: Takanori Maehara ; Yasushi Kawase ; Hanna Sumita ; Katsuya Tono ; Ken-ichi Kawarabayashi

【Abstract】: The optimal pricing problem is a fundamental problem that arises in combinatorial auctions. Suppose that there is one seller who has indivisible items and multiple buyers who want to purchase a combination of the items. The seller wants to sell his items for the highest possible prices, and each buyer wants to maximize his utility (i.e., valuation minus payment) as long as his payment does not exceed his budget. The optimal pricing problem seeks a price of each item and an assignment of items to buyers such that every buyer achieves the maximum utility under the prices. The goal of the problem is to maximize the total payment from buyers. In this paper, we consider the case that the valuations are submodular. We show that the problem is computationally hard even if there exists only one buyer. Then we propose approximation algorithms for the unlimited budget case. We also extend the algorithm for the limited budget case when there exists one buyer and multiple buyers collaborate with each other.

【Keywords】: optimal pricing; combinatorial auction; submodular function; curvature

88. On Covering Codes and Upper Bounds for the Dimension of Simple Games.

Paper Link】 【Pages】:629-634

【Authors】: Martin Olsen

【Abstract】: Consider a situation with n agents or players, where some of the players form a coalition with a certain collective objective. Simple games are used to model systems that can decide whether coalitions are successful (winning) or not (losing). A simple game can be viewed as a monotone boolean function. The dimension of a simple game is the smallest positive integer d such that the simple game can be expressed as the intersection of d threshold functions, where each threshold function uses a threshold and n weights. Taylor and Zwicker have shown that d is bounded from above by the number of maximal losing coalitions. We present two new upper bounds both containing the Taylor-Zwicker bound as a special case. The Taylor-Zwicker bound implies an upper bound of (n choose n/2). We improve this upper bound significantly by showing constructively that d is bounded from above by the cardinality of any binary covering code with length n and covering radius 1. This result supplements a recent result where Olsen et al. showed how to construct simple games with dimension |C| for any binary constant weight SECDED code C with length n. Our result represents a major step in the attempt to close the dimensionality gap for simple games.

【Keywords】: Simple Games; Dimension; Weighted Games; Binary Codes

89. Tractable Algorithms for Approximate Nash Equilibria in Generalized Graphical Games with Tree Structure.

Paper Link】 【Pages】:635-641

【Authors】: Luis E. Ortiz ; Mohammad Tanvir Irfan

【Abstract】: We provide the first fully polynomial time approximation scheme (FPTAS) for computing an approximate mixed-strategy Nash equilibrium in graphical multi-hypermatrix games (GMhGs), which are generalizations of normal-form games, graphical games, graphical polymatrix games, and hypergraphical games. Computing an exact mixed-strategy Nash equilibria in graphical polymatrix games is PPAD complete and thus generally believed to be intractable. In contrast, to the best of our knowledge, we are the first to establish an FPTAS for tree polymatrix games as well as tree graphical games when the number of actions is bounded by a constant. As a corollary, we give a quasi-polynomial time approximation scheme (quasi-PTAS) when the number of actions is bounded by a logarithm of the number of players.

【Keywords】: game theory; Nash equilibria; approximation algorithms; graphical games; polymatrix games

90. Recognising Multidimensional Euclidean Preferences.

Paper Link】 【Pages】:642-648

【Authors】: Dominik Peters

【Abstract】: Euclidean preferences are a widely studied preference model, in which decision makers and alternatives are embedded in d-dimensional Euclidean space. Decision makers prefer those alternatives closer to them. This model, also known as multidimensional unfolding, has applications in economics, psychometrics, marketing, and many other fields. We study the problem of deciding whether a given preference profile is d -Euclidean. For the one-dimensional case, polynomial-time algorithms are known. We show that, in contrast, for every other fixed dimension d > 1, the recognition problem is equivalent to the existential theory of the reals (ETR), and so in particular NP-hard. We further show that some Euclidean preference profiles require exponentially many bits in order to specify any Euclidean embedding, and prove that the domain of d-Euclidean preferences does not admit a finite forbidden minor characterisation for any d > 1. We also study dichotomous preferences and the behaviour of other metrics, and survey a variety of related work.

【Keywords】: social choice; voting; single-peaked preferences; spatial preferences; recognition problem; computational complexity, ETR; forbidden subprofiles; multidimensional unfolding

91. Preferences Single-Peaked on a Circle.

Paper Link】 【Pages】:649-655

【Authors】: Dominik Peters ; Martin Lackner

【Abstract】: We introduce the domain of preferences that are single-peaked on a circle, which is a generalization of the well-studied single-peaked domain. This preference restriction is useful, e.g., for scheduling decisions, and for one-dimensional decisions in the presence of extremist preferences. We give a fast recognition algorithm of this domain, provide a characterisation by finitely many forbidden subprofiles, and show that many popular single- and multi-winner voting rules are polynomial-time computable on this domain. In contrast, Kemeny's rule remains hard to evaluate, and several impossibility results from social choice theory can be proved using only profiles that are single-peaked on a circle

【Keywords】: social choice; voting; single-peaked preferences; winner determination; consecutive ones

92. Psychological Forest: Predicting Human Behavior.

Paper Link】 【Pages】:656-662

【Authors】: Ori Plonsky ; Ido Erev ; Tamir Hazan ; Moshe Tennenholtz

【Abstract】: We introduce a synergetic approach incorporating psychological theories and data science in service of predicting human behavior. Our method harnesses psychological theories to extract rigorous features to a data science algorithm. We demonstrate that this approach can be extremely powerful in a fundamental human choice setting. In particular, a random forest algorithm that makes use of psychological features that we derive, dubbed psychological forest, leads to prediction that significantly outperforms best practices in a choice prediction competition. Our results also suggest that this integrative approach is vital for data science tools to perform reasonably well on the data. Finally, we discuss how social scientists can learn from using this approach and conclude that integrating social and data science practices is a highly fruitful path for future research of human behavior.

【Keywords】: Human choice prediction, Psychological features, Cognition and data science

93. Revenue Maximization for Finitely Repeated Ad Auctions.

Paper Link】 【Pages】:663-669

【Authors】: Jiang Rong ; Tao Qin ; Bo An ; Tie-Yan Liu

【Abstract】: Reserve price is an effective tool for revenue maximization in ad auctions. The optimal reserve price depends on bidders' value distributions, which, however, are generally unknown to auctioneers. A common practice for auctioneers is to first collect information about the value distributions by a sampling procedure and then apply the reserve price estimated with the sampled bids to the following auctions. In order to maximize the total revenue over finite auctions, it is important for the auctioneer to find a proper sample size to trade off between the cost of the sampling procedure and the optimality of the estimated reserve price. We investigate the sample size optimization problem for Generalized Second Price auctions, which is the most widely-used mechanism in ad auctions, and make three main contributions along this line. First, we bound the revenue losses in the form of competitive ratio during and after sampling. Second, we formulate the problem of finding the optimal sample size as a non-convex mixed integer optimization problem. Then we characterize the properties of the problem and prove the uniqueness of the optimal sample size. Third, we relax the integer optimization problem to a continuous form and develop an efficient algorithm based on the properties to solve it. Experimental results show that our approach can significantly improve the revenue for the auctioneer in finitely repeated ad auctions.

【Keywords】: Ad Auctions;Revenue Maximization;Reserve Price

94. Proportional Justified Representation.

Paper Link】 【Pages】:670-676

【Authors】: Luis Sánchez Fernández ; Edith Elkind ; Martin Lackner ; Norberto Fernández García ; Jesús Arias-Fisteus ; Pablo Basanta-Val ; Piotr Skowron

【Abstract】: The goal of multi-winner elections is to choose a fixed-size committee based on voters’ preferences. An important concern in this setting is representation: large groups of voters with cohesive preferences should be adequately represented by the election winners. Recently, Aziz et al. proposed two axioms that aim to capture this idea: justified representation (JR) and its strengthening extended justified representation (EJR). In this paper, we extend the work of Aziz et al. in several directions. First, we answer an open question of Aziz et al., by showing that Reweighted Approval Voting satisfies JR for k = 3; 4; 5, but fails it for k >= 6. Second, we observe that EJR is incompatible with the Perfect Representation criterion, which is important for many applications of multi-winner voting, and propose a relaxation of EJR, which we call Proportional Justified Representation (PJR). PJR is more demanding than JR, but, unlike EJR, it is compatible with perfect representation, and a committee that provides PJR can be computed in polynomial time if the committee size divides the number of voters. Moreover, just like EJR, PJR can be used to characterize the classic PAV rule in the class of weighted PAV rules. On the other hand, we show that EJR provides stronger guarantees with respect to average voter satisfaction than PJR does.

【Keywords】: representation, multi-winner voting rules, approval ballots

95. Achieving Sustainable Cooperation in Generalized Prisoner's Dilemma with Observation Errors.

Paper Link】 【Pages】:677-683

【Authors】: Fuuki Shigenaka ; Tadashi Sekiguchi ; Atsushi Iwasaki ; Makoto Yokoo

【Abstract】: A repeated game is a formal model for analyzing cooperation in long-term relationships, e.g., in the prisoner's dilemma. Although the case where each player observes her opponent's action with some observation errors (imperfect private monitoring) is difficult to analyze, a special type of an equilibrium called belief-free equilibrium is identified to make the analysis in private monitoring tractable. However, existing works using a belief-free equilibrium show that cooperative relations can be sustainable only in ideal situations. We deal with a generic problem that can model both the prisoner's dilemma and the team production problem. We examine a situation with an additional action that is dominated by another action. To our surprise, by adding this seemingly irrelevant action, players can achieve sustainable cooperative relations far beyond the ideal situations. More specifically, we identify a class of strategies called one-shot punishment strategy that can constitute a belief-free equilibrium in a wide range of parameters. Moreover, for a two-player case, the obtained welfare matches a theoretical upper bound.

【Keywords】: repeated game; private monitoring; belief-free equilibrium

96. Mechanism Design for Multi-Type Housing Markets.

Paper Link】 【Pages】:684-690

【Authors】: Sujoy Sikdar ; Sibel Adali ; Lirong Xia

【Abstract】: We study multi-type housing markets, where there are p ≥ 2 types of items, each agent is initially endowed one item of each type, and the goal is to design mechanisms without monetary transfer to (re)allocate items to the agents based on their preferences over bundles of items, such that each agent gets one item of each type. In sharp contrast to classical housing markets, previous studies in multi-type housing markets have been hindered by the lack of natural solution concepts, because the strict core might be empty. We break the barrier in the literature by leveraging AI techniques and making natural assumptions on agents’ preferences. We show that when agents’ preferences are lexicographic, even with different importance orders, the classical top-trading-cycles mechanism can be extended while preserving most of its nice properties. We also investigate computational complexity of checking whether an allocation is in the strict core and checking whether the strict core is empty. Our results convey an encouragingly positive message: it is possible to design good mechanisms for multi-type housing markets under natural assumptions on preferences.

【Keywords】: Multi-type housing markets; Top-trading-cycles; Lexicographic preferences; CP-nets

97. Constrained Pure Nash Equilibria in Polymatrix Games.

Paper Link】 【Pages】:691-697

【Authors】: Sunil Simon ; Dominik Wojtczak

【Abstract】: We study the problem of checking for the existence of constrained pure Nash equilibria in a subclass of polymatrix games defined on weighted directed graphs. The payoff of a player is defined as the sum of nonnegative rational weights on incoming edges from players who picked the same strategy augmented by a fixed integer bonus for picking a given strategy. These games capture the idea of coordination within a local neighbourhood in the absence of globally common strategies. We study the decision problem of checking whether a given set of strategy choices for a subset of the players is consistent with some pure Nash equilibrium or, alternatively, with all pure Nash equilibria. We identify the most natural tractable cases and show NP or coNP-completness of these problems already for unweighted DAGs.

【Keywords】: Constrained Nash equilibria; Equilibrium computation; Polymatrix games; Network coordination games

98. Axiomatic Characterization of Game-Theoretic Network Centralities.

Paper Link】 【Pages】:698-705

【Authors】: Oskar Skibski ; Tomasz P. Michalak ; Talal Rahwan

【Abstract】: One of the fundamental research challenges in network science is the centrality analysis, i.e., identifying the nodes that play the most important roles in the network. In this paper, we focus on the game-theoretic approach to centrality analysis. While various centrality indices have been proposed based on this approach, it is still unknown what distinguishes this family of indices from the more classical ones. In this paper, we answer this question by providing the first axiomatic characterization of game-theoretic centralities. Specifically, we show that every centrality can be obtained following the game-theoretic approach, and show that two natural classes of game-theoretic centrality can be characterized by two intuitive properties pertaining to Myerson's notion of Fairness.

【Keywords】: Coalitional Games; Network Centralities; Axioms

99. Social Choice Under Metric Preferences: Scoring Rules and STV.

Paper Link】 【Pages】:706-712

【Authors】: Piotr Krzysztof Skowron ; Edith Elkind

【Abstract】: We consider voting under metric preferences: both voters and candidates are associated with points in a metric space, and each voter prefers candidates that are closer to her to ones that are further away. In this setting, it is often desirable to select a candidate that minimizes the sum of distances to the voters. However, common voting rules operate on voters' preference rankings and therefore may be unable to identify the best candidate. A relevant measure of the quality of a voting rule is then its distortion, defined as the worst-case ratio between the performance of a candidate selected by the rule and that of an optimal candidate. Anshelevich, Bhardwaj and Postl show that some popular rules such as Borda and Plurality do badly in this regard: their distortion scales linearly with the number of candidates. On the positive side, Anshelevich et al. identify a few voting rules whose distortion is bounded by a constant; however, these rules are rarely used in practice. In this paper, we analyze the distortion of two widely used (classes of) voting rules, namely, scoring rules and Single Transferable Vote (STV). We show that all scoring rules have super-constant distortion, answering a question that was left open by Anshelevich et al.; however, we identify a scoring rule whose distortion is asymptotically better than that of Plurality and Borda. For STV, we obtain an upper bound of O (log m ), where m is the number of candidates, as well as a super-constant lower bound; thus, STV is a reasonable, though not a perfect rule from this perspective.

【Keywords】:

100. Fans Economy and All-Pay Auctions with Proportional Allocations.

Paper Link】 【Pages】:713-719

【Authors】: Pingzhong Tang ; Yulong Zeng ; Song Zuo

【Abstract】: In this paper, we analyze an emerging economic form, called fans economy, in which a fan donates money to the host and gets allocated proportional to the amount of his donation (normalized by the overall amount of donation). Fans economy is the major way live streaming apps monetize and includes a number of popular economic forms ranging from crowdfunding to mutual fund. We propose an auction game, coined all-pay auctions with proportional allocation (APAPA), to model the fans economy and analyze the auction from the perspective of revenue. Comparing to the standard all-pay auction, which normally has no pure Nash-Equilibrium in the complete information setting, we solve the pure Nash-Equilibrium of the APAPA in closed form and prove its uniqueness. Motivated by practical concerns, we then analyze the case where APAPA is equipped with a reserve and show that there might be multiple equilibria in this case. We give an efficient algorithm to compute all equilibria in this case. For either case, with or without reserve, we show that APAPA always extracts revenue that 2-approximates the second-highest valuation. Furthermore, we conduct experiments to show how revenue changes with respect to different reserves.

【Keywords】: fans ecnomomy, all-pay auction with proportional allocation, pure Nash-Equilibrium

101. The Positronic Economist: A Computational System for Analyzing Economic Mechanisms.

Paper Link】 【Pages】:720-727

【Authors】: David R. M. Thompson ; Neil Newman ; Kevin Leyton-Brown

【Abstract】: Computational mechanism analysis is a recent approach to economic analysis in which a mechanism design setting is analyzed entirely by a computer. For games with non-trivial numbers of players and actions, the approach is only feasible when these games can be encoded compactly, e.g., as Action-Graph Games. Such encoding is currently a manual process requiring expert knowledge; our aim is to simplify and automate it. Our contribution, the Positronic Economist is a software system having two parts: (1) a Python-based language for succinctly describing mechanisms; and (2) a system that takes such descriptions as input, automatically identifies computationally useful structure, and produces a compact Action-Graph Game.

【Keywords】: action-graph games; compact games; mechanism analysis; computational game theory

102. Non-Additive Security Games.

Paper Link】 【Pages】:728-735

【Authors】: Sinong Wang ; Fang Liu ; Ness B. Shroff

【Abstract】: Security agencies have found security games to be useful models to understand how to better protect their assets. The key practical elements in this work are: (i) the attacker can simultaneously attack multiple targets, and (ii) different targets exhibit different types of dependencies based on the assets being protected (e.g., protection of critical infrastructure, network security, etc.). However, little is known about the computational complexity of these problems, especially when there exist dependencies among the targets. Moreover, previous security game models do not in general scale well. In this paper, we investigate a general security game where the utility function is defined on a collection of subsets of all targets, and provide a novel theoretical framework to show how to compactly represent such a game, efficiently compute the optimal (minimax) strategies, and characterize the complexity of this problem. We apply our theoretical framework to the network security game. We characterize settings under which we find a polynomial time algorithm for computing optimal strategies. In other settings we prove the problem is NP-hard and provide an approximation algorithm.

【Keywords】: Game theory; Security; Algorithms; Complexity

103. The Dollar Auction with Spiteful Players.

Paper Link】 【Pages】:736-742

【Authors】: Marcin Waniek ; Long Tran-Thanh ; Tomasz P. Michalak ; Nicholas R. Jennings

【Abstract】: The dollar auction is an auction model used to analyse the dynamics of conflict escalation. In this paper, we analyse the course of an auction when participating players are spiteful, i.e., they are motivated not only by their own profit, but also by the desire to hurt the opponent. We investigate this model for the complete information setting, both for the standard scenario and for the situation where auction starts with non-zero bids. Our results give us insight into the possible effects of meanness onto conflict escalation.

【Keywords】: dollar auction; all-pay auction; spitefulness

104. Proper Proxy Scoring Rules.

Paper Link】 【Pages】:743-749

【Authors】: Jens Witkowski ; Pavel Atanasov ; Lyle H. Ungar ; Andreas Krause

【Abstract】: Proper scoring rules can be used to incentivize a forecaster to truthfully report her private beliefs about the probabilities of future events and to evaluate the relative accuracy of forecasters. While standard scoring rules can score forecasts only once the associated events have been resolved, many applications would benefit from instant access to proper scores. In forecast aggregation, for example, it is known that using weighted averages, where more weight is put on more accurate forecasters, outperforms simple averaging of forecasts. We introduce proxy scoring rules, which generalize proper scoring rules and, given access to an appropriate proxy, allow for immediate scoring of probabilistic forecasts. In particular, we suggest a proxy-scoring generalization of the popular quadratic scoring rule, and characterize its incentive and accuracy evaluation properties theoretically. Moreover, we thoroughly evaluate it experimentally using data from a large real world geopolitical forecasting tournament, and show that it is competitive with proper scoring rules when the number of questions is small.

【Keywords】: Information Elicitation; Forecasting; Proper Scoring Rules; Peer Prediction; Mechanism Design; Incentive Schemes

105. Randomized Mechanisms for Selling Reserved Instances in Cloud Computing.

Paper Link】 【Pages】:750-757

【Authors】: Jia Zhang ; Weidong Ma ; Tao Qin ; Xiaoming Sun ; Tie-Yan Liu

【Abstract】: Selling reserved instances (or virtual machines) is a basic service in cloud computing. In this paper, we consider a more flexible pricing model for instance reservation, in which a customer can propose the time length and number of resources of her request, while in today's industry, customers can only choose from several predefined reservation packages. Under this model, we design randomized mechanisms for customers coming online to optimize social welfare and providers' revenue. We first consider a simple case, where the requests from the customers do not vary too much in terms of both length and value density. We design a randomized mechanism that achieves a competitive ratio 1/42 for both social welfare and revenue, which is a improvement as there is usually no revenue guarantee in previous works such as (Azar et al. 2015; Wang et al. 2015. This ratio can be improved up to 1/11 when we impose a realistic constraint on the maximum number of resources used by each request. On the hardness side, we show an upper bound 1/3 on competitive ratio for any randomized mechanism.We then extend our mechanism to the general case and achieve a competitive ratio 1/42⌈log k ⌉ log T for both social welfare and revenue, where T is the ratio of the maximum request length to the minimum request length and k is the ratio of the maximum request value density to the minimum request value density. This result outperforms the previous upper bound 1/ CkT for deterministic mechanisms (Wang et al. 2015). We also prove an upper bound 2/log 8 kT for any randomized mechanism. All the mechanisms we provide are in a greedy style. They are truthful and easy to be integrated into practical cloud systems.

【Keywords】: online scheduling; cloud computing; truthful mechanism design; randomized algorithm

106. Embedded Bandits for Large-Scale Black-Box Optimization.

Paper Link】 【Pages】:758-764

【Authors】: Abdullah Al-Dujaili ; Sundaram Suresh

【Abstract】: Random embedding has been applied with empirical success to large-scale black-box optimization problems with low effective dimensions. This paper proposes the EmbeddedHunter algorithm, which incorporates the technique in a hierarchical stochastic bandit setting, following the optimism in the face of uncertainty principle and breaking away from the multiple-run framework in which random embedding has been conventionally applied similar to stochastic black-box optimization solvers. Our proposition is motivated by the bounded mean variation in the objective value for a low-dimensional point projected randomly into the decision space of Lipschitz-continuous problems. In essence, the EmbeddedHunter algorithm expands optimistically a partitioning tree over a low-dimensional — equal to the effective dimension of the problem —search space based on a bounded number of random embeddings of sampled points from the low-dimensional space. In contrast to the probabilistic theoretical guarantees of multiple-run random-embedding algorithms, the finite-time analysis of the proposed algorithm presents a theoretical upper bound on the regret as a function of the algorithm's number of iterations. Furthermore, numerical experiments were conducted to validate its performance. The results show a clear performance gain over recently proposed random embedding methods for large-scale problems, provided the intrinsic dimensionality is low.

【Keywords】: Large-Scale Optimization; Black-Box Optimization; Derivative-Free Optimization; Machine Learning; Big Data

Paper Link】 【Pages】:765-772

【Authors】: Carlos Ansótegui ; Josep Pon ; Meinolf Sellmann ; Kevin Tierney

【Abstract】: Metaheuristics have been developed to provide general purpose approaches for solving hard combinatorial problems. While these frameworks often serve as the starting point for the development of problem-specific search procedures, they very rarely work efficiently in their default state. We combine the ideas of reactive search, which adjusts key parameters during search, and algorithm configuration, which fine-tunes algorithm parameters for a given set of problem instances, for the automatic compilation of a portfolio of highly reactive dialectic search heuristics for MaxSAT. Even though the dialectic search metaheuristic knows nothing more about MaxSAT than how to evaluate the cost of a truth assignment, our automatically generated solver defines a new state of the art for random weighted partial MaxSAT instances. Moreover, when combined with an industrial MaxSAT solver, the self-assembled reactive portfolio was able to win four out of nine gold medals at the recent 2016 MaxSAT Evaluation on random, crafted, and industrial partial and weighted-partial MaxSAT instances.

【Keywords】: heuristic search; reactive search; optimization; maxsat

108. Efficient Parameter Importance Analysis via Ablation with Surrogates.

Paper Link】 【Pages】:773-779

【Authors】: Andre Biedenkapp ; Marius Thomas Lindauer ; Katharina Eggensperger ; Frank Hutter ; Chris Fawcett ; Holger H. Hoos

【Abstract】: To achieve peak performance, it is often necessary to adjust the parameters of a given algorithm to the class of problem instances to be solved; this is known to be the case for popular solvers for a broad range of AI problems, including AI planning, propositional satisfiability (SAT) and answer set programming (ASP). To avoid tedious and often highly sub-optimal manual tuning of such parameters by means of ad-hoc methods, general-purpose algorithm configuration procedures can be used to automatically find performance-optimizing parameter settings. While impressive performance gains are often achieved in this manner, additional, potentially costly parameter importance analysis is required to gain insights into what parameter changes are most responsible for those improvements. Here, we show how the running time cost of ablation analysis, a well-known general-purpose approach for assessing parameter importance, can be reduced substantially by using regression models of algorithm performance constructed from data collected during the configuration process. In our experiments, we demonstrate speed-up factors between 33 and 14 727 for ablation analysis on various configuration scenarios from AI planning, SAT, ASP and mixed integer programming (MIP).

【Keywords】: Algorithm Configuration; Parameter Importance; Performance Prediction

109. Problem Difficulty and the Phase Transition in Heuristic Search.

Paper Link】 【Pages】:780-786

【Authors】: Eldan Cohen ; J. Christopher Beck

【Abstract】: In the recent years, there has been significant work on the difficulty of heuristic search problems, identifying different problem instance characteristics that can have a significant impact on search effort. Phase transitions in the solubility of random problem instances have proved useful in the study of problem difficulty for other classes of computational problems, notably SAT and CSP, and it has been shown that the hardest problems typically occur during this rapid transition. In this work, we perform the first empirical investigation of the phase transition phenomena for heuristic search. We establish the existence of a rapid transition in the solubility of an abstract model of heuristic search problems and show that, for greedy best first search, the hardest instances are associated with the phase transition region. We then perform a novel investigation of the behavior of heuristics of different strength across the solubility spectrum. Finally, we demonstrate that the behavior of our abstract model carries over to commonly used benchmark problems including the Pancake Problem, Grid Navigation, TopSpin, and the Towers of Hanoi. An interesting deviation is observed and explained in the Sliding Puzzle.

【Keywords】: Heuristic search, Phase transition, Greedy Best First Search, GBFS, Problem Hardness

110. Automatic Logic-Based Benders Decomposition with MiniZinc.

Paper Link】 【Pages】:787-793

【Authors】: Toby O. Davies ; Graeme Gange ; Peter J. Stuckey

【Abstract】: Logic-based Benders decomposition (LBBD) is a powerful hybrid optimisation technique that can combine the strong dual bounds of mixed integer programming (MIP) with the combinatorial search strengths of constraint programming (CP). A major drawback of LBBD is that it is a far more involved process to implement an LBBD solution to a problem than the "model-and-run" approach provided by both CP and MIP. We propose an automated approach that accepts an arbitrary MiniZinc model and solves it using LBBD with no additional intervention on the part of the modeller. The design of this approach also reveals an interesting duality between LBBD and large neighborhood search (LNS). We compare our implementation of this approach to CP and MIP solvers on 4 different problem classes where LBBD has been applied before.

【Keywords】: Combinatorial Optimisation; Constraint Programming; Logic-Based Benders Decomposition

111. Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization.

Paper Link】 【Pages】:794-800

【Authors】: Cong Fang ; Zhouchen Lin

【Abstract】: Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an asymptotic O(\frac{1}{\epsilon^2}) rate (convergence to the stationary points) on nonconvex problems. In this paper, we study Stochastic Variance Reduced Gradient (SVRG) in the asynchronous setting. We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems. We develop two schemes for ASVRG, depending on whether the parameters are updated as an atom or not. We prove that both of the two schemes can achieve linear speed up (a non-asymptotic O(\frac{n^\frac{2}{3}}{\epsilon}) rate to the stationary points) for nonconvex problems when the delay parameter \tau\leq n^{\frac{1}{3}}, where n is the number of training samples. We also establish a non-asymptotic O(\frac{n^\frac{2}{3}\tau^\frac{1}{3}}{\epsilon}) rate (convergence to the stationary points) for our algorithm without assumptions on \tau. This further demonstrates that even with asynchronous updating, SVRG has less number of Incremental First-order Oracles (IFOs) compared with Stochastic Gradient Descent and Gradient Descent. We also experiment on a shared memory multi-core system to demonstrate the efficiency of our algorithm.

【Keywords】: Asynchronous; Variance Reduction; Nonconvex

112. A Generic Bet-and-Run Strategy for Speeding Up Stochastic Local Search.

Paper Link】 【Pages】:801-807

【Authors】: Tobias Friedrich ; Timo Kötzing ; Markus Wagner

【Abstract】: A common strategy for improving optimization algorithms is to restart the algorithm when it is believed to be trapped in an inferior part of the search space. However, while specific restart strategies have been developed for specific problems (and specific algorithms), restarts are typically not regarded as a general tool to speed up an optimization algorithm. In fact, many optimization algorithms do not employ restarts at all. Recently, "bet-and-run" was introduced in the context of mixed-integer programming, where first a number of short runs with randomized initial conditions is made, and then the most promising run of these is continued. In this article, we consider two classical NP-complete combinatorial optimization problems, traveling salesperson and minimum vertex cover, and study the effectiveness of different bet-and-run strategies. In particular, our restart strategies do not take any problem knowledge into account, nor are tailored to the optimization algorithm. Therefore, they can be used off-the-shelf. We observe that state-of-the-art solvers for these problems can benefit significantly from restarts on standard benchmark instances.

【Keywords】: heuristic search; erraticism

113. The Simultaneous Maze Solving Problem.

Paper Link】 【Pages】:808-814

【Authors】: Stefan Funke ; André Nusser ; Sabine Storandt

【Abstract】: A grid maze is a binary matrix where fields containing a 0 are accessible while fields containing a 1 are blocked. A movement sequence consists of relative movements up, down, left, right – moving to a blocked field results in non-movement. The simultaneous maze solving problem asks for the shortest movement sequence starting in the upper left corner and visiting the lower right corner for all mazes of size n × m (for which a path from the upper left to the lower right corner exists at all). We present a theoretical problem analysis, including hardness results and a cubic upper bound on the sequence length. In addition, we describe several approaches to practically compute solving sequences and lower bounds despite the high combinatorial complexity of the problem.

【Keywords】: conformant planning; maze; universal traversal sequence; solving sequence

114. Going Beyond Primal Treewidth for (M)ILP.

Paper Link】 【Pages】:815-821

【Authors】: Robert Ganian ; Sebastian Ordyniak ; M. S. Ramanujan

【Abstract】: Integer Linear Programming (ILP) and its mixed variant (MILP) are archetypical examples of NP-complete optimization problems which have a wide range of applications in various areas of artificial intelligence. However, we still lack a thorough understanding of which structural restrictions make these problems tractable. Here we focus on structure captured via so-called decompositional parameters, which have been highly successful in fields such as boolean satisfiability and constraint satisfaction but have not yet reached their full potential in the ILP setting. In particular, primal treewidth (an established decompositional parameter) can only be algorithmically exploited to solve ILP under restricted circumstances. Our main contribution is the introduction and algorithmic exploitation of two new decompositional parameters for ILP and MILP. The first, torso-width, is specifically tailored to the linear programming setting and is the first decompositional parameter which can also be used for MILP. The latter, incidence treewidth, is a concept which originates from boolean satisfiability but has not yet been used in the ILP setting; here we obtain a full complexity landscape mapping the precise conditions under which incidence treewidth can be used to obtain efficient algorithms. Both of these parameters overcome previous shortcomings of primal treewidth for ILP in unique ways, and consequently push the frontiers of tractability for these important problems.

【Keywords】: (mixed) integer linear programming; (torso/incidence) treewidth; parameterized complexity; complexity landscape

115. Efficient Hyperparameter Optimization for Deep Learning Algorithms Using Deterministic RBF Surrogates.

Paper Link】 【Pages】:822-829

【Authors】: Ilija Ilievski ; Taimoor Akhtar ; Jiashi Feng ; Christine Annette Shoemaker

【Abstract】: Automatically searching for optimal hyperparameter configurations is of crucial importance for applying deep learning algorithms in practice. Recently, Bayesian optimization has been proposed for optimizing hyperparameters of various machine learning algorithms. Those methods adopt probabilistic surrogate models like Gaussian processes to approximate and minimize the validation error function of hyperparameter values. However, probabilistic surrogates require accurate estimates of sufficient statistics (e.g., covariance) of the error distribution and thus need many function evaluations with a sizeable number of hyperparameters. This makes them inefficient for optimizing hyperparameters of deep learning algorithms, which are highly expensive to evaluate. In this work, we propose a new deterministic and efficient hyperparameter optimization method that employs radial basis functions as error surrogates. The proposed mixed integer algorithm, called HORD, searches the surrogate for the most promising hyperparameter values through dynamic coordinate search and requires many fewer function evaluations. HORD does well in low dimensions but it is exceptionally better in higher dimensions. Extensive evaluations on MNIST and CIFAR-10 for four deep neural networks demonstrate HORD significantly outperforms the well-established Bayesian optimization methods such as GP, SMAC, and TPE. For instance, on average, HORD is more than 6 times faster than GP-EI in obtaining the best configuration of 19 hyperparameters.

【Keywords】: hyperparameter optimization; surrogate optimization; neural networks

116. An Exact Algorithm for the Maximum Weight Clique Problem in Large Graphs.

Paper Link】 【Pages】:830-838

【Authors】: Hua Jiang ; Chu-Min Li ; Felip Manyà

【Abstract】: We describe an exact branch-and-bound algorithm for the maximum weight clique problem (MWC), called WLMC, that is especially suited for large vertex-weighted graphs. WLMC incorporates two original contributions: a preprocessing to derive an initial vertex ordering and to reduce the size of the graph, and incremental vertex-weight splitting to reduce the number of branches in the search space. Experiments on representative large graphs from real-world applications show that WLMC greatly outperforms relevant exact and heuristic MWC algorithms, and refute the prevailing hypothesis that exact MWC algorithms are less adequate for large graphs than heuristic algorithms.

【Keywords】: Maximum Weight Clique Problem; Branch-and-Bound;Incremental Search;Exact Algorithm

117. Learning to Prune Dominated Action Sequences in Online Black-Box Planning.

Paper Link】 【Pages】:839-845

【Authors】: Yuu Jinnai ; Alex S. Fukunaga

【Abstract】: Black-box domains where the successor states generated by applying an action are generated by a completely opaque simulator pose a challenge for domain-independent planning. The main computational bottleneck in search-based planning for such domains is the number of calls to the black-box simulation. We propose a method for significantly reducing the number of calls to the simulator by the search algorithm by detecting and pruning sequences of actions which are dominated by others. We apply our pruning method to Iterated Width and breadth-first search in domain-independent black-box planning for Atari 2600 games in the Arcade Learning Environment (ALE), adding our pruning method significantly improves upon the baseline algorithms.

【Keywords】: Black-box Planning; Online Search; Arcade Learning Environment

Paper Link】 【Pages】:846-852

【Authors】: Maximilian Katzmann ; Christian Komusiewicz

【Abstract】: We investigate the potential of exhaustively exploring larger neighborhoods in local search algorithms for Minimum Vertex Cover. More precisely, we study whether, for moderate values of k , it is feasible and worthwhile to determine, given a graph G with vertex cover C , if there is a k -swap S such that ( C ∖ S ) ∪ ( S ∖ C ) is a smaller vertex cover of G . First, we describe an algorithm running in ∆ O(k) ⋅ n time for searching the k -swap neighborhood on n -vertex graphs with maximum degree ∆. Then, we demonstrate that, by devising additional pruning rules that decrease the size of the search space, this algorithm can be implemented so that it solves the problem quickly for k ≈ 20. Finally, we show that it is worthwhile to consider moderately-sized k -swap neighborhoods. For our benchmark data set, we show that when combining our algorithm with a hill-climbing approach, the solution quality improves quickly with the radius k of the local search neighborhood and that in most cases optimal solutions can be found by setting k =21.

【Keywords】: NP-hard problems; parameterized algorithm; graph problem;

119. New Lower Bound for the Minimum Sum Coloring Problem.

Paper Link】 【Pages】:853-859

【Authors】: Clément Lecat ; Corinne Lucet ; Chu-Min Li

【Abstract】: The Minimum Sum Coloring Problem (MSCP) is an NP-Hard problem derived from the graph coloring problem (GCP) and has practical applications in different domains such as VLSI design, distributed resource allocation, and scheduling. There exist few exact solutions for MSCP, probably due to its search space much more elusive than that of GCP. On the contrary, much effort is spent in the literature to develop upper and lower bounds for MSCP. In this paper, we borrow a notion called motif, that was used in a recent work for upper bounding the minimum number of colors in an optimal solution of MSCP, to develop a new algebraic lower bound called for MSCP. Experiments on standard benchmarks for MSCP and GCP show that this new lower bound is substantially better than the existing lower bounds for several families of graphs.

【Keywords】: graph coloring;sum coloring;scheduling

Paper Link】 【Pages】:860-867

【Authors】: Qi Lou ; Rina Dechter ; Alexander T. Ihler

【Abstract】: Bounding the partition function is a key inference task in many graphical models. In this paper, we develop an anytime anyspace search algorithm taking advantage of AND/OR tree structure and optimized variational heuristics to tighten deterministic bounds on the partition function. We study how our priority-driven best-first search scheme can improve on state-of-the-art variational bounds in an anytime way within limited memory resources, as well as the effect of the AND/OR framework to exploit conditional independence structure within the search process within the context of summation. We compare our resulting bounds to a number of existing methods, and show that our approach offers a number of advantages on real-world problem instances taken from recent UAI competitions.

【Keywords】: graphical models, AND/OR search, partition function, anytime, anyspace

121. Dancing with Decision Diagrams: A Combined Approach to Exact Cover.

Paper Link】 【Pages】:868-874

【Authors】: Masaaki Nishino ; Norihito Yasuda ; Shin-ichi Minato ; Masaaki Nagata

【Abstract】: Exact cover is the problem of finding subfamilies, S , of a family of sets, S , over universe U , where S forms a partition of  U .  It is a popular NP-hard problem appearing in a wide range of computer science studies. Knuth's algorithm DLX, a backtracking-based depth-first search implemented with the data structure called dancing links, is known as state-of-the-art for finding all exact covers. We propose a method to accelerate DLX. Our method constructs a Zero-suppressed Binary Decision Diagram (ZDD) that represents the set of solutions while running depth-first search in DLX. Constructing ZDDs enables the efficient use of memo cache to speed up the search. Moreover, our method has a virtue that it outputs ZDDs; we can perform several useful operations with them. Experiments confirm that the proposed method is up to several orders of magnitude faster than DLX.

【Keywords】: Exact Cover; Dancing Links; Enumeration; Zero-suppressed Binary Decision Diagrams

122. Solving High-Dimensional Multi-Objective Optimization Problems with Low Effective Dimensions.

Paper Link】 【Pages】:875-881

【Authors】: Hong Qian ; Yang Yu

【Abstract】: Multi-objective (MO) optimization problems require simultaneously optimizing two or more objective functions. An MO algorithm needs to find solutions that reach different optimal balances of the objective functions, i.e., optimal Pareto front, therefore, high dimensionality of the solution space can hurt MO optimization much severer than single-objective optimization, which was little addressed in previous studies. This paper proposes a general, theoretically-grounded yet simple approach ReMO, which can scale current derivative-free MO algorithms to the high-dimensional non-convex MO functions with low effective dimensions, using random embedding. We prove the conditions under which an MO function has a low effective dimension, and for such functions, we prove that ReMO possesses the desirable properties of optimal Pareto front preservation, time complexity reduction, and rotation perturbation invariance. Experimental results indicate that ReMO is effective for optimizing the high-dimensional MO functions with low effective dimensions, and is even effective for the high-dimensional MO functions where all dimensions are effective but most only have a small and bounded effect on the function value.

【Keywords】: multi-objective optimization; high-dimensional optimization; derivative-free optimization; random embedding

123. Automated Data Extraction Using Predictive Program Synthesis.

Paper Link】 【Pages】:882-890

【Authors】: Mohammad Raza ; Sumit Gulwani

【Abstract】: In recent years there has been rising interest in the use of programming-by-example techniques to assist users in data manipulation tasks. Such techniques rely on an explicit input-output examples specification from the user to automatically synthesize programs. However, in a wide range of data extraction tasks it is easy for a human observer to predict the desired extraction by just observing the input data itself. Such predictive intelligence has not yet been explored in program synthesis research, and is what we address in this work. We describe a predictive program synthesis algorithm that infers programs in a general form of extraction DSLs (domain specific languages) given input-only examples. We describe concrete instantiations of such DSLs and the synthesis algorithm in the two practical application domains of text extraction and web extraction, and present an evaluation of our technique on a range of extraction tasks encountered in practice.

【Keywords】: programming-by-example

124. Grid Pathfinding on the 2k Neighborhoods.

Paper Link】 【Pages】:891-897

【Authors】: Nicolas Rivera ; Carlos Hernández ; Jorge A. Baier

【Abstract】: Grid pathfinding, an old AI problem, is central for the development of navigation systems for autonomous agents. A surprising fact about the vast literature on this problem is that very limited neighborhoods have been studied. Indeed, only the 4- and 8-neighborhoods are usually considered, and rarely the 16-neighborhood. This paper describes three contributions that enable the construction of effective grid path planners for extended 2 k -neighborhoods. First, we provide a simple recursive definition of the 2 k -neighborhood in terms of the 2 k –1 -neighborhood. Second, we derive distance functions, for any k >1, which allow us to propose admissible heurisitics which are perfect for obstacle-free grids. Third, we describe a canonical ordering which allows us to implement a version of A whose performance scales well when increasing k . Our empirical evaluation shows that the heuristics we propose are superior to the Euclidean distance (ED) when regular A is used. For grids beyond 64 the overhead of computing the heuristic yields decreased time performance compared to the ED. We found also that a configuration of our A-based implementation, without canonical orders, is competitive with the "any-angle" path planner Theta$^$ both in terms of solution quality and runtime.

【Keywords】: 16-neighborhood; 32-neighborhood; A*; canonical orderings

125. Non-Monotone DR-Submodular Function Maximization.

Paper Link】 【Pages】:898-904

【Authors】: Tasuku Soma ; Yuichi Yoshida

【Abstract】: We consider non-monotone DR-submodular function maximization, where DR-submodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing non-monotone DR-submodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a 1/(2+ε)-approximation algorithm with a running time of roughly O( n /ε log 2 B ), where n is the size of the ground set, B is the maximum value of a coordinate, and ε > 0 is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and real-world datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

【Keywords】: Submodular Function, Approximate Algorithms

126. Regret Ratio Minimization in Multi-Objective Submodular Function Maximization.

Paper Link】 【Pages】:905-911

【Authors】: Tasuku Soma ; Yuichi Yoshida

【Abstract】: Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective func-tions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a nontrivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable trade-off between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.

【Keywords】: submodular function, multiobjective optimization

127. Value Compression of Pattern Databases.

Paper Link】 【Pages】:912-918

【Authors】: Nathan R. Sturtevant ; Ariel Felner ; Malte Helmert

【Abstract】: One common pattern database compression technique is to merge adjacent database entries and store the minimum of merged entries to maintain heuristic admissibility. In this paper we propose a compression technique that preserves every entry, but reduces the number of bits used to store each entry, therefore limiting the values that can be represented. Even when this technique throws away low values in the heuristic, it can still have better performance than the traditional approach. We develop a theoretical basis for selecting which values to keep and show improved performance in both unidirectional and bidirectional search.

【Keywords】: heuristic; pattern database; compression

128. A Fast Algorithm to Compute Maximum k-Plexes in Social Network Analysis.

Paper Link】 【Pages】:919-925

【Authors】: Mingyu Xiao ; Weibo Lin ; Yuanshun Dai ; Yifeng Zeng

【Abstract】: A clique model is one of the most important techniques on the cohesive subgraph detection; however, its applications are rather limited due to restrictive conditions of the model. Hence much research resorts to k -plex — a graph in which any vertex is adjacent to all but at most k vertices — which is a relaxation model of the clique. In this paper, we study the maximum k -plex problem and propose a fast algorithm to compute maximum k -plexes by exploiting structural properties of the problem. In an n -vertex graph, the algorithm computes optimal solutions in c n n O(1) time for a constant c < 2 depending only on k . To the best of our knowledge, this is the first algorithm that breaks the trivial theoretical bound of 2 n for each k ≥ 3. We also provide experimental results over multiple real-world social network instances in support.

【Keywords】: Exact algorithms; Social network; k-plex

129. A Unified Convex Surrogate for the Schatten-p Norm.

Paper Link】 【Pages】:926-932

【Authors】: Chen Xu ; Zhouchen Lin ; Hongbin Zha

【Abstract】: The Schatten- p norm (0 <  p < 1) has been widely used to replace the nuclear norm for better approximating the rank function. However, existing methods are either 1) not scalable for large scale problems due to relying on singular value decomposition (SVD) in every iteration, or 2) specific to some p values, e.g., 1/2, and 2/3. In this paper, we show that for any p , p 1 , and p 2 > 0 satisfying 1/ p = 1/ p 1 + 1/ p 2 , there is an equivalence between the Schatten- p norm of one matrix and the Schatten- p 1 and the Schatten- p 2 norms of its two factor matrices. We further extend the equivalence to multiple factor matrices and show that all the factor norms can be convex and smooth for any p > 0. In contrast, the original Schatten- p norm for 0 < p < 1 is non-convex and non-smooth. As an example we conduct experiments on matrix completion. To utilize the convexity of the factor matrix norms, we adopt the accelerated proximal alternating linearized minimization algorithm and establish its sequence convergence. Experiments on both synthetic and real datasets exhibit its superior performance over the state-of-the-art methods. Its speed is also highly competitive.

【Keywords】: Schatten-$p$ norm; Matrix Factorization; Matrix Completion; PALM

130. Efficient Stochastic Optimization for Low-Rank Distance Metric Learning.

Paper Link】 【Pages】:933-940

【Authors】: Jie Zhang ; Lijun Zhang

【Abstract】: Although distance metric learning has been successfully applied to many real-world applications, learning a distance metric from large-scale and high-dimensional data remains a challenging problem. Due to the PSD constraint, the computational complexity of previous algorithms per iteration is at least O ( d 2 ) where d is the dimensionality of the data.In this paper, we develop an efficient stochastic algorithm  for a class of distance metric learning problems with nuclear norm regularization, referred to as low-rank DML. By utilizing the low-rank structure of the intermediate solutions and stochastic gradients, the complexity of our algorithm has a linear dependence on the dimensionality d . The key idea is to maintain all the iterates  in factorized representations  and construct  stochastic gradients that are low-rank. In this way, the projection onto the PSD cone can be implemented efficiently by incremental SVD. Experimental results on several data sets validate the effectiveness and efficiency of our method.

【Keywords】: Stochastic Optimization; Low-Rank DML; Incremental SVD

Human-Aware Artificial Intelligence 3

131. Examples-Rules Guided Deep Neural Network for Makeup Recommendation.

Paper Link】 【Pages】:941-947

【Authors】: Taleb Alashkar ; Songyao Jiang ; Shuyang Wang ; Yun Fu

【Abstract】: In this paper, we consider a fully automatic makeup recommendation system and propose a novel examples-rules guided deep neural network approach. The framework consists of three stages. First, makeup-related facial traits are classified into structured coding. Second, these facial traits are fed in- to examples-rules guided deep neural recommendation model which makes use of the pairwise of Before-After images and the makeup artist knowledge jointly. Finally, to visualize the recommended makeup style, an automatic makeup synthesis system is developed as well. To this end, a new Before-After facial makeup database is collected and labeled manually, and the knowledge of makeup artist is modeled by knowledge base system. The performance of this framework is evaluated through extensive experimental analyses. The experiments validate the automatic facial traits classification, the recommendation effectiveness in statistical and perceptual ways and the makeup synthesis accuracy which outperforms the state of the art methods by large margin. It is also worthy to note that the proposed framework is a pioneering fully automatic makeup recommendation systems to our best knowledge.

【Keywords】: makeup recommendation; deep learning; neural network; rule-based system

132. Predicting Latent Narrative Mood Using Audio and Physiologic Data.

Paper Link】 【Pages】:948-954

【Authors】: Tuka Waddah AlHanai ; Mohammad Mahdi Ghassemi

【Abstract】: Inferring the latent emotive content of a narrative requires consideration of para-linguistic cues (e.g. pitch), linguistic content (e.g. vocabulary) and the physiological state of the narrator (e.g. heart-rate). In this study we utilized a combination of auditory, text, and physiological signals to predict the mood (happy or sad) of 31 narrations from subjects engaged in personal story-telling. We extracted 386 audio and 222 physiological features (using the Samsung Simband) from the data. A subset of 4 audio, 1 text, and 5 physiologic features were identified using Sequential Forward Selection (SFS) for inclusion in a Neural Network (NN). These features included subject movement, cardiovascular activity, energy in speech, probability of voicing, and linguistic sentiment (i.e. negative or positive). We explored the effects of introducing our selected features at various layers of the NN and found that the location of these features in the network topology had a significant impact on model performance. To ensure the real-time utility of the model, classification was performed over 5 second intervals. We evaluated our model’s performance using leave-one-subject-out crossvalidation and compared the performance to 20 baseline models and a NN with all features included in the input layer.

【Keywords】: emotion; neural networks; physiology; feature selection; acoustics

133. Collaborative Planning with Encoding of Users' High-Level Strategies.

Paper Link】 【Pages】:955-962

【Authors】: Joseph Kim ; Christopher J. Banks ; Julie A. Shah

【Abstract】: The generation of near-optimal plans for multi-agent systems with numerical states and temporal actions is computationally challenging. Current off-the-shelf planners can take a very long time before generating a near-optimal solution. In an effort to reduce plan computation time, increase the quality of the resulting plans, and make them more interpretable by humans, we explore collaborative planning techniques that actively involve human users in plan generation. Specifically, we explore a framework in which users provide high-level strategies encoded as soft preferences to guide the low-level search of the planner. Through human subject experimentation, we empirically demonstrate that this approach results in statistically significant improvements to plan quality, without substantially increasing computation time. We also show that the resulting plans achieve greater similarity to those generated by humans with regard to the produced sequences of actions, as compared to plans that do not incorporate user-provided strategies.

【Keywords】: Collaborative Planning, Human-Aware AI, Human-Robot Interaction

Human Computation and Crowd Sourcing 3

Paper Link】 【Pages】:963-969

【Authors】: Ethan Fast ; Eric Horvitz

【Abstract】: Analyses of text corpora over time can reveal trends in beliefs, interest, and sentiment about a topic. We focus on views expressed about artificial intelligence (AI) in the New York Times over a 30-year period. General interest, awareness, and discussion about AI has waxed and waned since the field was founded in 1956. We present a set of measures that captures levels of engagement, measures of pessimism and optimism, the prevalence of specific hopes and concerns, and topics that are linked to discussions about AI over decades. We find that discussion of AI has increased sharply since 2009, and that these discussions have been consistently more optimistic than pessimistic. However, when we examine specific concerns, we find that worries of loss of control of AI, ethical concerns for AI, and the negative impact of AI on work have grown in recent years. We also find that hopes for AI in healthcare and education have increased over time.

【Keywords】: artificial intelligence; news analysis; crowdsourcing

135. A Theoretical Analysis of First Heuristics of Crowdsourced Entity Resolution.

Paper Link】 【Pages】:970-976

【Authors】: Arya Mazumdar ; Barna Saha

【Abstract】: Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Due to inherent ambiguity of data representation and poor data quality, ER is a challenging task for any automated process. As a remedy, human-powered ER via crowdsourcing has become popular in recent years. Using crowd to answer queries is costly and time consuming. Furthermore, crowd-answers can often be faulty. Therefore, crowd-based ER methods aim to minimize human participation without sacrificing the quality and use a computer generated similarity matrix actively. While, some of these methods perform well in practice, no theoretical analysis exists for them, and further their worst case performances do not reflect the experimental findings. This creates a disparity in the understanding of the popular heuristics for this problem. In this paper, we make the first attempt to close this gap. We provide a thorough analysis of the prominent heuristic algorithms for crowd-based ER. We justify experimental observations with our analysis and information theoretic lower bounds.

【Keywords】: Clustering; Crowdsourcing; Algorithms; Entity Resolution

136. Pairwise HITS: Quality Estimation from Pairwise Comparisons in Creator-Evaluator Crowdsourcing Process.

Paper Link】 【Pages】:977-984

【Authors】: Takeru Sunahase ; Yukino Baba ; Hisashi Kashima

【Abstract】: A common technique for improving the quality of crowdsourcing results is to assign a same task to multiple workers redundantly, and then to aggregate the results to obtain a higher-quality result; however, this technique is not applicable to complex tasks such as article writing since there is no obvious way to aggregate the results. Instead, we can use a two-stage procedure consisting of a creation stage and an evaluation stage, where we first ask workers to create artifacts, and then ask other workers to evaluate the artifacts to estimate their quality. In this study, we propose a novel quality estimation method for the two-stage procedure where pairwise comparison results for pairs of artifacts are collected at the evaluation stage. Our method is based on an extension of Kleinberg's HITS algorithm to pairwise comparison, which takes into account the ability of evaluators as well as the ability of creators. Experiments using actual crowdsourcing tasks show that our methods outperform baseline methods especially when the number of evaluators per artifact is small.

【Keywords】: crowdsourcing; quality estimation

Humans and Artificial Intelligence 6

137. The Benefit in Free Information Disclosure When Selling Information to People.

Paper Link】 【Pages】:985-992

【Authors】: Shani Alkoby ; David Sarne

【Abstract】: This paper studies the benefit for information providers in free public information disclosure in settings where the prospective information buyers are people. The underlying model, which applies to numerous real-life situations, considers a standard decision making setting where the decision maker is uncertain about the outcomes of her decision. The information provider can fully disambiguate this uncertainty and wish to maximize her profit from selling such information. We use a series of AMT-based experiments with people to test the benefit for the information provider from reducing some of the uncertainty associated with the decision maker's problem, for free. Free information disclosure of this kind can be proved to be ineffective when the buyer is a fully-rational agent. Yet, when it comes to people we manage to demonstrate that a substantial improvement in the information provider's profit can be achieved with such an approach. The analysis of the results reveals that the primary reason for this phenomena is people's failure to consider the strategic nature of the interaction with the information provider. Peoples' inability to properly calculate the value of information is found to be secondary in its influence.

【Keywords】: Free Information Disclosure; Strategic Information Provider; Human-Computer Interaction

138. Psychologically Based Virtual-Suspect for Interrogative Interview Training.

Paper Link】 【Pages】:993-1000

【Authors】: Moshe Bitan ; Galit Nahari ; Zvi Nisin ; Ariel Roth ; Sarit Kraus

【Abstract】: In this paper, we present a Virtual-Suspect system which can be used to train inexperienced law enforcement personnel in interrogation strategies. The system supports different scenario configurations based on historical data. The responses presented by the Virtual-Suspect are selected based on the psychological state of the suspect, which can be configured as well. Furthermore, each interrogator's statement affects the Virtual-Suspect's current psychological state, which may lead the interrogation in different directions. In addition, the model takes into account the context in which the statements are made. Experiments with 24 subjects demonstrate that the Virtual-Suspect's behavior is similar to that of a human who plays the role of the suspect.

【Keywords】: Virtual Human, Training, Interrogative, Interviewing, Human Studies

139. PIVE: Per-Iteration Visualization Environment for Real-Time Interactions with Dimension Reduction and Clustering.

Paper Link】 【Pages】:1001-1009

【Authors】: Hannah Kim ; Jaegul Choo ; Changhyun Lee ; Hanseung Lee ; Chandan K. Reddy ; Haesun Park

【Abstract】: One of the key advantages of visual analytics is its capability to leverage both humans's visual perception and the power of computing. A big obstacle in integrating machine learning with visual analytics is its high computing cost. To tackle this problem, this paper presents PIVE (Per-Iteration Visualization Environment) that supports real-time interactive visualization with machine learning. By immediately visualizing the intermediate results from algorithm iterations, PIVE enables users to quickly grasp insights and interact with the intermediate output, which then affects subsequent algorithm iterations. In addition, we propose a widely-applicable interaction methodology that allows efficient incorporation of user feedback into virtually any iterative computational method without introducing additional computational cost. We demonstrate the application of PIVE for various dimension reduction algorithms such as multidimensional scaling and t-SNE and clustering and topic modeling algorithms such as k-means and latent Dirichlet allocation.

【Keywords】: Real-time interaction; Large-scale visualization; Multi-threading; Clustering; Dimension reduction; Visual analytics

140. JAG: A Crowdsourcing Framework for Joint Assessment and Peer Grading.

Paper Link】 【Pages】:1010-1016

【Authors】: Igor Labutov ; Christoph Studer

【Abstract】: Generation and evaluation of crowdsourced content is commonly treated as two separate processes, performed at different times and by two distinct groups of people: content creators and content assessors. As a result, most crowdsourcing tasks follow this template: one group of workers generates content and another group of workers evaluates it. In an educational setting, for example, content creators are traditionally students that submit open-response answers to assignments (e.g., a short answer, a circuit diagram, or a formula) and content assessors are instructors that grade these submissions. Despite the considerable success of peer-grading in massive open online courses (MOOCs), the process of test-taking and grading are still treated as two distinct tasks which typically occur at different times, and require an additional overhead of grader training and incentivization. Inspired by this problem in the context of education, we propose a general crowdsourcing framework that fuses open-response test-taking (content generation) and assessment into a single, streamlined process that appears to students in the form of an explicit test, but where everyone also acts as an implicit grader. The advantages offered by our framework include: a common incentive mechanism for both the creation and evaluation of content, and a probabilistic model that jointly models the processes of contribution and evaluation, facilitating efficient estimation of the quality of the contributions and the competency of the contributors. We demonstrate the effectiveness and limits of our framework via simulations and a real-world user study.

【Keywords】: crowdsourcing; peer grading; graphical model

141. On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems.

Paper Link】 【Pages】:1017-1025

【Authors】: Besmira Nushi ; Ece Kamar ; Eric Horvitz ; Donald Kossmann

【Abstract】: We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple points in the execution workflow. Moreover, errors can propagate, become amplified or be suppressed, making blame assignment difficult. We propose a human-in-the-loop methodology which leverages human intellect for troubleshooting system failures. The approach simulates potential component fixes through human computation tasks and measures the expected improvements in the holistic behavior of the system. The method provides guidance to designers about how they can best improve the system. We demonstrate the effectiveness of the approach on an automated image captioning system that has been pressed into real-world use.

【Keywords】: troubleshooting; machine learning; crowdsourcing

142. Capturing Dependencies among Labels and Features for Multiple Emotion Tagging of Multimedia Data.

Paper Link】 【Pages】:1026-1033

【Authors】: Shan Wu ; Shangfei Wang ; Qiang Ji

【Abstract】: In this paper, we tackle the problem of emotion tagging of multimedia data by modeling the dependencies among multiple emotions in both the feature and label spaces. These dependencies, which carry crucial top-down and bottom-up evidence for improving multimedia affective content analysis, have not been thoroughly exploited yet. To this end, we propose two hierarchical models that independently and dependently learn the shared features and global semantic relationships among emotion labels to jointly tag multiple emotion labels of multimedia data. Efficient learning and inference algorithms of the proposed models are also developed. Experiments on three benchmark emotion databases demonstrate the superior performance of our methods to existing methods.

【Keywords】:

Knowledge Representation and Reasoning 33

143. On the Computation of Paracoherent Answer Sets.

Paper Link】 【Pages】:1034-1040

【Authors】: Giovanni Amendola ; Carmine Dodaro ; Wolfgang Faber ; Nicola Leone ; Francesco Ricca

【Abstract】: Answer Set Programming (ASP) is a well-established formalism for nonmonotonic reasoning. An ASP program can have no answer set due to cyclic default negation. In this case, it is not possible to draw any conclusion, even if this is not intended. Recently, several paracoherent semantics have been proposed that address this issue,and several potential applications for these semantics have been identified. However, paracoherent semantics have essentially been inapplicable in practice, due to the lack of efficient algorithms and implementations. In this paper, this lack is addressed, and several different algorithms to compute semi-stable and semi-equilibrium models are proposed and implemented into an answer set solving framework. An empirical performance comparison among the new algorithms on benchmarks from ASP competitions is given as well.

【Keywords】: Answer Set Programming; Paracoherent Answer Set; Inconsistency Tolerance; Nonmonotonic Reasoning

144. Polynomially Bounded Logic Programs with Function Symbols: A New Decidable.

Paper Link】 【Pages】:1041-1047

【Authors】: Vernon Asuncion ; Yan Zhang ; Heng Zhang

【Abstract】: A logic program with function symbols is called finitely ground if there is a finite propositional logic program whose stable models are exactly the same as the stable models of this program. Finite groundability is an important property for logic programs with function symbols because it makes feasible to compute such program’s stable models using traditional ASP solvers. In this paper, we introduce a new decidable class of finitely ground programs called POLY-bounded programs, which, to the best of our knowledge, strictly contains all decidable classes of finitely ground programs discovered so far in the literature. We also study the related complexity property for this class of programs. We prove that deciding whether a program is POLY-bounded is EXPTIMEcomplete.

【Keywords】: Knowledge representaiton and reasoning, logic programs, stable model semantics, complexity, decidability

145. Abstraction in Situation Calculus Action Theories.

Paper Link】 【Pages】:1048-1055

【Authors】: Bita Banihashemi ; Giuseppe De Giacomo ; Yves Lespérance

【Abstract】: We develop a general framework for agent abstraction based on the situation calculus and the ConGolog agent programming language. We assume that we have a high-level specification and a low-level specification of the agent, both represented as basic action theories. A refinement mapping specifies how each high-level action is implemented by a low-level ConGolog program and how each high-level fluent can be translated into a low-level formula. We define a notion of sound abstraction between such action theories in terms of the existence of a suitable bisimulation between their respective models. Sound abstractions have many useful properties that ensure that we can reason about the agent's actions (e.g., executability, projection, and planning) at the abstract level, and refine and concretely execute them at the low level. We also characterize the notion of complete abstraction where all actions (including exogenous ones) that the high level thinks can happen can in fact occur at the low level.

【Keywords】: Reasoning About Action; Situation Calculus; Abstraction; Agent; Bisimulation; Refinement Mapping

146. Source Information Disclosure in Ontology-Based Data Integration.

Paper Link】 【Pages】:1056-1062

【Authors】: Michael Benedikt ; Bernardo Cuenca Grau ; Egor V. Kostylev

【Abstract】: Ontology-based data integration systems allow users to effectively access data sitting in multiple sources by means of queries over a global schema described by an ontology. In practice, datasources often contain sensitive information that the data owners want to keep inaccessible to users. In this paper, we formalize and study the problem of determining whether a given data integration system discloses a source query to an attacker. We consider disclosure on a particular dataset, and also whether a schema admits a dataset on which disclosure occurs. We provide lower and upper bounds on disclosure analysis, in the process introducing a number of techniques for analyzing logical privacy issues in ontology-based data integration.

【Keywords】: logic; ontology; rule

147. Ontology-Mediated Queries for Probabilistic Databases.

Paper Link】 【Pages】:1063-1069

【Authors】: Stefan Borgwardt ; Ismail Ilkan Ceylan ; Thomas Lukasiewicz

【Abstract】: Probabilistic databases (PDBs) are usually incomplete, e.g., containing only the facts that have been extracted from the Web with high confidence. However, missing facts are often treated as being false, which leads to unintuitive results when querying PDBs. Recently, open-world probabilistic databases (OpenPDBs) were proposed to address this issue by allowing probabilities of unknown facts to take any value from a fixed probability interval. In this paper, we extend OpenPDBs by Datalog+/- ontologies, under which both upper and lower probabilities of queries become even more informative, enabling us to distinguish queries that were indistinguishable before. We show that the dichotomy between P and PP in (Open)PDBs can be lifted to the case of first-order rewritable positive programs (without negative constraints); and that the problem can become NP^PP-complete, once negative constraints are allowed. We also propose an approximating semantics that circumvents the increase in complexity caused by negative constraints.

【Keywords】: probabilistic databases; imprecise probabilities; ontology based data access; data complexity dichotomy; open-world reasoning

148. Ontology-Based Data Access with a Horn Fragment of Metric Temporal Logic.

Paper Link】 【Pages】:1070-1076

【Authors】: Sebastian Brandt ; Elem Güzel Kalayci ; Roman Kontchakov ; Vladislav Ryzhikov ; Guohui Xiao ; Michael Zakharyaschev

【Abstract】: We advocate datalogMTL, a datalog extension of a Horn fragment of the metric temporal logic MTL, as a language for ontology-based access to temporal log data. We show that datalogMTL is EXPSPACE-complete even with punctual intervals, in which case MTL is known to be undecidable. Nonrecursive datalogMTL turns out to be PSPACE-complete for combined complexity and in AC0 for data complexity. We demonstrate by two real-world use cases that nonrecursive datalogMTL programs can express complex temporal concepts from typical user queries and thereby facilitate access to log data. Our experiments with Siemens turbine data and MesoWest weather data show that datalogMTL ontology-mediated queries are efficient and scale on large datasets of up to 11GB.

【Keywords】: ontology-based data access; metric temporal logic; datalog

149. Solving Advanced Argumentation Problems with Answer-Set Programming.

Paper Link】 【Pages】:1077-1083

【Authors】: Gerhard Brewka ; Martin Diller ; Georg Heissenberger ; Thomas Linsbichler ; Stefan Woltran

【Abstract】: Powerful formalisms for abstract argumentation have been proposed. Their complexity is often located beyond NP and ranges up to the third level of the polynomial hierarchy. The combined complexity of Answer-Set Programming (ASP) exactly matches this complexity when programs are restricted to predicates of bounded arity. In this paper, we exploit this coincidence and present novel efficient translations from abstract dialectical frameworks (ADFs) and GRAPPA to ASP.We also empirically compare our approach to other systems for ADF reasoning and report promising results.

【Keywords】: argumentation; answer set programming; abstract dialectical frameworks

150. Checking the Consistency of Combined Qualitative Constraint Networks.

Paper Link】 【Pages】:1084-1090

【Authors】: Quentin Cohen-Solal ; Maroua Bouzid ; Alexandre Niveau

【Abstract】: We study the problem of consistency checking for constraint networks over combined qualitative formalisms. We propose a framework which encompasses loose integrations and a form of spatio-temporal reasoning. In particular, we identify sufficient conditions ensuring the polynomiality of consistency checking, and we use them to find tractable subclasses.

【Keywords】: Qualitative Constraint Networks; Consistency Checking; Loose Integration; Tractable Subclass; Spatial Reasoning; Temporal Reasoning; Qualitative Reasoning

151. Add Data into Business Process Verification: Bridging the Gap between Theory and Practice.

Paper Link】 【Pages】:1091-1099

【Authors】: Riccardo De Masellis ; Chiara Di Francescomarino ; Chiara Ghidini ; Marco Montali ; Sergio Tessaris

【Abstract】: The need to extend business process languages with the capability to model complex data objects along with the control flow perspective has lead to significant practical and theoretical advances in the field of Business Process Modeling (BPM).On the practical side, there are several suites for control flow and data modeling; nonetheless, when it comes to formal verification, the data perspective is abstracted away due to the intrinsic difficulty of handling unbounded data. On the theoretical side, there is significant literature providing decidability results for expressive data-aware processes. However, they struggle to produce a concrete impact as being far from real BPM architectures and, most of all, not providing actual verification tools. In this paper we aim at bridging such a gap: we provide a concrete framework which, on the one hand, being based on Petri Nets and relational models, is close to the widely used BPM suites, and on the other is grounded on solid formal basis which allow to perform formal verification tasks. Moreover, we show how to encode our framework in an action language so as to perform reachability analysis using virtually any state-of-the-art planner.

【Keywords】: data-aware business process modeling; workflow nets; relational databases; formal verification; planning

152. Practical TBox Abduction Based on Justification Patterns.

Paper Link】 【Pages】:1100-1106

【Authors】: Jianfeng Du ; Hai Wan ; Huaguan Ma

【Abstract】: TBox abduction explains why an observation is not entailed by a TBox, by computing multiple sets of axioms, called explanations , such that each explanation does not entail the observation alone while appending an explanation to the TBox renders the observation entailed but does not introduce incoherence. Considering that practical explanations in TBox abduction are likely to mimic minimal explanations for TBox entailments, we introduce admissible explanations which are subsets of those justifications for the observation that are instantiated from a finite set of justification patterns. A justification pattern is obtained from a minimal set of axioms responsible for a certain atomic concept inclusion by replacing all concept (resp. role) names with concept (resp. role) variables. The number of admissible explanations is finite but can still be so large that computing all admissible explanations is impractical. Thus, we introduce a variant of subset-minimality, written ⊆ ds -minimality, which prefers fresh (concept or role) names than existing names. We propose efficient methods for computing all admissible ⊆ ds -minimal explanations and for computing all justification patterns, respectively. Experimental results demonstrate that combining the proposed methods is able to achieve a practical approach to TBox abduction.

【Keywords】: abductive reasoning;description logics;justification pattern;TBox abduction;explanation

153. The Unusual Suspects: Deep Learning Based Mining of Interesting Entity Trivia from Knowledge Graphs.

Paper Link】 【Pages】:1107-1113

【Authors】: Nausheen Fatma ; Manoj Kumar Chinnakotla ; Manish Shrivastava

【Abstract】: Trivia is any fact about an entity which is interesting due to its unusualness, uniqueness or unexpectedness. Trivia could be successfully employed to promote user engagement in various product experiences featuring the given entity. A Knowledge Graph (KG) is a semantic network which encodes various facts about entities and their relationships. In this paper, we propose a novel approach called DBpedia Trivia Miner (DTM) to automatically mine trivia for entities of a given domain in KGs. The essence of DTM lies in learning an Interestingness Model (IM), for a given domain, from human annotated training data provided in the form of interesting facts from the KG. The IM thus learnt is applied to extract trivia for other entities of the same domain in the KG. We propose two different approaches for learning the IM - a) A Convolutional Neural Network (CNN) based approach and b) Fusion Based CNN (F-CNN) approach which combines both hand-crafted and CNN features. Experiments across two different domains - Bollywood Actors and Music Artists reveal that CNN automatically learns features which are relevant to the task and shows competitive performance relative to hand-crafted feature based baselines whereas F-CNN significantly improves the performance over the baseline approaches which use hand-crafted features alone. Overall, DTM achieves an F1 score of 0.81 and 0.65 in Bollywood Actors and Music Artists domains respectively.

【Keywords】: Trivia Mining; Interesting Facts; Knowledge Graphs; Deep Learning; Convolutional Neural Networks; Supervised Learning; Entity Fact Classification;

154. Ontology Materialization by Abstraction Refinement in Horn SHOIF.

Paper Link】 【Pages】:1114-1120

【Authors】: Birte Glimm ; Yevgeny Kazakov ; Trung-Kien Tran

【Abstract】: Abstraction refinement is a recently introduced technique using which reasoning over large ABoxes is reduced to reasoning over small abstract ABoxes. Although the approach is sound for any classical Description Logic such as SROIQ, it is complete only for Horn ALCHOI. In this paper, we propose an extension of this method that is now complete for Horn SHOIF and also handles role- and equality-materialization. To show completeness, we use a tailored set of materialization rules that loosely decouple the ABox from the TBox. An empirical evaluation demonstrates that, despite the new features, the abstractions are still significantly smaller than the original ontologies and the materialization can be computed efficiently.

【Keywords】: Ontology materialization; Abstraction refinement; Ontology reasoning

155. Number Restrictions on Transitive Roles in Description Logics with Nominals.

Paper Link】 【Pages】:1121-1127

【Authors】: Víctor Gutiérrez-Basulto ; Yazmin Angélica Ibáñez-García ; Jean Christoph Jung

【Abstract】: We study description logics (DLs) supporting number restrictions on transitive roles. We first take a look at SOQ and SON with binary and unary coding of numbers, and provide algorithms for the satisfiability problem and tight complexity bounds ranging from EXPTIME to NEXPTIME. We then show that by allowing for counting only up to one (functionality), inverse roles and role inclusions can be added without losing decidability. We finally investigate DLs of the DL-Lite-family, and show that, in the presence of role inclusions, the core fragment becomes undecidable.

【Keywords】: Number Restrictions; Transitive Roles; Nominals; Satisfiability; Computational Complexity

156. Strategic Sequences of Arguments for Persuasion Using Decision Trees.

Paper Link】 【Pages】:1128-1134

【Authors】: Emmanuel Hadoux ; Anthony Hunter

【Abstract】: Persuasion is an activity that involves one party (the persuader) trying to induce another party (the persuadee) to believe or do something. For this, it can be advantageous forthe persuader to have a model of the persuadee. Recently, some proposals in the field of computational models of argument have been made for probabilistic models of what the persuadee knows about, or believes. However, these developments have not systematically harnessed established notions in decision theory for maximizing the outcome of a dialogue. To address this, we present a general framework for representing persuasion dialogues as a decision tree, and for using decision rules for selecting moves. Furthermore, we provide some empirical results showing how some well-known decision rules perform, and make observations about their general behaviour in the context of dialogues where there is uncertainty about the accuracy of the user model.

【Keywords】: KRR: Argumentation; Persuation dialogue; KRR: Reasoning with Beliefs

157. Preferential Structures for Comparative Probabilistic Reasoning.

Paper Link】 【Pages】:1135-1141

【Authors】: Matthew Harrison-Trainor ; Wesley H. Holliday ; Thomas F. Icard III

【Abstract】: Qualitative and quantitative approaches to reasoning about uncertainty can lead to different logical systems for formalizing such reasoning, even when the language for expressing uncertainty is the same. In the case of reasoning about relative likelihood, with statements of the form φ 􏰂≥ ψ expressing that φ is at least as likely as ψ, a standard qualitative approach using preordered preferential structures yields a dramatically different logical system than a quantitative approach using probability measures. In fact, the standard preferential approach validates principles of reasoning that are incorrect from a probabilistic point of view. However, in this paper we show that a natural modification of the preferential approach yields exactly the same logical system as a probabilistic approach — not using single probability measures, but rather sets of probability measures. Thus, the same preferential structures used in the study of non-monotonic logics and belief revision may be used in the study of comparative probabilistic reasoning based on imprecise probabilities.

【Keywords】: comparative probability; qualitative probability; imprecise probability; preferential structures; logic; complexity

158. Query Answering in DL-Lite with Datatypes: A Non-Uniform Approach.

Paper Link】 【Pages】:1142-1148

【Authors】: André Hernich ; Julio Lemos ; Frank Wolter

【Abstract】: Adding datatypes to ontology-mediated queries (OMQs) often makes query answering hard. As a consequence, the use of datatypes in OWL 2 QL has been severely restricted. In this paper we propose a new, non-uniform, way of analyzing the data-complexity of OMQ answering with datatypes. Instead of restricting the ontology language we aim at a classification of the patterns of datatype atoms in OMQs into those that can occur in non-tractable OMQs and those that only occur in tractable OMQs. To this end we establish a close link between OMQ answering with datatypes and constraint satisfaction problems over the datatypes. In a case study we apply this link to prove a P/coNP-dichotomy for OMQs over DL-Lite extended with the datatype (Q,<=). The proof employs a recent dichotomy result by Bodirsky and Kára for temporal constraint satisfaction problems.

【Keywords】: datatypes; conjunctive queries; data complexity; dichotomy

159. Diagnosability Planning for Controllable Discrete Event Systems.

Paper Link】 【Pages】:1149-1155

【Authors】: Hassan Ibrahim ; Philippe Dague ; Alban Grastien ; Lina Ye ; Laurent Simon

【Abstract】: In this paper, we propose an approach to ensure the diagnosability of a partially controllable system. Given a model of correct and faulty behaviors of a partially observable discrete event system, equipped with a set of elementary actions that do not intertwine with autonomous events, we search a diagnosability plan, i.e., a sequence of applicable actions that leads the system from an initial belief state (a set of potentially current states) to a diagnosable belief state, in which the system is then left to run freely. This helps in reducing the diagnosis interaction with running systems and can be applied, e.g., on the output of a repair plan, like in power networks. The two successive stages of this approach keep diagnosability planning, including diagnosability tests, in PSpace in comparison to the Exptime test for the more complex active diagnosability used usually in such cases. For this, we propose to construct incrementally the twin plant structure of the given system and to exploit its parts already constructed while testing the candidate plans and constructing its next parts. This helps in pruning the twin plant constructions and many non-diagnosability plan tests. We have created a special benchmark and tested three proposed methods, according to the recycling level of twin plants construction, with one cost function used for plan optimality and an optional heuristics.

【Keywords】: DES; Diagnosability; Planning; Twin Plant Recycling

160. Entropic Causal Inference.

Paper Link】 【Pages】:1156-1162

【Authors】: Murat Kocaoglu ; Alexandros G. Dimakis ; Sriram Vishwanath ; Babak Hassibi

【Abstract】: We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam's razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using Renyi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low H0 entropy (cardinality) in the true direction, it must have high H0 entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of finding the exogenous variable with minimum H1 entropy (Shannon Entropy) is equivalent to the problem of finding minimum joint entropy given n marginal distributions, also known as minimum entropy coupling problem. We propose an efficient greedy algorithm for the minimum entropy coupling problem, that for n=2 provably finds a local optimum. This gives a greedy algorithm for finding the exogenous variable with minimum Shannon entropy. Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models.

【Keywords】: causal inference; entropy minimization; cause-effect pairs; categorical variables

161. SAT Encodings for Distance-Based Belief Merging Operators.

Paper Link】 【Pages】:1163-1169

【Authors】: Sébastien Konieczny ; Jean-Marie Lagniez ; Pierre Marquis

【Abstract】: We present SAT encoding schemes for distance-based belief merging operators relying on the (possibly weighted) drastic distance or the Hamming distance between interpretations, and using sum, GMax (leximax) or GMin (leximin) as aggregation function. In order to evaluate these encoding schemes, we generated benchmarks of a time-tabling problem and translated them into belief merging instances. Then, taking advantage of these schemes, we compiled the merged bases of the resulting instances into query-equivalent CNF formulae. Experiments have shown the benefits which can be gained by considering the SAT encoding schemes we pointed out. Especially, thanks to them, we succeeded in computing query-equivalent formulae for merging instances based on hundreds of variables, which are out of reach of previous implementations.

【Keywords】: belief merging; SAT encoding; knowledge compilation

162. LPMLN, Weak Constraints, and P-log.

Paper Link】 【Pages】:1170-1177

【Authors】: Joohyung Lee ; Zhun Yang

【Abstract】: LP MLN is a recently introduced formalism that extends answer set programs by adopting the log-linear weight scheme of Markov Logic. This paper investigates the relationships between LPMLN and two other extensions of answer set programs: weak constraints to express a quantitative preference among answer sets, and P-log to incorporate probabilistic uncertainty. We present a translation of LP MLN into programs with weak constraints and a translation of P-log into LPMLN, which complement the existing translations in the opposite directions. The first translation allows us to compute the most probable stable models (i.e., MAP estimates) of LP MLN programs using standard ASP solvers. This result can be extended to other formalisms, such as Markov Logic, ProbLog, and Pearl's Causal Models, that are shown to be translatable into LP MLN . The second translation tells us how probabilistic nonmonotonicity (the ability of the reasoner to change his probabilistic model as a result of new information) of P-log can be represented in LP MLN , which yields a way to compute P-log using standard ASP solvers and MLN solvers.

【Keywords】: Answer Set Programming; Markov Logic

163. Graph-Based Wrong IsA Relation Detection in a Large-Scale Lexical Taxonomy.

Paper Link】 【Pages】:1178-1184

【Authors】: Jiaqing Liang ; Yanghua Xiao ; Yi Zhang ; Seung-won Hwang ; Haixun Wang

【Abstract】: Knowledge base(KB) plays an important role in artificial intelligence. Much effort has been taken to both manually and automatically construct web-scale knowledge bases. Comparing with manually constructed KBs, automatically constructed KB is broader but with more noises. In this paper, we study the problem of improving the quality for automatically constructed web-scale knowledge bases, in particular, lexical taxonomies of isA relationships. We find that these taxonomies usually contain cycles, which are often introduced by incorrect isA relations. Inspired by this observation, we introduce two kinds of models to detect incorrect isA relations from cycles. The first one eliminates cycles by extracting directed acyclic graphs, and the other one eliminates cycles by grouping nodes into different levels. We implement our models on Probase, a state-of-the-art, automatically constructed, web-scale taxonomy. After processing tens of millions of relations, our models eliminate 74 thousand wrong relations with 91% accuracy.

【Keywords】:

164. On the Transitivity of Hypernym-Hyponym Relations in Data-Driven Lexical Taxonomies.

Paper Link】 【Pages】:1185-1191

【Authors】: Jiaqing Liang ; Yi Zhang ; Yanghua Xiao ; Haixun Wang ; Wei Wang ; Pinpin Zhu

【Abstract】: Taxonomy is indispensable in understanding natural language. A variety of large scale, usage-based, data-driven lexical taxonomies have been constructed in recent years.Hypernym-hyponym relationship, which is considered as the backbone of lexical taxonomies can not only be used to categorize the data but also enables generalization. In particular, we focus on one of the most prominent properties of the hypernym-hyponym relationship, namely, transitivity, which has a significant implication for many applications. We show that, unlike human crafted ontologies and taxonomies, transitivity does not always hold in data-drivenlexical taxonomies. We introduce a supervised approach to detect whether transitivity holds for any given pair of hypernym-hyponym relationships. Besides solving the inferencing problem, we also use the transitivity to derive new hypernym-hyponym relationships for data-driven lexical taxonomies. We conduct extensive experiments to show the effectiveness of our approach.

【Keywords】:

165. Don't Forget the Quantifiable Relationship between Words: Using Recurrent Neural Network for Short Text Topic Discovery.

Paper Link】 【Pages】:1192-1198

【Authors】: Hengyang Lu ; Lu-Yao Xie ; Ning Kang ; Chong-Jun Wang ; Jun-Yuan Xie

【Abstract】: In our daily life, short texts have been everywhere especially since the emergence of social network. There are countless short texts in online media like twitter, online Q&A sites and so on. Discovering topics is quite valuable in various application domains such as content recommendation and text characterization. Traditional topic models like LDA are widely applied for sorts of tasks, but when it comes to short text scenario, these models may get stuck due to the lack of words. Recently, a popular model named BTM uses word co-occurrence relationship to solve the sparsity problem and is proved effectively. However, both BTM and extended models ignore the inside relationship between words. From our perspectives, more related words should appear in the same topic. Based on this idea, we propose a model named RIBS-TM which makes use of RNN for relationship learning and IDF for filtering high-frequency words. Experiments on two real-world short text datasets show great utility of our model.

【Keywords】: short text; topic model; RNN

166. The Symbolic Interior Point Method.

Paper Link】 【Pages】:1199-1205

【Authors】: Martin Mladenov ; Vaishak Belle ; Kristian Kersting

【Abstract】: Numerical optimization is arguably the most prominent computational framework in machine learning and AI. It can be seen as an assembly language for hard combinatorial problems ranging from classification and regression in learning, to computing optimal policies and equilibria in decision theory, to entropy minimization in information sciences. Unfortunately, specifying such problems in complex domains involving relations, objects and other logical dependencies is cumbersome at best, requiring considerable expert knowledge, and solvers require models to be painstakingly reduced to standard forms. To overcome this, we introduce a rich modeling framework for optimization problems that allows convenient codification of symbolic structure. Rather than reducing this symbolic structure to a sparse or dense matrix, we represent and exploit it directly using algebraic decision diagrams (ADDs). Combining efficient ADD-based matrix-vector algebra with a matrix-free interior-point method, we develop an engine that can fully leverage the structure of symbolic representations to solve convex linear and quadratic optimization problems. We demonstrate the flexibility of the resulting symbolic-numeric optimizer on decision making and compressed sensing tasks with millions of non-zero entries.

【Keywords】: probabilistic relational models; algebraic decision diagrams; matrix-free opimization; quadratic programs; interior-point solver; symbolic numerical; optimization

167. Small Is Beautiful: Computing Minimal Equivalent EL Concepts.

Paper Link】 【Pages】:1206-1212

【Authors】: Nadeschda Nikitina ; Patrick Koopmann

【Abstract】: In this paper, we present an algorithm and a tool for computing minimal, equivalent EL concepts wrt. a given ontology. Our tool can provide valuable support in manual development of ontologies and improve the quality of ontologies automatically generated by processes such as uniform interpolation, ontology learning, rewriting ontologies into simpler DLs, abduction and knowledge revision. Deciding whether there exist equivalent EL concepts of size less than k is known to be an NP-complete problem. We propose a minimisation algorithm that achieves reasonable computational performance also for larger ontologies and complex concepts. We evaluate our tool on several bio-medical ontologies with promising results.

【Keywords】:

168. Compiling Graph Substructures into Sentential Decision Diagrams.

Paper Link】 【Pages】:1213-1221

【Authors】: Masaaki Nishino ; Norihito Yasuda ; Shin-ichi Minato ; Masaaki Nagata

【Abstract】: The Zero-suppressed Sentential Decision Diagram (ZSDD) is a recentlydiscovered tractable representation of Boolean functions. ZSDD subsumes theZero-suppressed Binary Decision Diagram (ZDD) as a strict subset, andsimilar to ZDD, it can perform several useful operations like model countingand Apply operations. We propose a top-down compilation algorithmfor ZSDD that represents sets of specific graph substructures, e.g.,matchings and simple paths of a graph. We experimentally confirm that theproposed algorithm is faster than other construction methods includingbottom-up methods and top-down methods for ZDDs, and the resulting ZSDDsare smaller than ZDDs representing the same graph substructures. We alsoshow that the size constructed ZSDDs can be bounded by the branch-width of thegraph. This bound is tighter than that of ZDDs.

【Keywords】: Knowledge Compilation, Propositional Knowledge Base, Graph, Sentential Decision Diagrams

169. Efficient Evaluation of Answer Set Programs with External Sources Based on External Source Inlining.

Paper Link】 【Pages】:1222-1228

【Authors】: Christoph Redl

【Abstract】: HEX-programs are an extension of answer set programming(ASP) towards external sources. To this end, external atomsprovide a bidirectional interface between the program and anexternal source. Traditionally, HEX -programs are evaluatedusing a rewriting to ordinary ASP programs which guess truthvalues of external atoms; this yields answer set candidateswhose guesses are verified by evaluating the source. Despitethe integration of learning techniques into this approach, whichreduce the number of candidates and of necessary verificationcalls, the remaining external calls are still expensive. In thispaper we present an alternative approach based on inliningof external atoms, motivated by existing but less general approaches for specialized formalisms such as DL-programs. External atoms are then compiled away such that no verification calls are necessary. To this end, we make use of supportsets, which describe conditions on input atoms that are sufficient to satisfy an external atom. The approach is implementedin the DLVHEX reasoner. Experiments show a significant performance gain.

【Keywords】:

170. On Equivalence and Inconsistency of Answer Set Programs with External Sources.

Paper Link】 【Pages】:1229-1235

【Authors】: Christoph Redl

【Abstract】: HEX-programs extend of answer-set programs (ASP) with ex-ternal sources. In previous work, notions of equivalence ofASP programs under extensions have been developed. Mostwell-known are strong equivalence, which is given for pro-grams P and Q if P ∪ R and Q ∪ R have the same answersets for arbitrary programs R, and uniform equivalence, whichis given if this is guaranteed for sets R of facts. More fine-grained approaches exist, which restrict the set of atoms inthe added program R. In this paper we provide a characteriza-tion of equivalence of HEX -programs. Since well-known ASPextensions (e.g. constraint ASP) amount to special cases ofHEX , the results are interesting beyond the particular formal-ism. Based on this, we further characterize inconsistency ofprograms wrt. program extensions. We then discuss possibleapplications of the results for algorithms improvements.

【Keywords】:

171. ProjE: Embedding Projection for Knowledge Graph Completion.

Paper Link】 【Pages】:1236-1242

【Authors】: Baoxu Shi ; Tim Weninger

【Abstract】: With the large volume of new information created every day, determining the validity of information in a knowledge graph and filling in its missing parts are crucial tasks for many researchers and practitioners. To address this challenge, a number of knowledge graph completion methods have been developed using low-dimensional graph embeddings. Although researchers continue to improve these models using an increasingly complex feature space, we show that simple changes in the architecture of the underlying model can outperform state-of-the-art models without the need for complex feature engineering. In this work, we present a shared variable neural network model called ProjE that fills-in missing information in a knowledge graph by learning joint embeddings of the knowledge graph’s entities and edges, and through subtle, but important, changes to the standard loss function. In doing so, ProjE has a parameter size that is smaller than 11 out of 15 existing methods while performing 37% better than the current-best method on standard datasets. We also show, via a new fact checking task, that ProjE is capable of accurately determining the veracity of many declarative statements.

【Keywords】: knowledge graph completion; graph embedding; link prediction

Paper Link】 【Pages】:1243-1249

【Authors】: Yi Tay ; Anh Tuan Luu ; Siu Cheung Hui

【Abstract】: Knowledge graphs play a significant role in many intelligent systems such as semantic search and recommendation systems. Recent works in this area of knowledge graph embeddings such as TransE, TransH and TransR have shown extremely competitive and promising results in relational learning. In this paper, we propose a novel extension of the translational embedding model to solve three main problems of the current models. Firstly, translational models are highly sensitive to hyperparameters such as margin and learning rate. Secondly, the translation principle only allows one spot in vector space for each golden triplet. Thus, congestion of entities and relations in vector space may reduce precision. Lastly, the current models are not able to handle dynamic data especially the introduction of new unseen entities/relations or removal of triplets. In this paper, we propose Parallel Universe TransE (puTransE), an adaptable and robust adaptation of the translational model. Our approach non-parametrically estimates the energy score of a triplet from multiple embedding spaces of structurally and semantically aware triplet selection. Our proposed approach is simple, robust and parallelizable. Our experimental results show that our proposed approach outperforms TransE and many other embedding methods for link prediction on knowledge graphs on both public benchmark dataset and a real world dynamic dataset.

【Keywords】: Knowledge Graphs; Representational Learning; Link Prediction

173. Causal Discovery Using Regression-Based Conditional Independence Tests.

Paper Link】 【Pages】:1250-1256

【Authors】: Hao Zhang ; Shuigeng Zhou ; Kun Zhang ; Jihong Guan

【Abstract】: Conditional independence (CI) testing is an important tool in causal discovery. Generally, by using CI tests, a set of Markov equivalence classes w.r.t. the observed data can be estimated by checking whether each pair of variables x and y is d -separated, given a set of variables Z. Due to the curse of dimensionality, CI testing is often difficult to return a reliable result for high-dimensional Z. In this paper, we propose a regression-based CI test to relax the test of x ⊥ y | Z to simpler unconditional independence tests of x − f ( Z ) ⊥ y − g ( Z ), and x − f ( Z ) ⊥ Z or y − g ( Z ) ⊥ Z under the assumption that the data-generating procedure follows additive noise models (ANMs). When the ANM is identifiable, we prove that x − f ( Z ) ⊥ y − g ( Z ) ⇒ x ⊥ y | Z . We also show that 1) f and g can be easily estimated by regression, 2) our test is more powerful than the state-of-the-art kernel CI tests, and 3) existing causal learning algorithms can infer much more causal directions by using the proposed method.

【Keywords】: Causality discovery; Conditional independent test; Regression

174. An Improved Algorithm for Learning to Perform Exception-Tolerant Abduction.

Paper Link】 【Pages】:1257-1265

【Authors】: Mengxue Zhang ; Tushar Mathew ; Brendan A. Juba

【Abstract】: Inference from an observed or hypothesized condition to a plausible cause or explanation for this condition is known as abduction. For many tasks, the acquisition of the necessary knowledge by machine learning has been widely found to be highly effective. However, the semantics of learned knowledge are weaker than the usual classical semantics, and this necessitates new formulations of many tasks. We focus on a recently introduced formulation of the abductive inference task that is thus adapted to the semantics of machine learning. A key problem is that we cannot expect that our causes or explanations will be perfect, and they must tolerate some error due to the world being more complicated than our formalization allows. This is a version of the qualification problem, and in machine learning, this is known as agnostic learning. In the work by Juba that introduced the task of learning to make abductive inferences, an algorithm is given for producing k-DNF explanations that tolerates such exceptions: if the best possible k-DNF explanation fails to justify the condition with probability ε, then the algorithm is promised to find a k-DNF explanation that fails to justify the condition with probability at most O(nkε), where n is the number of propositional attributes used to describe the domain. Here, we present an improved algorithm for this task. When the best k- DNF fails with probability ε, our algorithm finds a k-DNF that fails with probability at most O ̃(nk/2ε) (i.e., suppressing logarithmic factors in n and 1/ε). We also examine the empirical advantage of this new algorithm over the previous algorithm in two test domains, one of explaining conditions generated by a “noisy” k-DNF rule, and another of explaining conditions that are actually generated by a linear threshold rule.

【Keywords】:

175. Trust-Sensitive Evolution of DL-Lite Knowledge Bases.

Paper Link】 【Pages】:1266-1273

【Authors】: Dmitriy Zheleznyakov ; Evgeny Kharlamov ; Ian Horrocks

【Abstract】: Evolution of Knowledge Bases (KBs) consists of incorporating new information in an existing KB. Previous studies assume that the new information should be fully trusted and thus completely incorporated in the old knowledge. We suggest a setting where the new knowledge can be partially trusted and develop model-based approaches (MBAs) to KB evolution that rely on this assumption. Under MBAs the result of evolution is a set of interpretations and thus two core problems for MBAs are closure, i.e., whether evolution result can be axiomatised with a KB, and approximation, i.e., whether it can be (maximally) approximated with a KB. We show that DL-Lite is not closed under a wide range of trust-sensitive MBAs. We introduce a notion of s-approximation that improves the previously proposed approximations and show how to compute it for various trust-sensitive MBAs.

【Keywords】:

Machine Learning Applications 57

176. Explicit Defense Actions Against Test-Set Attacks.

Paper Link】 【Pages】:1274-1280

【Authors】: Scott Alfeld ; Xiaojin Zhu ; Paul Barford

【Abstract】: Automated learning and decision making systems in public-facing applications are vulnerable to malicious attacks. Examples of such systems include spam detectors, credit card fraud detectors, and network intrusion detection systems. These systems are at further risk of attack when money is directly involved, such as market forecasters or decision systems used in determining insurance or loan rates. In this paper, we consider the setting where a predictor Bob has a fixed model, and an unknown attacker Alice aims to perturb (or poison) future test instances so as to alter Bob's prediction to her benefit. We focus specifically on Bob's optimal defense actions to limit Alice's effectiveness. We define a general framework for determining Bob's optimal defense action against Alice's worst-case attack. We then demonstrate our framework by considering linear predictors, where we provide tractable methods of determining the optimal defense action. Using these methods, we perform an empirical investigation of optimal defense actions for a particular class of linear models -- autoregressive forecasters -- and find that for ten real world futures markets, the optimal defense action reduces the Bob's loss by between 78 and 97%.

【Keywords】: Adversarial Learning; Autoregressive Forecasting; Machine Learning

177. Multidimensional Scaling on Multiple Input Distance Matrices.

Paper Link】 【Pages】:1281-1287

【Authors】: Song Bai ; Xiang Bai ; Longin Jan Latecki ; Qi Tian

【Abstract】: Multidimensional Scaling (MDS) is a classic technique that seeks vectorial representations for data points, given the pairwise distances between them. In recent years, data are usually collected from diverse sources or have multiple heterogeneous representations. However, how to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge. In this paper, we first define this new task formally. Then, we propose a new algorithm called Multi-View Multidimensional Scaling (MVMDS) by considering each input distance matrix as one view. The proposed algorithm can learn the weights of views (i.e., distance matrices) automatically by exploring the consensus information and complementary nature of views. Experimental results on synthetic as well as real datasets demonstrate the effectiveness of MVMDS. We hope that our work encourages a wider consideration in many domains where MDS is needed.

【Keywords】: Multidimensional Scaling;Image Retrieval and Clustering;Multi-view Learning

178. ICU Mortality Prediction: A Classification Algorithm for Imbalanced Datasets.

Paper Link】 【Pages】:1288-1294

【Authors】: Sakyajit Bhattacharya ; Vaibhav Rajan ; Harsh Shrivastava

【Abstract】: Determining mortality risk is important for critical decisions in Intensive Care Units (ICU). The need for machine learning models that provide accurate patient-specific prediction of mortality is well recognized. We present a new algorithm for ICU mortality prediction that is designed to address the problem of imbalance, which occurs, in the context of binary classification, when one of the two classes is significantly under--represented in the data. We take a fundamentally new approach in exploiting the class imbalance through a feature transformation such that the transformed features are easier to classify. Hypothesis testing is used for classification with a test statistic that follows the distribution of the difference of two chi-squared random variables, for which there are no analytic expressions and we derive an accurate approximation. Experiments on a benchmark dataset of 4000 ICU patients show that our algorithm surpasses the best competing methods for mortality prediction.

【Keywords】: Mortality Prediction; Supervised Classification; Class Imbalance

179. GLOMA: Embedding Global Information in Local Matrix Approximation Models for Collaborative Filtering.

Paper Link】 【Pages】:1295-1301

【Authors】: Chao Chen ; Dongsheng Li ; Qin Lv ; Junchi Yan ; Li Shang ; Stephen M. Chu

【Abstract】: Recommender systems have achieved great success in recent years, and matrix approximation (MA) is one of the most popular techniques for collaborative filtering (CF) based recommendation. However, a major issue is that MA methods perform poorly at detecting strong localized associations among closely related users and items. Recently, some MA-based CF methods adopt clustering methods to discover meaningful user-item subgroups and perform ensemble on different clusterings to improve the recommendation accuracy. However, ensemble learning suffers from lower efficiency due to the increased overall computation overhead. In this paper, we propose GLOMA, a new clustering-based matrix approximation method, which can embed global information in local matrix approximation models to improve recommendation accuracy. In GLOMA, a MA model is first trained on the entire data to capture global information. The global MA model is then utilized to guide the training of cluster-based local MA models, such that the local models can detect strong localized associations shared within clusters and at the same time preserve global associations shared among all users/items. Evaluation results using MovieLens and Netflix datasets demonstrate that, by integrating global information in local models, GLOMA can outperform five state-of-the-art MA-based CF methods in recommendation accuracy while achieving descent efficiency.

【Keywords】: matrix approximation, recommender systems, collaborative filtering

180. Predicting Soccer Highlights from Spatio-Temporal Match Event Streams.

Paper Link】 【Pages】:1302-1308

【Authors】: Tom Decroos ; Vladimir Dzyuba ; Jan Van Haaren ; Jesse Davis

【Abstract】: Sports broadcasters are continuously seeking to make their live coverages of soccer matches more attractive. A recent innovation is the “highlight channel,” which shows the most interesting events from multiple matches played at the same time. However, switching between matches at the right time is challenging in fast-paced sports like soccer, where interesting situations often evolve as quickly as they disappear again. This paper presents the POGBA algorithm for automatically predicting highlights in soccer matches, which is an important task that has not yet been addressed. POGBA leverages spatio-temporal event streams collected during matches to predict the probability that a particular game state will lead to a goal. An empirical evaluation on a real-world dataset shows that POGBA outperforms the baseline algorithms in terms of both precision and recall.

【Keywords】: Spatiotemporal Data; Sports Analytics; Highlight Prediction

181. A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems.

Paper Link】 【Pages】:1309-1315

【Authors】: Xin Dong ; Lei Yu ; Zhonghuo Wu ; Yuxia Sun ; Lingfeng Yuan ; Fangxi Zhang

【Abstract】: Collaborative filtering (CF) is a widely used approach in recommender systems to solve many real-world problems. Traditional CF-based methods employ the user-item matrix which encodes the individual preferences of users for items for learning to make recommendation. In real applications, the rating matrix is usually very sparse, causing CF-based methods to degrade significantly in recommendation performance. In this case, some improved CF methods utilize the increasing amount of side information to address the data sparsity problem as well as the cold start problem. However, the learned latent factors may not be effective due to the sparse nature of the user-item matrix and the side information. To address this problem, we utilize advances of learning effective representations in deep learning, and propose a hybrid model which jointly performs deep users and items’ latent factors learning from side information and collaborative filtering from the rating matrix. Extensive experimental results on three real-world datasets show that our hybrid model outperforms other methods in effectively utilizing side information and achieves performance improvement.

【Keywords】: Recommender System; Collaborative Filtering; Deep Learning

182. Collaborative Dynamic Sparse Topic Regression with User Profile Evolution for Item Recommendation.

Paper Link】 【Pages】:1316-1322

【Authors】: Li Gao ; Jia Wu ; Chuan Zhou ; Yue Hu

【Abstract】: In many time-aware item recommender systems, modeling the accurate evolution of both user profiles and the contents of items over time is essential. However, most existing methods focus on learning users' dynamic interests, where the contents of items are assumed to be stable over time. They thus fail to capture the dynamic changes in the item's contents. In this paper, we present a novel method CDUE for time-aware item recommendation, which captures the evolution of both user's interests and item's contents information via topic dynamics. Specifically, we propose a dynamic sparse topic model to track the evolution of topics for changes in items' contents over time and adapt a vector autoregressive model to profile users' dynamic interests. The item's topics and user's interests and their evolutions are learned collaboratively and simultaneously into a unified learning framework. Experimental results on two real-world data sets demonstrate the quality and effectiveness of the proposed method and show that our method can be used to make better future recommendations.

【Keywords】: recommender system; dynamic sparse topic modeling; user profile evolution

183. Event Video Mashup: From Hundreds of Videos to Minutes of Skeleton.

Paper Link】 【Pages】:1323-1330

【Authors】: Lianli Gao ; Peng Wang ; Jingkuan Song ; Zi Huang ; Jie Shao ; Heng Tao Shen

【Abstract】: The explosive growth of video content on the Web has been revolutionizing the way people share, exchange and perceive information, such as events. While an individual video usually concerns a specific aspect of an event, the videos that are uploaded by different users at different locations and times can embody different emphasis and compensate each other in describing the event. Combining these videos from different sources together can unveil a more complete picture of the event. Simply concatenating videos together is an intuitive solution, but it may degrade user experience since it is time-consuming and tedious to view those highly redundant, noisy and disorganized content. Therefore, we develop a novel approach, termed event video mashup (EVM), to automatically generate a unified short video from a collection of Web videos to describe the storyline of an event. We propose a submodular based content selection model that embodies both importance and diversity to depict the event from comprehensive aspects in an efficient way. Importantly, the video content is organized temporally and semantically conforming to the event evolution. We evaluate our approach on a real-world YouTube event dataset collected by ourselves. The extensive experimental results demonstrate the effectiveness of the proposed framework.

【Keywords】: video summarization; near duplicate; video mashup

184. Soft Video Parsing by Label Distribution Learning.

Paper Link】 【Pages】:1331-1337

【Authors】: Xin Geng ; Miaogen Ling

【Abstract】: In this paper, we tackle the problem of segmenting out a sequence of actions from videos. The videos contain background and actions which are usually composed of ordered sub-actions. We refer the sub-actions and the background as semantic units. Considering the possible overlap between two adjacent semantic units, we utilize label distributions to annotate the various segments in the video. The label distribution covers a certain number of semantic unit labels, representing the degree to which each label describes the video segment. The mapping from a video segment to its label distribution is then learned by a Label Distribution Learning (LDL) algorithm. Based on the LDL model, a soft video parsing method with segmental regular grammars is proposed to construct a tree structure for the video. Each leaf of the tree stands for a video clip of background or sub-action. The proposed method shows promising results on the THUMOS'14 and MSR-II datasets and its computational complexity is much less than the state-of-the-art method.

【Keywords】: video parsing;label distribution learning;grammar tree;sliding window;action localization;

185. Active Learning with Cross-Class Similarity Transfer.

Paper Link】 【Pages】:1338-1344

【Authors】: Yuchen Guo ; Guiguang Ding ; Yue Gao ; Jungong Han

【Abstract】: How to save labeling efforts for training supervised classifiers is an important research topic in machine learning community. Active learning (AL) and transfer learning (TL) are two useful tools to achieve this goal, and their combination, i.e., transfer active learning (T-AL) has also attracted considerable research interest. However, existing T-AL approaches consider to transfer knowledge from a source/auxiliary domain which has the same class labels as the target domain, but ignore the relationship among classes. In this paper, we investigate a more practical setting where the classes in source domain are related/similar to but different from the target domain classes. Specifically, we propose a novel cross-class T-AL approach to simultaneously transfer knowledge from source domain and actively annotate the most informative samples in target domain so that we can train satisfactory classifiers with as few labeled samples as possible. In particular, based on the class-class similarity and sample-sample similarity, we adopt a similarity propagation to find the source domain samples that can well capture the characteristics of a target class and then transfer the similar samples as the (pseudo) labeled data for the target class. In turn, the labeled and transferred samples are used to train classifiers and actively select new samples for annotation. Extensive experiments on three datasets demonstrate that the proposed approach outperforms significantly the state-of-the-art related approaches.

【Keywords】: active learning, cross-class transfer

186. DeepFix: Fixing Common C Language Errors by Deep Learning.

Paper Link】 【Pages】:1345-1351

【Authors】: Rahul Gupta ; Soham Pal ; Aditya Kanade ; Shirish Shevade

【Abstract】: The problem of automatically fixing programming errors is a very active research topic in software engineering. This is a challenging problem as fixing even a single error may require analysis of the entire program. In practice, a number of errors arise due to programmer's inexperience with the programming language or lack of attention to detail. We call these common programming errors. These are analogous to grammatical errors in natural languages. Compilers detect such errors, but their error messages are usually inaccurate. In this work, we present an end-to-end solution, called DeepFix, that can fix multiple such errors in a program without relying on any external tool to locate or fix them. At the heart of DeepFix is a multi-layered sequence-to-sequence neural network with attention which is trained to predict erroneous program locations along with the required correct statements. On a set of 6971 erroneous C programs written by students for 93 programming tasks, DeepFix could fix 1881 (27%) programs completely and 1338 (19%) programs partially.

【Keywords】: common programming errors; program repair; deep learning; fault localization; programming education

187. Question Difficulty Prediction for READING Problems in Standard Tests.

Paper Link】 【Pages】:1352-1359

【Authors】: Zhenya Huang ; Qi Liu ; Enhong Chen ; Hongke Zhao ; Mingyong Gao ; Si Wei ; Yu Su ; Guoping Hu

【Abstract】: Standard tests aim to evaluate the performance of examinees using different tests with consistent difficulties. Thus, a critical demand is to predict the difficulty of each test question before the test is conducted. Existing studies are usually based on the judgments of education experts (e.g., teachers), which may be subjective and labor intensive. In this paper, we propose a novel Test-aware Attention-based Convolutional Neural Network (TACNN) framework to automatically solve this Question Difficulty Prediction (QDP) task for READING problems (a typical problem style in English tests) in standard tests. Specifically, given the abundant historical test logs and text materials of questions, we first design a CNN-based architecture to extract sentence representations for the questions. Then, we utilize an attention strategy to qualify the difficulty contribution of each sentence to questions. Considering the incomparability of question difficulties in different tests, we propose a test-dependent pairwise strategy for training TACNN and generating the difficulty prediction value. Extensive experiments on a real-world dataset not only show the effectiveness of TACNN, but also give interpretable insights to track the attention information for questions.

【Keywords】: Educational mining; Question difficulty prediction; Standard tests; Convolutional neural network; Pairwise learning strategy

188. Additional Multi-Touch Attribution for Online Advertising.

Paper Link】 【Pages】:1360-1366

【Authors】: Wendi Ji ; Xiaoling Wang

【Abstract】: Multi-Touch Attribution studies the effects of various types of online advertisements on purchase conversions. It is a very important problem in computational advertising, as it allows marketers to assign credits for conversions to different advertising channels and optimize advertising campaigns. In this paper, we propose an additional multi-touch attribution model (AMTA) based on two obvious assumptions: (1) the effect of an ad exposure is fading with time and (2) the effects of ad exposures on the browsing path of a user are additive.AMTA borrows the techniques from survival analysis and uses the hazard rate to measure the influence of an ad exposure. In addition, we both take the conversion time and the intrinsic conversion rate of users into consideration.Experimental results on a large real-world advertising dataset illustrate that the our proposed method is superior to state-of-the-art techniques in conversion rate prediction and the credit allocation based on AMTA is reasonable.

【Keywords】: computational advertising, multi-touch attribution, survival analysis

189. Multitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug Interaction.

Paper Link】 【Pages】:1367-1373

【Authors】: Bo Jin ; Haoyu Yang ; Cao Xiao ; Ping Zhang ; Xiaopeng Wei ; Fei Wang

【Abstract】: Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality around the world. Identifying potential DDIs during the drug design process is critical in guiding targeted clinical drug safety testing. Although detection of adverse DDIs is conducted during Phase IV clinical trials, there are still a large number of new DDIs founded by accidents after the drugs were put on market. With the arrival of big data era, more and more pharmaceutical research and development data are becoming available, which provides an invaluable resource for digging insights that can potentially be leveraged in early prediction of DDIs. Many computational approaches have been proposed in recent years for DDI prediction. However, most of them focused on binary prediction (with or without DDI), despite the fact that each DDI is associated with a different type. Predicting the actual DDI type will help us better understand the DDI mechanism and identify proper ways to prevent it. In this paper, we formulate the DDI type prediction problem as a multitask dyadic regression problem, where the prediction of each specific DDI type is treated as a task. Compared with conventional matrix completion approaches which can only impute the missing entries in the DDI matrix, our approach can directly regress those dyadic relationships (DDIs) and thus can be extend to new drugs more easily. We developed an effective proximal gradient method to solve the problem. Evaluation on real world datasets is presented to demonstrate the effectiveness of the proposed approach.

【Keywords】: multitask learning, biomedical application

190. Semi-Supervised Multi-View Correlation Feature Learning with Application to Webpage Classification.

Paper Link】 【Pages】:1374-1381

【Authors】: Xiao-Yuan Jing ; Fei Wu ; Xiwei Dong ; Shiguang Shan ; Songcan Chen

【Abstract】: Webpage classification has attracted a lot of research interest. Webpage data is often multi-view and high-dimensional, and the webpage classification application is usually semi-supervised. Due to these characteristics, using semi-supervised multi-view feature learning (SMFL) technique to deal with the webpage classification problem has recently received much attention. However, there still exists room for improvement for this kind of feature learning technique. How to effectively utilize the correlation information among multi-view of webpage data is an important research topic. Correlation analysis on multi-view data can facilitate extraction of the complementary information. In this paper, we propose a novel SMFL approach, named semi-supervised multi-view correlation feature learning (SMCFL), for webpage classification. SMCFL seeks for a discriminant common space by learning a multi-view shared transformation in a semi-supervised manner. In the discriminant space, the correlation between intra-class samples is maximized, and the correlation between inter-class samples and the global correlation among both labeled and unlabeled samples are minimized simultaneously. We transform the matrix-variable based nonconvex objective function of SMCFL into a convex quadratic programming problem with one real variable, and can achieve a global optimal solution. Experiments on widely used datasets demonstrate the effectiveness and efficiency of the proposed approach.

【Keywords】:

191. Contextual RNN-GANs for Abstract Reasoning Diagram Generation.

Paper Link】 【Pages】:1382-1388

【Authors】: Viveka Kulharia ; Arnab Ghosh ; Amitabha Mukerjee ; Vinay P. Namboodiri ; Mohit Bansal

【Abstract】: Understanding object motions and transformations is a core problem in computer science. Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting or simulation. Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence. For this, we develop a novel Contextual Generative Adversarial Network based on Recurrent Neural Networks (Context-RNN-GANs), where both the generator and the discriminator modules are based on contextual history and the adversarial discriminator guides the generator to produce realistic images for the particular time step in the image sequence. We employ the Context-RNN-GAN model (and its variants) on a novel dataset of Diagrammatic Abstract Reasoning as well as perform initial evaluations on a next-frame prediction task of videos. Empirically, we show that our Context-RNN-GAN model performs competitively with 10th-grade human performance but there is still scope for interesting improvements as compared to college-grade human performance.

【Keywords】: Generative Adversarial Networks

192. A Framework for Minimal Clustering Modification via Constraint Programming.

Paper Link】 【Pages】:1389-1395

【Authors】: Chia-Tung Kuo ; S. S. Ravi ; Thi-Bich-Hanh Dao ; Christel Vrain ; Ian Davidson

【Abstract】: Consider the situation where your favorite clustering algorithm applied to a data set returns a good clustering but there are a few undesirable properties. One adhoc way to fix this is to re-run the clustering algorithm and hope to find a better variation. Instead, we propose to not run the algorithm again but minimally modify the existing clustering to remove the undesirable properties. We formulate the minimal clustering modification problem where we are given an initial clustering produced from any algorithm. The clustering is then modified to: i) remove the undesirable properties and ii) be minimally different to the given clustering. We show the underlying feasibility sub-problem can be intractable and demonstrate the flexibility of our constraint programming formulation. We empirically validate its usefulness through experiments on social network and medical imaging data sets.

【Keywords】: clustering; constraint programming;

193. Knowing What to Ask: A Bayesian Active Learning Approach to the Surveying Problem.

Paper Link】 【Pages】:1396-1402

【Authors】: Yoad Lewenberg ; Yoram Bachrach ; Ulrich Paquet ; Jeffrey S. Rosenschein

【Abstract】: We examine the surveying problem, where we attempt to predict how a target user is likely to respond to questions by iteratively querying that user, collaboratively based on the responses of a sample set of users. We focus on an active learning approach, where the next question we select to ask the user depends on their responses to the previous questions. We propose a method for solving the problem based on a Bayesian dimensionality reduction technique. We empirically evaluate our method, contrasting it to benchmark approaches based on augmented linear regression, and show that it achieves much better predictive performance, and is much more robust when there is missing data.

【Keywords】: Probabilistic Graphical Models; Active Surveying

194. ERMMA: Expected Risk Minimization for Matrix Approximation-based Recommender Systems.

Paper Link】 【Pages】:1403-1409

【Authors】: Dongsheng Li ; Chao Chen ; Qin Lv ; Li Shang ; Stephen M. Chu ; Hongyuan Zha

【Abstract】: Matrix approximation (MA) is one of the most popular techniques in today's recommender systems. In most MA-based recommender systems, the problem of risk minimization should be defined, and how to achieve minimum expected risk in model learning is one of the most critical problems to recommendation accuracy. This paper addresses the expected risk minimization problem, in which expected risk can be bounded by the sum of optimization error and generalization error. Based on the uniform stability theory, we propose an expected risk minimized matrix approximation method (ERMMA), which is designed to achieve better tradeoff between optimization error and generalization error in order to reduce the expected risk of the learned MA models. Theoretical analysis shows that ERMMA can achieve lower expected risk bound than existing MA methods. Experimental results on the MovieLens and Netflix datasets demonstrate that ERMMA outperforms six state-of-the-art MA-based recommendation methods in both rating prediction problem and item ranking problem.

【Keywords】: matrix approximation, recommender systems, collaborative filtering

195. Learning with Feature Network and Label Network Simultaneously.

Paper Link】 【Pages】:1410-1416

【Authors】: Yingming Li ; Ming Yang ; Zenglin Xu ; Zhongfei (Mark) Zhang

【Abstract】: For many supervised learning problems, limited training samples and incomplete labels are two difficult challenges, which usually lead to degenerated performance on label prediction. To improve the generalization performance, in this paper, we propose Doubly Regularized Multi-Label learning (DRML) by exploiting feature network and label network regularization simultaneously. In more details, the proposed algorithm first constructs a feature network and a label network with marginalized linear denoising autoencoder in data feature set and label set, respectively, and then learns a robust predictor with the feature network and the label network regularization simultaneously. While DRML is a general method for multi-label learning, in the evaluations we focus on the specific application of multi-label text tagging. Extensive evaluations on three benchmark data sets demonstrate that DRML outstands with a superior performance in comparison with some existing multi-label learning methods.

【Keywords】: Multi-label learning; feature network; Label network

196. Collaborative Company Profiling: Insights from an Employee's Perspective.

Paper Link】 【Pages】:1417-1423

【Authors】: Hao Lin ; Hengshu Zhu ; Yuan Zuo ; Chen Zhu ; Junjie Wu ; Hui Xiong

【Abstract】: Company profiling is an analytical process to build an in-depth understanding of company's fundamental characteristics. It serves as an effective way to gain vital information of the target company and acquire business intelligence. Traditional approaches for company profiling rely heavily on the availability of rich finance information about the company, such as finance reports and SEC filings, which may not be readily available for many private companies. However, the rapid prevalence of online employment services enables a new paradigm — to obtain the variety of company's information from their employees' online ratings and comments. This, in turn, raises the challenge to develop company profiles from an employee's perspective. To this end, in this paper, we propose a method named Company Profiling based Collaborative Topic Regression (CPCTR), for learning the latent structural patterns of companies. By formulating a joint optimization framework, CPCTR has the ability in collaboratively modeling both textual (e.g., reviews) and numerical information (e.g., salaries and ratings). Indeed, with the identified patterns, including the positive/negative opinions and the latent variable that influences salary, we can effectively carry out opinion analysis and salary prediction. Extensive experiments were conducted on a real-world data set to validate the effectiveness of CPCTR. The results show that our method provides a comprehensive understanding of company characteristics and delivers a more effective prediction of salaries than other baselines.

【Keywords】: Company Profiling; Collaborative Topic Regression; Salary Prediction

197. ESPACE: Accelerating Convolutional Neural Networks via Eliminating Spatial and Channel Redundancy.

Paper Link】 【Pages】:1424-1430

【Authors】: Shaohui Lin ; Rongrong Ji ; Chao Chen ; Feiyue Huang

【Abstract】: Recent years have witnessed an extensive popularity of convolutional neural networks (CNNs) in various computer vision and artificial intelligence applications. However, the performance gains have come at a cost of substantially intensive computation complexity, which prohibits its usage inresource-limited applications like mobile or embedded devices. While increasing attention has been paid to the acceleration of internal network structure, the redundancy of visual input is rarely considered. In this paper, we make the first attempt of reducing spatial and channel redundancy directly from the visual input for CNNs acceleration. The proposed method, termed ESPACE (Elimination of SPAtial and Channel rEdundancy), works by the following three steps: First, the 3D channel redundancy of convolutional layers is reduced by a set of low-rank approximation of convolutional filters. Second, a novel mask based selective processing scheme is proposed, which further speedups the convolution operations via skipping unsalient spatial locations of the visual input. Third, the accelerated network is fine-tuned using the training data via back-propagation. The proposed method is evaluated on ImageNet 2012 with implementations on two widely adopted CNNs, i.e. AlexNet and GoogLeNet. In comparison to several recent methods of CNN acceleration, the proposed scheme has demonstrated new state-of-the-art acceleration performance by a factor of 5.48 and 4.12 speedup on AlexNet and GoogLeNet, respectively, with a minimal decrease in classification accuracy.

【Keywords】:

198. A Sparse Dictionary Learning Framework to Discover Discriminative Source Activations in EEG Brain Mapping.

Paper Link】 【Pages】:1431-1437

【Authors】: Feng Liu ; Shouyi Wang ; Jay Rosenberger ; Jianzhong Su ; Hanli Liu

【Abstract】: Electroencephalography (EEG) source analysis is one of the most important noninvasive human brain imaging tools that provides millisecond temporal accuracy. However, discovering essential activated brain sources associated with different brain status is still a challenging problem. In this study, we propose for the first time that the ill-posed EEG inverse problem can be formulated and solved as a sparse over-complete dictionary learning problem. In particular, a novel supervised sparse dictionary learning framework was developed for EEG source reconstruction. A revised version of discriminative K-SVD (DK-SVD) algorithm is exploited to solve the formulated supervised dictionary learning problem. As the proposed learning framework incorporated the EEG label information of different brain status, it is capable of learning a sparse representation that reveal the most discriminative brain activity sources among different brain states. Compared to the state-of-the-art EEG source analysis methods, proposed sparse dictionary learning framework achieved significant superior performance in both computing speed and accuracy for the challenging EEG source reconstruction problem through extensive numerical experiments. More importantly, the experimental results also validated that the proposed sparse learning framework is effective to discover the discriminative task-related brain activation sources, which shows the potential to advance the high resolution EEG source analysis for real-time non-invasive brain imaging research.

【Keywords】: EEG; inverse problem; dictionary learning; discriminative source

199. On Predictive Patent Valuation: Forecasting Patent Citations and Their Types.

Paper Link】 【Pages】:1438-1444

【Authors】: Xin Liu ; Junchi Yan ; Shuai Xiao ; Xiangfeng Wang ; Hongyuan Zha ; Stephen M. Chu

【Abstract】: Patents are widely regarded as a proxy for inventive output which is valuable and can be commercialized by various means. Individual patent information such as technology field, classification, claims, application jurisdictions are increasingly available as released by different venues. This work has relied on a long-standing hypothesis that the citation received by a patent is a proxy for knowledge flows or impacts of the patent thus is directly related to patent value. This paper does not fall into the line of intensive existing work that test or apply this hypothesis, rather we aim to address the limitation of using so-far received citations for patent valuation. By devising a point process based patent citation type aware (self-citation and non-self-citation) prediction model which incorporates the various information of a patent, we open up the possibility for performing predictive patent valuation which can be especially useful for newly granted patents with emerging technology. Study on real-world data corroborates the efficacy of our approach. Our initiative may also have policy implications for technology markets, patent systems and all other stakeholders. The code and curated data will be available to the research community.

【Keywords】:

200. Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks.

Paper Link】 【Pages】:1445-1452

【Authors】: Yu Liu ; Jianlong Fu ; Tao Mei ; Chang Wen Chen

【Abstract】: Automatic generation of natural language description for individual images (a.k.a. image captioning) has attracted extensive research attention. In this paper, we take one step further to investigate the generation of a paragraph to describe a photo stream for the purpose of storytelling. This task is even more challenging than individual image description due to the difficulty in modeling the large visual variance in an ordered photo collection and in preserving the long-term language coherence among multiple sentences. To deal with these challenges, we formulate the task as a sequence-to-sequence learning problem and propose a novel joint learning model by leveraging the semantic coherence in a photo stream. Specifically, to reduce visual variance, we learn a semantic space by jointly embedding each photo with its corresponding contextual sentence, so that the semantically related photos and their correlations are discovered. Then, to preserve language coherence in the paragraph, we learn a novel Bidirectional Attention-based Recurrent Neural Network (BARNN) model, which can attend on the discovered semantic relation to produce a sentence sequence and maintain its consistence with the photo stream. We integrate the two-step learning components into one single optimization formulation and train the network in an end-to-end manner. Experiments on three widely-used datasets (NYC/Disney/SIND) show that the proposed approach outperforms state-of-the-art methods with large margins for both retrieval and paragraph generation tasks. We also show the subjective preference of the machine-generated stories by the proposed approach over the baselines through a user study with 40 human subjects.

【Keywords】: Bidirectional Attention-based Recurrent Neural Network; Multi-modality embedding; Deep Learning; Text Paragraph Generation for Storytelling

201. Data-Driven Approximations to NP-Hard Problems.

Paper Link】 【Pages】:1453-1459

【Authors】: Anton Milan ; Seyed Hamid Rezatofighi ; Ravi Garg ; Anthony R. Dick ; Ian D. Reid

【Abstract】: There exist a number of problem classes for which obtaining the exact solution becomes exponentially expensive with increasing problem size. The quadratic assignment problem (QAP) or the travelling salesman problem (TSP) are just two examples of such NP-hard problems. In practice, approximate algorithms are employed to obtain a suboptimal solution, where one must face a trade-off between computational complexity and solution quality. In this paper, we propose to learn to solve these problem from approximate examples, using recurrent neural networks (RNNs). Surprisingly, such architectures are capable of producing highly accurate solutions at minimal computational cost. Moreover, we introduce a simple, yet effective technique for improving the initial (weak) training set by incorporating the objective cost into the training procedure. We demonstrate the functionality of our approach on three exemplar applications: marginal distributions of a joint matching space, feature point matching and the travelling salesman problem. We show encouraging results on synthetic and real data in all three cases.

【Keywords】: Long short-term memory, data association, matching

202. Predicting Demographics of High-Resolution Geographies with Geotagged Tweets.

Paper Link】 【Pages】:1460-1466

【Authors】: Omar Montasser ; Daniel Kifer

【Abstract】: In this paper, we consider the problem of predicting demographics of geographic units given geotagged Tweets that are composed within these units. Traditional survey methods that offer demographics estimates are usually limited in terms of geographic resolution, geographic boundaries, and time intervals. Thus, it would be highly useful to develop computational methods that can complement traditional survey methods by offering demographics estimates at finer geographic resolutions, with flexible geographic boundaries (i.e. not confined to administrative boundaries), and at different time intervals. While prior work has focused on predicting demographics and health statistics at relatively coarse geographic resolutions such as the county-level or state-level, we introduce an approach to predict demographics at finer geographic resolutions such as the blockgroup-level. For the task of predicting gender and race/ethnicity counts at the blockgroup-level, an approach adapted from prior work to our problem achieves an average correlation of 0.389 (gender) and 0.569 (race) on a held-out test dataset. Our approach outperforms this prior approach with an average correlation of 0.671 (gender) and 0.692 (race).

【Keywords】: geography-based demographics prediction; geotagged social media; computational social science

Paper Link】 【Pages】:1467-1473

【Authors】: Arun Reddy Nelakurthi ; Jingrui He

【Abstract】: With the emergence of online forums associated with major diseases, such as diabetes mellitus, many patients are increasingly dependent on such disease-specific social networks to gain access to additional resources. Among these patients, it is common for them to stick to one disease-specific social network, although their desired resources might be spread over multiple social networks, such as patients with similar questions and concerns. Motivated by this application, in this paper, we focus on cross network link recommendation, which aims to identify similar users across multiple heterogeneous social networks. The problem setting is different from existing work on cross network link prediction, which either tries to link accounts of the same user from different social networks, or aims to match users with complementary expertise or interest. To approach the problem of cross network link recommendation, we propose to jointly decompose the user-keyword matrices from multiple social networks, while requiring them to share the same topics and user group-topic association matrices. This constraint comes from the fact that social networks dedicated to the same disease tend to share the same topics as well as the interests of users groups in certain topics. Based on this intuition, we construct a generic optimization framework, provide four instantiations and an iterative optimization algorithm with performance analysis. In the experiments, we demonstrate the superiority of the proposed algorithm over state-of-the-art techniques on various real-world data sets.

【Keywords】: link-prediction, social networks, matrix factorization and co-clustering

204. FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation.

Paper Link】 【Pages】:1474-1480

【Authors】: Yulei Niu ; Zhiwu Lu ; Songfang Huang ; Xin Gao ; Ji-Rong Wen

【Abstract】: We propose a novel approach, called FeaBoost, to image semantic segmentation with only image-level labels taken as weakly-supervised constraints. Our approach is motivated from two evidences: 1) each superpixel can be represented as a linear combination of basic components (e.g., predefined classes); 2) visually similar superpixels have high probability to share the same set of labels, i.e., they tend to have common combination of predefined classes. By taking these two evidences into consideration, semantic segmentation is formulated as joint feature and label refinement over superpixels. Furthermore, we develop an efficient FeaBoost algorithm to solve such optimization problem. Extensive experiments on the MSRC and LabelMe datasets demonstrate the superior performance of our FeaBoost approach in comparison with the state-of-the-art methods, especially when noisy labels are provided for semantic segmentation.

【Keywords】:

205. Learning Implicit Tasks for Patient-Specific Risk Modeling in ICU.

Paper Link】 【Pages】:1481-1487

【Authors】: Nozomi Nori ; Hisashi Kashima ; Kazuto Yamashita ; Susumu Kunisawa ; Yuichi Imanaka

【Abstract】: Accurate assessment of the severity of a patient’s condition plays a fundamental role in acute hospital care such as that provided in an intensive care unit (ICU). ICU clinicians are required to make sense of a large amount of clinical data in a limited time to estimate the severity of a patient’s condition, which ultimately leads to the planning of appropriate care. The ICU is an especially demanding environment for clinicians because of the diversity of patients who mostly suffer from multiple diseases of various types. In this paper, we propose a mortality risk prediction method for ICU patients. The method is intended to enhance the severity assessment by considering the diversity of patients. Our method produces patient-specific risk models that reflect the collection of diseases associated with the patient. Specifically, we assume a small number of latent basis tasks, where each latent task is associated with its own parameter vector; a parameter vector for a specific patient is constructed as a linear combination of these. The latent representation of a patient, namely, the coefficients of the combination, is learned based on the collection of diseases associated with the patient. Our method could be considered a multi-task learning method where latent tasks are learned based on the collection of diseases. We demonstrate the effectiveness of our proposed method using a dataset collected from a hospital. Our method achieved higher predictive performance compared with a single-task learning method, the “de facto standard,” and several multi-task learning methods including a recently proposed method for ICU mortality risk prediction. Furthermore, our proposed method could be used not only for predictions but also for uncovering patient-specificity from different viewpoints.

【Keywords】: multi-task learning; mortality modeling, ICU

206. Enabling Dark Energy Science with Deep Generative Models of Galaxy Images.

Paper Link】 【Pages】:1488-1494

【Authors】: Siamak Ravanbakhsh ; Francois Lanusse ; Rachel Mandelbaum ; Jeff G. Schneider ; Barnabás Póczos

【Abstract】: Understanding the nature of dark energy, the mysterious force driving the accelerated expansion of the Universe, is a major challenge of modern cosmology. The next generation of cosmological surveys, specifically designed to address this issue, rely on accurate measurements of the apparent shapes of distant galaxies. However, shape measurement methods suffer from various unavoidable biases and therefore will rely on a precise calibration to meet the accuracy requirements of the science analysis. This calibration process remains an open challenge as it requires large sets of high quality galaxy images. To this end, we study the application of deep conditional generative models in generating realistic galaxy images. In particular we consider variations on conditional variational autoencoder and introduce a new adversarial objective for training of conditional generative networks. Our results suggest a reliable alternative to the acquisition of expensive high quality observations for generating the calibration data needed by the next generation of cosmological surveys.

【Keywords】: deep learning; cosmology; weak lensing; galaxy image; generative model; conditional VAE; conditional GAN; generative adversarial network; conditional variational autoencoder

207. Unsupervised Deep Learning for Optical Flow Estimation.

Paper Link】 【Pages】:1495-1501

【Authors】: Zhe Ren ; Junchi Yan ; Bingbing Ni ; Bin Liu ; Xiaokang Yang ; Hongyuan Zha

【Abstract】: Recent work has shown that optical flow estimation can be formulated as a supervised learning problem. Moreover, convolutional networks have been successfully applied to this task. However, supervised flow learning is obfuscated by the shortage of labeled training data. As a consequence, existing methods have to turn to large synthetic datasets for easily computer generated ground truth. In this work, we explore if a deep network for flow estimation can be trained without supervision. Using image warping by the estimated flow, we devise a simple yet effective unsupervised method for learning optical flow, by directly minimizing photometric consistency. We demonstrate that a flow network can be trained from end-to-end using our unsupervised scheme. In some cases, our results come tantalizingly close to the performance of methods trained with full supervision.

【Keywords】:

208. Low-Rank Linear Cold-Start Recommendation from Social Data.

Paper Link】 【Pages】:1502-1508

【Authors】: Suvash Sedhain ; Aditya Krishna Menon ; Scott Sanner ; Lexing Xie ; Darius Braziunas

【Abstract】: The cold-start problem involves recommendation of content to new users of a system, for whom there is no historical preference information available. This proves a challenge for collaborative filtering algorithms that inherently rely on such information. Recent work has shown that social metadata, such as users' friend groups and page likes, can strongly mitigate the problem. However, such approaches either lack an interpretation as optimising some principled objective, involve iterative non-convex optimisation with limited scalability, or require tuning several hyperparameters. In this paper, we first show how three popular cold-start models are special cases of a linear content-based model, with implicit constraints on the weights. Leveraging this insight, we propose Loco, a new model for cold-start recommendation based on three ingredients: (a) linear regression to learn an optimal weighting of social signals for preferences, (b) a low-rank parametrisation of the weights to overcome the high dimensionality common in social data, and (c) scalable learning of such low-rank weights using randomised SVD. Experiments on four real-world datasets show that Loco yields significant improvements over state-of-the-art cold-start recommenders that exploit high-dimensional social network metadata.

【Keywords】: Recommender Systems, Cold Start, Social Recommendation

209. Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units.

Paper Link】 【Pages】:1509-1516

【Authors】: Wenling Shang ; Justin Chiu ; Kihyuk Sohn

【Abstract】: Deep Residual Networks (ResNets) have recently achieved state-of-the-art results on many challenging computer vision tasks. In this work we analyze the role of Batch Normalization (BatchNorm) layers on ResNets in the hope of improving the current architecture and better incorporating other normalization techniques, such as Normalization Propagation (NormProp), into ResNets. Firstly, we verify that BatchNorm helps distribute representation learning to residual blocks at all layers, as opposed to a plain ResNet without BatchNorm where learning happens mostly in the latter part of the network. We also observe that BatchNorm well regularizes Concatenated ReLU (CReLU) activation scheme on ResNets, whose magnitude of activation grows by preserving both positive and negative responses when going deeper into the network. Secondly, we investigate the use of NormProp as a replacement for BatchNorm in ResNets. Though NormProp theoretically attains the same effect as BatchNorm on generic convolutional neural networks, the identity mapping of ResNets invalidates its theoretical promise and NormProp exhibits a significant performance drop when naively applied. To bridge the gap between BatchNorm and NormProp in ResNets, we propose a simple modification to NormProp and employ the CReLU activation scheme. We experiment on visual object recognition benchmark datasets such as CIFAR-10/100 and ImageNet and demonstrate that 1) the modified NormProp performs better than the original NormProp but is still not comparable to BatchNorm and 2) CReLU improves the performance of ResNets with or without normalizations.

【Keywords】:

210. Portfolio Selection via Subset Resampling.

Paper Link】 【Pages】:1517-1523

【Authors】: Weiwei Shen ; Jun Wang

【Abstract】: As the cornerstone of the modern portfolio theory, Markowitz's mean-variance optimization is a major model adopted in portfolio management. However, the estimation errors in its input parameters substantially deteriorate its performance in practice. Specifically, loss could be huge when the number of assets for investment is not much smaller than the sample size of historical data. To hasten the applicability of Markowitz's portfolio optimization to large portfolios, in this paper, we propose a new portfolio strategy via subset resampling. Through resampling subsets of the original large universe of assets, we construct the associated subset portfolios with more accurately estimated parameters without requiring additional data. By aggregating a number of constructed subset portfolios, we attain a well-diversified portfolio of all assets. To investigate its performance, we first analyze its corresponding efficient frontiers by simulation, provide analysis on the hyperparameter selection, and then empirically compare its out-of-sample performance with those of various competing strategies on diversified datasets. Experimental results corroborate that the proposed portfolio strategy has marked superiority in extensive evaluation criteria.

【Keywords】: Portfolio; Resampling

211. Beyond IID: Learning to Combine Non-IID Metrics for Vision Tasks.

Paper Link】 【Pages】:1524-1531

【Authors】: Yinghuan Shi ; Wenbin Li ; Yang Gao ; Longbing Cao ; Dinggang Shen

【Abstract】: Metric learning has been widely employed, especially in various computer vision tasks, with the fundamental assumption that all samples (e.g., regions/superpixels in images/videos) are independent and identically distributed (IID). However, since the samples are usually spatially-connected or temporally-correlated with their physically-connected neighbours, they are not IID (non-IID for short), which cannot be directly handled by existing methods. Thus, we propose to learn and integrate non-IID metrics (NIME). To incorporate the non-IID spatial/temporal relations, instead of directly using non-IID features and metric learning as previous methods, NIME first builds several non-IID representations on original (non-IID) features by various graph kernel functions, and then automatically learns the metric under the best combination of various non-IID representations. NIME is applied to solve two typical computer vision tasks: interactive image segmentation and histology image identification. The results show that learning and integrating non-IID metrics improves the performance, compared to the IID methods. Moreover, our method achieves results comparable or better than that of the state-of-the-arts.

【Keywords】:

212. Fast Inverse Reinforcement Learning with Interval Consistent Graph for Driving Behavior Prediction.

Paper Link】 【Pages】:1532-1538

【Authors】: Masamichi Shimosaka ; Junichi Sato ; Kazuhito Takenaka ; Kentarou Hitomi

【Abstract】: Maximum entropy inverse reinforcement learning (MaxEnt IRL) is an effective approach for learning the underlying rewards of demonstrated human behavior, while it is intractable in high-dimensional state space due to the exponential growth of calculation cost. In recent years, a few works on approximating MaxEnt IRL in large state spaces by graphs provide successful results, however, types of state space models are quite limited. In this work, we extend them to more generic large state space models with graphs where time interval consistency of Markov decision processes are guaranteed. We validate our proposed method in the context of driving behavior prediction. Experimental results using actual driving data confirm the superiority of our algorithm in both prediction performance and computational cost over other existing IRL frameworks.

【Keywords】:

213. Neural Programming by Example.

Paper Link】 【Pages】:1539-1545

【Authors】: Chengxun Shu ; Hongyu Zhang

【Abstract】: Programming by Example (PBE) targets at automatically inferring a computer program for accomplishing a certain task from sample input and output. In this paper, we propose a deep neural networks (DNN) based PBE model called Neural Programming by Example (NPBE), which can learn from input-output strings and induce programs that solve the string manipulation problems. Our NPBE model has four neural network based components: a string encoder, an input-output analyzer, a program generator, and a symbol selector. We demonstrate the effectiveness of NPBE by training it end-to-end to solve some common string manipulation problems in spreadsheet systems. The results show that our model can induce string manipulation programs effectively. Our work is one step towards teaching DNN to generate computer programs.

【Keywords】: Programming by Example; Deep Learning; Program Induction

214. Simultaneous Clustering and Ensemble.

Paper Link】 【Pages】:1546-1552

【Authors】: Zhiqiang Tao ; Hongfu Liu ; Yun Fu

【Abstract】: Ensemble Clustering (EC) has gained a great deal of attention throughout the fields of data mining and machine learning, since it emerged as an effective and robust clustering framework. Typically, EC methods try to fuse multiple basic partitions (BPs) into a consensus one, of which each BP is obtained by performing traditional clustering method on the same dataset. One promising direction for ensemble clustering is to derive pairwise similarity from BPs, and then transform it as a graph partition problem. However, these graph based methods may suffer from an information loss when computing the similarity between data points, because they only utilize the categorical data provided by multiple BPs, yet neglect rich information from raw features. This problem can badly undermine the underlying cluster structure in the original feature space, and thus degrade the clustering performance. In light of this, we propose a novel Simultaneous Clustering and Ensemble (SCE) framework to alleviate such detrimental effect, which employs the similarity matrix from raw features to enhance the co-association matrix summarized by multiple BPs. Two neat closed-form solutions given by eigenvalue decomposition are provided for SCE. Experiments conducted on 16 real-world datasets demonstrate the effectiveness of the proposed SCE over the traditional clustering and state-of-the-art ensemble clustering methods. Moreover, several impact factors that may affect our method are also explored extensively.

【Keywords】: Ensemble Clustering; Co-association Matrix; Spectral Clustering

215. A Deep Hierarchical Approach to Lifelong Learning in Minecraft.

Paper Link】 【Pages】:1553-1561

【Authors】: Chen Tessler ; Shahar Givony ; Tom Zahavy ; Daniel J. Mankowitz ; Shie Mannor

【Abstract】: We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the H-DRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.

【Keywords】: Reinforcement Learning; Deep Learning; Lifelong Learning; Minecraft; Skills;

216. Learning Attributes from the Crowdsourced Relative Labels.

Paper Link】 【Pages】:1562-1568

【Authors】: Tian Tian ; Ning Chen ; Jun Zhu

【Abstract】: Finding semantic attributes to describe related concepts is typically a hard problem. The commonly used attributes in most fields are designed by domain experts, which is expensive and time-consuming. In this paper we propose an efficient method to learn human comprehensible attributes with crowdsourcing. We first design an analogical interface to collect relative labels from the crowds. Then we propose a hierarchical Bayesian model, as well as an efficient initialization strategy, to aggregate labels and extract concise attributes. Our experimental results demonstrate promise on discovering diverse and convincing attributes, which significantly improve the performance of the challenging zero-shot learning tasks.

【Keywords】: Crowdsourcing; Attributes; Graphical Model

217. Coupling Implicit and Explicit Knowledge for Customer Volume Prediction.

Paper Link】 【Pages】:1569-1575

【Authors】: Jingyuan Wang ; Yating Lin ; Junjie Wu ; Zhong Wang ; Zhang Xiong

【Abstract】: Customer volume prediction, which predicts the volume from a customer source to a service place, is a very important technique for location selection, market investigation, and other related applications. Most of traditional methods only make use of partial information for either supervised or unsupervised modeling, which cannot well integrate overall available knowledge. In this paper, we propose a method titled GR-NMF for jointly modeling both implicit correlations hidden inside customer volumes and explicit geographical knowledge via an integrated probabilistic framework. The effectiveness of GR-NMF in coupling all-round knowledge is verified over a real-life outpatient dataset under different scenarios. GR-NMF shows particularly evident advantages to all baselines in location selection with the cold-start challenge.

【Keywords】:

218. Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation.

Paper Link】 【Pages】:1576-1582

【Authors】: Jinzhuo Wang ; Wenmin Wang ; Ronggang Wang ; Wen Gao

【Abstract】: Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future interactions. In this paper, we propose a computer Go system that follows experts’ way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts’ nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of $253,233$ professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.

【Keywords】: Computer Go;Deep Alternative Neural Network;Long-Term Evaluation;POMDP;Pattern Learning;Move Prediction

219. Multiset Feature Learning for Highly Imbalanced Data Classification.

Paper Link】 【Pages】:1583-1589

【Authors】: Fei Wu ; Xiao-Yuan Jing ; Shiguang Shan ; Wangmeng Zuo ; Jing-Yu Yang

【Abstract】: With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio of data is high, most existing imbalanced learning methods decline in classification performance. To address this problem, a few highly imbalanced learning methods have been presented. However, most of them are still sensitive to the high imbalance ratio. This work aims to provide an effective solution for the highly imbalanced data classification problem. We conduct highly imbalanced learning from the perspective of feature learning. We partition the majority class into multiple blocks with each being balanced to the minority class and combine each block with the minority class to construct a balanced sample set. Multiset feature learning (MFL) is performed on these sets to learn discriminant features. We thus propose an uncorrelated cost-sensitive multiset learning (UCML) approach. UCML provides a multiple sets construction strategy, incorporates the cost-sensitive factor into MFL, and designs a weighted uncorrelated constraint to remove the correlation among multiset features. Experiments on five highly imbalanced datasets indicate that: UCML outperforms state-of-the-art imbalanced learning methods.

【Keywords】:

220. Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation.

Paper Link】 【Pages】:1590-1596

【Authors】: Cao Xiao ; Ping Zhang ; W. Art Chaovalitwongse ; Jianying Hu ; Fei Wang

【Abstract】: Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. It usually causes preventable hospitalizations and deaths, while associated with a huge amount of cost. Traditional preclinical in vitro safety profiling and clinical safety trials are restricted in terms of small scale, long duration, huge financial costs and limited statistical signifi- cance. The availability of large amounts of drug and ADR data potentially allows ADR predictions during the drugs’ early preclinical stage with data analytics methods to inform more targeted clinical safety tests. Despite their initial success, existing methods have trade-offs among interpretability, predictive power and efficiency. This urges us to explore methods that could have all these strengths and provide practical solutions for real world ADR predictions. We cast the ADR-drug relation structure into a three-layer hierarchical Bayesian model. We interpret each ADR as a symbolic word and apply latent Dirichlet allocation (LDA) to learn topics that may represent certain biochemical mechanism that relates ADRs with drug structures. Based on LDA, we designed an equivalent regularization term to incorporate the hierarchical ADR domain knowledge. Finally, we developed a mixed input model leveraging a fast collapsed Gibbs sampling method that the complexity of each iteration of Gibbs sampling proportional only to the number of positive ADRs. Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.

【Keywords】: machine learning, data mining, LDA, adverse drug reaction

221. Modeling the Intensity Function of Point Process Via Recurrent Neural Networks.

Paper Link】 【Pages】:1597-1603

【Authors】: Shuai Xiao ; Junchi Yan ; Xiaokang Yang ; Hongyuan Zha ; Stephen M. Chu

【Abstract】: Event sequence, asynchronously generated with random timestamp, is ubiquitous among applications. The precise and arbitrary timestamp can carry important clues about the underlying dynamics, and has lent the event data fundamentally different from the time-series whereby series is indexed with fixed and equal time interval. One expressive mathematical tool for modeling event is point process. The intensity functions of many point processes involve two components: the background and the effect by the history. Due to its inherent spontaneousness, the background can be treated as a time series while the other need to handle the history events. In this paper, we model the background by a Recurrent Neural Network (RNN) with its units aligned with time series indexes while the history effect is modeled by another RNN whose units are aligned with asynchronous events to capture the long-range dynamics. The whole model with event type and timestamp prediction output layers can be trained end-to-end. Our approach takes an RNN perspective to point process, and models its background and history effect. For utility, our method allows a black-box treatment for modeling the intensity which is often a pre-defined parametric form in point processes. Meanwhile end-to-end training opens the venue for reusing existing rich techniques in deep network for point process modeling. We apply our model to the predictive maintenance problem using a log dataset by more than 1000 ATMs from a global bank headquartered in North America.

【Keywords】:

222. Progressive Prediction of Student Performance in College Programs.

Paper Link】 【Pages】:1604-1610

【Authors】: Jie Xu ; Yuli Han ; Daniel Marcu ; Mihaela van der Schaar

【Abstract】: Accurately predicting students' future performance based on their tracked academic records in college programs is crucial for effectively carrying out necessary pedagogical interventions to ensure students' on-time graduation. Although there is a rich literature on predicting student performance in solving problems and studying courses using data-driven approaches, predicting student performance in completing college programs is much less studied and faces new challenges, mainly due to the diversity of courses selected by students and the requirement of continuous tracking and incorporation of students' evolving progresses. In this paper, we develop a novel algorithm that enables progressive prediction of students' performance by adapting ensemble learning techniques and utilizing education-specific domain knowledge. We prove its prediction performance guarantee and show its performance improvement against benchmark algorithms on a real-world student dataset from UCLA.

【Keywords】: Machine learning for education

223. Bridging Video Content and Comments: Synchronized Video Description with Temporal Summarization of Crowdsourced Time-Sync Comments.

Paper Link】 【Pages】:1611-1617

【Authors】: Linli Xu ; Chao Zhang

【Abstract】: With the rapid growth of online sharing media, we are facing a huge collection of videos. In the meantime, due to the volume and complexity of video data, it can be tedious and time consuming to index or annotate videos. In this paper, we propose to generate temporal descriptions of videos by exploiting the information of crowdsourced time-sync comments which are receiving increasing popularity on many video sharing websites. In this framework, representative and interesting comments of a video are selected and highlighted along the timeline, which provide an informative description of the video in a time-sync manner. The challenge of the proposed application comes from the extremely informal and noisy nature of the comments, which are usually short sentences and on very different topics. To resolve these issues, we propose a novel temporal summarization model based on the data reconstruction principle, where representative comments are selected in order to best reconstruct the original corpus at the text level as well as the topic level while incorporating the temporal correlations of the comments. Experimental results on real-world data demonstrate the effectiveness of the proposed framework and justify the idea of exploiting crowdsourced time-sync comments as a bridge to describe videos.

【Keywords】: video description; temporal summarization; time-sync comments

224. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval.

Paper Link】 【Pages】:1618-1625

【Authors】: Erkun Yang ; Cheng Deng ; Wei Liu ; Xianglong Liu ; Dacheng Tao ; Xinbo Gao

【Abstract】: With benefits of low storage cost and fast query speed, cross-modal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.

【Keywords】:

225. Discriminative Semi-Supervised Dictionary Learning with Entropy Regularization for Pattern Classification.

Paper Link】 【Pages】:1626-1632

【Authors】: Meng Yang ; Lin Chen

【Abstract】: Dictionary learning has played an important role in the success of sparse representation, which triggers the rapid developments of unsupervised and supervised dictionary learning methods. However, in most practical applications, there are usually quite limited labeled training samples while it is relatively easy to acquire abundant unlabeled training samples. Thus semi-supervised dictionary learning that aims to effectively explore the discrimination of unlabeled training data has attracted much attention of researchers. Although various regularizations have been introduced in the prevailing semi-supervised dictionary learning, how to design an effective unified model of dictionary learning and unlabeled-data class estimating and how to well explore the discrimination in the labeled and unlabeled data are still open. In this paper, we propose a novel discriminative semi-supervised dictionary learning model (DSSDL) by introducing discriminative representation, an identical coding of unlabeled data to the coding of testing data final classification, and an entropy regularization term. The coding strategy of unlabeled data can not only avoid the affect of its incorrect class estimation, but also make the learned discrimination be well exploited in the final classification. The introduced regularization of entropy can avoid overemphasizing on some uncertain estimated classes for unlabeled samples. Apart from the enhanced discrimination in the learned dictionary by the discriminative representation, an extended dictionary is used to mainly explore the discrimination embedded in the unlabeled data. Extensive experiments on face recognition, digit recognition and texture classification show the effectiveness of the proposed method.

【Keywords】: semi-supervised learning;discriminative dictionary learning; pattern classification

226. Fine-Grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images.

Paper Link】 【Pages】:1633-1639

【Authors】: Xin Yang ; Lequan Yu ; Lingyun Wu ; Yi Wang ; Dong Ni ; Jing Qin ; Pheng-Ann Heng

【Abstract】: Boundary incompleteness raises great challenges to automatic prostate segmentation in ultrasound images. Shape prior can provide strong guidance in estimating the missing boundary, but traditional shape models often suffer from hand-crafted descriptors and local information loss in the fitting procedure. In this paper, we attempt to address those issues with a novel framework. The proposed framework can seamlessly integrate feature extraction and shape prior exploring, and estimate the complete boundary with a sequential manner. Our framework is composed of three key modules. Firstly, we serialize the static 2D prostate ultrasound images into dynamic sequences and then predict prostate shapes by sequentially exploring shape priors. Intuitively, we propose to learn the shape prior with the biologically plausible Recurrent Neural Networks (RNNs). This module is corroborated to be effective in dealing with the boundary incompleteness. Secondly, to alleviate the bias caused by different serialization manners, we propose a multi-view fusion strategy to merge shape predictions obtained from different perspectives. Thirdly, we further implant the RNN core into a multiscale Auto-Context scheme to successively refine the details of the shape prediction map. With extensive validation on challenging prostate ultrasound images, our framework bridges severe boundary incompleteness and achieves the best performance in prostate boundary delineation when compared with several advanced methods. Additionally, our approach is general and can be extended to other medical image segmentation tasks, where boundary incompleteness is one of the main challenges.

【Keywords】: Prostate segmentation; Ultrasound image; Recurrent Neural Networks; Auto-Context

227. Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay.

Paper Link】 【Pages】:1640-1646

【Authors】: Haiyan Yin ; Sinno Jialin Pan

【Abstract】: The process for transferring knowledge of multiple reinforcement learning policies into a single multi-task policy via distillation technique is known as policy distillation. When policy distillation is under a deep reinforcement learning setting, due to the giant parameter size and the huge state space for each task domain, it requires extensive computational efforts to train the multi-task policy network. In this paper, we propose a new policy distillation architecture for deep reinforcement learning, where we assume that each task uses its task-specific high-level convolutional features as the inputs to the multi-task policy network. Furthermore, we propose a new sampling framework termed hierarchical prioritized experience replay to selectively choose experiences from the replay memories of each task domain to perform learning on the network. With the above two attempts, we aim to accelerate the learning of the multi-task policy network while guaranteeing a good performance. We use Atari 2600 games as testing environment to demonstrate the efficiency and effectiveness of our proposed solution for policy distillation

【Keywords】: deep reinforcement learning;policy distillation;multitask learning;deep neural networks

228. Personalized Donor-Recipient Matching for Organ Transplantation.

Paper Link】 【Pages】:1647-1654

【Authors】: Jinsung Yoon ; Ahmed M. Alaa ; Martin Cadeiras ; Mihaela van der Schaar

【Abstract】: Organ transplants can improve the life expectancy and quality of life for the recipient but carry the risk of serious post-operative complications, such as septic shock and organ rejection. The probability of a successful transplant depends in a very subtle fashion on compatibility between the donor and the recipient - but current medical practice is short of domain knowledge regarding the complex nature of recipient-donor compatibility. Hence a data-driven approach for learning compatibility has the potential for significant improvements in match quality. This paper proposes a novel system (ConfidentMatch) that is trained using data from electronic health records. ConfidentMatch predicts the success of an organ transplant (in terms of the 3-year survival rates) on the basis of clinical and demographic traits of the donor and recipient. ConfidentMatch captures the heterogeneity of the donor and recipient traits by optimally dividing the feature space into clusters and constructing different optimal predictive models to each cluster. The system controls the complexity of the learned predictive model in a way that allows for assuring more granular and accurate predictions for a larger number of potential recipient-donor pairs, thereby ensuring that predictions are "personalized" and tailored to individual characteristics to the finest possible granularity. Experiments conducted on the UNOS heart transplant dataset show the superiority of the prognostic value of ConfidentMatch to other competing benchmarks; ConfidentMatch can provide predictions of success with 95% accuracy for 5,489 patients of a total population of 9,620 patients, which corresponds to 410 more patients than the most competitive benchmark algorithm (DeepBoost).

【Keywords】: Organ Transplant; Personalization; Ensemble Learning; Health Informatics

229. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction.

Paper Link】 【Pages】:1655-1661

【Authors】: Junbo Zhang ; Yu Zheng ; Dekang Qi

【Abstract】: Forecasting the flow of crowds is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as inter-region traffic, events, and weather. We propose a deep-learning-based approach, called ST-ResNet, to collectively forecast the inflow and outflow of crowds in each and every region of a city. We design an end-to-end structure of ST-ResNet based on unique properties of spatio-temporal data. More specifically, we employ the residual neural network framework to model the temporal closeness, period, and trend properties of crowd traffic. For each property, we design a branch of residual convolutional units, each of which models the spatial properties of crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions. The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region. Experiments on two types of crowd flows in Beijing and New York City (NYC) demonstrate that the proposed ST-ResNet outperforms six well-known methods.

【Keywords】: deep learning; urban computing; spatio-temporal data

230. Robust Manifold Matrix Factorization for Joint Clustering and Feature Extraction.

Paper Link】 【Pages】:1662-1668

【Authors】: Lefei Zhang ; Qian Zhang ; Bo Du ; Dacheng Tao ; Jane You

【Abstract】: Low-rank matrix approximation has been widely used for data subspace clustering and feature representation in many computer vision and pattern recognition applications. However, in order to enhance the discriminability, most of the matrix approximation based feature extraction algorithms usually generate the cluster labels by certain clustering algorithm (e.g., the kmeans) and then perform the matrix approximation guided by such label information. In addition, the noises and outliers in the dataset with large reconstruction errors will easily dominate the objective function by the conventional ℓ 2 -norm based squared residue minimization. In this paper, we propose a novel clustering and feature extraction algorithm based on an unified low-rank matrix factorization framework, which suggests that the observed data matrix can be approximated by the production of projection matrix and low dimensional representation, among which the low-dimensional representation can be approximated by the cluster indicator and latent feature matrix simultaneously. Furthermore, we have proposed using the ℓ 2,1 -norm and integrating the manifold regularization to further promote the proposed model. A novel Augmented Lagrangian Method (ALM) based procedure is designed to effectively and efficiently seek the optimal solution of the problem. The experimental results in both clustering and feature extraction perspectives demonstrate the superior performance of the proposed method.

【Keywords】:

231. Discrete Personalized Ranking for Fast Collaborative Filtering from Implicit Feedback.

Paper Link】 【Pages】:1669-1675

【Authors】: Yan Zhang ; Defu Lian ; Guowu Yang

【Abstract】: Personalized ranking is usually considered as an ultimate goal of recommendation systems, but it suffers from efficiency issues when making recommendations. To this end, we propose a learning-based hashing framework called Discrete Personalized Ranking (DPR), to map users and items to a Hamming space, where user-item affinity can be efficiently calculated via Hamming distance. Due to the existence of discrete constraints, it is possible to exploit a two-stage learning procedure for learning binary codes according to most existing methods. This two-stage procedure consists of relaxed optimization by discarding discrete constraints and subsequent binary quantization. However, such a procedure has been shown resulting in a large quantization loss, so that longer binary codes would be required. To this end, DPR directly tackles the discrete optimization problem of personalized ranking. And the balance and un-correlation constraints of binary codes are imposed to derive compact but informatics binary codes. Based on the evaluation on several datasets, the proposed framework shows consistent superiority to the competing baselines even though only using shorter binary code.

【Keywords】: recommendation; discrete hashing; personalized ranking; implicit feedback

232. Catch'Em All: Locating Multiple Diffusion Sources in Networks with Partial Observations.

Paper Link】 【Pages】:1676-1683

【Authors】: Kai Zhu ; Zhen Chen ; Lei Ying

【Abstract】: This paper studies the problem of locating multiple diffusion sources in networks with partial observations. We propose a new source localization algorithm, named Optimal-Jordan-Cover (OJC). The algorithm first extracts a subgraph using a candidate selection algorithm that selects source candidates based on the number of observed infected nodes in their neighborhoods. Then, in the extracted subgraph, OJC finds a set of nodes that "cover" all observed infected nodes with the minimum radius. The set of nodes is called the Jordan cover, and is regarded as the set of diffusion sources. Considering the heterogeneous susceptible-infected-recovered (SIR) diffusion in the Erdos-Renyi (ER) random graph, we prove that OJC can locate all sources with probability one asymptotically with partial observations. OJC is a polynomial-time algorithm in terms of network size. However, the computational complexity increases exponentially in m; the number of sources. We further propose a low-complexity heuristic based on the K-Means for approximating the Jordan cover, named Approximate-Jordan-Cover (AJC). Simulations on random graphs and real networks demonstrate that both AJC and OJC significantly outperform other heuristic algorithms.

【Keywords】:

Machine Learning Methods 182

233. Learning Bayesian Networks with Incomplete Data by Augmentation.

Paper Link】 【Pages】:1684-1690

【Authors】: Tameem Adel ; Cassio Polpo de Campos

【Abstract】: We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach.

【Keywords】:

234. Unsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption.

Paper Link】 【Pages】:1691-1697

【Authors】: Tameem Adel ; Han Zhao ; Alexander Wong

【Abstract】: Domain adaptation addresses learning tasks where training is performed on data from one domain whereas testing is performed on data belonging to a different but related domain. Assumptions about the relationship between the source and target domains should lead to tractable solutions on the one hand, and be realistic on the other hand. Here we propose a generative domain adaptation model that allows for modelling different assumptions about this relationship, among which is a newly introduced assumption that replaces covariate shift with a possibly more realistic assumption without losing tractability due to the efficient variational inference procedure developed. In addition to the ability to model less restrictive relationships between source and target, modelling can be performed without any target labeled data (unsupervised domain adaptation). We also provide a Rademacher complexity bound of the proposed algorithm. We evaluate the model on the Amazon reviews and the CVC pedestrian detection datasets.

【Keywords】: Domain adaptation; variational inference

235. Scalable Optimization of Multivariate Performance Measures in Multi-instance Multi-label Learning.

Paper Link】 【Pages】:1698-1704

【Authors】: Apoorv Aggarwal ; Sandip Ghoshal ; Ankith M. S. Shetty ; Suhit Sinha ; Ganesh Ramakrishnan ; Purushottam Kar ; Prateek Jain

【Abstract】: The problem of multi-instance multi-label learning (MIML) requires a bag of instances to be assigned a set of labels most relevant to the bag as a whole. The problem finds numerous applications in machine learning, computer vision, and natural language processing settings where only partial or distant supervision is available. We present a novel method for optimizing multivariate performance measures in the MIML setting. Our approach MIML-perf uses a novel plug-in technique and offers a seamless way to optimize a vast variety of performance measures such as macro and micro-F measure, average precision, which are performance measures of choice in multi-label learning domains. MIML-perf offers two key benefits over the state of the art. Firstly, across a diverse range of benchmark tasks, ranging from relation extraction to text categorization and scene classification, MIML-perf offers superior performance as compared to state of the art methods designed specifically for these tasks. Secondly, MIML-perf operates with significantly reduced running times as compared to other methods, often by an order of magnitude or more.

【Keywords】: Distant Supervision; Relation Extraction; Multi-instance Learning; Macro-F Measure; Plug-in Classifiers

236. The Bernstein Mechanism: Function Release under Differential Privacy.

Paper Link】 【Pages】:1705-1711

【Authors】: Francesco Aldà ; Benjamin I. P. Rubinstein

【Abstract】: We address the problem of general function release under differential privacy, by developing a functional mechanism that applies under the weak assumptions of oracle access to target function evaluation and sensitivity. These conditions permit treatment of functions described explicitly or implicitly as algorithmic black boxes. We achieve this result by leveraging the iterated Bernstein operator for polynomial approximation of the target function, and polynomial coefficient perturbation. Under weak regularity conditions, we establish fast rates on utility measured by high-probability uniform approximation. We provide a lower bound on the utility achievable for any functional mechanism that is epsilon-differentially private. The generality of our mechanism is demonstrated by the analysis of a number of example learners, including naive Bayes, non-parametric estimators and regularized empirical risk minimization. Competitive rates are demonstrated for kernel density estimation; and epsilon-differential privacy is achieved for a broader class of support vector machines than known previously.

【Keywords】: Bernstein operator; Bernstein mechanism

237. Heavy-Tailed Analogues of the Covariance Matrix for ICA.

Paper Link】 【Pages】:1712-1718

【Authors】: Joseph Anderson ; Navin Goyal ; Anupama Nandi ; Luis Rademacher

【Abstract】: Independent Component Analysis (ICA) is the problem of learning a square matrix A, given samples of X = A S, where S is a random vector with independent coordinates. Most existing algorithms are provably efficient only when each S i has finite and moderately valued fourth moment. However, there are practical applications where this assumption need not be true, such as speech and finance. Algorithms have been proposed for heavy-tailed ICA, but they are not practical, using random walks and the full power of the ellipsoid algorithm multiple times. The main contributions of this paper are (1) A practical algorithm for heavy-tailed ICA that we call HTICA. We provide theoretical guarantees and show that it outperforms other algorithms in some heavy-tailed regimes, both on real and synthetic data. Like the current state-of-the-art, the new algorithm is based on the centroid body (a first moment analogue of the covariance matrix). Unlike the state-of-the-art, our algorithm is practically efficient. To achieve this, we use explicit analytic representations of the centroid body, which bypasses the use of the ellipsoid method and random walks. (2) We study how heavy tails affect different ICA algorithms, including HTICA. Somewhat surprisingly, we show that some algorithms that use the covariance matrix or higher moments can successfully solve a range of ICA instances with infinite second moment. We study this theoretically and experimentally, with both synthetic and real-world heavy-tailed data.

【Keywords】: Machine Learning Methods, Machine Learning Applications, Independent Component Analysis, Centroid Body, Heavy-Tailed Distributions

238. Fast Generalized Distillation for Semi-Supervised Domain Adaptation.

Paper Link】 【Pages】:1719-1725

【Authors】: Shuang Ao ; Xiang Li ; Charles X. Ling

【Abstract】: Semi-supervised domain adaptation (SDA) is a typical setting when we face the problem of domain adaptation in real applications. How to effectively utilize the unlabeled data is an important issue in SDA. Previous work requires access to the source data to measure the data distribution mismatch, which is ineffective when the size of the source data is relatively large. In this paper, we propose a new paradigm, called Generalized Distillation Semi-supervised Domain Adaptation (GDSDA). We show that without accessing the source data, GDSDA can effectively utilize the unlabeled data to transfer the knowledge from the source models. Then we propose GDSDA-SVM which uses SVM as the base classifier and can efficiently solve the SDA problem. Experimental results show that GDSDA-SVM can effectively utilize the unlabeled data to transfer the knowledge between different domains under the SDA setting.

【Keywords】: Domain Adaptation; Generalized Distillation

239. The Option-Critic Architecture.

Paper Link】 【Pages】:1726-1734

【Authors】: Pierre-Luc Bacon ; Jean Harb ; Doina Precup

【Abstract】: Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging.We tackle this problem in the framework of options [Sutton,Precup and Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.

【Keywords】: temporal abstraction ; options ; policy gradient methods

240. Label Efficient Learning by Exploiting Multi-Class Output Codes.

Paper Link】 【Pages】:1735-1741

【Authors】: Maria-Florina Balcan ; Travis Dick ; Yishay Mansour

【Abstract】: We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes. Rather than studying the behavior of these techniques for supervised learning, we establish a connection between the success of these methods and the existence of label-efficient learning procedures. We show that in both the realizable and agnostic cases, if output codes are successful at learning from labeled data, they implicitly assume structure on how the classes are related. By making that structure explicit, we design learning algorithms to recover the classes with low label complexity. We provide results for the commonly studied cases of one-vs-all learning and when the codewords of the classes are well separated. We additionally consider the more challenging case where the codewords are not well separated, but satisfy a boundary features condition that captures the natural intuition that every bit of the codewords should be significant.

【Keywords】: label efficient; output codes; multi-class; one-vs-all

241. Robust Partially-Compressed Least-Squares.

Paper Link】 【Pages】:1742-1748

【Authors】: Stephen Becker ; Ban Kawas ; Marek Petrik

【Abstract】: Randomized matrix compression techniques, such as the Johnson-Lindenstrauss transform, have emerged as an effective and practical way for solving large-scale problems efficiently. With a focus on computational efficiency, however, forsaking solutions quality and accuracy becomes the trade-off. In this paper, we investigate compressed least-squares problems and propose new models and algorithms that address the issue of error and noise introduced by compression. While maintaining computational efficiency, our models provide robust solutions that are more accurate than those of classical compressed variants. We introduce tools from robust optimization together with a form of partial compression to improve the error-time trade-offs of compressed least-squares solvers. We develop an efficient solution algorithm for our Robust Partially-Compressed (RPC) model based on a reduction to a one-dimensional search.

【Keywords】: least-squares regression; sketching; randomized projections; robust optimization

242. Learning Residual Alternating Automata.

Paper Link】 【Pages】:1749-1755

【Authors】: Sebastian Berndt ; Maciej Liskiewicz ; Matthias Lutter ; Rüdiger Reischuk

【Abstract】: Residuality plays an essential role for learning finite automata. While residual deterministic and non-deterministic automata have been understood quite well, fundamental questions concerning alternating automata (AFA) remain open. Recently, Angluin, Eisenstat, and Fisman (2015) have initiated a systematic study of residual AFAs and proposed an algorithm called AL – an extension of the popular L algorithm – to learn AFAs. Based on computer experiments they have conjectured that AL produces residual AFAs, but have not been able to give a proof. In this paper we disprove this conjecture by constructing a counterexample. As our main positive result we design an efficient learning algorithm, named AL* and give a proof that it outputs residual AFAs only. In addition, we investigate the succinctness of these different FA types in more detail.

【Keywords】: learning finite automata; active learning; exact learning; alternating finite automata; membership and equivalence queries

243. Resource Constrained Structured Prediction.

Paper Link】 【Pages】:1756-1762

【Authors】: Tolga Bolukbasi ; Kai-Wei Chang ; Joseph Wang ; Venkatesh Saligrama

【Abstract】: We study the problem of structured prediction under test-time budget constraints. We propose a novel approach based on selectively acquiring computationally costly features during test-time in order to reduce the computational cost of pre- diction with minimal performance degradation. We formulate a novel empirical risk minimization (ERM) for policy learning. We show that policy learning can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition and dependency parsing and show significant reduction in the feature costs without degrading accuracy.

【Keywords】: Resource Efficient Prediction

244. Cross-Domain Kernel Induction for Transfer Learning.

Paper Link】 【Pages】:1763-1769

【Authors】: Wei-Cheng Chang ; Yuexin Wu ; Hanxiao Liu ; Yiming Yang

【Abstract】: The key question in transfer learning (TL) research is how to make model induction transferable across different domains. Common methods so far require source and target domains to have a shared/homogeneous feature space, or the projection of features from heterogeneous domains onto a shared space. This paper proposes a novel framework, which does not require a shared feature space but instead uses a parallel corpus to calibrate domain-specific kernels into a unified kernel, to leverage graph-based label propagation in cross-domain settings, and to optimize semi-supervised learning based on labeled and unlabeled data in both source and target domains. Our experiments on benchmark datasets show advantageous performance of the proposed method over that of other state-of-the-art TL methods.

【Keywords】: transfer learning; graph Laplacian;

245. Informative Subspace Learning for Counterfactual Inference.

Paper Link】 【Pages】:1770-1776

【Authors】: Yale Chang ; Jennifer G. Dy

【Abstract】: Inferring causal relations from observational data is widely used for knowledge discovery in healthcare and economics. To investigate whether a treatment can affect an outcome of interest, we focus on answering counterfactual questions of this type: what would a patient’s blood pressure be had he/she received a different treatment? Nearest neighbor matching (NNM) sets the counterfactual outcome of any treatment (control) sample to be equal to the factual outcome of its nearest neighbor in the control (treatment) group. Although being simple, flexible and interpretable, most NNM approaches could be easily misled by variables that do not affect the outcome. In this paper, we address this challenge by learning subspaces that are predictive of the outcome variable for both the treatment group and control group. Applying NNM in the learned subspaces leads to more accurate estimation of the counterfactual outcomes and therefore treatment effects. We introduce an informative subspace learning algorithm by maximizing the nonlinear dependence between the candidate subspace and the outcome variable measured by the Hilbert-Schmidt Independence Criterion (HSIC). We propose a scalable estimator of HSIC, called HSIC-RFF that reduces the quadratic computational and storage complexities (with respect to the sample size) of the naive HSIC implementation to linear through constructing random Fourier features. We also prove an upper bound on the approximation error of the HSIC-RFF estimator. Experimental results on simulated datasets and real-world datasets demonstrate our proposed approach outperforms existing NNM approaches and other commonly used regression-based methods for counterfactual inference.

【Keywords】: Counterfactual; Causality; Subspace Learning

246. PAC Identification of a Bandit Arm Relative to a Reward Quantile.

Paper Link】 【Pages】:1777-1783

【Authors】: Arghya Roy Chaudhuri ; Shivaram Kalyanakrishnan

【Abstract】: We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixed tolerance of the m-th highest mean. This setup generalises a previous formulation with m = 1, and differs from yet another one which requires m such arms to be identified. The key implication of our proposed approach is the ability to derive upper bounds on the sample complexity that depend on n/m in place of n. Consequently, even when the number of arms is infinite, we only need a finite number of samples to identify an arm that compares favourably with a fixed reward quantile. This facility makes our approach attractive to applications such as drug discovery, wherein the number of arms (molecular configurations) may run into a few thousands. We present sampling algorithms for both the finite- and infinite-armed cases, and validate their efficiency through theoretical and experimental analysis.We also present a lower bound on the worst case sample complexity of PAC algorithms for our problem, which matches our upper bound up to a logarithmic factor.

【Keywords】: Multi-armed bandit; PAC algorithm; Quantile

247. Classification with Minimax Distance Measures.

Paper Link】 【Pages】:1784-1790

【Authors】: Morteza Haghir Chehreghani

【Abstract】: Minimax distance measures provide an effective way to capture the unknown underlying patterns and classes of the data in a non-parametric way. We develop a general-purpose framework to employ Minimax distances with any classification method that performs on numerical data. For this purpose, we establish a two-step strategy. First, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space are equal to their Minimax distances in the original space. We also consider the cases where multiple pairwise Minimax matrices are given, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition. We experimentally validate our framework on different synthetic and real-world datasets.

【Keywords】: classification; feature selection; Minimax distances

248. Latent Discriminant Analysis with Representative Feature Discovery.

Paper Link】 【Pages】:1791-1797

【Authors】: Gang Chen

【Abstract】: Linear Discriminant Analysis (LDA) is a well-known method for dimension reduction and classification with focus on discriminative feature selection. However, how to discover discriminative as well as representative features in LDA model has not been explored. In this paper, we propose a latent Fisher discriminant model with representative feature discovery in an semi-supervised manner. Specifically, our model leverages advantages of both discriminative and generative models by generalizing LDA with data-driven prior over the latent variables. Thus, our method combines multi-class, latent variables and dimension reduction in an unified Bayesian framework. We test our method on MUSK and Corel datasets and yield competitive results compared to baselines. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.

【Keywords】: semi-supervised classification; latent variables; semantic keyframe extraction

249. Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting.

Paper Link】 【Pages】:1798-1804

【Authors】: Lin Chen ; Seyed Hamed Hassani ; Amin Karbasi

【Abstract】: In this paper, we consider the problem of actively learning a linear classifier through query synthesis where the learner can construct artificial queries in order to estimate the true decision boundaries. This problem has recently gained a lot of interest in automated science and adversarial reverse engineering for which only heuristic algorithms are known. In such applications, queries can be constructed de novo to elicit information (e.g., automated science) or to evade detection with minimal cost (e.g., adversarial reverse engineering). We develop a general framework, called dimension coupling (DC), that 1) reduces a d-dimensional learning problem to d-1 low dimensional sub-problems, 2) solves each sub-problem efficiently, 3) appropriately aggregates the results and outputs a linear classifier, and 4) provides a theoretical guarantee for all possible schemes of aggregation. The proposed method is proved resilient to noise. We show that the DC framework avoids the curse of dimensionality: its computational complexity scales linearly with the dimension. Moreover, we show that the query complexity of DC is near optimal (within a constant factor of the optimum algorithm). To further support our theoretical analysis, we compare the performance of DC with the existing work. We observe that DC consistently outperforms the prior arts in terms of query complexity while often running orders of magnitude faster.

【Keywords】: active learning; halfspace learning; query synthesis

250. Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis.

Paper Link】 【Pages】:1805-1811

【Authors】: Zhourong Chen ; Nevin L. Zhang ; Dit-Yan Yeung ; Peixian Chen

【Abstract】: We are interested in exploring the possibility and benefits of structure learning for deep models. As the first step, this paper investigates the matter for Restricted Boltzmann Machines (RBMs) . We conduct the study with Replicated Softmax, a variant of RBMs for unsupervised text analysis. We present a method for learning what we call Sparse Boltzmann Machines , where each hidden unit is connected to a subset of the visible units instead of all of them.  Empirical results show that the method yields models with significantly improved model fit and interpretability as compared with RBMs where each hidden unit is connected to all visible units.

【Keywords】: RBM; Structure Learning; Sparse Boltzmann Machines; Text analysis; Neural Network

251. Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features.

Paper Link】 【Pages】:1812-1818

【Authors】: Zihao Chen ; Luo Luo ; Zhihua Zhang

【Abstract】: Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.

【Keywords】: distributed convex optimization, communication, lower bound

252. OFFER: Off-Environment Reinforcement Learning.

Paper Link】 【Pages】:1819-1825

【Authors】: Kamil Andrzej Ciosek ; Shimon Whiteson

【Abstract】: Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.

【Keywords】: Markov Decision Process; Policy Gradient; Variance Reduction; Actor-Critic; REINFORCE

253. Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests.

Paper Link】 【Pages】:1826-1832

【Authors】: Zachary Alan Daniels ; Dimitris N. Metaxas

【Abstract】: The multi-label classification problem involves finding a model that maps a set of input features to more than one output label. Class imbalance is a serious issue in multi-label classification. We introduce an extension of structured forests, a type of random forest used for structured prediction, called Sparse Oblique Structured Hellinger Forests (SOSHF). We explore using structured forests in the general multi-label setting and propose a new imbalance-aware formulation by altering how the splitting functions are learned in two ways. First, we account for cost-sensitivity when converting the multi-label problem to a single-label problem at each node in the tree. Second, we introduce a new objective function for determining oblique splits based on the Hellinger distance, a splitting criterion that has been shown to be robust to class imbalance. We empirically validate our method on a number of benchmarks against standard and state-of-the-art multi-label classification algorithms with improved results.

【Keywords】: Classification; Multi-Label Classification; Imbalanced Data; Random Forest; Imbalance-Aware Learning; Hellinger Distance Decision Trees; Structured Forest; Oblique Decision Trees

254. Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction.

Paper Link】 【Pages】:1833-1839

【Authors】: Sakyasingha Dasgupta ; Takayuki Osogami

【Abstract】: The dynamic Boltzmann machine (DyBM) has been proposed as a stochastic generative model of multi-dimensional time series, with an exact, learning rule that maximizes the log-likelihood of a given time series. The DyBM, however, is defined only for binary valued data, without any nonlinear hidden units. Here, in our first contribution, we extend the DyBM to deal with real valued data. We present a formulation called Gaussian DyBM, that can be seen as an extension of a vector autoregressive (VAR) model. This uses, in addition to standard (explanatory) variables, components that captures long term dependencies in the time series. In our second contribution, we extend the Gaussian DyBM model with a recurrent neural network (RNN) that controls the bias input to the DyBM units. We derive a stochastic gradient update rule such that, the output weights from the RNN can also be trained online along with other DyBM parameters. Furthermore, this acts as nonlinear hidden layer extending the capacity of DyBM and allows it to model nonlinear components in a given time-series. Numerical experiments with synthetic datasets show that the RNN-Gaussian DyBM improves predictive accuracy upon standard VAR by up to 35%. On real multi-dimensional time-series prediction, consisting of high nonlinearity and non-stationarity, we demonstrate that this nonlinear DyBM model achieves significant improvement upon state of the art baseline methods like VAR and long short-term memory (LSTM) networks at a reduced computational cost.

【Keywords】: Boltzmann machines; Time-series prediction; Artificial Neural Networks; Online learning

255. Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems.

Paper Link】 【Pages】:1840-1846

【Authors】: Carlo D'Eramo ; Alessandro Nuara ; Matteo Pirotta ; Marcello Restelli

【Abstract】: This paper is about the estimation of the maximum expected value of an infinite set of random variables.This estimation problem is relevant in many fields, like the Reinforcement Learning (RL) one.In RL it is well known that, in some stochastic environments, a bias in the estimation error can increase step-by-step the approximation error leading to large overestimates of the true action values. Recently, some approaches have been proposed to reduce such bias in order to get better action-value estimates, but are limited to finite problems.In this paper, we leverage on the recently proposed weighted estimator and on Gaussian process regression to derive a new method that is able to natively handle infinitely many random variables.We show how these techniques can be used to face both continuous state and continuous actions RL problems.To evaluate the effectiveness of the proposed approach we perform empirical comparisons with related approaches.

【Keywords】: continuous reinforcement learning; maximum expected value; reinforcement learning

256. Scalable Multitask Policy Gradient Reinforcement Learning.

Paper Link】 【Pages】:1847-1853

【Authors】: Salam El Bsat ; Haitham Bou-Ammar ; Matthew E. Taylor

【Abstract】: Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.

【Keywords】: Transfer Learning; Multi-Task Learning; Reinforcement Learning; Scalable MTL

257. From Shared Subspaces to Shared Landmarks: A Robust Multi-Source Classification Approach.

Paper Link】 【Pages】:1854-1860

【Authors】: Sarah M. Erfani ; Mahsa Baktashmotlagh ; Masud Moshtaghi ; Vinh Nguyen ; Christopher Leckie ; James Bailey ; Kotagiri Ramamohanarao

【Abstract】: Training machine leaning algorithms on augmented data fromdifferent related sources is a challenging task. This problemarises in several applications, such as the Internet of Things(IoT), where data may be collected from devices with differentsettings. The learned model on such datasets can generalizepoorly due to distribution bias. In this paper we considerthe problem of classifying unseen datasets, given several labeledtraining samples drawn from similar distributions. Weexploit the intrinsic structure of samples in a latent subspaceand identify landmarks, a subset of training instances fromdifferent sources that should be similar. Incorporating subspacelearning and landmark selection enhances generalizationby alleviating the impact of noise and outliers, as well asimproving efficiency by reducing the size of the data. However,since addressing the two issues simultaneously resultsin an intractable problem, we relax the objective functionby leveraging the theory of nonlinear projection and solve atractable convex optimisation. Through comprehensive analysis,we show that our proposed approach outperforms stateof-the-art results on several benchmark datasets, while keepingthe computational complexity low.

【Keywords】: Multi-Source Classification; Domain Generalization;Maximum Mean Discrepancy;

258. A Nearly-Black-Box Online Algorithm for Joint Parameter and State Estimation in Temporal Models.

Paper Link】 【Pages】:1861-1869

【Authors】: Yusuf Bugra Erol ; Yi Wu ; Lei Li ; Stuart J. Russell

【Abstract】: Online joint parameter and state estimation is a core problem for temporal models.Most existing methods are either restricted to a particular class of models (e.g., the Storvik filter) or computationally expensive (e.g., particle MCMC). We propose a novel nearly-black-box algorithm, the Assumed Parameter Filter (APF), a hybrid of particle filtering for state variables and assumed density filtering for parameter variables.It has the following advantages:(a) it is online and computationally efficient;(b) it is applicable to both discrete and continuous parameter spaces with arbitrary transition dynamics.On a variety of toy and real models, APF generates more accurate results within a fixed computation budget compared to several standard algorithms from the literature.

【Keywords】: state space model; joint parameter and state estimate; probabilistic programming; assumed parameter filter

259. Structure Regularized Unsupervised Discriminant Feature Analysis.

Paper Link】 【Pages】:1870-1876

【Authors】: Mingyu Fan ; Xiaojun Chang ; Dacheng Tao

【Abstract】: Feature selection is an important technique in machine learning research. An effective and robust feature selection method is desired to simultaneously identify the informative features and eliminate the noisy ones of data. In this paper, we consider the unsupervised feature selection problem which is particularly difficult as there is not any class labels that would guide the search for relevant features. To solve this, we propose a novel algorithmic framework which performs unsupervised feature selection. Firstly, the proposed framework implements structure learning, where the data structures (including intrinsic distribution structure and the data segment) are found via a combination of the alternative optimization and clustering. Then, both the intrinsic data structure and data segmentation are formulated as regularization terms for discriminant feature selection. The results of the feature selection also affect the structure learning step in the following iterations. By leveraging the interactions between structure learning and feature selection, we are able to capture more accurate structure of data and select more informative features. Clustering and classification experiments on real world image data sets demonstrate the effectiveness of our method.

【Keywords】: Feature Selection; Unsupervised Learning; Image Feature Learning

260. Self-Paced Learning: An Implicit Regularization Perspective.

Paper Link】 【Pages】:1877-1883

【Authors】: Yanbo Fan ; Ran He ; Jian Liang ; Bao-Gang Hu

【Abstract】: Self-paced learning (SPL) mimics the cognitive mechanism of humans and animals that gradually learns from easy to hard samples. One key issue in SPL is to obtain better weighting strategy that is determined by the minimizer function. Existing methods usually pursue this by artificially designing the explicit form of SPL regularizer. In this paper, we study a group of new regularizer (named self-paced implicit regularizer) that is deduced from robust loss function. Based on the convex conjugacy theory, the minimizer function for self-paced implicit regularizer can be directly learned from the latent loss function, while the analytic form of the regularizer can be even unknown. A general framework (named SPL-IR) for SPL is developed accordingly. We demonstrate that the learning procedure of SPL-IR is associated with latent robust loss functions, thus can provide some theoretical insights for its working mechanism. We further analyze the relation between SPL-IR and half-quadratic optimization and provide a group of self-paced implicit regularizer. Finally, we implement SPL-IR to both supervised and unsupervised tasks, and experimental results corroborate our ideas and demonstrate the correctness and effectiveness of implicit regularizers.

【Keywords】: self-paced learning; implicit regularizer; half-quadratic optimization

261. Deep MIML Network.

Paper Link】 【Pages】:1884-1890

【Authors】: Ji Feng ; Zhi-Hua Zhou

【Abstract】: In many real world applications, the concerned objects are with multiple labels, and can be represented as a bag of instances. Multi-instance Multi-label (MIML) learning provides a framework for handling such task and has exhibited excellent performance in various domains. In a MIML setting, the feature representation of instances usually has big impact on the final performance; inspired by the recent deep learning studies, in this paper, we propose the DeepMIML network which exploits deep neural network formation to generate instance representation for MIML. The sub-concept learning component of the DeepMIML structure reserves the instance-label relation discovery ability of MIML algorithms; that is, it can automatically locating the key input patterns that trigger the labels. The effectiveness of DeepMIML network is validated by experiments on various domains of data.

【Keywords】: deep learning;multi-instance multi-label learning

262. Modeling Skewed Class Distributions by Reshaping the Concept Space.

Paper Link】 【Pages】:1891-1897

【Authors】: Kyle D. Feuz ; Diane J. Cook

【Abstract】: We introduce an approach to learning from imbalanced class distributions that does not change the underlying data distribution. The ICC algorithm decomposes majority classes into smaller sub-classes that create a more balanced class distribution. In this paper, we explain how ICC can not only addressthe class imbalance problem but may also increase the expressive power of the hypothesis space. We validate ICC and analyze alternative decomposition methods on well-known machine learning datasets as well as new problems in pervasive computing. Our results indicate that ICC performs as well or better than existing approaches to handling class imbalance.

【Keywords】: Supervised Machine Learning, Class Imbalance, Clustering, Intra-Class Clustering

263. On Learning High Dimensional Structured Single Index Models.

Paper Link】 【Pages】:1898-1904

【Authors】: Ravi Ganti ; Nikhil S. Rao ; Laura Balzano ; Rebecca Willett ; Robert D. Nowak

【Abstract】: Single Index Models (SIMs) are simple yet flexible semi-parametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions, and under general structural assumptions, has not been forthcoming. In this paper, we propose computationally efficient algorithms for SIM inference in high dimensions with structural constraints. Our general approach specializes to sparsity, group sparsity, and low-rank assumptions among others. Experiments show that the proposed method enjoys superior predictive performance when compared to generalized linear models, and achieves results comparable to or better than single layer feedforward neural networks with significantly less computational cost.

【Keywords】: high dimensional statistics; single index models; prediction

264. Local Centroids Structured Non-Negative Matrix Factorization.

Paper Link】 【Pages】:1905-1911

【Authors】: Hongchang Gao ; Feiping Nie ; Heng Huang

【Abstract】: Non-negative Matrix Factorization (NMF) has attracted much attention and been widely used in real-world applications. As a clustering method, it fails to handle the case where data points lie in a complicated geometry structure. Existing methods adopt single global centroid for each cluster, failing to capture the manifold structure. In this paper, we propose a novel local centroids structured NMF to address this drawback. Instead of using single centroid for each cluster, we introduce multiple local centroids for individual cluster such that the manifold structure can be captured by the local centroids. Such a novel NMF method can improve the clustering performance effectively. Furthermore, a novel bipartite graph is incorporated to obtain the clustering indicator directly without any post process. Experiments on both toy datasets and real-world datasets have verified the effectiveness of the proposed method.

【Keywords】: Non-negative Matrix Factorization; Clustering;

265. Low-Rank Factorization of Determinantal Point Processes.

Paper Link】 【Pages】:1912-1918

【Authors】: Mike Gartrell ; Ulrich Paquet ; Noam Koenigstein

【Abstract】: Determinantal point processes (DPPs) have garnered attention as an elegant probabilistic model of set diversity. They are useful for a number of subset selection tasks, including product recommendation. DPPs are parametrized by a positive semi-definite kernel matrix. In this work we present a new method for learning the DPP kernel from observed data using a low-rank factorization of this kernel. We show that this low-rank factorization enables a learning algorithm that is nearly an order of magnitude faster than previous approaches, while also providing for a method for computing product recommendation predictions that is far faster (up to 20x faster or more for large item catalogs) than previous techniques that involve a full-rank DPP kernel. Furthermore, we show that our method provides equivalent or sometimes better test log-likelihood than prior full-rank DPP approaches.

【Keywords】: Stochastic Processes; Determinantal Point Processes; Recommender Systems

266. Robust Loss Functions under Label Noise for Deep Neural Networks.

Paper Link】 【Pages】:1919-1925

【Authors】: Aritra Ghosh ; Himanshu Kumar ; P. S. Sastry

【Abstract】: In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate approach would be to look for loss functions that are inherently noise-tolerant. For binary classification there exist theoretical results on loss functions that are robust to label noise. In this paper, we provide some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems. These results generalize the existing results on noise-tolerant loss functions for binary classification. We study some of the widely used loss functions in deep networks and show that the loss function based on mean absolute value of error is inherently robust to label noise. Thus standard back propagation is enough to learn the true classifier even under label noise. Through experiments, we illustrate the robustness of risk minimization with such loss functions for learning neural networks.

【Keywords】: Label Noise; Loss Function; Deep Neural Networks; Robust Risk Minimization

267. Exploring Commonality and Individuality for Multi-Modal Curriculum Learning.

Paper Link】 【Pages】:1926-1933

【Authors】: Chen Gong

【Abstract】: Curriculum Learning (CL) mimics the cognitive process ofhumans and favors a learning algorithm to follow the logical learning sequence from simple examples to more difficult ones. Recent studies show that selecting the simplest curriculum examples from different modalities for graph-based label propagation can yield better performance than simply leveraging single modality. However, they forcibly requirethe curriculums generated by all modalities to be identical to a common curriculum, which discard the individuality ofevery modality and produce the inaccurate curriculum for the subsequent learning. Therefore, this paper proposes a novel multi-modal CL algorithm by comprehensively investigating both the individuality and commonality of different modalities. By considering the curriculums of multiple modalities altogether, their common preference on selecting the simplestexamples can be explored by a row-sparse matrix, and their distinct opinions are captured by a sparse noise matrix. As a consequence, a "soft" fusion of multiple curriculums from different modalities is achieved and the propagation quality can thus be improved. Comprehensive empirical studies reveal that our method can generate higher accuracy than the state-of-the-art multi-modal CL approach and label propagation algorithms on various image classification tasks.

【Keywords】:

268. MPGL: An Efficient Matching Pursuit Method for Generalized LASSO.

Paper Link】 【Pages】:1934-1940

【Authors】: Dong Gong ; Mingkui Tan ; Yanning Zhang ; Anton van den Hengel ; Qinfeng Shi

【Abstract】: Unlike traditional LASSO enforcing sparsity on the variables, Generalized LASSO (GL) enforces sparsity on a linear transformation of the variables, gaining flexibility and success in many applications. However, many existing GL algorithms do not scale up to high-dimensional problems, and/or only work well for a specific choice of the transformation. We propose an efficient Matching Pursuit Generalized LASSO (MPGL) method, which overcomes these issues, and is guaranteed to converge to a global optimum. We formulate the GL problem as a convex quadratic constrained linear programming (QCLP) problem and tailor-make a cutting plane method. More specifically, our MPGL iteratively activates a subset of nonzero elements of the transformed variables, and solves a subproblem involving only the activated elements thus gaining significant speed-up. Moreover, MPGL is less sensitive to the choice of the trade-off hyper-parameter between data fitting and regularization, and mitigates the long-standing hyper-parameter tuning issue in many existing methods. Experiments demonstrate the superior efficiency and accuracy of the proposed method over the state-of-the-arts in both classification and image processing tasks.

【Keywords】: generalized lasso; fused lasso; matching pursuit; convex programming; sparsity

269. Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected.

Paper Link】 【Pages】:1941-1947

【Authors】: Aditya Gopalan ; Prashanth L. A. ; Michael Fu ; Steve Marcus

【Abstract】: Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions: the classic K -armed bandit and the linearly parameterized bandit. In both settings, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate cost distortions, and exhibit sublinear regret assuming Holder continuous weight distortion functions. For the K -armed setting, we show that the algorithm, called W-UCB, achieves problem-dependent regret O ( L 2 M 2 log n / Δ(2/α – 1), where n is the number of plays, Δ is the gap in distorted expected value between the best and next best arm, L and alpha are the Holder constants for the distortion function, and M is an upper bound on costs, and a problem-independent regret bound of O (( KL 2 M 2 ) (α/2) n (2 – α)/2) ). We also present a matching lower bound on the regret, showing that the regret of W-UCB is essentially unimprovable over the class of Holder-continuous weight distortions. For the linearly parameterized setting, we develop a new algorithm, a variant of the Optimism in the Face of Uncertainty Linear bandit (OFUL) algorithm called WOFUL (Weight-distorted OFUL), and show that it has regret O ( d √ n polylog( n) ) with high probability, for sub-Gaussian cost distributions.

【Keywords】: Multi-armed bandits; Linear Bandits; Cumulative Prospect Theory

270. Efficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm.

Paper Link】 【Pages】:1948-1954

【Authors】: Xiawei Guo ; Quanming Yao ; James Tin-Yau Kwok

【Abstract】: Most tensor problems are NP-hard, and low-rank tensor completion is much more difficult than low-rank matrix completion. In this paper, we propose a time and space-efficient low-rank tensor completion algorithm by using the scaled latent nuclear norm for regularization and the Frank-Wolfe (FW) algorithm for optimization. We show that all the steps can be performed efficiently. In particular,FW's linear subproblem has a closed-form solution which can be obtained from rank-one SVD. By utilizing sparsity of the observed tensor,we only need to maintain sparse tensors and a set of small basis matrices. Experimental results show that the proposed algorithm is more accurate, much faster and more scalable than the state-of-the-art.

【Keywords】: Optimization; Tensor completion; Frank-Wolfe algorithm; Link prediction

271. Convex Co-Embedding for Matrix Completion with Predictive Side Information.

Paper Link】 【Pages】:1955-1961

【Authors】: Yuhong Guo

【Abstract】: Matrix completion as a common problem in many application domains has received increasing attention in the machine learning community. Previous matrix completion methods have mostly focused on exploiting the matrix low-rank property to recover missing entries. Recently, it has been noticed that side information that describes the matrix items can help to improve the matrix completion performance. In this paper, we propose a novel matrix completion approach that exploits side information within a principled co-embedding framework. This framework integrates a low-rank matrix factorization model and a label embedding based prediction model together to derive a convex co-embedding formulation with nuclear norm regularization. We develop a fast proximal gradient descent algorithm to solve this co-embedding problem. The effectiveness of the proposed approach is demonstrated on two types of real world application problems.

【Keywords】: Machine Learning; Matrix Completion

272. Continuous Conditional Dependency Network for Structured Regression.

Paper Link】 【Pages】:1962-1968

【Authors】: Chao Han ; Mohamed F. Ghalwash ; Zoran Obradovic

【Abstract】: Structured regression on graphs aims to predict response variables from multiple nodes by discovering and exploiting the dependency structure among response variables. This problem is challenging since dependencies among response variables are always unknown, and the associated prior knowledge is non-symmetric. In previous studies, various promising solutions were proposed to improve structured regression by utilizing symmetric prior knowledge, learning sparse dependency structure among response variables, or learning representations of attributes of multiple nodes. However, none of them are capable of efficiently learning dependency structure while incorporating non-symmetric prior knowledge. To achieve these objectives, we proposed Continuous Conditional Dependency Network (CCDN) for structured regression. The intuitive idea behind this model is that each response variable is not only dependent on attributes from the same node, but also on response variables from all other nodes. This results in a joint modeling of local conditional probabilities. The parameter learning is formulated as a convex optimization problem and an effective sampling algorithm is proposed for inference. CCDN is flexible in absorbing non-symmetric prior knowledge. The performance of CCDN on multiple datasets provides evidence of its structure recovery ability and superior effectiveness and efficiency as compared to the state-of-the-art alternatives.

【Keywords】: Structured Regression; Structure Learning

273. Bilateral k-Means Algorithm for Fast Co-Clustering.

Paper Link】 【Pages】:1969-1975

【Authors】: Junwei Han ; Kun Song ; Feiping Nie ; Xuelong Li

【Abstract】: With the development of the information technology, the amount of data, e.g. text, image and video, has been increased rapidly. Efficiently clustering those large scale data sets is a challenge. To address this problem, this paper proposes a novel co-clustering method named bilateral k-means algorithm (BKM) for fast co-clustering. Different from traditional k-means algorithms, the proposed method has two indicator matrices P and Q and a diagonal matrix S to be solved, which represent the cluster memberships of samples and features, and the co-cluster centres, respectively. Therefore, it could implement different clustering tasks on the samples and features simultaneously. We also introduce an effective approach to solve the proposed method, which involves less multiplication. The computational complexity is analyzed. Extensive experiments on various types of data sets are conducted. Compared with the state-of-the-art clustering methods, the proposed BKM not only has faster computational speed, but also achieves promising clustering results.

【Keywords】: co-clustering; fast co-clustering; diagonal co-cluster

274. Alternating Back-Propagation for Generator Network.

Paper Link】 【Pages】:1976-1984

【Authors】: Tian Han ; Yang Lu ; Song-Chun Zhu ; Ying Nian Wu

【Abstract】: This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a non-linear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.

【Keywords】: Generator Network, Alternating Back-propagation, Unsupervised Learning

275. Enumerate Lasso Solutions for Feature Selection.

Paper Link】 【Pages】:1985-1991

【Authors】: Satoshi Hara ; Takanori Maehara

【Abstract】: We propose an algorithm for enumerating solutions to the Lasso regression problem.In ordinary Lasso regression, one global optimum is obtained and the resulting features are interpreted as task-relevant features.However, this can overlook possibly relevant features not selected by the Lasso.With the proposed method, we can enumerate many possible feature sets for human inspection, thus recording all the important features.We prove that by enumerating solutions, we can recover a true feature set exactly under less restrictive conditions compared with the ordinary Lasso.We confirm our theoretical results also in numerical simulations.Finally, in the gene expression and the text data, we demonstrate that the proposed method can enumerate a wide variety of meaningful feature sets, which are overlooked by the global optima.

【Keywords】:

276. Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling.

Paper Link】 【Pages】:1992-1999

【Authors】: Daisuke Hatano ; Takuro Fukunaga ; Takanori Maehara ; Ken-ichi Kawarabayashi

【Abstract】: We propose a scalable and efficient algorithm for coclustering a higher-order tensor. Viewing tensors with hypergraphs, we propose formulating the co-clustering of a tensor as a problem of partitioning the corresponding hypergraph. Our algorithm is based on the random sampling technique, which has been successfully applied to graph cut problems. We extend a random sampling algorithm for the graph multiwaycut problem to hypergraphs, and design a co-clustering algorithm based on it. Each iteration of our algorithm runs in polynomial on the size of hypergraphs, and thus it performs well even for higher-order tensors, which are difficult to deal with for state-of-the-art algorithm.

【Keywords】: Co-clustering; Graph partitionning; Karger and Stein's algorithm

277. Learning Invariant Deep Representation for NIR-VIS Face Recognition.

Paper Link】 【Pages】:2000-2006

【Authors】: Ran He ; Xiang Wu ; Zhenan Sun ; Tieniu Tan

【Abstract】: Visual versus near infrared (VIS-NIR) face recognition is still a challenging heterogeneous task due to large appearance difference between VIS and NIR modalities. This paper presents a deep convolutional network approach that uses only one network to map both NIR and VIS images to a compact Euclidean space. The low-level layers of this network are trained only on large-scale VIS data. Each convolutional layer is implemented by the simplest case of maxout operator. The high-level layer is divided into two orthogonal subspaces that contain modality-invariant identity information and modality-variant spectrum information respectively. Our joint formulation leads to an alternating minimization approach for deep representation at the training time and an efficient computation for heterogeneous data at the testing time. Experimental evaluations show that our method achieves 94% verification rate at FAR=0.1% on the challenging CASIA NIR-VIS 2.0 face recognition dataset. Compared with state-of-the-art methods, it reduces the error rate by 58% only with a compact 64-D representation.

【Keywords】: deep learning; face recognition; heterogeneous; near infrared; CNN

278. A Generalized Stochastic Variational Bayesian Hyperparameter Learning Framework for Sparse Spectrum Gaussian Process Regression.

Paper Link】 【Pages】:2007-2014

【Authors】: Quang Minh Hoang ; Trong Nghia Hoang ; Kian Hsiang Low

【Abstract】: While much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of low-rank GP approximations that exploit the sparse spectral representation of a GP kernel. This paper presents such an effort to advance the state of the art of sparse spectrum GP models to achieve competitive predictive performance for massive datasets. Our generalized framework of stochastic variational Bayesian sparse spectrum GP (sVBSSGP) models addresses their shortcomings by adopting a Bayesian treatment of the spectral frequencies to avoid overfitting, modeling these frequencies jointly in its variational distribution to enable their interaction a posteriori, and exploiting local data for boosting the predictive performance. However, such structural improvements result in a variational lower bound that is intractable to be optimized. To resolve this, we exploit a variational parameterization trick to make it amenable to stochastic optimization. Interestingly, the resulting stochastic gradient has a linearly decomposable structure that can be exploited to refine our stochastic optimization method to incur constant time per iteration while preserving its property of being an unbiased estimator of the exact gradient of the variational lower bound. Empirical evaluation on real-world datasets shows that sVBSSGP outperforms state-of-the-art stochastic implementations of sparse GP models.

【Keywords】: Gaussian process; stochastic variational inference; scalability

279. Semi-Supervised Adaptive Label Distribution Learning for Facial Age Estimation.

Paper Link】 【Pages】:2015-2021

【Authors】: Peng Hou ; Xin Geng ; Zeng-Wei Huo ; Jiaqi Lv

【Abstract】: Lack of sufficient training data with exact ages is still a challenge for facial age estimation. To deal with such problem, a method called Label Distribution Learning (LDL) was proposed to utilize the neighboring ages while learning a particular age. Later, an adaptive version of LDL called ALDL was proposed to generate a proper label distribution for each age. However, the adaptation process requires more training data, which creates a dilemma between the performance of ALDL and the training data. In this paper, we propose an algorithm called Semi-supervised Adaptive Label Distribution Learning (SALDL) to solve the dilemma and improve the performance using unlabeled data for facial age estimation. On the one hand, the utilization of unlabeled data helps to improve the adaptation process. On the other hand, the adapted label distributions conversely reinforce the semi-supervised process. As a result, they can promote each other to get better performance. Experimental results show that SALDL performs remarkably better than state-of-the-art algorithms when there are only limited accurately labeled data available.

【Keywords】: age estimation; label distribution learning; semi-supervised

280. Sampling Beats Fixed Estimate Predictors for Cloning Stochastic Behavior in Multiagent Systems.

Paper Link】 【Pages】:2022-2028

【Authors】: Brian Hrolenok ; Byron Boots ; Tucker Hybinette Balch

【Abstract】: Modeling stochastic multiagent behavior such as fish schooling is challenging for fixed-estimate prediction techniques because they fail to reliably reproduce the stochastic aspects of the agents’ behavior. We show how standard fixed-estimate predictors fit within a probabilistic framework, and suggest the reason they work for certain classes of behaviors and not others. We quantify the degree of mismatch and offer alternative sampling-based modeling techniques. We are specifically interested in building executable models (as opposed to statistical or descriptive models) because we want to reproduce and study multiagent behavior in simulation. Such models can be used by biologists, sociologists, and economists to explain and predict individual and group behavior in novel scenarios, and to test hypotheses regarding group behavior. Developing models from observation of real systems is an obvious application of machine learning. Learning directly from data eliminates expensive hand processing and tuning, but introduces unique challenges that violate certain assumptions common in standard machine learning approaches. Our framework suggests a new class of sampling-based methods, which we implement and apply to simulated deterministic and stochastic schooling behaviors, as well as the observed schooling behavior of real fish. Experimental results show that our implementation performs comparably with standard learning techniques for deterministic behaviors, and better on stochastic behaviors.

【Keywords】: Multiagent Systems, Learning Time Series Models, Modeling Multiagent Behavior

281. Sequential Classification-Based Optimization for Direct Policy Search.

Paper Link】 【Pages】:2029-2035

【Authors】: Yi-Qi Hu ; Hong Qian ; Yang Yu

【Abstract】: Classification-based optimization is a recently developed framework for derivative-free optimization, which has shown to be effective for non-convex optimization problems with many local optima. This framework requires to sample a batch of solutions for every update of the search model. However, in reinforcement learning, direct policy search often offers only sequential policy evaluation. Thus, classificationbased optimization is not efficient for direct policy search where solutions have to be sampled sequentially. In this paper, we adapt the classification-based optimization for sequential sampled solutions by forming the batch of reused historical solutions. Experiments on helicopter hovering control task and reinforcement learning benchmark tasks in OpenAI Gym show that the new algorithm is superior to state-of-the-art derivative-free optimization approaches.

【Keywords】: derivative-free optimization; direct policy search; sequential optimization

282. A Riemannian Network for SPD Matrix Learning.

Paper Link】 【Pages】:2036-2042

【Authors】: Zhiwu Huang ; Luc J. Van Gool

【Abstract】: Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting Riemannian geometry of underlying SPD manifolds. In this paper we build a Riemannian network architecture to open up a new direction of SPD matrix non-linear learning in a deep model. In particular, we devise bilinear mapping layers to transform input SPD matrices to more desirable SPD matrices, exploit eigenvalue rectification layers to apply a non-linear activation function to the new SPD matrices, and design an eigenvalue logarithm layer to perform Riemannian computing on the resulting SPD matrices for regular output layers. For training the proposed deep network, we exploit a new backpropagation with a variant of stochastic gradient descent on Stiefel manifolds to update the structured connection weights and the involved SPD matrix data. We show through experiments that the proposed SPD matrix network can be simply trained and outperform existing SPD matrix learning and state-of-the-art methods in three typical visual classification tasks.

【Keywords】: Riemannian network; SPD matrix learning

283. Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization.

Paper Link】 【Pages】:2043-2049

【Authors】: Zhouyuan Huo ; Heng Huang

【Abstract】: We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization. Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization. Recently, variance reduction technique is proposed and it is proved to be able to accelerate the convergence of SGD greatly. It is shown that asynchronous SGD method with variance reduction technique has linear convergence rate when problem is strongly convex. However, there is still no work to analyze the convergence rate of this method for non-convex problem. In this paper, we consider two asynchronous parallel implementations of mini-batch gradient descent method with variance reduction: one is on distributed-memory architecture and the other is on shared-memory architecture. We prove that both methods can converge with a rate of O(1/T) for non-convex optimization, and linear speedup is accessible when we increase the number of workers. We evaluate our methods by optimizing multi-layer neural networks on two real datasets (MNIST and CIFAR-10), and experimental results demonstrate our theoretical analysis.

【Keywords】: Non-Convex Optimization; Asynchronous Mini-batch Gradient Descent; Variance Reduction

284. Learning Unitary Operators with Help From u(n).

Paper Link】 【Pages】:2050-2058

【Authors】: Stephanie L. Hyland ; Gunnar Rätsch

【Abstract】: A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra u( n ) associated with the Lie group U ( n ) of n × n unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using n 2 real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks.

【Keywords】: recurrent neural network; lie algebra; lie group; deep learning

285. Denoising Criterion for Variational Auto-Encoding Framework.

Paper Link】 【Pages】:2059-2065

【Authors】: Daniel Jiwoong Im ; Sungjin Ahn ; Roland Memisevic ; Yoshua Bengio

【Abstract】: Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer that encourages this noise injection. In this paper, we show that injecting noise both in input and in the stochastic hidden layer can be advantageous and we propose a modified variational lower bound as an improved objective function in this setup. When input is corrupted, then the standard VAE lower bound involves marginalizing the encoder conditional distribution over the input noise, which makes the training criterion intractable. Instead, we propose a modified training criterion which corresponds to a tractable bound when input is corrupted. Experimentally, we find that the proposed denoising variational autoencoder (DVAE) yields better average log-likelihood than the VAE and the importance weighted autoencoder on the MNIST and Frey Face datasets.

【Keywords】: Variational Auto-encoder, Deep generative models

286. Recovering True Classifier Performance in Positive-Unlabeled Learning.

Paper Link】 【Pages】:2066-2072

【Authors】: Shantanu Jain ; Martha White ; Predrag Radivojac

【Abstract】: A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic curve, or the precision recall curve obtained on such data can be corrected with the knowledge of class priors; i.e., the proportions of the positive and negative examples in the unlabeled data. We extend the results to a noisy setting where some of the examples labeled positive are in fact negative and show that the correction also requires the knowledge of the proportion of noisy examples in the labeled positives. Using state-of-the-art algorithms to estimate the positive class prior and the proportion of noise, we experimentally evaluate two correction approaches and demonstrate their efficacy on real-life data.

【Keywords】: ROC curve; AUC; Precision Recall curve; asymmetric noise; positive unlabeled learning; class prior estimation

287. Generalized Ambiguity Decompositions for Classification with Applications in Active Learning and Unsupervised Ensemble Pruning.

Paper Link】 【Pages】:2073-2079

【Authors】: Zhengshen Jiang ; Hongzhi Liu ; Bin Fu ; Zhonghai Wu

【Abstract】: Error decomposition analysis is a key problem for ensemble learning. Two commonly used error decomposition schemes, the classic Ambiguity Decomposition and Bias-Variance-Covariance decomposition, are only suitable for regression tasks with square loss. We generalized the classic Ambiguity Decomposition from regression problems with square loss to classification problems with any loss functions that are twice differentiable, including the logistic loss in Logistic Regression, the exponential loss in Boosting methods, and the 0-1 loss in many other classification tasks. We further proved several important properties of the Ambiguity term, armed with which the Ambiguity terms of logistic loss, exponential loss and 0-1 loss can be explicitly computed and optimized. We further discussed the relationship between margin theory, "good'' and "bad'' diversity theory and our theoretical results, and provided some new insights for ensemble learning. We demonstrated the applications of our theoretical results in active learning and unsupervised ensemble pruning, and the experimental results confirmed the effectiveness of our methods.

【Keywords】: Ensemble Learning; Ambiguity Decomposition; Classification; Active Learning; Ensemble Pruning

288. Twin Learning for Similarity and Clustering: A Unified Kernel Approach.

Paper Link】 【Pages】:2080-2086

【Authors】: Zhao Kang ; Chong Peng ; Qiang Cheng

【Abstract】: Many similarity-based clustering methods work in two separate steps including similarity matrix computation and subsequent spectral clustering. However similarity measurement is challenging because it is usually impacted by many factors, e.g., the choice of similarity metric, neighborhood size, scale of data, noise and outliers. Thus the learned similarity matrix is often not suitable, let alone optimal, for the subsequent clustering. In addition, nonlinear similarity often exists in many real world data which, however, has not been effectively considered by most existing methods. To tackle these two challenges, we propose a model to simultaneously learn cluster indicator matrix and similarity information in kernel spaces in a principled way. We show theoretical relationships to kernel k-means, k-means, and spectral clustering methods. Then, to address the practical issue of how to select the most suitable kernel for a particular clustering task, we further extend our model with a multiple kernel learning ability. With this joint model, we can automatically accomplish three subtasks of finding the best cluster indicator matrix, the most accurate similarity relations and the optimal combination of multiple kernels. By leveraging the interactions between these three subtasks in a joint framework, each subtask can be iteratively boosted by using the results of the others towards an overall optimal solution. Extensive experiments are performed to demonstrate the effectiveness of our method.

【Keywords】: Clustering; Similarity Learning; Kernel Method

289. Tunable Sensitivity to Large Errors in Neural Network Training.

Paper Link】 【Pages】:2087-2093

【Authors】: Gil Keren ; Sivan Sabato ; Björn W. Schuller

【Abstract】: When humans learn a new concept, they might ignore examples that they cannot make sense of at first, and only later focus on such examples, when they are more useful for learning. We propose incorporating this idea of tunable sensitivity for hard examples in neural network learning, using a new generalization of the cross-entropy gradient step, which can be used in place of the gradient in any gradient-based training method. The generalized gradient is parameterized by a value that controls the sensitivity of the training process to harder training examples. We tested our method on several benchmark datasets. We propose, and corroborate in our experiments, that the optimal level of sensitivity to hard example is positively correlated with the depth of the network. Moreover, the test prediction error obtained by our method is generally lower than that of the vanilla cross-entropy gradient learner. We therefore conclude that tunable sensitivity can be helpful for neural network learning.

【Keywords】: Machine Learning; Deep Learning

290. Binary Embedding with Additive Homogeneous Kernels.

Paper Link】 【Pages】:2094-2100

【Authors】: Saehoon Kim ; Seungjin Choi

【Abstract】: Binary embedding transforms vectors in Euclidean space into the vertices of Hamming space such that Hamming distance between binary codes reflects a particular distance metric. In machine learning, the similarity metrics induced by Mercer kernels are frequently used, leading to the development of binary embedding with Mercer kernels (BE-MK) where the approximate nearest neighbor search is performed in a reproducing kernel Hilbert space (RKHS). Kernelized locality-sensitive hashing (KLSH), which is one of the representative BE-MK, uses kernel PCA to embed data points into a Euclidean space, followed by the random hyperplane binary embedding. In general, it works well when the query and data points in the database follow the same probability distribution. The streaming data environment, however, continuously requires KLSH to update the leading eigenvectors of the Gram matrix, which can be costly or hard to carry out in practice. In this paper we present a completely randomized binary embedding to work with a family of additive homogeneous kernels, referred to as BE-AHK. The proposed algorithm is easy to implement, built on Vedaldi and Zisserman's work on explicit feature maps for additive homogeneous kernels. We show that our BE-AHK is able to preserve kernel values by developing an upper- and lower-bound on its Hamming distance, which guarantees to solve approximate nearest neighbor search efficiently. Numerical experiments demonstrate that BE-AHK actually yields similarity-preserving binary codes in terms of additive homogeneous kernels and is superior to existing methods in case that training data and queries are generated from different distributions. Moreover, in cases where a large code size is allowed, the performance of BE-AHK is comparable to that of KLSH in general cases.

【Keywords】: Binary embedding; Locality-sensitive hashing; Randomized algorithm

291. Structured Inference Networks for Nonlinear State Space Models.

Paper Link】 【Pages】:2101-2109

【Authors】: Rahul G. Krishnan ; Uri Shalit ; David Sontag

【Abstract】: Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.

【Keywords】: Unsupervised Learning; Deep Learning; Time Series; Hidden Markov Models

292. Estimating Uncertainty Online Against an Adversary.

Paper Link】 【Pages】:2110-2116

【Authors】: Volodymyr Kuleshov ; Stefano Ermon

【Abstract】: Assessing uncertainty is an important step towards ensuring the safety and reliability of machine learning systems. Existing uncertainty estimation techniques may fail when their modeling assumptions are not met, e.g. when the data distribution differs from the one seen at training time. Here, we propose techniques that assess a classification algorithm’s uncertainty via calibrated probabilities (i.e. probabilities that match empirical outcome frequencies in the long run) and which are guaranteed to be reliable (i.e. accurate and calibrated) on out-of-distribution input, including input generated by an adversary. This represents an extension of classical online learning that handles uncertainty in addition to guaranteeing accuracy under adversarial assumptions. We establish formal guarantees for our methods, and we validate them on two real-world problems: question answering and medical diagnosis from genomic data.

【Keywords】: online learning; calibration; uncertainty estimation

293. Learning Non-Linear Dynamics of Decision Boundaries for Maintaining Classification Performance.

Paper Link】 【Pages】:2117-2123

【Authors】: Atsutoshi Kumagai ; Tomoharu Iwata

【Abstract】: We propose a method that involves a probabilistic model for learning future classifiers for tasks in which decision boundaries nonlinearly change over time. In certain applications, such as spam-mail classification, the decision boundary dynamically changes over time. Accordingly, the performance of the classifiers will deteriorate quickly unless the classifiers are updated using additional data. However, collecting such data can be expensive or impossible. The proposed model alleviates this deterioration in performance without additional data by modeling the non-linear dynamics of the decision boundary using Gaussian processes. The method also involves our developed learning algorithm for our model based on empirical variational Bayesian inference by which uncertainty of dynamics can be incorporated for future classification. The effectiveness of the proposed method was demonstrated through experiments using synthetic and real-world data sets.

【Keywords】: Transfer, Adaptation, Multitask Learning;Classification;Time-Series/Data Streams

294. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration.

Paper Link】 【Pages】:2124-2132

【Authors】: Himabindu Lakkaraju ; Ece Kamar ; Rich Caruana ; Eric Horvitz

【Abstract】: Predictive models deployed in the real world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases encountered at test time. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formulate and address the problem of informed discovery of unknown unknowns of any given predictive model where unknown unknowns occur due to systematic biases in the training data.We propose a model-agnostic methodology which uses feedback from an oracle to both identify unknown unknowns and to intelligently guide the discovery. We employ a two-phase approach which first organizes the data into multiple partitions based on the feature similarity of instances and the confidence scores assigned by the predictive model, and then utilizes an explore-exploit strategy for discovering unknown unknowns across these partitions. We demonstrate the efficacy of our framework by varying the underlying causes of unknown unknowns across various applications. To the best of our knowledge, this paper presents the first algorithmic approach to the problem of discovering unknown unknowns of predictive models.

【Keywords】: debugging predictive models; bandit algorithms; unknown unknowns

295. Dynamic Action Repetition for Deep Reinforcement Learning.

Paper Link】 【Pages】:2133-2139

【Authors】: Aravind S. Lakshminarayanan ; Sahil Sharma ; Balaraman Ravindran

【Abstract】: One of the long standing goals of Artificial Intelligence (AI) is to build cognitive agents which can perform complex tasks from raw sensory inputs without explicit supervision. Recent progress in combining Reinforcement Learning objective functions and Deep Learning architectures has achieved promising results for such tasks. An important aspect of such sequential decision making problems, which has largely been neglected, is for the agent to decide on the duration of time for which to commit to actions. Such action repetition is important for computational efficiency, which is necessary for the agent to respond in real-time to events (in applications such as self-driving cars). Action Repetition arises naturally in real life as well as simulated environments. The time scale of executing an action enables an agent (both humans and AI) to decide the granularity of control during task execution. Current state of the art Deep Reinforcement Learning models, whether they are off-policy or on-policy, consist of a framework with a static action repetition paradigm, wherein the action decided by the agent is repeated for a fixed number of time steps regardless of the contextual state while executing the task. In this paper, we propose a new framework - Dynamic Action Repetition which changes Action Repetition Rate (the time scale of repeating an action) from a hyper-parameter of an algorithm to a dynamically learnable quantity. At every decision-making step, our models allow the agent to commit to an action and the time scale of executing the action. We show empirically that such a dynamic time scale mechanism improves the performance on relatively harder games in the Atari 2600 domain, independent of the underlying Deep Reinforcement Learning algorithm used.

【Keywords】: Deep Reinforcement Learning; Dynamic Action Repetition; Game Intelligence

296. Playing FPS Games with Deep Reinforcement Learning.

Paper Link】 【Pages】:2140-2146

【Authors】: Guillaume Lample ; Devendra Singh Chaplot

【Abstract】: Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as average humans in deathmatch scenarios.

【Keywords】: Deep Learning; Reinforcement Learning; Machine Learning; Artificial Intelligence; Doom AI

297. Transfer Reinforcement Learning with Shared Dynamics.

Paper Link】 【Pages】:2147-2153

【Authors】: Romain Laroche ; Merwan Barlier

【Abstract】: This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion and for each sample, the reward entry is changed by its reward estimate. The second idea consists in adopting the optimism in the face of uncertainty principle and to use upper bound reward estimates. Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.

【Keywords】: Transfer Learning; Reinforcement Learning; Multi-Task Reinforcement Learning

298. Transfer Learning for Deep Learning on Graph-Structured Data.

Paper Link】 【Pages】:2154-2160

【Authors】: Jaekoo Lee ; Hyunjae Kim ; Jongsun Lee ; Sungroh Yoon

【Abstract】: Graphs provide a powerful means for representing complex interactions between entities. Recently, new deep learning approaches have emerged for representing and modeling graph-structured data while the conventional deep learning methods, such as convolutional neural networks and recurrent neural networks, have mainly focused on the grid-structured inputs of image and audio. Leveraged by representation learning capabilities, deep learning-based techniques can detect structural characteristics of graphs, giving promising results for graph applications. In this paper, we attempt to advance deep learning for graph-structured data by incorporating another component: transfer learning. By transferring the intrinsic geometric information learned in the source domain, our approach can construct a model for a new but related task in the target domain without collecting new data and without training a new model from scratch. We thoroughly tested our approach with large-scale real-world text data and confirmed the effectiveness of the proposed transfer learning framework for deep learning on graphs. According to our experiments, transfer learning is most effective when the source and target domains bear a high level of structural similarity in their graph representations.

【Keywords】: Deep learning; graph analysis; transfer learning

299. Efficient Online Model Adaptation by Incremental Simplex Tableau.

Paper Link】 【Pages】:2161-2167

【Authors】: Zhixian Lei ; Xuehan Ye ; Yongcai Wang ; Deying Li ; Jia Xu

【Abstract】: Online multi-kernel learning is promising in the era of mobile computing, in which a combined classifier with multiple kernels are offline trained, and online adapts to personalized features for serving the end user precisely and smartly. The online adaptation is mainly carried out at the end-devices, which requires the adaptation algorithms to be light, efficient and accurate. Previous results focused mainly on efficiency. This paper proposes an novel online model adaptation framework for not only efficiency but also optimal online adaptation. At first, an online optimal incremental simplex tableau (IST)algorithm is proposed, which approaches the model adaption by linear programming and produces the optimized model update in each step when a personalized training data is collected.But keeping online optimal in each step is expensive and may cause over-fitting especially when the online data is noisy. A Fast-IST approach is therefore proposed, which measures the deviation between the training data and the current model. It schedules updating only when enough deviation is detected. The efficiency of each update is further enhanced by running IST only limited iterations, which bounds the computation complexity. Theoretical analysis and extensive evaluations show that Fast-IST saves computation cost greatly, while achieving speedy and accurate model adaptation.It provides better model adaptation speed and accuracy while using even lower computing cost than the state-of-the art.

【Keywords】: online learning; model adaptation; multi kernel learning; wearable computing; simplex

300. Multivariate Hawkes Processes for Large-Scale Inference.

Paper Link】 【Pages】:2168-2174

【Authors】: Rémi Lemonnier ; Kevin Scaman ; Argyris Kalogeratos

【Abstract】: In this paper, we present a framework for fitting multivariate Hawkes processes for large-scale problems, both in the number of events in the observed history n and the number of event types d (i.e. dimensions). The proposed Scalable Low-Rank Hawkes Process (SLRHP) framework introduces a low-rank approximation of the kernel matrix that allows to perform the nonparametric learning of the d 2 triggering kernels in at most O ( ndr 2 ) operations, where r is the rank of the approximation ( r ≪ d, n ). This comes as a major improvement to the existing state-of-the-art inference algorithms that require O ( nd 2 ) operations. Furthermore, the low-rank approximation allows SLRHP to learn representative patterns of interaction between event types, which is usually valuable for the analysis of complex processes in real-world networks.

【Keywords】:

301. Self-Paced Multi-Task Learning.

Paper Link】 【Pages】:2175-2181

【Authors】: Changsheng Li ; Junchi Yan ; Fan Wei ; Weishan Dong ; Qingshan Liu ; Hongyuan Zha

【Abstract】: Multi-task learning is a paradigm, where multiple tasks are jointly learnt. Previous multi-task learning models usually treat all tasks and instances per task equally during learning. Inspired by the fact that humans often learn from easy concepts to hard ones in the cognitive process, in this paper, we propose a novel multi-task learning framework that attempts to learn the tasks by simultaneously taking into consideration the complexities of both tasks and instances per task. We propose a novel formulation by presenting a new task-oriented regularizer that can jointly prioritize tasks and instances.Thus it can be interpreted as a self-paced learner for multi-task learning. An efficient block coordinate descent algorithm is developed to solve the proposed objective function, and the convergence of the algorithm can be guaranteed. Experimental results on the toy and real-world datasets demonstrate the effectiveness of the proposed approach, compared to the state-of-the-arts.

【Keywords】: multi-task learning

302. Infinitely Many-Armed Bandits with Budget Constraints.

Paper Link】 【Pages】:2182-2188

【Authors】: Haifang Li ; Yingce Xia

【Abstract】: We study the infinitely many-armed bandit problem with budget constraints, where the number of arms can be infinite and much larger than the number of possible experiments. The player aims at maximizing his/her total expected reward under a budget constraint B for the cost of pulling arms. We introduce a weak stochastic assumption on the ratio of expected-reward to expected-cost of a newly pulled arm which characterizes its probability of being a near-optimal arm. We propose an algorithm named RCB-I to this new problem, in which the player first randomly picks K arms, whose order is sub-linear in terms of B, and then runs the algorithm for the finite-arm setting on the selected arms. Theoretical analysis shows that this simple algorithm enjoys a sub-linear regret in term of the budget B . We also provide a lower bound of any algorithm under Bernoulli setting. The regret bound of RCB-I matches the lower bound up to a logarithmic factor. We further extend this algorithm to the any-budget setting (i.e., the budget is unknown in advance) and conduct corresponding theoretical analysis.

【Keywords】: Budgeted Multi-Armed Bandits; Infinitely Many Arms; Regret Analysis

303. Sparse Subspace Clustering by Learning Approximation ℓ0 Codes.

Paper Link】 【Pages】:2189-2195

【Authors】: Jun Li ; Yu Kong ; Yun Fu

【Abstract】: Subspace clustering has been widely applied to detect meaningful clusters in high-dimensional data spaces. A main challenge in subspace clustering is to quickly calculate a "good" affinity matrix. ℓ 0 , ℓ 1 , ℓ 2 or nuclear norm regularization is used to construct the affinity matrix in many subspace clustering methods because of their theoretical guarantees and empirical success. However, they suffer from the following problems: (1) ℓ 2 and nuclear norm regularization require very strong assumptions to guarantee a subspace-preserving affinity; (2) although ℓ 1 regularization can be guaranteed to give a subspace-preserving affinity under certain conditions, it needs more time to solve a large-scale convex optimization problem; (3) ℓ 0 regularization can yield a tradeoff between computationally efficient and subspace-preserving affinity by using the orthogonal matching pursuit (OMP) algorithm, but this still takes more time to search the solution in OMP when the number of data points is large. In order to overcome these problems, we first propose a learned OMP (LOMP) algorithm to learn a single hidden neural network (SHNN) to fast approximate the ℓ 0 code. We then exploit a sparse subspace clustering method based on ℓ 0 code which is fast computed by SHNN. Two sufficient conditions are presented to guarantee that our method can give a subspace-preserving affinity. Experiments on handwritten digit and face clustering show that our method not only quickly computes the ℓ 0 code, but also outperforms the relevant subspace clustering methods in clustering results. In particular, our method achieves the state-of-the-art clustering accuracy (94.32%) on MNIST.

【Keywords】: clustering; sparse coding; neural network

304. Riemannian Submanifold Tracking on Low-Rank Algebraic Variety.

Paper Link】 【Pages】:2196-2202

【Authors】: Qian Li ; Zhichao Wang

【Abstract】: Matrix recovery aims to learn a low-rank structure from high dimensional data, which arises in numerous learning applications. As a popular heuristic to matrix recovery, convex relaxation involves iterative calling of singular value decomposition (SVD). Riemannian optimization based method can alleviate such expensive cost in SVD for improved scalability, which however is usually degraded by the unknown rank. This paper proposes a novel algorithm RIST that exploits the algebraic variety of low-rank manifold for matrix recovery. Particularly, RIST utilizes an efficient scheme that automatically estimate the potential rank on the real algebraic variety and tracks the favorable Riemannian submanifold. Moreover, RIST utilizes the second-order geometric characterization and achieves provable superlinear convergence, which is superior to the linear convergence of most existing methods. Extensive comparison experiments demonstrate the accuracy and ef- ficiency of RIST algorithm.

【Keywords】: Riemannian Optimization; Low-rank recovery; Matrix Compltion; RPCA; Submanifold

305. Large Graph Hashing with Spectral Rotation.

Paper Link】 【Pages】:2203-2209

【Authors】: Xuelong Li ; Di Hu ; Feiping Nie

【Abstract】: Faced with the requirements of huge amounts of data processing nowadays, hashing techniques have attracted much attention due to their efficient storage and searching ability. Among these techniques, the ones based on spectral graph show remarkable performance as they could embed the data on a low-dimensional manifold and maintain the neighborhood structure via a non-linear spectral eigenmap. However, the spectral solution in real value of such methods may deviate from the discrete solution. The common practice is just performing a simple rounding operation to obtain the final binary codes, which could break constraints and even result in worse condition. In this paper, we propose to impose a so-called spectral rotation technique to the spectral hashing objective, which could transform the candidate solution into a new one that better approximates the discrete one. Moreover, the binary codes are obtained from the modified solution via minimizing the Euclidean distance, which could result in more semantical correlation within the manifold, where the constraints for codes are always held. We provide an efficient alternative algorithm to solve the above problems. And a manifold learning perceptive for motivating the proposed method is also shown. Extensive experiments are conducted on three large-scale benchmark datasets and the results show our method outperforms state-of-the-art hashing methods, especially the spectral graph ones.

【Keywords】: hashing, retrieval, spectral rotation

306. Low-Rank Tensor Completion with Total Variation for Visual Data Inpainting.

Paper Link】 【Pages】:2210-2216

【Authors】: Xutao Li ; Yunming Ye ; Xiaofei Xu

【Abstract】: With the advance of acquisition techniques, plentiful higherorder tensor data sets are built up in a great variety of fields such as computer vision, neuroscience, remote sensing and recommender systems. The real-world tensors often contain missing values, which makes tensor completion become a prerequisite to utilize them. Previous studies have shown that imposing a low-rank constraint on tensor completion produces impressive performances. In this paper, we argue that low-rank constraint, albeit useful, is not effective enough to exploit the local smooth and piecewise priors of visual data. We propose integrating total variation into low-rank tensor completion (LRTC) to address the drawback. As LRTC can be formulated by both tensor unfolding and tensor decomposition, we develop correspondingly two methods, namely LRTC-TV-I and LRTC-TVII, and their iterative solvers. Extensive experimental results on color image and medical image inpainting tasks show the effectiveness and superiority of the two methods against state-of-the-art competitors.

【Keywords】: tensor completion; total variation; inpainting; low rank; ADMM

307. Learning Safe Prediction for Semi-Supervised Regression.

Paper Link】 【Pages】:2217-2223

【Authors】: Yu-Feng Li ; Han-Wen Zha ; Zhi-Hua Zhou

【Abstract】: Semi-supervised learning (SSL) concerns how to improve performance via the usage of unlabeled data. Recent studies indicate that the usage of unlabeled data might even deteriorate performance. Although some proposals have been developed to alleviate such a fundamental challenge for semi-supervised classification, the efforts on semi-supervised regression (SSR) remain to be limited. In this work we consider the learning of a safe prediction from multiple semi-supervised regressors, which is not worse than a direct supervised learner with only labeled data. We cast it as a geometric projection issue with an efficient algorithm. Furthermore, we show that the proposal is provably safe and has already achieved the maximal performance gain, if the ground-truth label assignment is realized by a convex linear combination of base regressors. This provides insight to help understand safe SSR. Experimental results on a broad range of datasets validate the effectiveness of our proposal.

【Keywords】: Semi-supervised regression; Safe

308. A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis.

Paper Link】 【Pages】:2224-2230

【Authors】: Zhe Li ; Tianbao Yang ; Lijun Zhang ; Rong Jin

【Abstract】: This paper aims to provide a sharp excess risk guarantee for learning a sparse linear model without any assumptions about the strong convexity of the expected loss and the sparsity of the optimal solution in hindsight. Given a target level ε for the excess risk, an interesting question to ask is how many examples and how large the support set of the solution are enough for learning a good model with the target excess risk. To answer these questions, we present a two-stage algorithm that (i) in the first stage an epoch based stochastic optimization algorithm is exploited with an established O(1/ε) bound on the sample complexity; and (ii) in the second stage a distribution dependent randomized sparsification is presented with an O(1/ε) bound on the sparsity (referred to as support complexity) of the resulting model. Compared to previous works, our contributions lie at (i) we reduce the order of the sample complexity from O(1/ε2) to O(1/ε) without the strong convexity assumption; and (ii) we reduce the constant in O(1/ε) for the sparsity by exploring the distribution dependent sampling.

【Keywords】: sparse learning;sample complexity;non-strongly convex

309. Balanced Clustering with Least Square Regression.

Paper Link】 【Pages】:2231-2237

【Authors】: Hanyang Liu ; Junwei Han ; Feiping Nie ; Xuelong Li

【Abstract】: Clustering is a fundamental research topic in data mining. A balanced clustering result is often required in a variety of applications. Many existing clustering algorithms have good clustering performances, yet fail in producing balanced clusters. In this paper, we propose a novel and simple method for clustering, referred to as the Balanced Clustering with Least Square regression (BCLS), to minimize the least square linear regression, with a balance constraint to regularize the clustering model. In BCLS, the linear regression is applied to estimate the class-specific hyperplanes that partition each class of data from others, thus guiding the clustering of the data points into different clusters. A balance constraint is utilized to regularize the clustering, by minimizing which can help produce balanced clusters. In addition, we apply the method of augmented Lagrange multipliers (ALM) to help optimize the objective model. The experiments on seven real-world benchmarks demonstrate that our approach not only produces good clustering performance but also guarantees a balanced clustering result.

【Keywords】: balanced clustering; least square regression; augmented Lagrange multipliers

310. Ordinal Constrained Binary Code Learning for Nearest Neighbor Search.

Paper Link】 【Pages】:2238-2244

【Authors】: Hong Liu ; Rongrong Ji ; Yongjian Wu ; Feiyue Huang

【Abstract】: Recent years have witnessed extensive attention in binary code learning, a.k.a. hashing, for nearest neighbor search problems. It has been seen that high-dimensional data points can quantize into binary codes to give an efficient similarity approximation via Hamming distance. Among the existing schemes, ranking-based hashing is recent promising that targets at preserving ordinal relations of ranking in the Hamming space to minimize retrieval loss. However, the size of the ranking tuples that show the ordinal relations, is quadratic or cubic to the size of training samples. It is so very expensive to embed such ranking tuples in binary code learning, especially given a large-scale training data set. Besides, it remains difficult to build ranking tuples efficiently for most ranking-preserving hashing, which are deployed over an ordinal graph-based setting. To handle these problems, we propose a novel ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH), which efficiently learns the optimal hashing functions with a graph-based approximation to embed the ordinal relations. The core idea is to reduce the size of ordinal graph with ordinal constraint projection, which preserves the ordinal relations through a small data set (such as clusters or random samples). In particular, to learn such hash functions effectively, we further relax the discrete constraints and design a specific stochastic gradient decent algorithm for optimization. Experimental results on three large-scale visual search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the proposed OCH method can achieve superior performance over the state-of-the-arts approaches.

【Keywords】: binary code learning;Hashing;tensor ordinal graph; ordinal transformation embedding

311. Sparse Deep Transfer Learning for Convolutional Neural Network.

Paper Link】 【Pages】:2245-2251

【Authors】: Jiaming Liu ; Yali Wang ; Yu Qiao

【Abstract】: Extensive studies have demonstrated that the representations of convolutional neural networks (CNN), which are learned from a large-scale data set in the source domain, can be effectively transferred to a new target domain. However, compared to the source domain, the target domain often has limited data in practice. In this case, overfitting may significantly depress transferability, due to the model redundancy of the intensive CNN structures. To deal with this difficulty, we propose a novel sparse deep transfer learning approach for CNN. There are three main contributions in this work. First, we introduce a Sparse-SourceNet to reduce the redundancy in the source domain. Second, we introduce a Hybrid-TransferNet to improve the generalization ability and the prediction accuracy of transfer learning, by taking advantage of both model sparsity and implicit knowledge. Third, we introduce a Sparse-TargetNet, where we prune our Hybrid-TransferNet to obtain a highly-compact, source-knowledge-integrated CNN in the target domain. To examine the effectiveness of our methods, we perform our sparse deep transfer learning approach on a number of benchmark transfer learning tasks. The results show that, compared to the standard fine-tuning approach, our proposed approach achieves a significant pruning rate on CNN while improves the accuracy of transfer learning.

【Keywords】: Convolutional Neural Network; Deep Model Compression; Transfer Learning

312. Cost-Sensitive Feature Selection via F-Measure Optimization Reduction.

Paper Link】 【Pages】:2252-2258

【Authors】: Meng Liu ; Chang Xu ; Yong Luo ; Chao Xu ; Yonggang Wen ; Dacheng Tao

【Abstract】: Feature selection aims to select a small subset from the high-dimensional features which can lead to better learning performance, lower computational complexity, and better model readability. The class imbalance problem has been neglected by traditional feature selection methods, therefore the selected features will be biased towards the majority classes. Because of the superiority of F-measure to accuracy for imbalanced data, we propose to use F-measure as the performance measure for feature selection algorithms. As a pseudo-linear function, the optimization of F-measure can be achieved by minimizing the total costs. In this paper, we present a novel cost-sensitive feature selection (CSFS) method which optimizes F-measure instead of accuracy to take class imbalance issue into account. The features will be selected according to optimal F-measure classifier after solving a series of cost-sensitive feature selection sub-problems. The features selected by our method will fully represent the characteristics of not only majority classes, but also minority classes. Extensive experimental results conducted on synthetic, multi-class and multi-label datasets validate the efficiency and significance of our feature selection method.

【Keywords】:

313. Multiple Kernel k-Means with Incomplete Kernels.

Paper Link】 【Pages】:2259-2265

【Authors】: Xinwang Liu ; Miaomiao Li ; Lei Wang ; Yong Dou ; Jianping Yin ; En Zhu

【Abstract】: Multiple kernel clustering (MKC) algorithms optimally combine a group of pre-specified base kernels to improve clustering performance. However, existing MKC algorithms cannot efficiently address the situation where some rows and columns of base kernels are absent. This paper proposes a simple while effective algorithm to address this issue. Different from existing approaches where incomplete kernels are firstly imputed and a standard MKC algorithm is applied to the imputed kernels, our algorithm integrates imputation and clustering into a unified learning procedure. Specifically, we perform multiple kernel clustering directly with the presence of incomplete kernels, which are treated as auxiliary variables to be jointly optimized. Our algorithm does not require that there be at least one complete base kernel over all the samples. Also, it adaptively imputes incomplete kernels and combines them to best serve clustering. A three-step iterative algorithm with proved convergence is designed to solve the resultant optimization problem. Extensive experiments are conducted on four benchmark data sets to compare the proposed algorithm with existing imputation-based methods. Our algorithm consistently achieves superior performance and the improvement becomes more significant with increasing missing ratio, verifying the effectiveness and advantages of the proposed joint imputation and clustering.

【Keywords】: Multiple kernel learning; Clustering; Absent Learning

314. Optimal Neighborhood Kernel Clustering with Multiple Kernels.

Paper Link】 【Pages】:2266-2272

【Authors】: Xinwang Liu ; Sihang Zhou ; Yueqing Wang ; Miaomiao Li ; Yong Dou ; En Zhu ; Jianping Yin

【Abstract】: Multiple kernel $k$-means (MKKM) aims to improve clustering performance by learning an optimal kernel, which is usually assumed to be a linear combination of a group of pre-specified base kernels. However, we observe that this assumption could: i) cause limited kernel representation capability; and ii) not sufficiently consider the negotiation between the process of learning the optimal kernel and that of clustering, leading to unsatisfying clustering performance. To address these issues, we propose an optimal neighborhood kernel clustering (ONKC) algorithm to enhance the representability of the optimal kernel and strengthen the negotiation between kernel learning and clustering. We theoretically justify this ONKC by revealing its connection with existing MKKM algorithms. Furthermore, this justification shows that existing MKKM algorithms can be viewed as a special case of our approach and indicates the extendability of the proposed ONKC for designing better clustering algorithms. An efficient algorithm with proved convergence is designed to solve the resultant optimization problem. Extensive experiments have been conducted to evaluate the clustering performance of the proposed algorithm. As demonstrated, our algorithm significantly outperforms the state-of-the-art ones in the literature, verifying the effectiveness and advantages of ONKC.

【Keywords】: Multiple Kernel Learning; Optimal Kernel Clustering

315. Generalization Analysis for Ranking Using Integral Operator.

Paper Link】 【Pages】:2273-2279

【Authors】: Yong Liu ; Shizhong Liao ; Hailun Lin ; Yinliang Yue ; Weiping Wang

【Abstract】: The study on generalization performance of ranking algorithms is one of the fundamental issues in ranking learning theory. Although several generalization bounds have been proposed based on different measures, the convergence rates of the existing bounds are usually at most O (√1/ n ), where n is the size of data set. In this paper, we derive novel generalization bounds for the regularized ranking in reproducing kernel Hilbert space via integral operator of kernel function. We prove that the rates of our bounds are much faster than (√1/ n ). Specifically, we first introduce a notion of local Rademacher complexity for ranking, called local ranking  Rademacher complexity, which is used to measure the complexity of the space of loss functions of the ranking. Then, we use the local ranking Rademacher complexity to obtain a basic generalization bound. Finally, we establish the relationship between the local Rademacher complexity and the eigenvalues of integral operator, and further derive sharp generalization bounds of faster convergence rate.

【Keywords】: Generalization Analysis; Ranking; Integral Operator

316. Infinite Kernel Learning: Generalization Bounds and Algorithms.

Paper Link】 【Pages】:2280-2286

【Authors】: Yong Liu ; Shizhong Liao ; Hailun Lin ; Yinliang Yue ; Weiping Wang

【Abstract】: Kernel learning is a fundamental problem both in recent research and application of kernel methods. Existing kernel learning methods commonly use some measures of generalization errors to learn the optimal kernel in a convex (or conic) combination of prescribed basic kernels. However, the generalization bounds derived by these measures usually have slow convergence rates, and the basic kernels are finite and should be specified in advance. In this paper, we propose a new kernel learning method based on a novel measure of generalization error, called principal eigenvalue proportion (PEP), which can learn the optimal kernel with sharp generalization bounds over the convex hull of a possibly infinite set of basic kernels. We first derive sharp generalization bounds based on the PEP measure. Then we design two kernel learning algorithms for finite kernels and infinite kernels respectively, in which the derived sharp generalization bounds are exploited to guarantee faster convergence rates, moreover, basic kernels can be learned automatically for infinite kernel learning instead of being prescribed in advance. Theoretical analysis and empirical results demonstrate that the proposed kernel learning method outperforms the state-of-the-art kernel learning methods.

【Keywords】: Kernel Learning; Generalization Bound; Model Selection

317. Accelerated Variance Reduced Stochastic ADMM.

Paper Link】 【Pages】:2287-2293

【Authors】: Yuanyuan Liu ; Fanhua Shang ; James Cheng

【Abstract】: Recently, many variance reduced stochastic alternating direction method of multipliers (ADMM) methods (e.g. SAG-ADMM, SDCA-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rates for strongly convex problems. However, the best known convergence rate for general convex problems is O(1/ T ) as opposed to O(1/ T 2 ) of accelerated batch algorithms, where T is the number of iterations. Thus, there still remains a gap in convergence rates between existing stochastic ADMM and batch algorithms. To bridge this gap, we introduce the momentum acceleration trick for batch optimization into the stochastic variance reduced gradient based ADMM (SVRG-ADMM), which leads to an accelerated (ASVRG-ADMM) method. Then we design two different momentum term update rules for strongly convex and general convex cases. We prove that ASVRG-ADMM converges linearly for strongly convex problems. Besides having a low-iteration complexity as existing stochastic ADMM methods, ASVRG-ADMM improves the convergence rate on general convex problems from O(1/ T ) to O(1/T 2 ). Our experimental results show the effectiveness of ASVRG-ADMM.

【Keywords】:

318. Semi-Supervised Classifications via Elastic and Robust Embedding.

Paper Link】 【Pages】:2294-2300

【Authors】: Yun Liu ; Yiming Guo ; Hua Wang ; Feiping Nie ; Heng Huang

【Abstract】: Transductive semi-supervised learning can only predict labels for unlabeled data appearing in training data, and can not predict labels for testing data never appearing in training set. To handle this out-of-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint might be too rigid to capture the manifold structure of data. In this paper, we relax this rigid constraint and propose to use an elastic constraint on the predicted label matrix such that the manifold structure can be better explored. Moreover, since unlabeled data are often very abundant in practice and usually there are some outliers, we use a non-squared loss instead of the traditional squared loss to learn a robust model. The derived problem, although is convex, has so many nonsmooth terms, which make it very challenging to solve. In the paper, we propose an efficient optimization algorithm to solve a more general problem, based on which we find the optimal solution to the derived problem.

【Keywords】: Semi-Supervised Classifications; Elastic and Robust Embedding

319. Approximate Conditional Gradient Descent on Multi-Class Classification.

Paper Link】 【Pages】:2301-2307

【Authors】: Zhuanghua Liu ; Ivor Tsang

【Abstract】: Conditional gradient descent, aka the Frank-Wolfe algorithm,regains popularity in recent years. The key advantage of Frank-Wolfe is that at each step the expensive projection is replaced with a much more efficient linear optimization step. Similar to gradient descent, the loss function of Frank-Wolfe scales with the data size. Training on big data poses a challenge for researchers. Recently, stochastic Frank-Wolfe methods have been proposed to solve the problem, but they do not perform well in practice. In this work, we study the problem of approximating the Frank-Wolfe algorithm on the large-scale multi-class classification problem which is a typical application of the Frank-Wolfe algorithm. We present a simple but effective method employing internal structure of data to approximate Frank-Wolfe on the large-scale multiclass classification problem. Empirical results verify that our method outperforms the state-of-the-art stochastic projection free methods.

【Keywords】:

320. Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling.

Paper Link】 【Pages】:2308-2314

【Authors】: Minnan Luo ; Feiping Nie ; Xiaojun Chang ; Yi Yang ; Alexander G. Hauptmann ; Qinghua Zheng

【Abstract】: Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the document’s topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using l21-norm and capped l21-norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.

【Keywords】:

Paper Link】 【Pages】:2315-2321

【Authors】: Yifei Ma ; Roman Garnett ; Jeff G. Schneider

【Abstract】: Autonomous systems can be used to search for sparse signals in a large space; e.g., aerial robots can be deployed to localize threats, detect gas leaks, or respond to distress calls. Intuitively, search algorithms may increase efficiency by collecting aggregate measurements summarizing large contiguous regions. However, most existing search methods either ignore the possibility of such region observations (e.g., Bayesian optimization and multi-armed bandits) or make strong assumptions about the sensing mechanism that allow each measurement to arbitrarily encode all signals in the entire environment (e.g., compressive sensing). We propose an algorithm that actively collects data to search for sparse signals using only noisy measurements of the average values on rectangular regions (including single points), based on the greedy maximization of information gain. We analyze our algorithm in 1d and show that it requires $\tilde{O}(\frac{n}{\mu^2}+k^2)$ measurements to recover all of $k$ signal locations with small Bayes error, where $\mu$ and $n$ are the signal strength and the size of the search space, respectively. We also show that active designs can be fundamentally more efficient than passive designs with region sensing, contrasting with the results of Arias-Castro, Candes, and Davenport (2013). We demonstrate the empirical performance of our algorithm on a search problem using satellite image data and in high dimensions.

【Keywords】: active search; Bayesian optimization; design of experiments; statistical complexity; information theory

322. Where to Add Actions in Human-in-the-Loop Reinforcement Learning.

Paper Link】 【Pages】:2322-2328

【Authors】: Travis Mandel ; Yun-En Liu ; Emma Brunskill ; Zoran Popovic

【Abstract】: In order for reinforcement learning systems to learn quickly in vast action spaces such as the space of all possible pieces of text or the space of all images, leveraging human intuition and creativity is key. However, a human-designed action space is likely to be initially imperfect and limited; furthermore, humans may improve at creating useful actions with practice or new information. Therefore, we propose a framework in which a human adds actions to a reinforcement learning system over time to boost performance. In this setting, however, it is key that we use human effort as efficiently as possible, and one significant danger is that humans waste effort adding actions at places (states) that aren't very important. Therefore, we propose Expected Local Improvement (ELI), an automated method which selects states at which to query humans for a new action. We evaluate ELI on a variety of simulated domains adapted from the literature, including domains with over a million actions and domains where the simulated experts change over time. We find ELI demonstrates excellent empirical performance, even in settings where the synthetic "experts" are quite poor.

【Keywords】: Human-in-the-Loop AI, MDPs, Exploration, Human-Aware AI

323. Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction.

Paper Link】 【Pages】:2329-2335

【Authors】: Qi Meng ; Wei Chen ; Jingcheng Yu ; Taifeng Wang ; Zhiming Ma ; Tie-Yan Liu

【Abstract】: Regularized empirical risk minimization (R-ERM) is an important branch of machine learning, since it constrains the capacity of the hypothesis space and guarantees the generalization ability of the learning algorithm. Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem. Recently, variance reduction technique was proposed to improve ProxSGD and ProxSCD, and the corresponding ProxSVRG and ProxSVRCD have better convergence rate. These proximal algorithms with variance reduction technique have also achieved great success in applications at small and moderate scales. However, in order to solve large-scale R-ERM problems and make more practical impacts, the parallel versions of these algorithms are sorely needed. In this paper, we propose asynchronous ProxSVRG (Async-ProxSVRG) and asynchronous ProxSVRCD (Async-ProxSVRCD) algorithms, and prove that Async-ProxSVRG can achieve near linear speedup when the training data is sparse, while Async-ProxSVRCD can achieve near linear speedup regardless of the sparse condition, as long as the number of block partitions are appropriately set. We have conducted experiments on a regularized logistic regression task. The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

【Keywords】: asynchronous, parallel, stochastic gradient descent, variance reduction, proximal

324. Generalization Error Bounds for Optimization Algorithms via Stability.

Paper Link】 【Pages】:2336-2342

【Authors】: Qi Meng ; Yue Wang ; Wei Chen ; Taifeng Wang ; Zhiming Ma ; Tie-Yan Liu

【Abstract】: Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and nonconvex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of 𝒪(1/ n )+ 𝔼ρ( T )), where ρ( T ) is the convergence error and T is the number of iterations) and in high probability (in the order of 𝒪(log{1/δ / √ n + ρ( T ) with probability 1 – δ). For nonconvex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.

【Keywords】: generalization;stability;optimization;stochastic gradient descent;variance reduction

325. When and Why Are Deep Networks Better Than Shallow Ones?

Paper Link】 【Pages】:2343-2349

【Authors】: Hrushikesh Mhaskar ; Qianli Liao ; Tomaso A. Poggio

【Abstract】: While the universal approximation property holds both for hierarchical and shallow networks, deep networks can approximate the class of compositional functions as well as shallow networks but with exponentially lower number of training parameters and sample complexity. Compositional functions are obtained as a hierarchy of local constituent functions, where "local functions'' are functions with low dimensionality. This theorem proves an old conjecture by Bengio on the role of depth in networks, characterizing precisely the conditions under which it holds. It also suggests possible answers to the the puzzle of why high-dimensional deep networks trained on large training sets often do not seem to show overfit.

【Keywords】: deep learning; shallow and deep networks; function approximation

326. Lifted Inference for Convex Quadratic Programs.

Paper Link】 【Pages】:2350-2356

【Authors】: Martin Mladenov ; Leonard Kleinhans ; Kristian Kersting

【Abstract】: Symmetry is the essential element of lifted inferencethat has recently demonstrated the possibility to perform very efficient inference in highly-connected, but symmetric probabilistic models. This raises the question, whether this holds for optimization problems in general.Here we show that for a large classof optimization methods this is actually the case.Specifically, we introduce the concept of fractionalsymmetries of convex quadratic programs (QPs),which lie at the heart of many AI and machine learning approaches,and exploit it to lift, i.e., to compress QPs.These lifted QPs can then be tackled with the usual optimization toolbox (off-the-shelf solvers, cutting plane algorithms,stochastic gradients etc.). If the original QP exhibitssymmetry, then the lifted one will generallybe more compact, and hence more efficient to solve.

【Keywords】: lifted inference; machine learning; quadratic program

327. Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions.

Paper Link】 【Pages】:2357-2363

【Authors】: Alejandro Molina ; Sriraam Natarajan ; Kristian Kersting

【Abstract】: Multivariate count data are pervasive in science in the form of histograms, contingency tables and others. Previous work on modeling this type of distributions do not allow for fast and tractable inference. In this paper we present a novel Poisson graphical model, the first based on sum product networks, called PSPN, allowing for positive as well as negative dependencies. We present algorithms for learning tree PSPNs from data as well as for tractable inference via symbolic evaluation. With these, information-theoretic measures such as entropy, mutual information, and distances among count variables can be computed without resorting to approximations. Additionally, we show a connection between PSPNs and LDA, linking the structure of tree PSPNs to a hierarchy of topics. The experimental results on several synthetic and real world datasets demonstrate that PSPN often outperform state-of-the-art while remaining tractable.

【Keywords】: Graphical Models; Deep Models; Sum-Product Networks; Poisson Models

328. Deep Collective Inference.

Paper Link】 【Pages】:2364-2372

【Authors】: John Moore ; Jennifer Neville

【Abstract】: Collective inference is widely used to improve classification in network datasets. However, despite recent advances in deep learning and the successes of recurrent neural networks (RNNs), researchers have only just recently begun to study how to apply RNNs to heterogeneous graph and network datasets. There has been recent work on using RNNs for unsupervised learning in networks (e.g., graph clustering, node embedding) and for prediction (e.g., link prediction, graph classification), but there has been little work on using RNNs for node-based relational classification tasks. In this paper, we provide an end-to-end learning framework using RNNs for collective inference. Our main insight is to transform a node and its set of neighbors into an unordered sequence (of varying length) and use an LSTM-based RNN to predict the class label as the output of that sequence. We develop a collective inference method, which we refer to as Deep Collective Inference (DCI), that uses semi-supervised learning in partially-labeled networks and two label distribution correction mechanisms for imbalanced classes. We compare to several alternative methods on seven network datasets. DCI achieves up to a 12% reduction in error compared to the best alternative and a 25% reduction in error on average — over all methods, for all label proportions.

【Keywords】: Relational learning; collective classification; deep learning; neural networks; LSTM

329. Streaming Classification with Emerging New Class by Class Matrix Sketching.

Paper Link】 【Pages】:2373-2379

【Authors】: Xin Mu ; Feida Zhu ; Juan Du ; Ee-Peng Lim ; Zhi-Hua Zhou

【Abstract】: Streaming classification with emerging new class is an important problem of great research challenge and practical value. In many real applications, the task often needs to handle large matrices issues such as textual data in the bag-of-words model and large-scale image analysis. However, the methodologies and approaches adopted by the existing solutions, most of which involve massive distance calculation, have so far fallen short of successfully addressing a real-time requested task. In this paper, the proposed method dynamically maintains two low-dimensional matrix sketches to 1) detect emerging new classes; 2) classify known classes; and 3) update the model in the data stream. The update efficiency is superior to the existing methods. The empirical evaluation shows the proposed method not only receives the comparable performance but also strengthens modelling on large-scale data sets.

【Keywords】:

330. Deep Hashing: A Joint Approach for Image Signature Learning.

Paper Link】 【Pages】:2380-2386

【Authors】: Yadong Mu ; Zhu Liu

【Abstract】: Similarity-based image hashing represents crucial technique for visual data storage reduction and expedited image search. Conventional hashing schemes typically feed hand-crafted features into hash functions, which separates the procedures of feature extraction and hash function learning. In this paper, we propose a novel algorithm that concurrently performs feature engineering and non-linear supervised hashing function learning. Our technical contributions in this paper are two-folds: 1) deep network optimization is often achieved by gradient propagation, which critically requires a smooth objective function. The discrete nature of hash codes makes them not amenable for gradient-based optimization. To address this issue, we propose an exponentiated hashing loss function and its bilinear smooth approximation. Effective gradient calculation and propagation are thereby enabled; 2) pre-training is an important trick in supervised deep learning. The impact of pre-training on the hash code quality has never been discussed in current deep hashing literature. We propose a pre-training scheme inspired by recent advance in deep network based image classification, and experimentally demonstrate its effectiveness. Comprehensive quantitative evaluations are conducted. On all adopted benchmarks, our proposed algorithm generates new performance records by significant improvement margins.

【Keywords】: image hashing; deep learning; image search

331. Tsallis Regularized Optimal Transport and Ecological Inference.

Paper Link】 【Pages】:2387-2393

【Authors】: Boris Muzellec ; Richard Nock ; Giorgio Patrini ; Frank Nielsen

【Abstract】: Optimal transport is a powerful framework for computing distances between probability distributions. We unify the two main approaches to optimal transport, namely Monge-Kantorovitch and Sinkhorn-Cuturi, into what we define as Tsallis regularized optimal transport (TROT). TROT interpolates a rich family of distortions from Wasserstein to Kullback-Leibler, encompassing as well Pearson, Neyman and Hellinger divergences, to name a few. We show that metric properties known for Sinkhorn-Cuturi generalize to TROT, and provide efficient algorithms for finding the optimal transportation plan with formal convergence proofs. We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences. TROT provides a convenient framework for ecological inference by allowing to compute the joint distribution -— that is, the optimal transportation plan itself — when side information is available, which is e.g. typically what census represents in political science. Experiments on data from the 2012 US presidential elections display the potential of TROT in delivering a faithful reconstruction of the joint distribution of ethnic groups and voter preferences.

【Keywords】: Ecological Inference; Regularized Optimal Transport

332. The Multivariate Generalised von Mises Distribution: Inference and Applications.

Paper Link】 【Pages】:2394-2400

【Authors】: Alexandre K. W. Navarro ; Jes Frellsen ; Richard E. Turner

【Abstract】: Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression inspired by Gaussian Processes, and a method for probabilistic Principal Component Analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. kernel functions and automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.

【Keywords】: Circular statistics, Bayesian inference, Approximate inference, Kernels, Gaussian Processes

333. Querying Partially Labelled Data to Improve a K-nn Classifier.

Paper Link】 【Pages】:2401-2407

【Authors】: Vu-Linh Nguyen ; Sébastien Destercke ; Marie-Hélène Masson

【Abstract】: When learning from instances whose output labels may be partial, the problem of knowing which of these output labels should be made precise to improve the accuracy of predictions arises. This problem can be seen as the intersection of two tasks: the one of learning from partial labels and the one of active learning, where the goal is to provide the labels of additional instances to improve the model accuracy. In this paper, we propose querying strategies of partial labels for the well-known K-nn classifier. We propose different criteria of increasing complexity, using among other things the amount of ambiguity that partial labels introduce in the K-nn decision rule. We then show that our strategies usually outperform simple baseline schemes, and that more complex strategies provide a faster improvement of the model accuracies.

【Keywords】: Active Learning; Classification; Case-based Reasoning; Uncertainty in AI;

334. Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours.

Paper Link】 【Pages】:2408-2414

【Authors】: Feiping Nie ; Guohao Cai ; Xuelong Li

【Abstract】: Due to the efficiency of learning relationships and complex structures hidden in data, graph-oriented methods have been widely investigated and achieve promising performance in multi-view learning. Generally, these learning algorithms construct informative graph for each view or fuse different views to one graph, on which the following procedure are based. However, in many real world dataset, original data always contain noise and outlying entries that result in unreliable and inaccurate graphs, which cannot be ameliorated in the previous methods. In this paper, we propose a novel multi-view learning model which performs clustering/semi-supervised classification and local structure learning simultaneously. The obtained optimal graph can be partitioned into specific clusters directly. Moreover, our model can allocate ideal weight for each view automatically without additional weight and penalty parameters. An efficient algorithm is proposed to optimize this model. Extensive experimental results on different real-world datasets show that the proposed model outperforms other state-of-the-art multi-view algorithms.

【Keywords】: Multi-view data fusion; Optimal graph; Clustering; Semi-supervised Classification

335. Multiclass Capped ℓp-Norm SVM for Robust Classifications.

Paper Link】 【Pages】:2415-2421

【Authors】: Feiping Nie ; Xiaoqian Wang ; Heng Huang

【Abstract】: Support vector machine (SVM) model is one of most successful machine learning methods and has been successfully applied to solve numerous real-world application. Because the SVM methods use the hinge loss or squared hinge loss functions for classifications, they usually outperform other classification approaches, e.g. the least square loss function based methods. However, like most supervised learning algorithms, they learn classifiers based on the labeled data in training set without specific strategy to deal with the noise data. In many real-world applications, we often have data outliers in train set, which could misguide the classifiers learning, such that the classification performance is suboptimal. To address this problem, we proposed a novel capped Lp-norm SVM classification model by utilizing the capped `p-norm based hinge loss in the objective which can deal with both light and heavy outliers. We utilize the new formulation to naturally build the multiclass capped Lp-norm SVM. More importantly, we derive a novel optimization algorithms to efficiently minimize the capped Lp-norm based objectives, and also rigorously prove the convergence of proposed algorithms. We present experimental results showing that employing the new capped Lp-norm SVM method can consistently improve the classification performance, especially in the cases when the data noise level increases.

【Keywords】: Capped Lp-Norm SVM; Robust Classification

336. Unsupervised Large Graph Embedding.

Paper Link】 【Pages】:2422-2428

【Authors】: Feiping Nie ; Wei Zhu ; Xuelong Li

【Abstract】: There are many successful spectral based unsupervised dimensionality reduction methods, including Laplacian Eigenmap (LE), Locality Preserving Projection (LPP), Spectral Regression (SR), etc. LPP and SR are two different linear spectral based methods, however, we discover that LPP and SR are equivalent, if the symmetric similarity matrix is doubly stochastic, Positive Semi-Definite (PSD) and with rank p, where p is the reduced dimension. The discovery promotes us to seek low-rank and doubly stochastic similarity matrix, we then propose an unsupervised linear dimensionality reduction method, called Unsupervised Large Graph Embedding (ULGE). ULGE starts with similar idea as LPP, it adopts an efficient approach to construct similarity matrix and then performs spectral analysis efficiently, the computational complexity can reduce to O(ndm), which is a significant improvement compared to conventional spectral based methods which need O(n^2d) at least, where n, d and m are the number of samples, dimensions and anchors, respectively. Extensive experiments on several public available data sets demonstrate the efficiency and effectiveness of the proposed method.

【Keywords】: Spectral based dimensionality reduction; Equivalence between LPP and SR; Large graph embedding; Anchor-based graph

337. Matching Node Embeddings for Graph Similarity.

Paper Link】 【Pages】:2429-2435

【Authors】: Giannis Nikolentzos ; Polykarpos Meladianos ; Michalis Vazirgiannis

【Abstract】: Graph kernels have emerged as a powerful tool for graph comparison. Most existing graph kernels focus on local properties of graphs and ignore global structure. In this paper, we compare graphs based on their global properties as these are captured by the eigenvectors of their adjacency matrices. We present two algorithms for both labeled and unlabeled graph comparison. These algorithms represent each graph as a set of vectors corresponding to the embeddings of its vertices. The similarity between two graphs is then determined using the Earth Mover's Distance metric. These similarities do not yield a positive semidefinite matrix. To address for this, we employ an algorithm for SVM classification using indefinite kernels. We also present a graph kernel based on the Pyramid Match kernel that finds an approximate correspondence between the sets of vectors of the two graphs. We further improve the proposed kernel using the Weisfeiler-Lehman framework. We evaluate the proposed methods on several benchmark datasets for graph classification and compare their performance to state-of-the-art graph kernels. In most cases, the proposed algorithms outperform the competing methods, while their time complexity remains very attractive.

【Keywords】: graph kernels; graph classification; node matching

338. Inductive Pairwise Ranking: Going Beyond the n log(n) Barrier.

Paper Link】 【Pages】:2436-2442

【Authors】: U. N. Niranjan ; Arun Rajkumar

【Abstract】: We study the problem of ranking a set of items from nonactively chosen pairwise preferences where each item has feature information with it. We propose and characterize a very broad class of preference matrices giving rise to the Feature Low Rank (FLR) model, which subsumes several models ranging from the classic Bradley–Terry–Luce (BTL) (Bradley and Terry 1952) and Thurstone (Thurstone 1927) models to the recently proposed blade-chest (Chen and Joachims 2016) and generic low-rank preference (Rajkumar and Agarwal 2016) models. We use the technique of matrix completion in the presence of side information to develop the Inductive Pairwise Ranking (IPR) algorithm that provably learns a good ranking under the FLR model, in a sample-efficient manner. In practice, through systematic synthetic simulations, we confirm our theoretical findings regarding improvements in the sample complexity due to the use of feature information. Moreover, on popular real-world preference learning datasets, with as less as 10% sampling of the pairwise comparisons, our method recovers a good ranking.

【Keywords】: Ranking; preference learning; inductive learning; matrix completion; features; sample complexity

Paper Link】 【Pages】:2443-2449

【Authors】: Dino Oglic ; Roman Garnett ; Thomas Gärtner

【Abstract】: We consider an active search problem in intensionally specified structured spaces. The ultimate goal in this setting is to discover structures from structurally different partitions of a fixed but unknown target class. An example of such a process is that of computer-aided de novo drug design. In the past 20 years several Monte Carlo search heuristics have been developed for this process. Motivated by these hand-crafted search heuristics, we devise a Metropolis--Hastings sampling scheme where the acceptance probability is given by a probabilistic surrogate of the target property, modeled with a max entropy conditional model. The surrogate model is updated in each iteration upon the evaluation of a selected structure. The proposed approach is consistent and the empirical evidence indicates that it achieves a large structural variety of discovered targets.

【Keywords】:

340. Top-k Hierarchical Classification.

Paper Link】 【Pages】:2450-2456

【Authors】: Sechan Oh

【Abstract】: This paper studies a top-k hierarchical classification problem. In top-k classification, one is allowed to make k predictions and no penalty is incurred if at least one of k predictions is correct. In hierarchical classification, classes form a structured hierarchy, and misclassification costs depend on the relation between the correct class and the incorrect class in the hierarchy. Despite that the fact that both top-k classification and hierarchical classification have gained increasing interests, the two problems have always been studied separately. In this paper, we define a top-k hierarchical loss function using a real world application. We provide the Bayes-optimal solution that minimizes the expected top-k hierarchical misclassification cost. Via numerical experiments, we show that our solution outperforms two baseline methods that address only one of the two issues.

【Keywords】: Hierarchical classification; Top-k; Precision@k; Bayes-optimal

341. Unimodal Thompson Sampling for Graph-Structured Arms.

Paper Link】 【Pages】:2457-2463

【Authors】: Stefano Paladino ; Francesco Trovò ; Marcello Restelli ; Nicola Gatti

【Abstract】: We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this setting, each arm corresponds to a node of a graph and each edge provides a relationship, unknown to the learner, between two nodes in terms of expected reward. Furthermore, for any node of the graph there is a path leading to the unique node providing the maximum expected reward, along which the expected reward is monotonically increasing. Previous results on this setting describe the behavior of frequentist MAB algorithms. In our paper, we design a Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting. We show that -as it happens in a wide number of scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In particular, we provide a thorough experimental evaluation of the performance of our and state-of-the-art algorithms as the properties of the graph vary.

【Keywords】: multi-armed bandit; unimodal MAB

342. Accelerated Gradient Temporal Difference Learning.

Paper Link】 【Pages】:2464-2470

【Authors】: Yangchen Pan ; Adam M. White ; Martha White

【Abstract】: The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.

【Keywords】: Reinforcement learning; temporal difference learning; least squares; prediction

343. A General Framework for Sparsity Regularized Feature Selection via Iteratively Reweighted Least Square Minimization.

Paper Link】 【Pages】:2471-2477

【Authors】: Hanyang Peng ; Yong Fan

【Abstract】: A variety of feature selection methods based on sparsity regularization have been developed with different loss functions and sparse regularization functions. Capitalizing on the existing sparsity regularized feature selection methods, we propose a general sparsity feature selection (GSR-FS) algorithm that optimizes a ℓ 2, r (0 <  r ≤ 2) based loss function with a ℓ 2, p -norm (0 < p ≤ 2) sparse regularization. The ℓ 2, r - norm (0 < 𝑟 ≤ 2) based loss function brings flexibility to balance data-fitting and robustness to outliers by tuning its parameter, and the ℓ 2, p -norm (0 < p ≤ 1) based regularization function is able to boost the sparsity for feature selection. To solve the optimization problem with multiple non-smooth and non-convex functions when , we develop an efficient solver under the general umbrella of Iterative Reweighted Least Square (IRLS) algorithms. Our algorithm has been proved to converge with a theoretical convergence order of min(2 – r, 2 – p ) at least . The experimental results have demonstrated that our method could achieve competitive feature selection performance on publicly available datasets compared with state-of-the-art feature selection methods, with reduced computational cost.

【Keywords】: feature selection; general framework; sparsity regularization

344. Cascade Subspace Clustering.

Paper Link】 【Pages】:2478-2484

【Authors】: Xi Peng ; Jiashi Feng ; Jiwen Lu ; Wei-Yun Yau ; Zhang Yi

【Abstract】: In this paper, we recast the subspace clustering as a verification problem. Our idea comes from an assumption that the distribution between a given sample x and cluster centers Omega is invariant to different distance metrics on the manifold, where each distribution is defined as a probability map (i.e. soft-assignment) between x and Omega. To verify this so-called invariance of distribution, we propose a deep learning based subspace clustering method which simultaneously learns a compact representation using a neural network and a clustering assignment by minimizing the discrepancy between pair-wise sample-centers distributions. To the best of our knowledge, this is the first work to reformulate clustering as a verification problem. Moreover, the proposed method is also one of the first several cascade clustering models which jointly learn representation and clustering in end-to-end manner. Extensive experimental results show the effectiveness of our algorithm comparing with 11 state-of-the-art clustering approaches on four data sets regarding to four evaluation metrics.

【Keywords】: Clustering; Deep Learning; Subspace Clustering; Neural Network

345. Column Networks for Collective Classification.

Paper Link】 【Pages】:2485-2491

【Authors】: Trang Pham ; Truyen Tran ; Dinh Q. Phung ; Svetha Venkatesh

【Abstract】: Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computationally challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient with linear complexity in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all of these applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.

【Keywords】:

346. A General Clustering Agreement Index: For Comparing Disjoint and Overlapping Clusters.

Paper Link】 【Pages】:2492-2498

【Authors】: Reihaneh Rabbany ; Osmar R. Zaïane

【Abstract】: A clustering agreement index quantifies the similarity between two given clusterings. It is most commonly used to compare the results obtained from different clustering algorithms against the ground-truth clustering in the benchmark datasets. In this paper, we present a general Clustering Agreement Index (CAI) for comparing disjoint and overlapping clusterings. CAI is generic and introduces a family of clustering agreement indexes. In particular, the two widely used indexes of Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI), are special cases of the CAI. Our index, therefore, provides overlapping extensions for both these commonly used indexes, whereas their original formulations are only defined for disjoint cases. Lastly, unlike previous indexes, CAI is flexible and can be adapted to incorporate the structure of the data, which is important when comparing clusters in networks, a.k.a communities.

【Keywords】: Clustering Agreement; Overlapping Clusters; Cluster Evaluation; Cluster Validation; Community Detection

347. Non-Negative Inductive Matrix Completion for Discrete Dyadic Data.

Paper Link】 【Pages】:2499-2505

【Authors】: Piyush Rai

【Abstract】: We present a non-negative inductive latent factor model for binary- and count-valued matrices containing dyadic data, with side information along the rows and/or the columns of the matrix. The side information is incorporated by conditioning the row and column latent factors on the available side information via a regression model. Our model can not only perform matrix factorization and completion with side-information, but also infers interpretable latent topics that explain/summarize the data. An appealing aspect of our model is in the full local conjugacy of all parts of the model, including the main latent factor model, as well as for the regression model that leverages the side information. This enables us to design scalable and simple to implement Gibbs sampling and Expectation Maximization algorithms for doing inference in the model. Inference cost in our model scales in the number of nonzeros in the data matrix, which makes it particularly attractive for massive, sparse matrices. We demonstrate the effectiveness of our model on several real-world data sets, comparing it with state-of-the-art baselines.

【Keywords】: matrix factorization; non-negative matrix factorization; matrix completion; recommender systems; Bayesian models; multi-label learning; inductive matrix completion

348. Online Active Linear Regression via Thresholding.

Paper Link】 【Pages】:2506-2512

【Authors】: Carlos Riquelme ; Ramesh Johari ; Baosen Zhang

【Abstract】: We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower bounds. We extend the algorithm and its guarantees to sparse linear regression in high-dimensional settings. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling in real-world datasets that exhibit high nonlinearity and high dimensionality — significantly reducing both the mean and variance of the squared error.

【Keywords】: Active Learning, Linear Regression, Reinforcement Learning, Machine Learning, Online Algorithms

349. Adaptive Proximal Average Approximation for Composite Convex Minimization.

Paper Link】 【Pages】:2513-2519

【Authors】: Li Shen ; Wei Liu ; Junzhou Huang ; Yu-Gang Jiang ; Shiqian Ma

【Abstract】: We propose a fast first-order method to solve multi-term nonsmooth composite convex minimization problems by employing a recent proximal average approximation technique and a novel adaptive parameter tuning technique. Thanks to this powerful parameter tuning technique, the proximal gradient step can be performed with a much larger stepsize in the algorithm implementation compared with the prior PA-APG method, which is the core to enable significant improvements in practical performance. Moreover, by choosing the approximation parameter adaptively, the proposed method is shown to enjoy the O(1/k) iteration complexity theoretically without needing any extra computational cost, while the PA-APG method incurs much more iterations for convergence. The preliminary experimental results on overlapping group Lasso and graph-guided fused Lasso problems confirm our theoretic claim well, and indicate that the proposed method is almost five times faster than the state-of-the-art PA-APG method and therefore suitable for higher-precision required optimization.

【Keywords】:

350. Random Features for Shift-Invariant Kernels with Moment Matching.

Paper Link】 【Pages】:2520-2526

【Authors】: Weiwei Shen ; Zhihui Yang ; Jun Wang

【Abstract】: In order to grapple with the conundrum in the scalability of kernel-based learning algorithms, the method of approximating nonlinear kernels via random feature maps has attracted wide attention in large-scale learning systems. Specifically, the associated sampling procedure is one critical component that dictates the quality of random feature maps. However, for high-dimensional features, the standard Monte Carlo sampling method has been shown to be less effective in producing low-variance random samples. In consequence, it demands constructing a large number of features to attain the desired accuracy for downstream use. In this paper, we present a novel sampling algorithm powered by moment matching techniques to reduce the variance of random features. Our extensive empirical studies and comparisons with several highly competitive peer methods verify the superiority of the proposed algorithm in Gram matrix approximation and generalization errors in regression. Our rigorous theoretical proofs justify that the proposed algorithm is guaranteed achieving lower variance than the standard Monte Carlo method in high dimensional settings.

【Keywords】: Variance Reduction; Moment Matching; Random Feature Maps

351. Compressed K-Means for Large-Scale Clustering.

Paper Link】 【Pages】:2527-2533

【Authors】: Xiao-Bo Shen ; Weiwei Liu ; Ivor W. Tsang ; Fumin Shen ; Quan-Sen Sun

【Abstract】: Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

【Keywords】: Large-scale clustering; k-means; binary code

352. Patch Reordering: A NovelWay to Achieve Rotation and Translation Invariance in Convolutional Neural Networks.

Paper Link】 【Pages】:2534-2540

【Authors】: Xu Shen ; Xinmei Tian ; Shaoyan Sun ; Dacheng Tao

【Abstract】: Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance on many visual recognition tasks. However, the combination of convolution and pooling operations only shows invariance to small local location changes in meaningful objects in input. Sometimes, such networks are trained using data augmentation to encode this invariance into the parameters, which restricts the capacity of the model to learn the content of these objects. A more efficient use of the parameter budget is to encode rotation or translation invariance into the model architecture, which relieves the model from the need to learn them. To enable the model to focus on learning the content of objects other than their locations, we propose to conduct patch ranking of the feature maps before feeding them into the next layer. When patch ranking is combined with convolution and pooling operations, we obtain consistent representations despite the location of meaningful objects in input. We show that the patch ranking module improves the performance of the CNN on many benchmark tasks, including MNIST digit recognition, large-scale image recognition, and image retrieval.

【Keywords】:

353. Asymmetric Discrete Graph Hashing.

Paper Link】 【Pages】:2541-2547

【Authors】: Xiaoshuang Shi ; Fuyong Xing ; Kaidi Xu ; Manish Sapkota ; Lin Yang

【Abstract】: Recently, many graph based hashing methods have been emerged to tackle large-scale problems. However, there exists two major bottlenecks: (1) directly learning discrete hashing codes is an NP-hardoptimization problem; (2) the complexity of both storage and computational time to build a graph with n data points is O ( n 2 ). To address these two problems, in this paper, we propose a novel yetsimple supervised graph based hashing method, asymmetric discrete graph hashing, by preserving the asymmetric discrete constraint and building an asymmetric affinity matrix to learn compact binary codes.Specifically, we utilize two different instead of identical discrete matrices to better preserve the similarity of the graph with short binary codes. We generate the asymmetric affinity matrix using m ( m << n ) selected anchors to approximate the similarity among all training data so that computational time and storage requirement can be significantly improved. In addition, the proposed method jointly learns discrete binary codes and a low-dimensional projection matrix to further improve the retrieval accuracy. Extensive experiments on three benchmark large-scale databases demonstrate its superior performance over the recent state of the arts with lower training time costs.

【Keywords】: Discrete Hashing, Graph

354. Spectral Clustering with Brainstorming Process for Multi-View Data.

Paper Link】 【Pages】:2548-2554

【Authors】: Jeong Woo Son ; Junkey Jeon ; Alex Lee ; Sun-Joong Kim

【Abstract】: Clustering tasks often requires multiple views rather than a singleview to correctly reflect diverse characteristics of the cluster boundaries. The cluster boundaries estimated using a single view are incorrect in general, and those incorrect estimation should be compensated by helps of other views. If each viewis independent to other views, incorrect estimations will be mostly revised as the number of views grow. However, as the number of views grow, it is almost impossibleto avoid dependencies among views, and such dependencies often delude correct estimations. Thus, dependencies among views should be carefully considered in multi-view clustering. This paper proposes a new spectral clustering method to deal with multi-view data and dependencies among views. The proposed method is motivated by the brainstorming process. In the brainstorming process, an instance is regarded as an agenda to be discussed, while each view is considered as a brainstormer. Through the discussion step in the brainstorming process, a brainstormer iteratively suggests their opinions and accepts others’ different opinions. To compensate the biases caused by information sharing between brainstormers with dependent opinions, those having independent opinions are more encouraged to discuss together than those with dependent opinions. The conclusion step makes a compromise by merging or concatenating all opinions. The clustering is finally done after the conclusion. Experimental results in three tasks show the effectiveness of the proposed method comparing with ordinary single and multi-view spectral clusterings.

【Keywords】: spectral clustering; multi-view data

355. Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning.

Paper Link】 【Pages】:2555-2561

【Authors】: Kun Song ; Feiping Nie ; Junwei Han ; Xuelong Li

【Abstract】: We introduce a novel supervised metric learning algorithm named parameter free large margin nearest neighbor (PFLMNN) which can be seen as an improvement of the classical large margin nearest neighbor (LMNN) algorithm. The contributions of our work consist of two aspects. First, our method discards the costterm which shrinks the distances between inquiry input and its k target neighbors (the k nearest neighbors with same labels as inquiry input) in LMNN, and only focuses on improving the action to push the imposters (the samples with different labels form the inquiry input) apart out of the neighborhood of inquiry. As a result, our method does not have the parameter needed to tune on the validating set, which makes it more convenient to use. Second, by leveraging the geometry information of the imposters, we construct a novel cost function to penalize the smalldistances between each inquiry and its imposters. Different from LMNN considering every imposter located in the neighborhood of each inquiry, our method only takes care of the nearest imposters. Because when the nearest imposter is pushed out of the neighborhood of its inquiry, other imposters would be all out. In this way, the constraints in our model are much less than that of LMNN, which makes our method much easier to find the optimal distance metric. Consequently, our method not only learns a better distance metric than LMNN, but also runs faster than LMNN. Extensive experiments on different data sets with various sizes and difficulties are conducted, and the results have shown that, compared with LMNN, PFLMNN achieves better classification results.

【Keywords】: large margin; kNN; distance metric learning

356. Multilinear Regression for Embedded Feature Selection with Application to fMRI Analysis.

Paper Link】 【Pages】:2562-2568

【Authors】: Xiaonan Song ; Haiping Lu

【Abstract】: Embedded feature selection is effective when both prediction and interpretation are needed. The Lasso and its extensions are standard methods for selecting a subset of features while optimizing a prediction function. In this paper, we are interested in embedded feature selection for multidimensional data, wherein (1) there is no need to reshape the multidimensional data into vectors and (2) structural information from multiple dimensions are taken into account. Our main contribution is a new method called Regularized multilinear regression and selection (Remurs) for automatically selecting a subset of features while optimizing prediction for multidimensional data. Both nuclear norm and the ℓ 1 -norm are carefully incorporated to derive a multi-block optimization algorithm with proved convergence. In particular, Remurs is motivated by fMRI analysis where the data are multidimensional and it is important to find the connections of raw brain voxels with functional activities. Experiments on synthetic and real data show the advantages of Remurs compared to Lasso, Elastic Net, and their multilinear extensions.

【Keywords】: Feature Selection; Tensor; Multilinear Regression; Regularization; Low Rank

357. Distributed Negative Sampling for Word Embeddings.

Paper Link】 【Pages】:2569-2575

【Authors】: Stergios Stergiou ; Zygimantas Straznickas ; Rolina Wu ; Kostas Tsioutsiouliklis

【Abstract】: Word2Vec recently popularized dense vector word representations as fixed-length features for machine learning algorithms and is in widespread use today. In this paper we investigate one of its core components, Negative Sampling, and propose efficient distributed algorithms that allow us to scale to vocabulary sizes of more than 1 billion unique words and corpus sizes of more than 1 trillion words.

【Keywords】: negative sampling, word embeddings, word2vec

358. Label-Free Supervision of Neural Networks with Physics and Domain Knowledge.

Paper Link】 【Pages】:2576-2582

【Authors】: Russell Stewart ; Stefano Ermon

【Abstract】: In many machine learning applications, labeled data is scarce and obtaining more labels is expensive. We introduce a new approach to supervising neural networks by specifying constraints that should hold over the output space, rather than direct examples of input-output pairs. These constraints are derived from prior domain knowledge, e.g., from known laws of physics. We demonstrate the effectiveness of this approach on real world and simulated computer vision tasks. We are able to train a convolutional neural network to detect and track objects without any labeled examples. Our approach can significantly reduce the need for labeled training data, but introduces new challenges for encoding prior knowledge into appropriate loss functions.

【Keywords】: computer vision, unsupervised learning, physics

359. Unsupervised Learning with Truncated Gaussian Graphical Models.

Paper Link】 【Pages】:2583-2589

【Authors】: Qinliang Su ; Xuejun Liao ; Chunyuan Li ; Zhe Gan ; Lawrence Carin

【Abstract】: Gaussian graphical models (GGMs) are widely used for statistical modeling, because of ease of inference and the ubiquitous use of the normal distribution in practical approximations. However, they are also known for their limited modeling abilities, due to the Gaussian assumption. In this paper, we introduce a novel variant of GGMs, which relaxes the Gaussian restriction and yet admits efficient inference. Specifically, we impose a bipartite structure on the GGM and govern the hidden variables by truncated normal distributions. The nonlinearity of the model is revealed by its connection to rectified linear unit (ReLU) neural networks. Meanwhile, thanks to the bipartite structure and appealing properties of truncated normals, we are able to train the models efficiently using contrastive divergence. We consider three output constructs, accounting for real-valued, binary and count data. We further extend the model to deep constructions and show that deep models can be used for unsupervised pre-training of rectifier neural networks. Extensive experimental results are provided to validate the proposed models and demonstrate their superiority over competing models.

【Keywords】: Graphical Models, Latent variable model, Restricted Boltzmann Machine

360. Automatic Curriculum Graph Generation for Reinforcement Learning Agents.

Paper Link】 【Pages】:2590-2596

【Authors】: Maxwell Svetlik ; Matteo Leonetti ; Jivko Sinapov ; Rishi Shah ; Nick Walker ; Peter Stone

【Abstract】: In recent years, research has shown that transfer learning methods can be leveraged to construct curricula that sequence a series of simpler tasks such that performance on a final target task is improved. A major limitation of existing approaches is that such curricula are handcrafted by humans that are typically domain experts. To address this limitation, we introduce a method to generate a curriculum based on task descriptors and a novel metric of transfer potential. Our method automatically generates a curriculum as a directed acyclic graph (as opposed to a linear sequence as done in existing work). Experiments in both discrete and continuous domains show that our method produces curricula that improve the agent's learning performance when compared to the baseline condition of learning on the target task from scratch.

【Keywords】: curriculum learning; reinforcement learning; transfer learning; machine learning;

361. Self-Correcting Models for Model-Based Reinforcement Learning.

Paper Link】 【Pages】:2597-2603

【Authors】: Erik Talvitie

【Abstract】: When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.

【Keywords】: Model-based reinforcement learning; Markov decision processes; Monte Carlo planning

362. Distant Domain Transfer Learning.

Paper Link】 【Pages】:2604-2610

【Authors】: Ben Tan ; Yu Zhang ; Sinno Jialin Pan ; Qiang Yang

【Abstract】: In this paper, we study a novel transfer learning problem termed Distant Domain Transfer Learning (DDTL). Different from existing transfer learning problems which assume that there is a close relation between the source domain and the target domain, in the DDTL problem, the target domain can be totally different from the source domain. For example, the source domain classifies face images but the target domain distinguishes plane images. Inspired by the cognitive processof human where two seemingly unrelated concepts can be connected by learning intermediate concepts gradually, we propose a Selective Learning Algorithm (SLA) to solve the DDTL problem with supervised autoencoder or supervised convolutional autoencoder as a base model for handling different types of inputs. Intuitively, the SLA algorithm selects usefully unlabeled data gradually from intermediate domains as a bridge to break the large distribution gap for transferring knowledge between two distant domains. Empirical studies on image classification problems demonstrate the effectiveness of the proposed algorithm, and on some tasks the improvement in terms of the classification accuracy is up to 17% over “non-transfer” methods.

【Keywords】: transfer learning; distant domain; deep learning

363. Confidence-Rated Discriminative Partial Label Learning.

Paper Link】 【Pages】:2611-2617

【Authors】: Cai-Zhi Tang ; Min-Ling Zhang

【Abstract】: Partial label learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one label is valid. The common discriminative solution to learn from partial label examples assumes one parametric model for each class label, whose predictions are aggregated to optimize specific objectives such as likelihood or margin over the training examples. Nonetheless, existing discriminative approaches treat the predictions from all parametric models in an equal manner, where the confidence of each candidate label being the ground-truth label is not differentiated. In this paper, a boosting-style partial label learning approach is proposed to enabling confidence-rated discriminative modeling. Specifically, the ground-truth confidence of each candidate label is maintained in each boosting round and utilized to train the base classifier. Extensive experiments on artificial as well as real-world partial label data sets validate the effectiveness of the confidence-rated discriminative modeling.

【Keywords】: partial label learning; discriminative modeling; candidate label

364. Cross-Domain Ranking via Latent Space Learning.

Paper Link】 【Pages】:2618-2624

【Authors】: Jie Tang ; Wendy Hall

【Abstract】: We study the problem of cross-domain ranking, which addresses learning to rank objects from multiple interrelated domains. In many applications, we may have multiple interrelated domains, some of them with a large amount of training data and others with very little. We often wish to utilize the training data from all these related domains to help improve ranking performance. In this paper, we present a unified model: BayCDR for cross-domain ranking. BayCDR uses a latent space to measure the correlation between different domains, and learns the ranking functions from the interrelated domains via the latent space by a Bayesian model, where each ranking function is based on a weighted average model. An efficient learning algorithm based on variational inference and a generalization bound has been developed. To scale up to handle real large data, we also present a learning algorithm under the Map-Reduce programming model. Finally, we demonstrate the effectiveness and efficiency of BayCDR on large datasets.

【Keywords】: cross-domain ranking; heterogeneous ranking; machine learning

365. How to Train a Compact Binary Neural Network with High Accuracy?

Paper Link】 【Pages】:2625-2631

【Authors】: Wei Tang ; Gang Hua ; Liang Wang

【Abstract】: How to train a binary neural network (BinaryNet) with both high compression rate and high accuracy on large scale dataset? We answer this question through a careful analysis of previous work on BinaryNets, in terms of training strategies, regularization, and activation approximation. Our findings first reveal that a low learning rate is highly preferred to avoid frequent sign changes of the weights, which often makes the learning of BinaryNets unstable. Secondly, we propose to use PReLU instead of ReLU in a BinaryNet to conveniently absorb the scale factor for weights to the activation function, which enjoys high computation efficiency for binarized layers while maintains high approximation accuracy. Thirdly, we reveal that instead of imposing L2 regularization, driving all weights to zero which contradicts with the setting of BinaryNets, we introduce a regularization term that encourages the weights to be bipolar. Fourthly, we discover that the failure of binarizing the last layer, which is essential for high compression rate, is due to the improper output range. We propose to use a scale layer to bring it to normal. Last but not least, we propose multiple binarizations to improve the approximation of the activations. The composition of all these enables us to train BinaryNets with both high compression rate and high accuracy, which is strongly supported by our extensive empirical study.

【Keywords】: binary neural network

Paper Link】 【Pages】:2632-2638

【Authors】: Voot Tangkaratt ; Herke van Hoof ; Simone Parisi ; Gerhard Neumann ; Jan Peters ; Masashi Sugiyama

【Abstract】: Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored. In this paper, we propose a contextual policy search method in the model-based relative entropy stochastic search framework with integrated dimensionality reduction. We learn a model of the reward that is locally quadratic in both the policy parameters and the context variables. Furthermore, we perform supervised linear dimensionality reduction on the context variables by nuclear norm regularization. The experimental results show that the proposed method outperforms naive dimensionality reduction via principal component analysis and a state-of-the-art contextual policy search method.

【Keywords】: Contextual policy search, Robot learning

367. Coactive Critiquing: Elicitation of Preferences and Features.

Paper Link】 【Pages】:2639-2645

【Authors】: Stefano Teso ; Paolo Dragone ; Andrea Passerini

【Abstract】: When faced with complex choices, users refine their own preference criteria as they explore the catalogue of options. In this paper we propose an approach to preference elicitation suited for this scenario. We extend Coactive Learning, which iteratively collects manipulative feedback, to optionally query example critiques. User critiques are integrated into the learning model by dynamically extending the feature space. Our formulation natively supports constructive learning tasks, where the option catalogue is generated on-the-fly. We present an upper bound on the average regret suffered bythe learner. Our empirical analysis highlights the promise of our approach.

【Keywords】: ML: Preferences/Ranking Learning; ML: Online Learning; ML: Recommender Systems

368. Importance Sampling with Unequal Support.

Paper Link】 【Pages】:2646-2652

【Authors】: Philip S. Thomas ; Emma Brunskill

【Abstract】: Importance sampling is often used in machine learning when training and testing data come from different distributions. In this paper we propose a new variant of importance sampling that can reduce the variance of importance samplingbased estimates by orders of magnitude when the supports of the training and testing distributions differ. After motivating and presenting our new importance sampling estimator, we provide a detailed theoretical analysis that characterizes both its bias and variance relative to the ordinary importance sampling estimator (in various settings, which include cases where ordinary importance sampling is biased, while our new estimator is not, and vice versa). We conclude with an example of how our new importance sampling estimator can be used to improve estimates of how well a new treatment policy for diabetes will work for an individual, using only data from when the individual used a previous treatment policy.

【Keywords】: policy evaluation; importance sampling; support

369. Achieving Privacy in the Adversarial Multi-Armed Bandit.

Paper Link】 【Pages】:2653-2659

【Authors】: Aristide Charles Yedia Tossou ; Christos Dimitrakakis

【Abstract】: In this paper, we improve the previously best known regret  bound to achieve ε-differential privacy in oblivious adversarial  bandits from O(T 2/3 /ε) to O(√T lnT/ε). This is achieved  by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear  amount of information in T. However, we can improve this  privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach O(√ ln T)-DP, with a a regret of O(T 2/3 ) that holds against an adaptive adversary, an improvement from the best known of O(T 3/4 ). This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.

【Keywords】: Differential privacy;Multi-armed bandit;EXP3;Adversarial bandits;Online learning

370. Thompson Sampling for Stochastic Bandits with Graph Feedback.

Paper Link】 【Pages】:2660-2666

【Authors】: Aristide C. Y. Tossou ; Christos Dimitrakakis ; Devdatt P. Dubhashi

【Abstract】: We present a simple set of algorithms based on Thompson Sampling for stochastic bandit problems with graph feedback. Thompson Sampling is generally applicable, without the need to construct complicated upper confidence bounds. As we show in this paper, it has excellent performance in problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, as well as extensive experi- mental results on real and simulated networks. More specifically, we tested our algorithms on power law, planted partitions and Erdo's–Rényi graphs, as well as on graphs derived from Facebook and Flixster data and show that they clearly outperform related methods that employ upper confidence bounds.

【Keywords】: Thompson sampling;Stochastic multi-armed bandit;graphical learning;online learning

371. Selecting Sequences of Items via Submodular Maximization.

Paper Link】 【Pages】:2667-2673

【Authors】: Sebastian Tschiatschek ; Adish Singla ; Andreas Krause

【Abstract】: Motivated by many real world applications such as recommendations in online shopping or entertainment, we consider the problem of selecting sequences of items. In this paper we introduce a novel class of utility functions over sequences of items, strictly generalizing the commonly used class of submodular set functions. We encode the sequential dependencies between items by a directed graph underlying the utility function. Classical algorithms fail to achieve any constant factor approximation guarantees on the problem of selecting sequences of bounded length with maximum utility. We propose an efficient algorithm for this problem that comes with strong theoretical guarantees characterized by the structural properties of the underlying graph. We demonstrate the effectiveness of our algorithm in synthetic and real world experiments on a movie recommendation dataset.

【Keywords】:

372. Variable Kernel Density Estimation in High-Dimensional Feature Spaces.

Paper Link】 【Pages】:2674-2680

【Authors】: Christiaan Maarten van der Walt ; Etienne Barnard

【Abstract】: Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum-likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.

【Keywords】: machine learning; probability density estimation; non-parametric density estimation; kernel bandwidth estimation; kernel density estimation; maximum-likelihood; high-dimensional

373. Regularization for Unsupervised Deep Neural Nets.

Paper Link】 【Pages】:2681-2687

【Authors】: Baiyang Wang ; Diego Klabjan

【Abstract】: Unsupervised neural networks, such as restricted Boltzmann machines (RBMs) and deep belief networks (DBNs), are powerful tools for feature selection and pattern recognition tasks. We demonstrate that overfitting occurs in such models just as in deep feedforward neural networks, and discuss possible regularization methods to reduce overfitting. We also propose a "partial" approach to improve the efficiency of Dropout/DropConnect in this scenario, and discuss the theoretical justification of these methods from model convergence and likelihood bounds. Finally, we compare the performance of these methods based on their likelihood and classification error rates for various pattern recognition data sets.

【Keywords】: deep learning; neural networks; unsupervised learning

Paper Link】 【Pages】:2688-2694

【Authors】: Hao Wang ; Xingjian Shi ; Dit-Yan Yeung

【Abstract】: Link prediction is a fundamental task in such areas as social network analysis, information retrieval, and bioinformatics. Usually link prediction methods use the link structures or node attributes as the sources of information. Recently, the relational topic model (RTM) and its variants have been proposed as hybrid methods that jointly model both sources of information and achieve very promising accuracy. However, the representations (features) learned by them are still not effective enough to represent the nodes (items). To address this problem, we generalize recent advances in deep learning from solely modeling i.i.d. sequences of attributes to jointly modeling graphs and non-i.i.d. sequences of attributes. Specifically, we follow the Bayesian deep learning framework and devise a hierarchical Bayesian model, called relational deep learning (RDL), to jointly model high-dimensional node attributes and link structures with layers of latent variables. Due to the multiple nonlinear transformations in RDL, standard variational inference is not applicable. We propose to utilize the product of Gaussians (PoG) structure in RDL to relate the inferences on different variables and derive a generalized variational inference algorithm for learning the variables and predicting the links. Experiments on three real-world datasets show that RDL works surprisingly well and significantly outperforms the state of the art.

【Keywords】: deep learning; link prediction; relational learning

375. Factorization Bandits for Interactive Recommendation.

Paper Link】 【Pages】:2695-2702

【Authors】: Huazheng Wang ; Qingyun Wu ; Hongning Wang

【Abstract】: We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable contextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm's convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved for the developed algorithm, where considerable regret reduction is achieved on both user and item sides. Extensive experimentations on both simulations and large-scale real-world datasets confirmed the advantages of the proposed algorithm compared with several state-of-the-art factorization-based and bandit-based collaborative filtering methods.

【Keywords】: Contextual bandits; latent feature learning; online recommendations; regret analysis

376. Latent Smooth Skeleton Embedding.

Paper Link】 【Pages】:2703-2709

【Authors】: Li Wang ; Qi Mao ; Ivor W. Tsang

【Abstract】: Learning a smooth skeleton in a low-dimensional space from noisy data becomes important in computer vision and computational biology. Existing methods assume that the manifold constructed from the data is smooth, but they lack the ability to model skeleton structures from noisy data. To overcome this issue, we propose a novel probabilistic structured learning model to learn the density of latent embedding given high-dimensional data and its neighborhood graph. The embedded points that form a smooth skeleton structure are obtained by maximum a posteriori (MAP) estimation. Our analysis shows that the resulting similarity matrix is sparse and unique, and its associated kernel has eigenvalues that follow a power law distribution, which leads to the embeddings of a smooth skeleton. The model is extended to learn a sparse similarity matrix when the graph structure is unknown. Extensive experiments demonstrate the effectiveness of the proposed methods on various datasets by comparing them with existing methods.

【Keywords】: structure learning;manifold learning; nonlinear dimensionality reduction; probabilistic model

377. Polynomial Optimization Methods for Matrix Factorization.

Paper Link】 【Pages】:2710-2717

【Authors】: Po-Wei Wang ; Chun-Liang Li ; J. Zico Kolter

【Abstract】: Matrix factorization is a core technique in many machine learning problems, yet also presents a nonconvex and often difficult-to-optimize problem. In this paper we present an approach based upon polynomial optimization techniques that both improves the convergence time of matrix factorization algorithms and helps them escape from local optima. Our method is based on the realization that given a joint search direction in a matrix factorization task, we can solve the ``subspace search'' problem (the task of jointly finding the steps to take in each direction) by solving a bivariate quartic polynomial optimization problem. We derive two methods for solving this problem based upon sum of squares moment relaxations and the Durand-Kerner method, then apply these techniques on matrix factorization to derive a direct coordinate descent approach and a method for speeding up existing approaches. On three benchmark datasets we show the method substantially improves convergence speed over state-of-the-art approaches, while also attaining lower objective value.

【Keywords】: polynomial optimization; matrix factorization; Durand-Kerner method

378. Two-Dimensional PCA with F-Norm Minimization.

Paper Link】 【Pages】:2718-2724

【Authors】: Qianqian Wang ; Quanxue Gao

【Abstract】: Two-dimensional principle component analysis (2DPCA) has been widely used for face image representation and recognition. But it is sensitive to the presence of outliers. To alleviate this problem, we propose a novel robust 2DPCA, namely 2DPCA with F-norm minimization (F-2DPCA), which is intuitive and directly derived from 2DPCA. In F-2DPCA, distance in spatial dimensions (attribute dimensions) is measured in F-norm, while the summation over different data points uses 1-norm. Thus it is robust to outliers and rotational invariant as well. To solve F-2DPCA, we propose a fast iterative algorithm, which has a closed-form solution in each iteration, and prove its convergence. Experimental results on face image databases illustrate its effectiveness and advantages.

【Keywords】: Dimensionality reduction;Principal component analysis

379. Feature Selection Guided Auto-Encoder.

Paper Link】 【Pages】:2725-2731

【Authors】: Shuyang Wang ; Zhengming Ding ; Yun Fu

【Abstract】: Recently the auto-encoder and its variants have demonstrated their promising results in extracting effective features. Specifically, its basic idea of encouraging the output to be as similar as input, ensures the learned representation could faithfully reconstruct the input data. However, one problem arises that not all hidden units are useful to compress the discriminative information while lots of units mainly contribute to represent the task-irrelevant patterns. In this paper, we propose a novel algorithm, Feature Selection Guided Auto-Encoder, which is a unified generative model that integrates feature selection and auto-encoder together. To this end, our proposed algorithm can distinguish the task-relevant units from the task-irrelevant ones to obtain most effective features for future classification tasks. Our model not only performs feature selection on learned high-level features, but also dynamically endows the auto-encoder to produce more discriminative units. Experiments on several benchmarks demonstrate our method's superiority over state-of-the-art approaches.

【Keywords】:

380. Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation.

Paper Link】 【Pages】:2732-2738

【Authors】: Wei Wang ; Hao Wang ; Chen Zhang ; Yang Gao

【Abstract】: As a fundamental constituent of machine learning, domain adaptation generalizes a learning model from a source domain to a different (but related) target domain. In this paper, we focus on semi-supervised domain adaptation and explicitly extend the applied range of unlabeled target samples into the combination of distribution alignment and adaptive classifier learning. Specifically, our extension formulates the following aspects in a single optimization: 1) learning a cross-domain predictive model by developing the Fredholm integral based kernel prediction framework; 2) reducing the distribution difference between two domains; 3) exploring multiple kernels to induce an optimal learning space. Correspondingly, such an extension is distinguished with allowing for noise resiliency, facilitating knowledge transfer and analyzing diverse data characteristics. It is emphasized that we prove the differentiability of our formulation and present an effective optimization procedure based on the reduced gradient, guaranteeing rapid convergence. Comprehensive empirical studies verify the effectiveness of the proposed method.

【Keywords】: domain adaptation; kernel methods; classification

381. Fast Online Incremental Learning on Mixture Streaming Data.

Paper Link】 【Pages】:2739-2745

【Authors】: Yi Wang ; Xin Fan ; Zhongxuan Luo ; Tianzhu Wang ; Maomao Min ; Jiebo Luo

【Abstract】: The explosion of streaming data poses challenges to feature learning methods including linear discriminant analysis (LDA). Many existing LDA algorithms are not efficient enough to incrementally update with samples that sequentially arrive in various manners. First, we propose a new fast batch LDA (FLDA/QR) learning algorithm that uses the cluster centers to solve a lower triangular system that is optimized by the Cholesky-factorization. To take advantage of the intrinsically incremental mechanism of the matrix, we further develop an exact incremental algorithm (IFLDA/QR). The Gram-Schmidt process with reorthogonalization in IFLDA/QR significantly saves the space and time expenses compared with the rank-one QR-updating of most existing methods. IFLDA/QR is able to handle streaming data containing 1) new labeled samples in the existing classes, 2) samples of an entirely new (novel) class, and more significantly, 3) a chunk of examples mixed with those in 1) and 2). Both theoretical analysis and numerical experiments have demonstrated much lower space and time costs (2~10 times faster) than the state of the art, with comparable classification accuracy.

【Keywords】: Incremental linear discriminant analysis(ILDA);linear discriminant analysis(LDA); Online learning;QR-decomposition

382. Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation.

Paper Link】 【Pages】:2746-2753

【Authors】: Yingfei Wang ; Hua Ouyang ; Chu Wang ; Jianhui Chen ; Tsvetan Asamov ; Yi Chang

【Abstract】: Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the results in S (out of M ) different positions. The aim is to maximize the cumulative reward with respect to the best possible subset and positions in hindsight. By the adaptation of a minimum-cost maximum-flow network, a practical algorithm based on Thompson sampling is derived for the (contextual) combinatorial problem, thus resolving the problem of computational intractability.With its potential to work with whole-page recommendation and any probabilistic models, to illustrate the effectiveness of our method, we focus on Gaussian process optimization and a contextual setting where click-through rate is predicted using logistic regression. We demonstrate the algorithms’ performance on synthetic Gaussian process problems and on large-scale news article recommendation datasets from Yahoo! Front Page Today Module.

【Keywords】: multi-armed bandits; Thompson sampling; Whole-page Recommendation; combinatorial optimization; semi-bandits

383. Unbiased Multivariate Correlation Analysis.

Paper Link】 【Pages】:2754-2760

【Authors】: Yisen Wang ; Simone Romano ; Vinh Nguyen ; James Bailey ; Xingjun Ma ; Shu-Tao Xia

【Abstract】: Correlation measures are a key element of statistics and machine learning, and essential for a wide range of data analysis tasks. Most existing correlation measures are for pairwise relationships, but real-world data can also exhibit complex multivariate correlations, involving three or more variables. We argue that multivariate correlation measures should be comparable, interpretable, scalable and unbiased. However, no existing measures satisfy all these requirements. In this paper, we propose an unbiased multivariate correlation measure, called UMC, which satisfies all the above criteria. UMC is a cumulative entropy based non-parametric multivariate correlation measure, which can capture both linear and non-linear correlations for groups of three or more variables. It employs a correction for chance using a statistical model of independence to address the issue of bias. UMC has high interpretability and we empirically show it outperforms state-of-the-art multivariate correlation measures in terms of statistical power, as well as for use in both subspace clustering and outlier detection tasks.

【Keywords】: multivariate correlation measure; bias analysis; statistical model of independence; subspace clustering; outlier detection

384. Beyond RPCA: Flattening Complex Noise in the Frequency Domain.

Paper Link】 【Pages】:2761-2767

【Authors】: Yunhe Wang ; Chang Xu ; Chao Xu ; Dacheng Tao

【Abstract】: Discovering robust low-rank data representations is important in many real-world problems. Traditional robust principal component analysis (RPCA) assumes that the observed data are corrupted by some sparse noise (e.g., Laplacian noise) and utilizes the l1-norm to separate out the noisy compo- nent. Nevertheless, as well as simple Gaussian or Laplacian noise, noise in real-world data is often more complex, and thus the l1 and l2-norms are insufficient for noise charac- terization. This paper presents a more flexible approach to modeling complex noise by investigating their properties in the frequency domain. Although elements of a noise matrix are chaotic in the spatial domain, the absolute values of its alternative coefficients in the frequency domain are constant w.r.t. their variance. Based on this observation, a new robust PCA algorithm is formulated by simultaneously discovering the low-rank and noisy components. Extensive experiments on synthetic data and video background subtraction demon- strate that FRPCA is effective for handles complex noise.

【Keywords】:

385. Improving Efficiency of SVM k-Fold Cross-Validation by Alpha Seeding.

Paper Link】 【Pages】:2768-2774

【Authors】: Zeyi Wen ; Bin Li ; Kotagiri Ramamohanarao ; Jian Chen ; Yawen Chen ; Rui Zhang

【Abstract】: The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.

【Keywords】: Machine Learning; SVM; cross-validation

386. Rank Ordering Constraints Elimination with Application for Kernel Learning.

Paper Link】 【Pages】:2775-2781

【Authors】: Ying Xie ; Chris H. Q. Ding ; Yihong Gong ; Zongze Wu

【Abstract】: A number of machine learning domains,such as information retrieval, recommender systems, kernel learning, neural network-biological systems etc,deal with importance scores. Very often, there existsome prior knowledge that could help improve the performance.In many cases, these prior knowledge manifest themselves in the rank ordering constraints.These inequality constraints are usually very difficult to deal with in optimization.In this paper, we provide a slack variable transformation methods, which effectively eliminatesthe rank ordering inequality constraints, and thus simplify the learning task significantly.We apply this transformation in kernel learning problem, and also provide an efficient algorithm tosolved the transformed system. On seven datasets,our approach reduces the computational time by orders of magnitudes as compared to the current standardquadratically constrained quadratic programming(QCQP) optimization approach.

【Keywords】: machine learning; rank order constraints; kernel learning

387. Solving Indefinite Kernel Support Vector Machine with Difference of Convex Functions Programming.

Paper Link】 【Pages】:2782-2788

【Authors】: Hai-Ming Xu ; Hui Xue ; Xiaohong Chen ; Yunyun Wang

【Abstract】: Indefinite kernel support vector machine (IKSVM) has recently attracted increasing attentions in machine learning. Different from traditional SVMs, IKSVM essentially is a non-convex optimization problem. Some algorithms directly change the spectrum of the indefinite kernel matrix at the cost of losing some valuable information involved in the kernels so as to transform the non-convex problem into a convex one. Other algorithms aim to solve the dual form of IKSVM, but suffer from the dual gap between the primal and dual problems in the case of indefinite kernels. In this paper, we directly focus on the non-convex primal form of IKSVM and propose a novel algorithm termed as IKSVM-DC. According to the characteristics of the spectrum for the indefinite kernel matrix, IKSVM-DC decomposes the objective function into the subtraction of two convex functions and thus reformulates the primal problem as a difference of convex functions (DC) programming which can be optimized by the DC algorithm (DCA). In order to accelerate convergence rate, IKSVM-DC further combines the classical DCA with a line search step along the descent direction at each iteration. A theoretical analysis is then presented to validate that IKSVM-DC can converge to a local minimum. Systematical experiments on real-world datasets demonstrate the superiority of IKSVM-DC compared to state-of-the-art IKSVM related algorithms.

【Keywords】: Indefinite Kernel Support Vector Machine; Difference of Convex Functions Programming; Kernel Method

388. Cleaning the Null Space: A Privacy Mechanism for Predictors.

Paper Link】 【Pages】:2789-2795

【Authors】: Ke Xu ; Tongyi Cao ; Swair Shah ; Crystal Maung ; Haim Schweitzer

【Abstract】: In standard machine learning and regression setting feature values are used to predict some desired information. The privacy challenge considered here is to prevent an adversary from using available feature values to predict confidential information that one wishes to keep secret. We show that this can sometimes be achieved with almost no effect on the qual- ity of predicting desired information. We describe two algorithms aimed at providing such privacy when the predictors have a linear operator in the first stage. The desired effect can be achieved by zeroing out feature components in the approximate null space of the linear operator.

【Keywords】: Privacy; Multi-label Learning; Linear Regression

389. Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee.

Paper Link】 【Pages】:2796-2802

【Authors】: Yi Xu ; Haiqin Yang ; Lijun Zhang ; Tianbao Yang

【Abstract】: In this paper, we address learning problems for high dimensional data. Previously, oblivious random projection based approaches that project high dimensional features onto a random subspace have been used in practice for tackling high-dimensionality challenge in machine learning. Recently, various non-oblivious randomized reduction methods have been developed and deployed for solving many numerical problems such as matrix product approximation, low-rank matrix approximation, etc. However, they are less explored for the machine learning tasks, e.g., classification. More seriously, the theoretical analysis of excess risk bounds for risk minimization, an important measure of generalization performance, has not been established for non-oblivious randomized reduction methods. It therefore remains an open problem what is the benefit of using them over previous oblivious random projection based approaches. To tackle these challenges, we propose an algorithmic framework for employing non-oblivious randomized reduction method for general empirical risk minimizing in machine learning tasks, where the original high-dimensional features are projected onto a random subspace that is derived from the data with a small matrix approximation error. We then derive the first excess risk bound for the proposed non-oblivious randomized reduction approach without requiring strong assumptions on the training data. The established excess risk bound exhibits that the proposed approach provides much better generalization performance and it also sheds more insights about different randomized reduction approaches. Finally, we conduct extensive experiments on both synthetic and real-world benchmark datasets, whose dimension scales to O(10^7), to demonstrate the efficacy of our proposed approach.

【Keywords】: randomized reduction, excess risk, learning

390. A General Efficient Hyperparameter-Free Algorithm for Convolutional Sparse Learning.

Paper Link】 【Pages】:2803-2809

【Authors】: Zheng Xu ; Junzhou Huang

【Abstract】: Structured sparse learning has become a popular and mature research field. Among all structured sparse models, we found an interesting fact that most structured sparse properties could be captured by convolution operators, most famous ones being total variation and wavelet sparsity. This finding has naturally brought us to a generalization termed as Convolutional Sparsity. While this generalization bridges the convolution and sparse learning theory, we are able to propose a general, efficient, hyperparameter-free optimization algorithm framework for convolutional sparse models, thanks to the analysis theory of convolution operators. The convergence of the general, hyperparameter-free algorithm has been comprehensively analyzed, with a non-ergodic rate of O(1/ϵ 2 ) and ergodic rate of O(1/ϵ), where ϵ is the desired accuracy. Extensive experiments confirm the superior performance of our general algorithm in various convolutional sparse models, even better than some application-specialistic algorithms.

【Keywords】: sparse learning; convolutional sparse learning; primal-dual algorithm; hyperparameter-free algorithm; structured sparse learning;

391. Multi-View Correlated Feature Learning by Uncovering Shared Component.

Paper Link】 【Pages】:2810-2816

【Authors】: Xiaowei Xue ; Feiping Nie ; Sen Wang ; Xiaojun Chang ; Bela Stantic ; Min Yao

【Abstract】: Learning multiple heterogeneous features from different data sources is challenging. One research topic is how to exploit and utilize the correlations among various features across multiple views with the aim of improving the performance of learning tasks, such as classification. In this paper, we propose a new multi-view feature learning algorithm that simultaneously analyzes features from different views. Compared to most of the existing subspace learning methods that only focus on exploiting a shared latent subspace, our algorithm not only learns individual information in each view but also captures feature correlations among multiple views by learning a shared component. By assuming that such a component is shared by all views, we simultaneously exploit the shared component and individual information of each view in a batch mode. Since the objective function is non-smooth and difficult to solve, we propose an efficient iterative algorithm for optimization with guaranteed convergence. Extensive experiments are conducted on several benchmark datasets. The results demonstrate that our proposed algorithm performs better than all the compared multi-view learning algorithms.

【Keywords】: Multi-view Learning, Shared Component, Correlated Feature Learning

392. A Framework of Online Learning with Imbalanced Streaming Data.

Paper Link】 【Pages】:2817-2823

【Authors】: Yan Yan ; Tianbao Yang ; Yi Yang ; Jianhui Chen

【Abstract】: A challenge for mining large-scale streaming data overlooked by most existing studies on online learning is the skewdistribution of examples over different classes. Many previous works have considered cost-sensitive approaches in an online setting for streaming data, where fixed costs are assigned to different classes, or ad-hoc costs are adapted based on the distribution of data received so far. However, it is not necessary for them to achieve optimal performance in terms of the measures suited for imbalanced data, such as Fmeasure, area under ROC curve (AUROC), area under precision and recall curve (AUPRC). This work proposes a general framework for online learning with imbalanced streaming data, where examples are coming sequentially and models are updated accordingly on-the-fly. By simultaneously learning multiple classifiers with different cost vectors, the proposed method can be adopted for different target measures for imbalanced data, including F-measure, AUROC and AUPRC. Moreover, we present a rigorous theoretical justification of the proposed framework for the F-measure maximization. Our empirical studies demonstrate the competitive if not better performance of the proposed method compared to previous cost-sensitive and resampling based online learning algorithms and those that are designed for optimizing certain measures.

【Keywords】: Online learning; Imbalanced data

393. TaGiTeD: Predictive Task Guided Tensor Decomposition for Representation Learning from Electronic Health Records.

Paper Link】 【Pages】:2824-2830

【Authors】: Kai Yang ; Xiang Li ; Haifeng Liu ; Jing Mei ; Guo Tong Xie ; Junfeng Zhao ; Bing Xie ; Fei Wang

【Abstract】: With the better availability of healthcare data, such as Electronic Health Records (EHR), more and more data analytics methodologies are developed aiming at digging insights from them to improve the quality of care delivery. There are many challenges on analyzing EHR, such as high dimensionality and event sparsity. Moreover, different from other application domains, the EHR analysis algorithms need to be highly interpretable to make them clinically useful. This makes representation learning from EHRs of key importance. In this paper, we propose an algorithm called Predictive Task Guided Tensor Decomposition (TaGiTeD), to analyze EHRs. Specifically, TaGiTeD learns event interaction patterns that are highly predictive for certain tasks from EHRs with supervised tensor decomposition. Compared with unsupervised methods, TaGiTeD can learn effective EHR representations in a more focused way. This is crucial because most of the medical problems have very limited patient samples, which are not enough for unsupervised algorithms to learn meaningful representations form. We apply TaGiTeD on real world EHR data warehouse and demonstrate that TaGiTeD can learn representations that are both interpretable and predictive.

【Keywords】: Tensor; Dimensionality reduction; EHR data

394. Deep Learning for Fixed Model Reuse.

Paper Link】 【Pages】:2831-2837

【Authors】: Yang Yang ; De-Chuan Zhan ; Ying Fan ; Yuan Jiang ; Zhi-Hua Zhou

【Abstract】: Model reuse attempts to construct a model by utilizing existing available models, mostly trained for other tasks, rather than building a model from scratch. It is helpful to reduce the time cost, data amount, and expertise required. Deep learning has achieved great success in various tasks involving images, voices and videos. There are several studies have the sense of model reuse, by trying to reuse pre-trained deep networks architectures or deep model features to train a new deep model. They, however, neglect the fact that there are many other fixed models or features available. In this paper, we propose a more thorough model reuse scheme, FMR (Fixed Model Reuse). FMR utilizes the learning power of deep models to implicitly grab the useful discriminative information from fixed model/features that have been widely used in general tasks. We firstly arrange the convolution layers of a deep network and the provided fixed model/features in parallel, fully connecting to the output layer nodes. Then, the dependencies between the output layer nodes and the fixed model/features are knockdown such that only the raw feature inputs are needed when the model is being used for testing, though the helpful information in the fixed model/features have already been incorporated into the model. On one hand, by the FMR scheme, the required amount of training data can be significantly reduced because of the reuse of fixed model/features. On the other hand, the fixed model/features are not explicitly used in testing, and thus, the scheme can be quite useful in applications where the fixed model/features are protected by patents or commercial secrets. Experiments on five real-world datasets validate the effectiveness of FMR compared with state-of-the-art deep methods.

【Keywords】: Model Reuse; Deep Learning

395. Learning Deep Latent Space for Multi-Label Classification.

Paper Link】 【Pages】:2838-2844

【Authors】: Chih-Kuan Yeh ; Wei-Chieh Wu ; Wei-Jen Ko ; Yu-Chiang Frank Wang

【Abstract】: Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.

【Keywords】:

396. A Unified Algorithm for One-Cass Structured Matrix Factorization with Side Information.

Paper Link】 【Pages】:2845-2851

【Authors】: Hsiang-Fu Yu ; Hsin-Yuan Huang ; Inderjit S. Dhillon ; Chih-Jen Lin

【Abstract】: In many applications such as recommender systems and multi-label learning the task is to complete a partially observed binary matrix. Such PU learning (positive-unlabeled) problems can be solved by one-class matrix factorization (MF). In practice side information such as user or item features in recommender systems are often available besides the observed positive user-item connections. In this work we consider a generalization of one-class MF so that two types of side information are incorporated and a general convex loss function can be used. The resulting optimization problem is very challenging, but we derive an efficient and effective alternating minimization procedure. Experiments on large-scale multi-label learning and one-class recommender systems demonstrate the effectiveness of our proposed approach.

【Keywords】: Matrix Factorization; PU Learning; Multi-label Learning

397. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.

Paper Link】 【Pages】:2852-2858

【Authors】: Lantao Yu ; Weinan Zhang ; Jun Wang ; Yong Yu

【Abstract】: As a new way of training generative models, Generative Adversarial Net (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.

【Keywords】: Generative Adversarial Nets; Deep Learning; Unsupervised Learning; Reinforcement Learning;

398. CBRAP: Contextual Bandits with RAndom Projection.

Paper Link】 【Pages】:2859-2866

【Authors】: Xiaotian Yu ; Michael R. Lyu ; Irwin King

【Abstract】: Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual data. Due to the curse of dimensionality, there are two challenges in most of the current bandit algorithms: the first is high time-complexity; and the second is extreme large upper regret bounds with high-dimensional data. In this paper, in order to attack the above two challenges effectively, we develop an algorithm of Contextual Bandits via RAndom Projection (CBRAP) in the setting of linear payoffs, which works especially for high-dimensional contextual data. The proposed CBRAP algorithm is time-efficient and flexible, because it enables players to choose an arm in a low-dimensional space, and relaxes the sparsity assumption of constant number of non-zero components in previous work. Besides, we prove an upper regret bound for the proposed algorithm, which is associated with reduced dimensions. By comparing with three benchmark algorithms, we demonstrate improved performance on cumulative payoffs of CBRAP during its sequential decisions on both synthetic and real-world datasets, as well as its superior time-efficiency.

【Keywords】: contextual bandits; random projection; high-dimensional data

399. An Exact Penalty Method for Binary Optimization Based on MPEC Formulation.

Paper Link】 【Pages】:2867-2875

【Authors】: Ganzhao Yuan ; Bernard Ghanem

【Abstract】: Binary optimization is a central problem in mathematical optimization and its applications are abundant. To solve this problem, we propose a new class of continuous optimization techniques, which is based on Mathematical Programming with Equilibrium Constraints (MPECs). We first reformulate the binary program as an equivalent augmented biconvex optimization problem with a bilinear equality constraint, then we propose an exact penalty method to solve it. The resulting algorithm seeks a desirable solution to the original problem via solving a sequence of linear programming convex relaxation subproblems. In addition, we prove that the penalty function, induced by adding the complementarity constraint to the objective, is exact, i.e., it has the same local and global minima with those of the original binary program when the penalty parameter is over some threshold. The convergence of the algorithm can be guaranteed, since it essentially reduces to block coordinate descent in the literature. Finally, we demonstrate the effectiveness of our method on the problem of dense subgraph discovery. Extensive experiments show that our method outperforms existing techniques, such as iterative hard thresholding and linear programming relaxation.

【Keywords】:

400. Scalable Feature Selection via Distributed Diversity Maximization.

Paper Link】 【Pages】:2876-2883

【Authors】: Sepehr Abbasi Zadeh ; Mehrdad Ghadiri ; Vahab S. Mirrokni ; Morteza Zadimoghaddam

【Abstract】: Feature selection is a fundamental problem in machine learning and data mining. The majority of feature selection algorithms are designed for running on a single machine (centralized setting) and they are less applicable to very large datasets. Although there are some distributed methods to tackle this problem, most of them are distributing the data horizontally which are not suitable for datasets with a large number of features and few number of instances. Thus, in this paper, we introduce a novel vertically distributable feature selection method in order to speed up this process and be able to handle very large datasets in a scalable manner. In general, feature selection methods aim at selecting relevant and non-redundant features (Minimum Redundancy and Maximum Relevance). It is much harder to consider redundancy in a vertically distributed setting than a centralized setting since there is no global access to the whole data. To the best of our knowledge, this is the first attempt toward solving the feature selection problem with a vertically distributed filter method which handles the redundancy with consistently comparable results with centralized methods. In this paper, we formalize the feature selection problem as a diversity maximization problem by introducing a mutual-information-based metric distance on the features. We show the effectiveness of our method by performing an extensive empirical study. In particular, we show that our distributed method outperforms state-of-the-art centralized feature selection algorithms on a variety of datasets. From a theoretical point of view, we have proved that the used greedy algorithm in our method achieves an approximation factor of 1/4 for the diversity maximization problem in a distributed setting with high probability. Furthermore, we improve this to 8/25 expected approximation using multiplicity in our distribution.

【Keywords】: Feature Selection; Distributed Feature Selection; Scalable Feature Selection; Distributed Diversity Maximization; Diversity Maximization; Composable Core-set

401. Fast Compressive Phase Retrieval under Bounded Noise.

Paper Link】 【Pages】:2884-2890

【Authors】: Hongyang Zhang ; Shan You ; Zhouchen Lin ; Chao Xu

【Abstract】: We study the problem of recovering a t-sparse real vector from m quadratic equations yi=(aix)^2 with noisy measurements yi's. This is known as the problem of compressive phase retrieval, and has been widely applied to X-ray diffraction imaging, microscopy, quantum mechanics, etc. The challenge is to design a a) fast and b) noise-tolerant algorithms with c) near-optimal sample complexity. Prior work in this direction typically achieved one or two of these goals, but none of them enjoyed the three performance guarantees simultaneously. In this work, with a particular set of sensing vectors ai's, we give a provable algorithm that is robust to any bounded yet unstructured deterministic noise. Our algorithm requires roughly O(t) measurements and runs in O(tnlog (1/epsilon)) time, where epsilon is the error. This result advances the state-of-the-art work, and guarantees the applicability of our method to large datasets. Experiments on synthetic and real data verify our theory.

【Keywords】: Phase Retrieval, Compressed Sensing, Sparsity

402. Query-Efficient Imitation Learning for End-to-End Simulated Driving.

Paper Link】 【Pages】:2891-2897

【Authors】: Jiakai Zhang ; Kyunghyun Cho

【Abstract】: One way to approach end-to-end autonomous driving is to learn a policy that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy is tuned to minimize the difference between the predicted and ground-truth actions. A policy trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often require a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

【Keywords】: imitation learning; deep learning; autonomous driving; racing game

403. Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning.

Paper Link】 【Pages】:2898-2906

【Authors】: Quanshi Zhang ; Ruiming Cao ; Ying Nian Wu ; Song-Chun Zhu

【Abstract】: This paper proposes a learning strategy that embeds object-part concepts into a pre-trained convolutional neural network (CNN), in an attempt to 1) explore explicit semantics hidden in CNN units and 2) gradually transform the pre-trained CNN into a semantically interpretable graphical model for hierarchical object understanding. Given part annotations on very few (e.g., 3-12) objects, our method mines certain latent patterns from the pre-trained CNN and associates them with different semantic parts. We use a four-layer And-Or graph to organize the CNN units, so as to clarify their internal semantic hierarchy. Our method is guided by a small number of part annotations, and it achieves superior part-localization performance (about 13%-107% improvement in part center prediction on the PASCAL VOC and ImageNet datasets)

【Keywords】: Deep Learning; And-Or Graph; Convolutional Neural Network; Weakly-Supervised Learning

404. Universum Prescription: Regularization Using Unlabeled Data.

Paper Link】 【Pages】:2907-2913

【Authors】: Xiang Zhang ; Yann LeCun

【Abstract】: This paper shows that simply prescribing "none of the above" labels to unlabeled data has a beneficial regularization effect to supervised learning. We call it universum prescription by the fact that the prescribed labels cannot be one of the supervised labels. In spite of its simplicity, universum prescription obtained competitive results in training deep convolutional networks for CIFAR-10, CIFAR-100, STL-10 and ImageNet datasets. A qualitative justification of these approaches using Rademacher complexity is presented. The effect of a regularization parameter — probability of sampling from unlabeled data — is also studied empirically.

【Keywords】: Regularization; Convolutional Networks; Unlabeled Data

405. Learning Sparse Task Relations in Multi-Task Learning.

Paper Link】 【Pages】:2914-2920

【Authors】: Yu Zhang ; Qiang Yang

【Abstract】: In multi-task learning, when the number of tasks is large, pairwise task relations exhibit sparse patterns since usually a task cannot be helpful to all of the other tasks and moreover, sparse task relations can reduce the risk of overfitting compared with the dense ones. In this paper, we focus on learning sparse task relations. Based on a regularization framework which can learn task relations among multiple tasks, we propose a SParse covAriance based mulTi-taSk (SPATS) model to learn a sparse covariance by using the ℓ l regularization. The resulting objective function of the SPATS method is convex, which allows us to devise an alternating method to solve it. Moreover, some theoretical properties of the proposed model are studied. Experiments on synthetic and real-world datasets demonstrate the effectiveness of the proposed method.

【Keywords】: Multi-Task Learning

406. Multi-View Clustering via Deep Matrix Factorization.

Paper Link】 【Pages】:2921-2927

【Authors】: Handong Zhao ; Zhengming Ding ; Yun Fu

【Abstract】: Multi-View Clustering (MVC) has garnered more attention recently since many real-world data are comprised of different representations or views. The key is to explore complementary information to benefit the clustering problem. In this paper, we present a deep matrix factorization framework for MVC, where semi-nonnegative matrix factorization is adopted to learn the hierarchical semantics of multi-view data in a layer-wise fashion. To maximize the mutual information from each view, we enforce the non-negative representation of each view in the final layer to be the same. Furthermore, to respect the intrinsic geometric structure in each view data, graph regularizers are introduced to couple the output representation of deep structures. As a non-trivial contribution, we provide the solution based on alternating minimization strategy, followed by a theoretical proof of convergence. The superior experimental results on three face benchmarks show the effectiveness of the proposed deep matrix factorization model.

【Keywords】:

407. SCOPE: Scalable Composite Optimization for Learning on Spark.

Paper Link】 【Pages】:2928-2934

【Authors】: Shen-Yi Zhao ; Ru Xiang ; Ying-Hao Shi ; Peng Gao ; Wu-Jun Li

【Abstract】: Many machine learning models, such as logistic regression (LR) and support vector machine (SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization (DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods might not be scalable enough. In this paper, we propose a novel DSO method, called scalable composite optimization for learning (SCOPE), and implement it on the fault-tolerant distributed platform Spark. SCOPE is both computation-efficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is strongly convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.

【Keywords】:

408. Lock-Free Optimization for Non-Convex Problems.

Paper Link】 【Pages】:2935-2941

【Authors】: Shen-Yi Zhao ; Gong-Duo Zhang ; Wu-Jun Li

【Abstract】: Stochastic gradient descent (SGD) and its variants have attracted much attention in machine learning due to their efficiency and effectiveness for optimization. To handle large-scale problems, researchers have recently proposed several lock-free strategy based parallel SGD (LF-PSGD) methods for multi-core systems. However, existing works have only proved the convergence of these LF-PSGD methods for convex problems. To the best of our knowledge, no work has proved the convergence of the LF-PSGD methods for non-convex problems. In this paper, we provide the theoretical proof about the convergence of two representative LF-PSGD methods, Hogwild! and AsySVRG, for non-convex problems. Empirical results also show that both Hogwild! and AsySVRG are convergent on non-convex problems, which successfully verifies our theoretical results.

【Keywords】:

409. Scalable Graph Embedding for Asymmetric Proximity.

Paper Link】 【Pages】:2942-2948

【Authors】: Chang Zhou ; Yuqiong Liu ; Xiaofei Liu ; Zhongyi Liu ; Jun Gao

【Abstract】: Graph Embedding methods are aimed at mapping each vertex into a low dimensional vector space, which preserves certain structural relationships among the vertices in the original graph. Recently, several works have been proposed to learn embeddings based on sampled paths from the graph, e.g., DeepWalk, Line, Node2Vec. However, their methods only preserve symmetric proximities, which could be insufficient in many applications, even the underlying graph is undirected. Besides, they lack of theoretical analysis of what exactly the relationships they preserve in their embedding space. In this paper, we propose an asymmetric proximity preserving (APP) graph embedding method via random walk with restart, which captures both asymmetric and high-order similarities between node pairs. We give theoretical analysis that our method implicitly preserves the Rooted PageRank score for any two vertices. We conduct extensive experiments on tasks of link prediction and node recommendation on open source datasets, as well as online recommendation services in Alibaba Group, in which the training graph has over 290 million vertices and 18 billion edges, showing our method to be highly scalable and effective.

【Keywords】: Graph Embedding, Dimensionality Reduction, Recommendation

410. Bilinear Probabilistic Canonical Correlation Analysis via Hybrid Concatenations.

Paper Link】 【Pages】:2949-2955

【Authors】: Yang Zhou ; Haiping Lu ; Yiu-ming Cheung

【Abstract】: Canonical Correlation Analysis (CCA) is a classical technique for two-view correlation analysis, while Probabilistic CCA (PCCA) provides a generative and more general viewpoint for this task. Recently, PCCA has been extended to bilinear cases for dealing with two-view matrices in order to preserve and exploit the matrix structures in PCCA. However, existing bilinear PCCAs impose restrictive model assumptions for matrix structure preservation, sacrificing generative correctness or model flexibility. To overcome these drawbacks, we propose BPCCA, a new bilinear extension of PCCA, by introducing a hybrid joint model. Our new model preserves matrix structures indirectly via hybrid vector-based and matrix-based concatenations. This enables BPCCA to gain more model flexibility in capturing two-view correlations and obtain close-form solutions in parameter estimation. Experimental results on two real-world applications demonstrate the superior performance of BPCCA over competing methods.

【Keywords】: Dimensionality Reduction; Probabilistic Model; Bilinear CCA

411. Parametric Dual Maximization for Non-Convex Learning Problems.

Paper Link】 【Pages】:2956-2962

【Authors】: Yuxun Zhou ; Zhaoyi Kang ; Costas J. Spanos

【Abstract】: We consider a class of non-convex learning problems that can be formulated as jointly optimizing regularized hinge loss and a set of auxiliary variables. Such problems encompass but are not limited to various versions of semi-supervised learning,learning with hidden structures, robust learning, etc. Existing methods either suffer from local minima or have to invoke anon-scalable combinatorial search. In this paper, we propose a novel learning procedure, namely Parametric Dual Maximization(PDM), that can approach global optimality efficiently with user specified approximation levels. The building blocks of PDM are two new results: (1) The equivalent convex maximization reformulation derived by parametric analysis.(2) The improvement of local solutions based on a necessary and sufficient condition for global optimality. Experimental results on two representative applications demonstrate the effectiveness of PDM compared to other approaches.

【Keywords】: learning algorithm;non-convex;large margin method

412. One-Step Spectral Clustering via Dynamically Learning Affinity Matrix and Subspace.

Paper Link】 【Pages】:2963-2969

【Authors】: Xiaofeng Zhu ; Wei He ; Yonggang Li ; Yang Yang ; Shichao Zhang ; Rongyao Hu ; Yonghua Zhu

【Abstract】: This paper proposes a one-step spectral clustering method by learning an intrinsic affinity matrix (i.e., the clustering result) from the low-dimensional space (i.e., intrinsic subspace) of original data. Specifically, the intrinsic affinitymatrix is learnt by: 1) the alignment of the initial affinity matrix learnt from original data; 2) the adjustment of the transformation matrix, which transfers the original feature space into its intrinsic subspace by simultaneously conducting feature selection and subspace learning; and 3) the clustering result constraint, i.e., the graph constructed by the intrinsic affinity matrix has exact c connected components where c is the number of clusters. In this way, two affinity matrices and a transformation matrix are iteratively updated until achieving their individual optimum, so that these two affinity matrices are consistent and the intrinsic subspace is learnt via the transformation matrix. Experimental results on both synthetic and benchmark datasets verified that our proposed method outputted more effective clustering result than the previous clustering methods.

【Keywords】:

413. Multi-Kernel Low-Rank Dictionary Pair Learning for Multiple Features Based Image Classification.

Paper Link】 【Pages】:2970-2976

【Authors】: Xiaoke Zhu ; Xiao-Yuan Jing ; Fei Wu ; Di Wu ; Li Cheng ; Sen Li ; Ruimin Hu

【Abstract】: Dictionary learning (DL) is an effective feature learning technique, and has led to interesting results in many classification tasks. Recently, by combining DL with multiple kernel learning (which is a crucial and effective technique for combining different feature representation information), a few multi-kernel DL methods have been presented to solve the multiple feature representations based classification problem. However, how to improve the representation capability and discriminability of multi-kernel dictionary has not been well studied. In this paper, we propose a novel multi-kernel DL approach, named multi-kernel low-rank dictionary pair learning (MKLDPL). Specifically, MKLDPL jointly learns a kernel synthesis dictionary and a kernel analysis dictionary by exploiting the class label information. The learned synthesis and analysis dictionaries work together to implement the coding and reconstruction of samples in the kernel space. To enhance the discriminability of the learned multi-kernel dictionaries, MKLDPL imposes the low-rank regularization on the analysis dictionary, which can make samples from the same class have similar representations. We apply MKLDPL for multiple features based image classification task. Experimental results demonstrate the effectiveness of the proposed approach.

【Keywords】: Multiple kernel learning; Kernel dictionary pair learning; Low-rank regularization; Multiple features based image classification

414. Discover Multiple Novel Labels in Multi-Instance Multi-Label Learning.

Paper Link】 【Pages】:2977-2984

【Authors】: Yue Zhu ; Kai Ming Ting ; Zhi-Hua Zhou

【Abstract】: Multi-instance multi-label learning (MIML) is a learning paradigm where an object is represented by a bag of instances and each bag is associated with multiple labels. Ordinary MIML setting assumes a fixed target label set. In real applications, multiple novel labels may exist outside this set, but hidden in the training data and unknown to the MIML learner. Existing MIML approaches are unable to discover the hidden novel labels, let alone predicting these labels in the previously unseen test data. In this paper, we propose the first approach to discover multiple novel labels in MIML problem using an efficient augmented lagrangian optimization, which has a bag-dependent loss term and a bag-independent clustering regularization term, enabling the known labels and multiple novel labels to be modeled simultaneously. The effectiveness of the proposed approach is validated in experiments.

【Keywords】:

Multiagent Systems 9

415. Improving Surveillance Using Cooperative Target Observation.

Paper Link】 【Pages】:2985-2991

【Authors】: Rashi Aswani ; Sai Krishna Munnangi ; Praveen Paruchuri

【Abstract】: The Cooperative Target Observation (CTO) problem has been of great interest in the multi-agents and robotics literature due to the problem being at the core of a number of applications including surveillance. In CTO problem, the observer agents attempt to maximize the collective time during which each moving target is being observed by at least one observer in the area of interest. However, most of the prior works for the CTO problem consider the targets movement to be Randomized. Given our focus on surveillance domain, we modify this assumption to make the targets strategic and present two target strategies namely Straight-line strategy and Controlled Randomization strategy. We then modify the observer strategy proposed in the literature based on the K-means algorithm to introduce five variants and provide experimental validation. In surveillance domain, it is often reasonable to assume that the observers may themselves be a subject of observation for a variety of purposes by unknown adversaries whose model may not be known. Randomizing the observers actions can help to make their target observation strategy less predictable. As the fifth variant, we therefore introduce Adjustable Randomization into the best performing observer strategy where the observer can adjust the expected loss in reward due to randomization depending on the situation.

【Keywords】: Cooperative Target Observation; Surveillance Applications; K-means; Target Prediction; Memorization; BRLP-CTO; Adjustable Randomization

416. Query Complexity of Tournament Solutions.

Paper Link】 【Pages】:2992-2998

【Authors】: Palash Dey

【Abstract】: A directed graph where there is exactly one edge between every pair of vertices is called a tournament. Finding the “best” set of vertices of a tournament is a well studied problem in social choice theory. A tournament solution takes a tournament as input and outputs a subset of vertices of the input tournament. However, in many applications, for example, choosing the best set of drugs from a given set of drugs, the edges of the tournament are given only implicitly and knowing the orientation of an edge is costly. In such scenarios, we would like to know the best set of vertices (according to some tournament solution) by “querying” as few edges as possible. We, in this paper, precisely study this problem for commonly used tournament solutions: given an oracle access to the edges of a tournament T , find f(T) by querying as few edges as possible, for a tournament solution f. We first show that the set of Condorcet non-losers in a tournament can be found by querying 2n−⌊log n⌋−2 edges only and this is tight in the sense that every algorithm for finding the set of Condorcet non-losers needs to query at least 2n−⌊log n⌋−2 edges in the worst case, where n is the number of vertices in the input tournament. We then move on to study other popular tournament solutions and show that any algorithm for finding the Copeland set, the Slater set, the Markov set, the bipartisan set, the uncovered set, the Banks set, and the top cycle must query Ω(n 2 ) edges in the worst case. On the positive side, we are able to circumvent our strong query complexity lower bound results by proving that, if the size of the top cycle of the input tournament is at most k, then we can find all the tournament solutions mentioned above by querying O(nk + n log n / log(1− 1 / k ) ) edges only.

【Keywords】: tournaments, query complexity, graph elusiveness, voting, social choice

417. Centralized versus Personalized Commitments and Their Influence on Cooperation in Group Interactions.

Paper Link】 【Pages】:2999-3005

【Authors】: The Anh Han ; Luís Moniz Pereira ; Luis A. Martinez-Vaquero ; Tom Lenaerts

【Abstract】: Before engaging in a group venture agents may seek commitments from other members in the group and, based on the level of participation (i.e. the number of actually committed participants), decide whether it is worth joining the venture. Alternatively, agents can delegate this costly process to a (beneficent or non-costly) third-party, who helps seek commitments from the agents. Using methods from Evolutionary Game Theory, this paper shows that, in the context of Public Goods Game, much higher levels of cooperation can be achieved through such centralized commitment management. It provides a more efficient mechanism for dealing with commitment free-riders, those who are not willing to bear the cost of arranging commitments whilst enjoying the benefits provided by the paying commitment proposers. We show that the participation level plays a crucial role in the decision of whether an agreement should be formed; namely, it needs to be more strict in terms of the level of participation required from players of the centralized system for the agreement to be formed; however, once it is done right, it is much more beneficial in terms of the level of cooperation and social welfare achieved. In short, our analysis provides important insights for the design of multi-agent systems that rely on commitments to monitor agents' cooperative behavior.

【Keywords】: Commitment; Cooperation; Evolutionary Game Theory; Public Goods

418. Kont: Computing Tradeoffs in Normative Multiagent Systems.

Paper Link】 【Pages】:3006-3012

【Authors】: Özgür Kafali ; Nirav Ajmeri ; Munindar P. Singh

【Abstract】: We propose Kont, a formal framework for comparing normative multiagent systems (nMASs) by computing tradeoffs among liveness (something good happens) and safety (nothing bad happens). Safety-focused nMASs restrict agents' actions to avoid undesired enactments. However, such restrictions hinder liveness, particularly in situations such as medical emergencies. We formalize tradeoffs using norms, and develop an approach for understanding to what extent an nMAS promotes liveness or safety. We propose patterns to guide the design of an nMAS with respect to liveness and safety, and prove their correctness. We further quantify liveness and safety using heuristic metrics for an emergency healthcare application. We show that the results of the application corroborate our theoretical development.

【Keywords】: Norms; Liveness and safety

419. Parameterised Verification of Infinite State Multi-Agent Systems via Predicate Abstraction.

Paper Link】 【Pages】:3013-3020

【Authors】: Panagiotis Kouvaros ; Alessio Lomuscio

【Abstract】: We define a class of parameterised infinite state multi-agent systems (MAS) that is unbounded in both the number of agents composing the system and the domain of the variables encoding the agents. We analyse their verification problem by combining and extending existing techniques in parameterised model checking with predicate abstraction procedures. The resulting methodology addresses both forms of unboundedness and provides a technique for verifying unbounded MAS defined on infinite-state variables. We illustrate the effectiveness of the technique on an infinite-domain variant of an unbounded version of the train-gate-controller.

【Keywords】: Parameterised Verification; Predicate Abstraction; Cutoffs

420. Decentralized Planning in Stochastic Environments with Submodular Rewards.

Paper Link】 【Pages】:3021-3028

【Authors】: Rajiv Ranjan Kumar ; Pradeep Varakantham ; Akshat Kumar

【Abstract】: Decentralized Markov Decision Process (Dec-MDP) provides a rich framework to represent cooperative decentralized and stochastic planning problems under transition uncertainty. However, solving a Dec-MDP to generate coordinated yet decentralized policies is NEXP-Hard. Researchers have made significant progress in providing approximate approaches to improve scalability with respect to number of agents. However, there has been little or no research devoted to finding guarantees on solution quality for approximate approaches considering multiple (more than 2 agents) agents. We have a similar situation with respect to the competitive decentralized planning problem and the Stochastic Game (SG) model. To address this, we identify models in the cooperative and competitive case that rely on submodular rewards, where we show that existing approximate approaches can provide strong quality guarantees ( a priori, and for cooperative case also posteriori guarantees). We then provide solution approaches and demonstrate improved online guarantees on benchmark problems from the literature for the cooperative case.

【Keywords】: Multiagent Systems; Planning under uncertainty

421. Solving Seven Open Problems of Offline and Online Control in Borda Elections.

Paper Link】 【Pages】:3029-3035

【Authors】: Marc Neveling ; Jörg Rothe

【Abstract】: Standard (offline) control scenarios in elections (such as adding, deleting, or partitioning either voters or candidates) have been studied for many voting systems, natural and less natural ones, and the related control problems have been classified in terms of their complexity. However, for one of the most important natural voting systems, the Borda Count, only a few such complexity results are known. We reduce the number of missing cases by pinpointing the complexity of three control scenarios for Borda elections, including some that arguably are among the practically most relevant ones. We also study online candidate control, an interesting dynamical, partial-information model due to Hemaspaandra et al. (2012a), who mainly focused on general complexity bounds by constructing artificial voting systems—only recently they succeeded in classifying four problems of online candidate control for one natural voting system: sequential plurality (Hemaspaandra et al. 2016). We settle the complexity of another four natural cases: constructive and destructive online control by deleting and adding candidates in sequential Borda elections.

【Keywords】: computational social choice; voting; control; Borda election

422. Collective Multiagent Sequential Decision Making Under Uncertainty.

Paper Link】 【Pages】:3036-3043

【Authors】: Duc Thien Nguyen ; Akshat Kumar ; Hoong Chuin Lau

【Abstract】: Multiagent sequential decision making has seen rapid progress with formal models such as decentralized MDPs and POMDPs. However, scalability to large multiagent systems and applicability to real world problems remain limited. To address these challenges, we study multiagent planning problems where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our work exploits recent advances in graphical models for modeling and inference with a population of individuals such as collective graphical models and the notion of finite partial exchangeability in lifted inference. We develop a collective decentralized MDP model where policies can be computed based on counts of agents in different states. As the policy search space over counts is combinatorial, we develop a sampling based framework that can compute open and closed loop policies. Comparisons with previous best approaches on synthetic instances and a real world taxi dataset modeling supply-demand matching show that our approach significantly outperforms them w.r.t. solution quality.

【Keywords】:

423. Nurturing Group-Beneficial Information-Gathering Behaviors Through Above-Threshold Criteria Setting.

Paper Link】 【Pages】:3044-3052

【Authors】: Igor Rochlin ; David Sarne ; Maytal Bremer ; Ben Grynhaus

【Abstract】: This paper studies a criteria-based mechanism for nurturing and enhancing agents' group-benefiting individual efforts whenever the agents are self-interested. The idea is that only those agents that meet the criteria get to benefit from the group effort, giving an incentive to contribute even when it is otherwise individually irrational. Specifically, the paper provides a comprehensive equilibrium analysis of a threshold-based criteria mechanism for the common cooperative information gathering application, where the criteria is set such that only those whose contribution to the group is above some pre-specified threshold can benefit from the contributions of others. The analysis results in a closed form solution for the strategies to be used in equilibrium and facilitates the numerical investigation of different model properties as well as a comparison to the dual mechanism according to only an agent whose contribution is below the specified threshold gets to benefit from the contributions of others. One important contribution enabled through the analysis provided is in showing that, counter-intuitively, for some settings the use of the above-threshold criteria is outperformed by the use of the below-threshold criteria as far as collective and individual performance is concerned.

【Keywords】: Multi-Agent Exploration; Self-Interested Agents; Cooperation; Teamwork; Economically-Motivated Agents

Natural Language Processing and Knowledge Representation 11

424. Improving Multi-Document Summarization via Text Classification.

Paper Link】 【Pages】:3053-3059

【Authors】: Ziqiang Cao ; Wenjie Li ; Sujian Li ; Furu Wei

【Abstract】: Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multi-document summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.

【Keywords】: summarization; text classification; deep neural network

425. Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions.

Paper Link】 【Pages】:3060-3066

【Authors】: Guoliang Ji ; Kang Liu ; Shizhu He ; Jun Zhao

【Abstract】: Distant supervision for relation extraction is an efficient method to scale relation extraction to very large corpora which contains thousands of relations. However, the existing approaches have flaws on selecting valid instances and lack of background knowledge about the entities. In this paper, we propose a sentence-level attention model to select the valid instances, which makes full use of the supervision information from knowledge bases. And we extract entity descriptions from Freebase and Wikipedia pages to supplement background knowledge for our task. The background knowledge not only provides more information for predicting relations, but also brings better entity representations for the attention module. We conduct three experiments on a widely used dataset and the experimental results show that our approach outperforms all the baseline systems significantly.

【Keywords】:

426. Neural Bag-of-Ngrams.

Paper Link】 【Pages】:3067-3074

【Authors】: Bofang Li ; Tao Liu ; Zhe Zhao ; Puwei Wang ; Xiaoyong Du

【Abstract】: Bag-of-ngrams (BoN) models are commonly used for representing text. One of the main drawbacks of traditional BoN is the ignorance of n-gram's semantics. In this paper, we introduce the concept of Neural Bag-of-ngrams (Neural-BoN), which replaces sparse one-hot n-gram representation in traditional BoN with dense and rich-semantic n-gram representations. We first propose context guided n-gram representation by adding n-grams to word embeddings model. However, the context guided learning strategy of word embeddings is likely to miss some semantics for text-level tasks. Text guided n-gram representation and label guided n-gram representation are proposed to capture more semantics like topic or sentiment tendencies. Neural-BoN with the latter two n-gram representations achieve state-of-the-art results on 4 document-level classification datasets and 6 semantic relatedness categories. They are also on par with some sophisticated DNNs on 3 sentence-level classification datasets. Similar to traditional BoN, Neural-BoN is efficient, robust and easy to implement. We expect it to be a strong baseline and be used in more real-world applications.

【Keywords】:

427. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents.

Paper Link】 【Pages】:3075-3081

【Authors】: Ramesh Nallapati ; Feifei Zhai ; Bowen Zhou

【Abstract】: We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.

【Keywords】: summarization; extractive; neural networks; deep learning

428. Unit Dependency Graph and Its Application to Arithmetic Word Problem Solving.

Paper Link】 【Pages】:3082-3088

【Authors】: Subhro Roy ; Dan Roth

【Abstract】: Math word problems provide a natural abstraction to a range of natural language understanding problems that involve reasoning about quantities, such as interpreting election results, news about casualties, and the financial section of a newspaper. Units associated with the quantities often provide information that is essential to support this reasoning. This paper proposes a principled way to capture and reason about units and shows how it can benefit an arithmetic word problem solver. This paper presents the concept of Unit Dependency Graphs (UDGs), which provides a compact representation of the dependencies between units of numbers mentioned in a given problem. Inducing the UDG alleviates the brittleness of the unit extraction system and allows for a natural way to leverage domain knowledge about unit compatibility, for word problem solving. We introduce a decomposed model for inducing UDGs with minimal additional annotations, and use it to augment the expressions used in the arithmetic word problem solver of (Roy and Roth 2015) via a constrained inference framework. We show that introduction of UDGs reduces the error of the solver by over 10 %, surpassing all existing systems for solving arithmetic word problems. In addition, it also makes the system more robust to adaptation to new vocabulary and equation forms .

【Keywords】: Math word problem solving; Knowledge representation; Question answering; Quantitative reasoning

429. Prerequisite Skills for Reading Comprehension: Multi-Perspective Analysis of MCTest Datasets and Systems.

Paper Link】 【Pages】:3089-3096

【Authors】: Saku Sugawara ; Hikaru Yokono ; Akiko Aizawa

【Abstract】: One of the main goals of natural language processing (NLP) is synthetic understanding of natural language documents, especially reading comprehension (RC). An obstacle to the further development of RC systems is the absence of a synthetic methodology to analyze their performance. It is difficult to examine the performance of systems based solely on their results for tasks because the process of natural language understanding is complex. In order to tackle this problem, we propose in this paper a methodology inspired by unit testing in software engineering that enables the examination of RC systems from multiple aspects. Our methodology consists of three steps. First, we define a set of prerequisite skills for RC based on existing NLP tasks. We assume that RC capability can be divided into these skills. Second, we manually annotate a dataset for an RC task with information regarding the skills needed to answer each question. Finally, we analyze the performance of RC systems for each skill based on the annotation. The last two steps highlight two aspects: the characteristics of the dataset, and the weaknesses in and differences among RC systems. We tested the effectiveness of our methodology by annotating the Machine Comprehension Test (MCTest) dataset and analyzing four existing systems (including a neural system) on it. The results of the annotations showed that answering questions requires a combination of skills, and clarified the kinds of capabilities that systems need to understand natural language. We conclude that the set of prerequisite skills we define are promising for the decomposition and analysis of RC.

【Keywords】: Reading Comprehension

430. Neural Machine Translation with Reconstruction.

Paper Link】 【Pages】:3097-3103

【Authors】: Zhaopeng Tu ; Yang Liu ; Lifeng Shang ; Xiaohua Liu ; Hang Li

【Abstract】: Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy. It has been widely observed that NMT tends to repeatedly translate some source words while mistakenly ignoring other words. To alleviate this problem, we propose a novel encoder-decoder-reconstructor framework for NMT. The reconstructor, incorporated into the NMT model, manages to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible. Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-the-art NMT and statistical MT systems.

【Keywords】: neural machine translation, reconstruction, adequacy

431. SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions.

Paper Link】 【Pages】:3104-3110

【Authors】: Han Xiao ; Minlie Huang ; Lian Meng ; Xiaoyan Zhu

【Abstract】: Knowledge graph embedding represents entities and relations in knowledge graph as low-dimensional, continuous vectors, and thus enables knowledge graph compatible with machine learning models. Though there have been a variety of models for knowledge graph embedding, most methods merely concentrate on the fact triples, while supplementary textual descriptions of entities and relations have not been fully employed. To this end, this paper proposes the semantic space projection (SSP) model which jointly learns from the symbolic triples and textual descriptions. Our model builds interaction between the two information sources, and employs textual descriptions to discover semantic relevance and offer precise semantic embedding. Extensive experiments show that our method achieves substantial improvements against baselines on the tasks of knowledge graph completion and entity classification.

【Keywords】: Knowledge Graph; Representation Learning; Semantic Analysis; Textual Information;

432. Efficiently Answering Technical Questions - A Knowledge Graph Approach.

Paper Link】 【Pages】:3111-3118

【Authors】: Shuo Yang ; Lei Zou ; Zhongyuan Wang ; Jun Yan ; Ji-Rong Wen

【Abstract】: More and more users prefer to ask their technical questions online. For machines, understanding a question is nontrivial. Current approaches lack explicit background knowledge.In this paper, we introduce a novel technical question understanding approach to recommending probable solutions to users. First, a knowledge graph is constructed which contains abundant technical information, and an augmented knowledge graph is built on the basis of the knowledge graph, to link the knowledge graph and documents. Then we develop a light weight question driven mechanism to select candidate documents. To improve the online performance, we propose an index-based random walk to support the online search. We use comprehensive experiments to evaluate the effectiveness of our approach on a large scale of real-world query logs. Our system outperforms main-stream search engine and the state-of-art information retrieval methods. Meanwhile, extensive experiments confirm the efficiency of our index-based online search mechanism.

【Keywords】:

433. Incorporating Knowledge Graph Embeddings into Topic Modeling.

Paper Link】 【Pages】:3119-3126

【Authors】: Liang Yao ; Yin Zhang ; Baogang Wei ; Zhe Jin ; Rui Zhang ; Yangyang Zhang ; Qinfei Chen

【Abstract】: Probabilistic topic models could be used to extract low-dimension topics from document collections. However, such models without any human knowledge often produce topics that are not interpretable. In recent years, a number of knowledge-based topic models have been proposed, but they could not process fact-oriented triple knowledge in knowledge graphs. Knowledge graph embeddings, on the other hand, automatically capture relations between entities in knowledge graphs. In this paper, we propose a novel knowledge-based topic model by incorporating knowledge graph embeddings into topic modeling. By combining latent Dirichlet allocation, a widely used topic model with knowledge encoded by entity vectors, we improve the semantic coherence significantly and capture a better representation of a document in the topic space. Our evaluation results will demonstrate the effectiveness of our method.

【Keywords】:

434. A Context-Enriched Neural Network Method for Recognizing Lexical Entailment.

Paper Link】 【Pages】:3127-3134

【Authors】: Kun Zhang ; Enhong Chen ; Qi Liu ; Chuanren Liu ; Guangyi Lv

【Abstract】: Recognizing lexical entailment (RLE) always plays an important role in inference of natural language, i.e., identifying whether one word entails another, for example, fox entails animal. In the literature, automatically recognizing lexical entailment for word pairs deeply relies on words' contextual representations. However, as a "prototype" vector, a single representation cannot reveal multifaceted aspects of the words due to their homonymy and polysemy. In this paper, we propose a supervised Context-Enriched Neural Network (CENN) method for recognizing lexical entailment. To be specific, we first utilize multiple embedding vectors from different contexts to represent the input word pairs. Then, through different combination methods and attention mechanism, we integrate different embedding vectors and optimize their weights to predict whether there are entailment relations in word pairs. Moreover, our proposed framework is flexible and open to handle different word contexts and entailment perspectives in the text corpus. Extensive experiments on five datasets show that our approach significantly improves the performance of automatic RLE in comparison with several state-of-the-art methods.

【Keywords】: Recognizing lexical entailment; Context-enriched; Neural network

Natural Language Processing and Machine Learning 38

435. Bayesian Neural Word Embedding.

Paper Link】 【Pages】:3135-3143

【Authors】: Oren Barkan

【Abstract】: Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram with negative sampling, known also as word2vec, advanced the state-of-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm. The algorithm relies on a Variational Bayes solution for the Skip-Gram objective and a detailed step by step description is provided. We present experimental results that demonstrate the performance of the proposed algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method.

【Keywords】: Bayesian Skip-Gram; Bayesian Word Embedding; Neural Word Embedding

436. Improving Word Embeddings with Convolutional Feature Learning and Subword Information.

Paper Link】 【Pages】:3144-3151

【Authors】: Shaosheng Cao ; Wei Lu

【Abstract】: We present a novel approach to learning word embeddings by exploring subword information (character n-gram, root/affix and inflections) and capturing the structural information of their context with convolutional feature learning. Specifically, we introduce a convolutional neural network architecture that allows us to measure structural information of context words and incorporate subword features conveying semantic, syntactic and morphological information related to the words. To assess the effectiveness of our model, we conduct extensive experiments on the standard word similarity and word analogy tasks. We showed improvements over existing state-of-the-art methods for learning word embeddings, including skipgram, GloVe, char n-gram and DSSM.

【Keywords】:

437. Joint Copying and Restricted Generation for Paraphrase.

Paper Link】 【Pages】:3152-3158

【Authors】: Ziqiang Cao ; Chuwei Luo ; Wenjie Li ; Sujian Li

【Abstract】: Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq) models use a single decoder and neglect this fact. In this paper, we develop a novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder. The copying decoder finds the position to be copied based on a typical attention model. The generative decoder produces words limited in the source-specific vocabulary. To combine the two decoders and determine the final output, we develop a predictor to predict the mode of copying or rewriting. This predictor can be guided by the actual writing mode in the training data. We conduct extensive experiments on two different paraphrase datasets. The result shows that our model outperforms the state-of-the-art approaches in terms of both informativeness and language quality.

【Keywords】: paraphrase; seq2seq; copy; rewrite

438. Unsupervised Learning of Evolving Relationships Between Literary Characters.

Paper Link】 【Pages】:3159-3165

【Authors】: Snigdha Chaturvedi ; Mohit Iyyer ; Hal Daumé III

【Abstract】: Understanding inter-character relationships is fundamental for understanding character intentions and goals in a narrative. This paper addresses unsupervised modeling of relationships between characters. We model relationships as dynamic phenomenon, represented as evolving sequences of latent states empirically learned from data. Unlike most previous work our approach is completely unsupervised. This enables data-driven inference of inter-character relationship types beyond simple sentiment polarities, by incorporating lexical and semantic representations, and leveraging large quantities of raw text. We present three models based on rich sets of linguistic features that capture various cues about relationships. We compare these models with existing techniques and also demonstrate that relationship categories learned by our model are semantically coherent.

【Keywords】: inter-personal relationships, Unsupervised methods, Markov models, Computational Narratives

439. Translation Prediction with Source Dependency-Based Context Representation.

Paper Link】 【Pages】:3166-3172

【Authors】: Kehai Chen ; Tiejun Zhao ; Muyun Yang ; Lemao Liu

【Abstract】: Learning context representations is very promising to improve translation results, particularly through neural networks. Previous efforts process the context words sequentially and neglect their internal syntactic structure. In this paper, we propose a novel neural network based on bi-convolutional architecture to represent the source dependency-based context for translation prediction. The proposed model is able to not only encode the long-distance dependencies but also capture the functional similarities for better translation prediction (i.e., ambiguous words translation and word forms translation). Examined by a large-scale Chinese-English translation task, the proposed approach achieves a significant improvement (of up to +1.9 BLEU points) over the baseline system, and meanwhile outperforms a number of context-enhanced comparison system.

【Keywords】: translation prediction; source dependency; neural Network; context representation; statistical machine translation

440. Maximum Reconstruction Estimation for Generative Latent-Variable Models.

Paper Link】 【Pages】:3173-3179

【Authors】: Yong Cheng ; Yang Liu ; Wei Xu

【Abstract】: Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly.

【Keywords】: maximum reconstruction estimation

441. Incorporating Expert Knowledge into Keyphrase Extraction.

Paper Link】 【Pages】:3180-3187

【Authors】: Sujatha Das Gollapalli ; Xiao-Li Li ; Peng Yang

【Abstract】: Keyphrases that efficiently summarize a document’s content are used in various document processing and retrieval tasks. Current state-of-the-art techniques for keyphrase extraction operate at a phrase-level and involve scoring candidate phrases based on features of their component words.In this paper, we learn keyphrase taggers for research papers using token-based features incorporating linguistic, surface-form, and document-structure information through sequence labeling. We experimentally illustrate that using within document features alone, our tagger trained with ConditionalRandom Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show significant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers.

【Keywords】: keyphrase extraction; conditional random fields; feature labeling

442. Unsupervised Learning for Lexicon-Based Classification.

Paper Link】 【Pages】:3188-3194

【Authors】: Jacob Eisenstein

【Abstract】: In lexicon-based classification, documents are assigned labels by comparing the number of words that appear from two opposed lexicons, such as positive and negative sentiment. Creating such words lists is often easier than labeling instances, and they can be debugged by non-experts if classification performance is unsatisfactory. However, there is little analysis or justification of this classification heuristic. This paper describes a set of assumptions that can be used to derive a probabilistic justification for lexicon-based classification, as well as an analysis of its expected accuracy. One key assumption behind lexicon-based classification is that all words in each lexicon are equally predictive. This is rarely true in practice, which is why lexicon-based approaches are usually outperformed by supervised classifiers that learn distinct weights on each word from labeled instances. This paper shows that it is possible to learn such weights without labeled data, by leveraging co-occurrence statistics across the lexicons. This offers the best of both worlds: light supervision in the form of lexicons, and data-driven classification with higher accuracy than traditional word-counting heuristics.

【Keywords】: text classification; machine learning

443. Open-Vocabulary Semantic Parsing with both Distributional Statistics and Formal Knowledge.

Paper Link】 【Pages】:3195-3201

【Authors】: Matt Gardner ; Jayant Krishnamurthy

【Abstract】: Traditional semantic parsers map language onto compositional, executable queries in a fixed schema. This mapping allows them to effectively leverage the information contained in large, formal knowledge bases (KBs, e.g., Freebase) to answer questions, but it is also fundamentally limiting---these semantic parsers can only assign meaning to language that falls within the KB's manually-produced schema. Recently proposed methods for open vocabulary semantic parsing overcome this limitation by learning execution models for arbitrary language, essentially using a text corpus as a kind of knowledge base. However, all prior approaches to open vocabulary semantic parsing replace a formal KB with textual information, making no use of the KB in their models. We show how to combine the disparate representations used by these two approaches, presenting for the first time a semantic parser that (1) produces compositional, executable representations of language, (2) can successfully leverage the information contained in both a formal KB and a large corpus, and (3) is not limited to the schema of the underlying KB. We demonstrate significantly improved performance over state-of-the-art baselines on an open-domain natural language question answering task.

【Keywords】: semantic parsing; question answering; knowledge bases

444. Geometry of Compositionality.

Paper Link】 【Pages】:3202-3208

【Authors】: Hongyu Gong ; Suma Bhat ; Pramod Viswanath

【Abstract】: This paper proposes a simple test for compositionality (i.e., literal usage) of a word or phrase in a context-specific way. The test is computationally simple, relying on no external resources and only uses a set of trained word vectors. Experiments show that the proposed method is competitive with state of the art and displays high accuracy in context-specific compositionality detection of a variety of natural language phenomena (idiomaticity, sarcasm, metaphor) for different datasets in multiple languages. The key insight is to connect compositionality to a curious geometric property of word embeddings, which is of independent interest.

【Keywords】: Computation; Language; Compositionality

445. Disambiguating Spatial Prepositions Using Deep Convolutional Networks.

Paper Link】 【Pages】:3209-3215

【Authors】: Kaveh Hassani ; Won-Sook Lee

【Abstract】: We address the coarse-grained disambiguation of the spatial prepositions as the first step towards spatial role labeling using deep learning models. We propose a hybrid feature of word embeddings and linguistic features, and compare its performance against a set of linguistic features, pre-trained word embeddings, and corpus-trained embeddings using seven classical machine learning classifiers and two deep learning models. We also compile a dataset of 43,129 sample sentences from Pattern Dictionary of English Prepositions (PDEP). The comprehensive experimental results suggest that the combination of the hybrid feature and a convolutional neural network outperforms state-of-the-art methods and reaches the accuracy of 94.21% and F1-score of 0.9398.

【Keywords】: word sense disambiguation; spatial relations; deep learning

446. A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media.

Paper Link】 【Pages】:3216-3222

【Authors】: Hangfeng He ; Xu Sun

【Abstract】: Named entity recognition (NER) in Chinese social media is important but difficult because of its informality and strong noise. Previous methods only focus on in-domain supervised learning which is limited by the rare annotated data. However, there are enough corpora in formal domains and massive in-domain unannotated texts which can be used to improve the task. We propose a unified model which can learn from out-of-domain corpora and in-domain unannotated texts. The unified model contains two major functions. One is for cross-domain learning and another for semi-supervised learning. Cross-domain learning function can learn out-of-domain information based on domain similarity. Semi-Supervised learning function can learn in-domain unannotated information by self-training. Both learning functions outperform existing methods for NER in Chinese social media. Finally, our unified model yields nearly 11% absolute improvement over previously published results.

【Keywords】: Semi-Supervised; Cross-Domain; Named Entity Recognition (NER)

447. Recurrent Attentional Topic Model.

Paper Link】 【Pages】:3223-3229

【Authors】: Shuangyin Li ; Yu Zhang ; Rong Pan ; Mingzhi Mao ; Yang Yang

【Abstract】: In a document, the topic distribution of a sentence depends on both the topics of preceding sentences and its own content, and it is usually affected by the topics of the preceding sentences with different weights. It is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a Recurrent Attentional Topic Model (RATM) for document embedding. The RATM not only takes advantage of the sequential orders among sentence but also use the attention mechanism to model the relations among successive sentences. In RATM, we propose a Recurrent Attentional Bayesian Process (RABP) to handle the sequences. Based on the RABP, RATM fully utilizes the sequential information of the sentences in a document. Experiments on two copora show that our model outperforms state-of-the-art methods on document modeling and classification.

【Keywords】: Recurrent; Attention; Topic Model

448. Representations of Context in Recognizing the Figurative and Literal Usages of Idioms.

Paper Link】 【Pages】:3230-3236

【Authors】: Changsheng Liu ; Rebecca Hwa

【Abstract】: Many idiomatic expressions can be interpreted literally or figuratively, depending on the context in which they occur. Developing an appropriate computational model of the context is crucial for automatic idiom usage recognition. While many existing methods incorporate some elements of context, they have not sufficiently captured the interactions between the linguistic properties of idiomatic expressions and the representations of the context. In this paper we perform an in-depth exploration of the role of representations of the context for idiom usage recognition; we highlight the advantages and limitations of different representation choices in existing methods in terms of known linguistic properties of idioms; we then propose a supervised ensemble method that selects representations adaptively for different idioms. Experimental result suggests that the proposed method performs better for a wider range of idioms than previous methods.

【Keywords】: figurative language; idiom; semantic representation

449. Deterministic Attention for Sequence-to-Sequence Constituent Parsing.

Paper Link】 【Pages】:3237-3243

【Authors】: Chunpeng Ma ; Lemao Liu ; Akihiro Tamura ; Tiejun Zhao ; Eiichiro Sumita

【Abstract】: The sequence-to-sequence model is proven to be extremely successful in constituent parsing. It relies on one key technique, the probabilistic attention mechanism, to automatically select the context for prediction. Despite its successes, the probabilistic attention model does not always select the most important context. For example, the headword and boundary words of a subtree have been shown to be critical when predicting the constituent label of the subtree, but this contextual information becomes increasingly difficult to learn as the length of the sequence increases. In this study, we proposed a deterministic attention mechanism that deterministically selects the important context and is not affected by the sequence length. We implemented two different instances of this framework. When combined with a novel bottom-up linearization method, our parser demonstrated better performance than that achieved by the sequence-to-sequence parser with probabilistic attention mechanism.

【Keywords】:

450. S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions.

Paper Link】 【Pages】:3244-3251

【Authors】: Xianling Mao ; Bo-Si Feng ; Yi-Jing Hao ; Liqiang Nie ; Heyan Huang ; Guihua Wen

【Abstract】: To compare the similarity of probability distributions, the information-theoretically motivated metrics like Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metrics for probability distributions. In this paper, we first introduce a new approximation formula for S2JSD-distance, and then propose a novel LSH scheme adapted to S2JSD-distance for approximate nearest neighbors search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.

【Keywords】: LSH; Probability distribution; Hashing

451. Coherent Dialogue with Attention-Based Language Models.

Paper Link】 【Pages】:3252-3258

【Authors】: Hongyuan Mei ; Mohit Bansal ; Matthew R. Walter

【Abstract】: We model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism. Our attention-RNN language model dynamically increases the scope of attention on the history as the conversation continues, as opposed to standard attention (or alignment) models with a fixed input scope in a sequence-to-sequence model. This allows each generated word to be associated with the most relevant words in its corresponding conversation history. We evaluate the model on two popular dialogue datasets, the open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot dataset, and achieve significant improvements over the state-of-the-art and baselines on several metrics, including complementary diversity-based metrics, human evaluation, and qualitative visualizations. We also show that a vanilla RNN with dynamic attention outperforms more complex memory models (e.g., LSTM and GRU) by allowing for flexible, long-distance memory. We promote further coherence via topic modeling-based reranking.

【Keywords】: Coherent dialogue; Dialogue system; Neural attention; Neural language model; Neural dialogue model

452. Definition Modeling: Learning to Define Word Embeddings in Natural Language.

Paper Link】 【Pages】:3259-3266

【Authors】: Thanapon Noraset ; Chen Liang ; Larry Birnbaum ; Doug Downey

【Abstract】: Distributed representations of words have been shown to capture lexical semantics, based on their effectiveness in word similarity and analogical relation tasks. But, these tasks only evaluate lexical semantics indirectly. In this paper, we study whether it is possible to utilize distributed representations to generate dictionary definitions of words, as a more direct and transparent representation of the embeddings' semantics. We introduce definition modeling, the task of generating a definition for a given word and its embedding. We present different definition model architectures based on recurrent neural networks, and experiment with the models over multiple data sets. Our results show that a model that controls dependencies between the word being defined and the definition words performs significantly better, and that a character-level convolution layer that leverages morphology can complement word-level embeddings. Our analysis reveals which components of our models contribute to accuracy. Finally, the errors made by a definition model may provide insight into the shortcomings of word embeddings.

【Keywords】: word embedding; recurrent neural network; natural language generation; dictionary definition; semantics

453. Incrementally Learning the Hierarchical Softmax Function for Neural Language Models.

Paper Link】 【Pages】:3267-3273

【Authors】: Hao Peng ; Jianxin Li ; Yangqiu Song ; Yaopeng Liu

【Abstract】: Neural network language models (NNLMs) have attracted a lot of attention recently. In this paper, we present a training method that can incrementally train the hierarchical softmax function for NNMLs. We split the cost function to model old and update corpora separately, and factorize the objective function for the hierarchical softmax. Then we provide a new stochastic gradient based method to update all the word vectors and parameters, by comparing the old tree generated based on the old corpus and the new tree generated based on the combined (old and update) corpus. Theoretical analysis shows that the mean square error of the parameter vectors can be bounded by a function of the number of changed words related to the parameter node. Experimental results show that incremental training can save a lot of time. The smaller the update corpus is, the faster the update training process is, where an up to 30 times speedup has been achieved. We also use both word similarity/relatedness tasks and dependency parsing task as our benchmarks to evaluate the correctness of the updated word vectors.

【Keywords】: Incremental Learning; Word Representation; CBOW; Skip-graw

454. Condensed Memory Networks for Clinical Diagnostic Inferencing.

Paper Link】 【Pages】:3274-3280

【Authors】: Aaditya Prakash ; Siyuan Zhao ; Sadid A. Hasan ; Vivek Datla ; Kathy Lee ; Ashequl Qadir ; Joey Liu ; Oladimeji Farri

【Abstract】: Diagnosis of a clinical condition is a challenging task, which often requires significant medical investigation. Previous work related to diagnostic inferencing problems mostly consider multivariate observational data (e.g. physiological signals, lab tests etc.). In contrast, we explore the problem using free-text medical notes recorded in an electronic health record (EHR). Complex tasks like these can benefit from structured knowledge bases, but those are not scalable. We instead exploit raw text from Wikipedia as a knowledge source. Memory networks have been demonstrated to be effective in tasks which require comprehension of free-form text. They use the final iteration of the learned representation to predict probable classes. We introduce condensed memory neural networks (C-MemNNs), a novel model with iterative condensation of memory representations that preserves the hierarchy of features in the memory. Experiments on the MIMIC-III dataset show that the proposed model outperforms other variants of memory networks to predict the most probable diagnoses given a complex clinical scenario.

【Keywords】: Memory Networks, Neural Networks for Medicine, Key-Value Memory Networks, Condensed Memory Networks; Diagnostic Inferencing

455. Robsut Wrod Reocginiton via Semi-Character Recurrent Neural Network.

Paper Link】 【Pages】:3281-3287

【Authors】: Keisuke Sakaguchi ; Kevin Duh ; Matt Post ; Benjamin Van Durme

【Abstract】: Language processing mechanism by humans is generally more robust than computers. The Cmabrigde Uinervtisy (Cambridge University) effect from the psycholinguistics literature has demonstrated such a robust word processing mechanism, where jumbled words (e.g. Cmabrigde / Cambridge) are recognized with little cost. On the other hand, computational models for word recognition (e.g. spelling checkers) perform poorly on data with such noise. Inspired by the findings from the Cmabrigde Uinervtisy effect, we propose a word recognition model based on a semi-character level recurrent neural network (scRNN). In our experiments, we demonstrate that scRNN has significantly more robust performance in word spelling correction (i.e. word recognition) compared to existing spelling checkers and character-based convolutional neural network. Furthermore, we demonstrate that the model is cognitively plausible by replicating a psycholinguistics experiment about human reading difficulty using our model.

【Keywords】:

456. Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation.

Paper Link】 【Pages】:3288-3294

【Authors】: Iulian Vlad Serban ; Tim Klinger ; Gerald Tesauro ; Kartik Talamadupula ; Bowen Zhou ; Yoshua Bengio ; Aaron C. Courville

【Abstract】: We introduce a new class of models called multiresolution recurrent neural networks, which explicitly model natural language generation at multiple levels of abstraction. The models extend the sequence-to-sequence framework to generate two parallel stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language words (e.g. sentences). The coarse sequences follow a latent stochastic process with a factorial representation, which helps the models generalize to new examples. The coarse sequences can also incorporate task-specific knowledge, when available. In our experiments, the coarse sequences are extracted using automatic procedures, which are designed to capture compositional structure and semantics. These procedures enable training the multiresolution recurrent neural networks by maximizing the exact joint log-likelihood over both sequences. We apply the models to dialogue response generation in the technical support domain and compare them with several competing models. The multiresolution recurrent neural networks outperform competing models by a substantial margin, achieving state-of-the-art results according to both a human evaluation study and automatic evaluation metrics. Furthermore, experiments show the proposed models generate more fluent, relevant and goal-oriented responses.

【Keywords】: Dialogue System; Conversational System; Chatbot; Neural Network; Deep Learning; Generative Models; Variational Autoencoder; Latent Variable Model; Variational Learning; Technical Support; Ubuntu

457. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues.

Paper Link】 【Pages】:3295-3301

【Authors】: Iulian Vlad Serban ; Alessandro Sordoni ; Ryan Lowe ; Laurent Charlin ; Joelle Pineau ; Aaron C. Courville ; Yoshua Bengio

【Abstract】: Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterances in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.

【Keywords】: Dialogue System; Conversational System; Chatbot; Neural Network; Deep Learning; Generative Models; Variational Autoencoder; Latent Variable Model; Variational Learning; Twitter

458. Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation.

Paper Link】 【Pages】:3302-3308

【Authors】: Jinsong Su ; Zhixing Tan ; Deyi Xiong ; Rongrong Ji ; Xiaodong Shi ; Yang Liu

【Abstract】: Neural machine translation (NMT) heavily relies on word-level modelling to learn semantic representations of input sentences.However, for languages without natural word delimiters (e.g., Chinese) where input sentences have to be tokenized first,conventional NMT is confronted with two issues:1) it is difficult to find an optimal tokenization granularity for source sentence modelling, and2) errors in 1-best tokenizations may propagate to the encoder of NMT.To handle these issues, we propose word-lattice based Recurrent Neural Network (RNN) encoders for NMT,which generalize the standard RNN to word lattice topology.The proposed encoders take as input a word lattice that compactly encodes multiple tokenizations, and learn to generate new hidden states from arbitrarily many inputs and hidden states in preceding time steps.As such, the word-lattice based encoders not only alleviate the negative impact of tokenization errors but also are more expressive and flexible to embed input sentences.Experiment results on Chinese-English translation demonstrate the superiorities of the proposed encoders over the conventional encoder.

【Keywords】:

459. Semantic Parsing with Neural Hybrid Trees.

Paper Link】 【Pages】:3309-3315

【Authors】: Raymond Hendy Susanto ; Wei Lu

【Abstract】: We propose a neural graphical model for parsing natural language sentences into their logical representations. The graphical model is based on hybrid tree structures that jointly represent both sentences and semantics. Learning and decoding are done using efficient dynamic programming algorithms. The model is trained under a discriminative setting, which allows us to incorporate a rich set of features. Hybrid tree structures have shown to achieve state-of-the-art results on standard semantic parsing datasets. In this work, we propose a novel model that incorporates a rich, nonlinear featurization by a feedforward neural network. The error signals are computed with respect to the conditional random fields (CRFs) objective using an inside-outside algorithm, which are then backpropagated to the neural network. We demonstrate that by combining the strengths of the exact global inference in the hybrid tree models and the power of neural networks to extract high level features, our model is able to achieve new state-of-the-art results on standard benchmark datasets across different languages.

【Keywords】: semantic parsing; hybrid trees; graphical models; neural networks

460. Coupled Multi-Layer Attentions for Co-Extraction of Aspect and Opinion Terms.

Paper Link】 【Pages】:3316-3322

【Authors】: Wenya Wang ; Sinno Jialin Pan ; Daniel Dahlmeier ; Xiaokui Xiao

【Abstract】: The task of aspect and opinion terms co-extraction aims to explicitly extract aspect terms describing features of an entity and opinion terms expressing emotions from user-generated texts. To achieve this task, one effective approach is to exploit relations between aspect terms and opinion terms by parsing syntactic structure for each sentence. However, this approach requires expensive effort for parsing and highly depends on the quality of the parsing results. In this paper, we offer a novel deep learning model, named coupled multi-layer attentions. The proposed model provides an end-to-end solution and does not require any parsers or other linguistic resources for preprocessing. Specifically, the proposed model is a multi-layer attention network, where each layer consists of a couple of attentions with tensor operators. One attention is for extracting aspect terms, while the other is for extracting opinion terms. They are learned interactively to dually propagate information between aspect terms and opinion terms. Through multiple layers, the model can further exploit indirect relations between terms for more precise information extraction. Experimental results on three benchmark datasets in SemEval Challenge 2014 and 2015 show that our model achieves state-of-the-art performances compared with several baselines.

【Keywords】: information extraction;deep learning;multi-layer attentions;aspect terms extraction;opinion terms extraction

461. Dual-Clustering Maximum Entropy with Application to Classification and Word Embedding.

Paper Link】 【Pages】:3323-3329

【Authors】: Xiaolong Wang ; Jingjing Wang ; Chengxiang Zhai

【Abstract】: Maximum Entropy (ME), as a general-purpose machine learning model, has been successfully applied to various fields such as text mining and natural language processing. It has been used as a classification technique and recently also applied to learn word embedding. ME establishes a distribution of the exponential form over items (classes/words). When training such a model, learning efficiency is guaranteed by globally updating the entire set of model parameters associated with all items at each training instance. This creates a significant computational challenge when the number of items is large. To achieve learning efficiency with affordable computational cost, we propose an approach named Dual-Clustering Maximum Entropy (DCME). Exploiting the primal-dual form of ME, it conducts clustering in the dual space and approximates each dual distribution by the corresponding cluster center. This naturally enables a hybrid online-offline optimization algorithm whose time complexity per instance only scales as the product of the feature/word vector dimensionality and the cluster number. Experimental studies on text classification and word embedding learning demonstrate that DCME effectively strikes a balance between training speed and model quality, substantially outperforming state-of-the-art methods.

【Keywords】: Maximum Entropy; Dual Clustering

462. Neural Machine Translation Advised by Statistical Machine Translation.

Paper Link】 【Pages】:3330-3336

【Authors】: Xing Wang ; Zhengdong Lu ; Zhaopeng Tu ; Hang Li ; Deyi Xiong ; Min Zhang

【Abstract】: Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e.g., the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.

【Keywords】: Neural Machine Translation; Statistical Machine Translation; Statistical machine translation recommendations

463. A Dynamic Window Neural Network for CCG Supertagging.

Paper Link】 【Pages】:3337-3343

【Authors】: Huijia Wu ; Jiajun Zhang ; Chengqing Zong

【Abstract】: Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes to encode input tokens. However, it is obvious that different tags usually rely on different context window sizes. This motivates us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. We find that applying dropout on the dynamic filters is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-of-the-art CCG supertagging performance on the standard test set.

【Keywords】: supertagging;dynamic window;

464. Distinguish Polarity in Bag-of-Words Visualization.

Paper Link】 【Pages】:3344-3350

【Authors】: Yusheng Xie ; Zhengzhang Chen ; Ankit Agrawal ; Alok N. Choudhary

【Abstract】: Neural network-based BOW models reveal that word-embedding vectors encode strong semantic regularities. However, such models are insensitive to word polarity. We show that, coupled with simple information such as word spellings, word-embedding vectors can preserve both semantic regularity and conceptual polarity without supervision. We then describe a nontrivial modification to the t-distributed stochastic neighbor embedding (t-SNE) algorithm that visualizes these semantic- and polarity-preserving vectors in reduced dimensions. On a real Facebook corpus, our experiments show significant improvement in t-SNE visualization as a result of the proposed modification.

【Keywords】: t-SNE;sentiment;word embedding; auto encoder

465. Topic Aware Neural Response Generation.

Paper Link】 【Pages】:3351-3357

【Authors】: Chen Xing ; Wei Wu ; Yu Wu ; Jie Liu ; Yalou Huang ; Ming Zhou ; Wei-Ying Ma

【Abstract】: We consider incorporating topic information into a sequence-to-sequence framework to generate informative and interesting responses for chatbots. To this end, we propose a topic aware sequence-to-sequence (TA-Seq2Seq) model. The model utilizes topics to simulate prior human knowledge that guides them to form informative and interesting responses in conversation, and leverages topic information in generation by a joint attention mechanism and a biased generation probability. The joint attention mechanism summarizes the hidden vectors of an input message as context vectors by message attention and synthesizes topic vectors by topic attention from the topic words of the message obtained from a pre-trained LDA model, with these vectors jointly affecting the generation of words in decoding. To increase the possibility of topic words appearing in responses, the model modifies the generation probability of topic words by adding an extra probability item to bias the overall distribution. Empirical studies on both automatic evaluation metrics and human annotations show that TA-Seq2Seq can generate more informative and interesting responses, significantly outperforming state-of-the-art response generation models.

【Keywords】: Neural response generation; Sequence to sequence model; Topic aware conversation model; Joint attention; Biased response generation

466. Variational Autoencoder for Semi-Supervised Text Classification.

Paper Link】 【Pages】:3358-3364

【Authors】: Weidi Xu ; Haoze Sun ; Chao Deng ; Ying Tan

【Abstract】: Although semi-supervised variational autoencoder (SemiVAE) works in image classification task, it fails in text classification task if using vanilla LSTM as its decoder. From a perspective of reinforcement learning, it is verified that the decoder's capability to distinguish between different categorical labels is essential. Therefore, Semi-supervised Sequential Variational Autoencoder (SSVAE) is proposed, which increases the capability by feeding label into its decoder RNN at each time-step. Two specific decoder structures are investigated and both of them are verified to be effective. Besides, in order to reduce the computational complexity in training, a novel optimization method is proposed, which estimates the gradient of the unlabeled objective function by sampling, along with two variance reduction techniques. Experimental results on Large Movie Review Dataset (IMDB) and AG's News corpus show that the proposed approach significantly improves the classification accuracy compared with pure-supervised classifiers, and achieves competitive performance against previous advanced methods. State-of-the-art results can be obtained by integrating other pretraining-based methods.

【Keywords】: variational autoencoder; semi-supervised learning; text classification

467. Neural Models for Sequence Chunking.

Paper Link】 【Pages】:3365-3371

【Authors】: Feifei Zhai ; Saloni Potdar ; Bing Xiang ; Bowen Zhou

【Abstract】: Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside- Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks.

【Keywords】:

468. BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings.

Paper Link】 【Pages】:3372-3378

【Authors】: Biao Zhang ; Deyi Xiong ; Jinsong Su

【Abstract】: In this paper, we propose a bidimensional attention based recursiveautoencoder (BattRAE) to integrate clues and sourcetargetinteractions at multiple levels of granularity into bilingualphrase representations. We employ recursive autoencodersto generate tree structures of phrases with embeddingsat different levels of granularity (e.g., words, sub-phrases andphrases). Over these embeddings on the source and targetside, we introduce a bidimensional attention network to learntheir interactions encoded in a bidimensional attention matrix,from which we extract two soft attention weight distributionssimultaneously. These weight distributions enableBattRAE to generate compositive phrase representations viaconvolution. Based on the learned phrase representations, wefurther use a bilinear neural model, trained via a max-marginmethod, to measure bilingual semantic similarity. To evaluatethe effectiveness of BattRAE, we incorporate this semanticsimilarity as an additional feature into a state-of-the-art SMTsystem. Extensive experiments on NIST Chinese-English testsets show that our model achieves a substantial improvementof up to 1.63 BLEU points on average over the baseline.

【Keywords】: Bidimensional Neural Network; Attention-Based Recursive Autoencoder; Bilingual Phrase Embedding; Statistical Machine Translation

469. Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision.

Paper Link】 【Pages】:3379-3385

【Authors】: Meng Zhang ; Haoruo Peng ; Yang Liu ; Huan-Bo Luan ; Maosong Sun

【Abstract】: Building bilingual lexica from non-parallel data is a long-standing natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.

【Keywords】: Bilingual word representation learning; Bilingual lexicon induction; Resource-scarce settings

470. Active Discriminative Text Representation Learning.

Paper Link】 【Pages】:3386-3392

【Authors】: Ye Zhang ; Matthew Lease ; Byron C. Wallace

【Abstract】: We propose a new active learning (AL) method for text classification with convolutional neural networks (CNNs). In AL, one selects the instances to be manually labeled with the aim of maximizing model performance with minimal effort. Neural models capitalize on word embeddings as representations (features), tuning these to the task at hand. We argue that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations). This is in contrast to traditional AL approaches (e.g., entropy-based uncertainty sampling), which specify higher level objectives. We propose a simple approach for sentence classification that selects instances containing words whose embeddings are likely to be updated with the greatest magnitude, thereby rapidly learning discriminative, task-specific embeddings. We extend this approach to document classification by jointly considering: (1) the expected changes to the constituent word representations; and (2) the model’s current overall uncertainty regarding the instance. The relative emphasis placed on these criteria is governed by a stochastic process that favors selecting instances likely to improve representations at the outset of learning, and then shifts toward general uncertainty sampling as AL progresses. Empirical results show that our method outperforms baseline AL approaches on both sentence and document classification tasks. We also show that, as expected, the method quickly learns discriminative word embeddings. To the best of our knowledge, this is the first work on AL addressing neural models for text classification.

【Keywords】: Convolutional Neural Network, Text Classification, Active Learning

471. Learning Context-Specific Word/Character Embeddings.

Paper Link】 【Pages】:3393-3399

【Authors】: Xiaoqing Zheng ; Jiangtao Feng ; Yi Chen ; Haoyuan Peng ; Wenqing Zhang

【Abstract】: Unsupervised word representations have demonstrated improvements in predictive generalization on various NLP tasks. Most of the existing models are in fact good at capturing the relatedness among words rather than their ''genuine'' similarity because the context representations are often represented by a sum (or an average) of the neighbor's embeddings, which simplifies the computation but ignores an important fact that the meaning of a word is determined by its context, reflecting not only the surrounding words but also the rules used to combine them (i.e. compositionality). On the other hand, much effort has been devoted to learning a single-prototype representation per word, which is problematic because many words are polysemous, and a single-prototype model is incapable of capturing phenomena of homonymy and polysemy. We present a neural network architecture to jointly learn word embeddings and context representations from large data sets. The explicitly produced context representations are further used to learn context-specific and multi-prototype word embeddings. Our embeddings were evaluated on several NLP tasks, and the experimental results demonstrated the proposed model outperformed other competitors and is applicable to intrinsically "character-based" languages.

【Keywords】: Word embeddings; Neural network; Unsupervised Learning

472. Mechanism-Aware Neural Machine for Dialogue Response Generation.

Paper Link】 【Pages】:3400-3407

【Authors】: Ganbin Zhou ; Ping Luo ; Rongyu Cao ; Fen Lin ; Bo Chen ; Qing He

【Abstract】: To the same utterance, people's responses in everyday dialogue may be diverse largely in terms of content semantics, speaking styles, communication intentions and so on. Previous generative conversational models ignore these 1-to-n relationships between a post to its diverse responses, and tend to return high-frequency but meaningless responses. In this study we propose a mechanism-aware neural machine for dialogue response generation. It assumes that there exists some latent responding mechanisms, each of which can generate different responses for a single input post. With this assumption we model different responding mechanisms as latent embeddings, and develop a encoder-diverter-decoder framework to train its modules in an end-to-end fashion. With the learned latent mechanisms, for the first time these decomposed modules can be used to encode the input into mechanism-aware context, and decode the responses with the controlled generation styles and topics. Finally, the experiments with human judgements, intuitive examples, detailed discussions demonstrate the quality and diversity of the generated responses with 9.80% increase of acceptable ratio over the best of six baseline methods.

【Keywords】: Mechanism-Aware Responding Machine; Encoder-Diverter-Decoder Framework; Response Diversity

Natural Language Processing and Text Mining 18

473. Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora.

Paper Link】 【Pages】:3408-3414

【Authors】: Lidong Bing ; Bhuwan Dhingra ; Kathryn Mazaitis ; Jong Hyuk Park ; William W. Cohen

【Abstract】: We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, well-structured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of distant labels.

【Keywords】: Relation extraction; distant supervision; document structure; label propagation

474. Using Discourse Signals for Robust Instructor Intervention Prediction.

Paper Link】 【Pages】:3415-3421

【Authors】: Muthu Kumar Chandrasekaran ; Carrie Demmans Epp ; Min-Yen Kan ; Diane J. Litman

【Abstract】: We tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags that represent in-post discourse features. We show PDTB relation-based features increase the robustness of the classifier and complement baseline features in recalling more diverse instructor intervention patterns. In comprehensive experiments over 14 MOOC offerings from several disciplines, the PDTB discourse features improve performance on average. The resultant models are less dependent on domain-specific vocabulary, allowing them to better generalize to new courses.

【Keywords】: MOOC discussion forum, MOOC, Instructor intervention, PDTB discourse relations

475. Automatic Emphatic Information Extraction from Aligned Acoustic Data and Its Application on Sentence Compression.

Paper Link】 【Pages】:3422-3428

【Authors】: Yanju Chen ; Rong Pan

【Abstract】: We introduce a novel method to extract and utilize the semantic information from acoustic data. By automatic Speech-To-Text alignment techniques, we are able to detect word-based acoustic durations that can prosodically emphasize specific words in an utterance. We model and analyze the sentence-based emphatic patterns by predicting the emphatic levels using only the lexical features, and demonstrate the potential ability of emphatic information produced by such an unsupervised method to improve the performance of NLP tasks, such as sentence compression, by providing weak supervision on multi-task learning based on LSTMs.

【Keywords】: weak supervision; prosodic prominence; sentence compression; multi-task learning; acoustic data

476. Unsupervised Sentiment Analysis with Signed Social Networks.

Paper Link】 【Pages】:3429-3435

【Authors】: Kewei Cheng ; Jundong Li ; Jiliang Tang ; Huan Liu

【Abstract】: Huge volumes of opinion-rich data is user-generated in social media at an unprecedented rate, easing the analysis of individual and public sentiments. Sentiment analysis has shown to be useful in probing and understanding emotions, expressions and attitudes in the text. However, the distinct characteristics of social media data present challenges to traditional sentiment analysis. First, social media data is often noisy, incomplete and fast-evolved which necessitates the design of a sophisticated learning model. Second, sentiment labels are hard to collect which further exacerbates the problem by not being able to discriminate sentiment polarities. Meanwhile, opportunities are also unequivocally presented. Social media contains rich sources of sentiment signals in textual terms and user interactions, which could be helpful in sentiment analysis. While there are some attempts to leverage implicit sentiment signals in positive user interactions, little attention is paid on signed social networks with both positive and negative links. The availability of signed social networks motivates us to investigate if negative links also contain useful sentiment signals. In this paper, we study a novel problem of unsupervised sentiment analysis with signed social networks. In particular, we incorporate explicit sentiment signals in textual terms and implicit sentiment signals from signed social networks into a coherent model SignedSenti for unsupervised sentiment analysis. Empirical experiments on two real-world datasets corroborate its effectiveness.

【Keywords】: Sentiment Analysis; Signed Social Networks; Negative Links

477. Recurrent Neural Networks with Auxiliary Labels for Cross-Domain Opinion Target Extraction.

Paper Link】 【Pages】:3436-3442

【Authors】: Ying Ding ; Jianfei Yu ; Jing Jiang

【Abstract】: Opinion target extraction is a fundamental task in opinion mining. In recent years, neural network based supervised learning methods have achieved competitive performance on this task. However, as with any supervised learning method, neural network based methods for this task cannot work well when the training data comes from a different domain than the test data. On the other hand, some rule-based unsupervised methods have shown to be robust when applied to different domains. In this work, we use rule-based unsupervised methods to create auxiliary labels and use neural network models to learn a hidden representation that works well for different domains. When this hidden representation is used for opinion target extraction, we find that it can outperform a number of strong baselines with a large margin.

【Keywords】:

478. Distant Supervision via Prototype-Based Global Representation Learning.

Paper Link】 【Pages】:3443-3449

【Authors】: Xianpei Han ; Le Sun

【Abstract】: Distant supervision (DS) is a promising technique for relation extraction. Currently, most DS approaches build relation extraction models in local instance feature space, often suffer from the multi-instance problem and the missing label problem. In this paper, we propose a new DS method — prototype-based global representation learning, which can effectively resolve the multi-instance problem and the missing label problem by learning informative entity pair representations, and building discriminative extraction models at the entity pair level, rather than at the instance level. Specifically, we propose a prototype-based embedding algorithm, which can embed entity pairs into a prototype-based global feature space; we then propose a neural network model, which can classify entity pairs into target relation types by summarizing relevant information from multiple instances. Experimental results show that our method can achieve significant performance improvement over traditional DS methods.

【Keywords】: relation extraction; distant supervision; representation learning

479. What Happens Next? Future Subevent Prediction Using Contextual Hierarchical LSTM.

Paper Link】 【Pages】:3450-3456

【Authors】: Linmei Hu ; Juanzi Li ; Liqiang Nie ; Xiao-Li Li ; Chao Shao

【Abstract】: Events are typically composed of a sequence of subevents. Predicting a future subevent of an event is of great importance for many real-world applications. Most previous work on event prediction relied on hand-crafted features and can only predict events that already exist in the training data. In this paper, we develop an end-to-end model which directly takes the texts describing previous subevents as input and automatically generates a short text describing a possible future subevent. Our model captures the two-level sequential structure of a subevent sequence, namely, the word sequence for each subevent and the temporal order of subevents. In addition, our model incorporates the topics of the past subevents to make context-aware prediction of future subevents. Extensive experiments on a real-world dataset demonstrate the superiority of our model over several state-of-the-art methods.

【Keywords】: event prediction; LSTM; subevent sequence

480. Efficient Dependency-Guided Named Entity Recognition.

Paper Link】 【Pages】:3457-3465

【Authors】: Zhanming Jie ; Aldrian Obaja Muis ; Wei Lu

【Abstract】: Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER. In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guided NER model performs competitively with models based on conventional semi-Markov conditional random fields, while requiring significantly less running time.

【Keywords】: Named entity recognition; dependency trees; global structured information; dependency-guided model; conditional random fields

481. Improving Event Causality Recognition with Multiple Background Knowledge Sources Using Multi-Column Convolutional Neural Networks.

Paper Link】 【Pages】:3466-3473

【Authors】: Canasai Kruengkrai ; Kentaro Torisawa ; Chikara Hashimoto ; Julien Kloetzer ; Jong-Hoon Oh ; Masahiro Tanaka

【Abstract】: We propose a method for recognizing such event causalities as "smoke cigarettes" → "die of lung cancer" using background knowledge taken from web texts as well as original sentences from which candidates for the causalities were extracted. We retrieve texts related to our event causality candidates from four billion web pages by three distinct methods, including a why-question answering system, and feed them to our multi-column convolutional neural networks. This allows us to identify the useful background knowledge scattered in web texts and effectively exploit the identified knowledge to recognize event causalities. We empirically show that the combination of our neural network architecture and background knowledge significantly improves average precision, while the previous state-of-the-art method gains just a small benefit from such background knowledge.

【Keywords】: event causality recognition; multi-column convolutional neural networks; background knowledge

482. Efficiently Mining High Quality Phrases from Texts.

Paper Link】 【Pages】:3474-3481

【Authors】: Bing Li ; Xiaochun Yang ; Bin Wang ; Wei Cui

【Abstract】: Phrase mining is a key research problem for semantic analysis and text-based information retrieval. The existing approaches based on NLP, frequency, and statistics cannot extract high quality phrases and the processing is also time consuming, which are not suitable for dynamic on-line applications. In this paper, we propose an efficient high-quality phrase mining approach (EQPM). To the best of our knowledge, our work is the first effort that considers both intra-cohesion and inter-isolation in mining phrases, which is able to guarantee appropriateness. We also propose a strategy to eliminate order sensitiveness, and ensure the completeness of phrases. We further design efficient algorithms to make the proposed model and strategy feasible. The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3X - 29X faster than the state-of-the-art works.

【Keywords】: text mining; phrase mining; phrasal segmentation

483. Learning Latent Sentiment Scopes for Entity-Level Sentiment Analysis.

Paper Link】 【Pages】:3482-3489

【Authors】: Hao Li ; Wei Lu

【Abstract】: In this paper, we focus on the task of extracting named entities together with their associated sentiment information in a joint manner. Our key observation in such an entity-level sentiment analysis (a.k.a. targeted sentiment analysis) task is that there exists a sentiment scope within which each named entity is embedded, which largely decides the sentiment information associated with the entity. However, such sentiment scopes are typically not explicitly annotated in the data, and their lengths can be unbounded. Motivated by this, unlike traditional approaches that cast this problem as a simple sequence labeling task, we propose a novel approach that can explicitly model the latent sentiment scopes. Our experiments on the standard datasets demonstrate that our approach is able to achieve better results compared to existing approaches based on conventional conditional random fields (CRFs) and a more recent work based on neural networks.

【Keywords】: NLP; sentiment analysis; CRF; graphical model; entity-level; latent model

484. Structural Correspondence Learning for Cross-Lingual Sentiment Classification with One-to-Many Mappings.

Paper Link】 【Pages】:3490-3496

【Authors】: Nana Li ; Shuangfei Zhai ; Zhongfei Zhang ; Boying Liu

【Abstract】: Structural correspondence learning (SCL) is an effective method for cross-lingual sentiment classification. This approach uses unlabeled documents along with a word translation oracle to automatically induce task specific, cross-lingual correspondences. It transfers knowledge through identifying important features, i.e., pivot features. For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language. This one-to-one mapping between words in different languages is too strict. Also the context is not considered at all. In this paper, we propose a cross-lingual SCL based on distributed representation of words; it can learn meaningful one-to-many mappings for pivot words using large amounts of monolingual data and a small dictionary. We conduct experiments on NLP&CC 2013 cross-lingual sentiment analysis dataset, employing English as source language, and Chinese as target language. Our method does not rely on the parallel corpora and the experimental results show that our approach is more competitive than the state-of-the-art methods in cross-lingual sentiment classification.

【Keywords】: sentiment classification; cross-lingual; structural correspondence learning

485. Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization.

Paper Link】 【Pages】:3497-3503

【Authors】: Piji Li ; Zihao Wang ; Wai Lam ; Zhaochun Ren ; Lidong Bing

【Abstract】: We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural variational inference is used for the posterior inference of the latent variables. For salience estimation, we propose an unsupervised data reconstruction framework, which jointly considers the reconstruction for latent semantic space and observed term vector space. Therefore, we can capture the salience of sentences from these two different and complementary vector spaces. Thereafter, the VAEs-based latent semantic model is integrated into the sentence salience estimation component in a unified fashion, and the whole framework can be trained jointly by back-propagation via multi-task learning. Experimental results on the benchmark datasets DUC and TAC show that our framework achieves better performance than the state-of-the-art models.

【Keywords】: Multi-Document Summarization;Variational Auto-Encoders

486. Collaborative User Clustering for Short Text Streams.

Paper Link】 【Pages】:3504-3510

【Authors】: Shangsong Liang ; Zhaochun Ren ; Emine Yilmaz ; Evangelos Kanoulas

【Abstract】: In this paper, we study the problem of user clustering in the context of their published short text streams. Clustering users by short text streams is more challenging than in the case of long documents associated with them as it is difficult to track users' dynamic interests in streaming sparse data. To obtain better user clustering performance, we propose a user collaborative interest tracking model (UCIT) that aims at tracking changes of each user's dynamic topic distributions in collaboration with their followees', based both on the content of current short texts and the previously estimated distributions. We evaluate our proposed method via a benchmark dataset consisting of Twitter users and their tweets. Experimental results validate the effectiveness of our proposed UCIT model that integrates both users' and their collaborative interests for user clustering by short text streams.

【Keywords】:

487. Word Embedding Based Correlation Model for Question/Answer Matching.

Paper Link】 【Pages】:3511-3517

【Authors】: Yikang Shen ; Wenge Rong ; Nan Jiang ; Baolin Peng ; Jie Tang ; Zhang Xiong

【Abstract】: The large scale of Q&A archives accumulated in community based question answering (CQA) servivces are important information and knowledge resource on the web. Question and answer matching task has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, a Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding. Given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs, while it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.

【Keywords】: QA; word embedding

488. Greedy Flipping for Constrained Word Deletion.

Paper Link】 【Pages】:3518-3524

【Authors】: Jin-ge Yao ; Xiaojun Wan

【Abstract】: In this paper we propose a simple yet efficient method for constrained word deletion to compress sentences, based on top-down greedy local flipping from multiple random initializations. The algorithm naturally integrates various grammatical constraints in the compression process, without using time-consuming integer linear programming solvers. Our formulation suits for any objective function involving arbitrary local score definition. Experimental results show that the proposed method achieves nearly identical performance with explicit ILP formulation while being much more efficient.

【Keywords】:

489. Attentive Interactive Neural Networks for Answer Selection in Community Question Answering.

Paper Link】 【Pages】:3525-3531

【Authors】: Xiaodong Zhang ; Sujian Li ; Lei Sha ; Houfeng Wang

【Abstract】: Answer selection plays a key role in community question answering (CQA). Previous research on answer selection usually ignores the problems of redundancy and noise prevalent in CQA. In this paper, we propose to treat different text segments differently and design a novel attentive interactive neural network (AI-NN) to focus on those text segments useful to answer selection. The representations of question and answer are first learned by convolutional neural networks (CNNs) or other neural network architectures. Then AI-NN learns interactions of each paired segments of two texts. Row-wise and column-wise pooling are used afterwards to collect the interactions. We adopt attention mechanism to measure the importance of each segment and combine the interactions to obtain fixed-length representations for question and answer. Experimental results on CQA dataset in SemEval-2016 demonstrate that AI-NN outperforms state-of-the-art method.

【Keywords】: Answer selection; Attention; Community question answering

490. Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning.

Paper Link】 【Pages】:3532-3539

【Authors】: Zhou Zhao ; Hanqing Lu ; Vincent W. Zheng ; Deng Cai ; Xiaofei He ; Yueting Zhuang

【Abstract】: Nowadays the community-based question answering (CQA) sites become the popular Internet-based web service, which have accumulated millions of questions and their posted answers over time. Thus, question answering becomes an essential problem in CQA sites, which ranks the high-quality answers to the given question. Currently, most of the existing works study the problem of question answering based on the deep semantic matching model to rank the answers based on their semantic relevance, while ignoring the authority of answerers to the given question. In this paper, we consider the problem of community-based question answering from the viewpoint of asymmetric multi-faceted ranking network embedding. We propose a novel asymmetric multi-faceted ranking network learning framework for community-based question answering by jointly exploiting the deep semantic relevance between question-answer pairs and the answerers' authority to the given question. We then develop an asymmetric ranking network learning method with deep recurrent neural networks by integrating both answers' relative quality rank to the given question and the answerers' following relations in CQA sites. The extensive experiments on a large-scale dataset from a real world CQA site show that our method achieves better performance than other state-of-the-art solutions to the problem.

【Keywords】: Question-answering; Network Learning; LSTM

Planning and Scheduling 21

491. Plan Reordering and Parallel Execution - A Parameterized Complexity View.

Paper Link】 【Pages】:3540-3546

【Authors】: Meysam Aghighi ; Christer Bäckström

【Abstract】: Bäckström has previously studied a number of optimization problems for partial-order plans, like finding a minimum deordering (MCD) or reordering (MCR), and finding the minimum parallel execution length (PPL), which are all NP-complete. We revisit these problems, but applying parameterized complexity analysis rather than standard complexity analysis. We consider various parameters, including both the original and desired size of the plan order, as well as its width and height. Our findings include that MCD and MCR are W[2]-hard and in W[P] when parameterized with the desired order size, and MCD is fixed-parameter tractable (fpt) when parameterized with the original order size. Problem PPL is fpt if parameterized with the size of the non-concurrency relation, but para-NP-hard in most other cases. We also consider this problem when the number (k) of agents, or processors, is restricted, finding that this number is a crucial parameter; this problem is fixed-parameter tractable with the order size, the parallel execution length and k as parameter, but para-NP-hard without k as parameter.

【Keywords】: Partially ordered plan; Parameterized complexity; Complexity of planning; Plan reordering; Parallel plan execution

492. Validating Domains and Plans for Temporal Planning via Encoding into Infinite-State Linear Temporal Logic.

Paper Link】 【Pages】:3547-3554

【Authors】: Alessandro Cimatti ; Andrea Micheli ; Marco Roveri

【Abstract】: Temporal planning is an active research area of Artificial Intelligence because of its many applications ranging from roboticsto logistics and beyond. Traditionally, authors focused on theautomatic synthesis of plans given a formal representation of thedomain and of the problem. However, the effectiveness of suchtechniques is limited by the complexity of the modeling phase: it ishard to produce a correct model for the planning problem at hand. In this paper, we present a technique to simplify the creation ofcorrect models by leveraging formal-verification tools for automaticvalidation. We start by using the ANML language, a very expressivelanguage for temporal planning problems that has been recentlypresented. We chose ANML because of its usability andreadability. Then, we present a sound-and-complete, formal encodingof the language into Linear Temporal Logic over predicates withinfinite-state variables. Thanks to this reduction, we enable theformal verification of several relevant properties over the planningproblem, providing useful feedback to the modeler.

【Keywords】: temporal planning; domain validation; plan validation; ANML;

493. On the Disruptive Effectiveness of Automated Planning for LTLf-Based Trace Alignment.

Paper Link】 【Pages】:3555-3561

【Authors】: Giuseppe De Giacomo ; Fabrizio Maria Maggi ; Andrea Marrella ; Fabio Patrizi

【Abstract】: One major task in business process management is that of aligning real process execution traces to a process model by (minimally) introducing and eliminating steps. Here, we look at declarative process specifications expressed in Linear Temporal Logic on finite traces (LTLf). We provide a sound and complete technique to synthesize the alignment instructions relying on finite automata theoretic manipulations. Such a technique can be effectively implemented by using planning technology. Notably, the resulting planning-based alignment system significantly outperforms all current state-of-the-art ad-hoc alignment systems. We report an in-depth experimental study that supports this claim.

【Keywords】: Business Processes; Trace Alignment; Linear Time Temporal Logic on Finite Traces; Automated Planning; Declare

494. Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs.

Paper Link】 【Pages】:3562-3568

【Authors】: Frits de Nijs ; Erwin Walraven ; Mathijs Michiel de Weerdt ; Matthijs T. J. Spaan

【Abstract】: Multi-agent planning problems with constraints on global resource consumption occur in several domains. Existing algorithms for solving Multi-agent Markov Decision Processes can compute policies that meet a resource constraint in expectation, but these policies provide no guarantees on the probability that a resource constraint violation will occur. We derive a method to bound constraint violation probabilities using Hoeffding's inequality. This method is applied to two existing approaches for computing policies satisfying constraints: the Constrained MDP framework and a Column Generation approach. We also introduce an algorithm to adaptively relax the bound up to a given maximum violation tolerance. Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efficiently trade off violation probability with expected value, outperforming state-of-the-art planners.

【Keywords】: Markov Decision Process; Resource constraints; Planning under uncertainty

495. Optimizing Quantiles in Preference-Based Markov Decision Processes.

Paper Link】 【Pages】:3569-3575

【Authors】: Hugo Gilbert ; Paul Weng ; Yan Xu

【Abstract】: In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.

【Keywords】: Markov decision process; Quantile

496. An Analysis of Monte Carlo Tree Search.

Paper Link】 【Pages】:3576-3582

【Authors】: Steven James ; George Konidaris ; Benjamin Rosman

【Abstract】: Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. Despite the vast amount of research into MCTS, the effect of modifications on the algorithm, as well as the manner in which it performs in various domains, is still not yet fully known. In particular, the effect of using knowledge-heavy rollouts in MCTS still remains poorly understood, with surprising results demonstrating that better-informed rollouts often result in worse-performing agents. We present experimental evidence suggesting that, under certain smoothness conditions, uniformly random simulation policies preserve the ordering over action preferences. This explains the success of MCTS despite its common use of these rollouts to evaluate states. We further analyse non-uniformly random rollout policies and describe conditions under which they offer improved performance.

【Keywords】: Monte Carlo; MCTS; UCT; planning; variance; bias; MDP; simulation

497. An Efficient Approach to Model-Based Hierarchical Reinforcement Learning.

Paper Link】 【Pages】:3583-3589

【Authors】: Zhuoru Li ; Akshay Narayan ; Tze-Yun Leong

【Abstract】: We propose a model-based approach to hierarchical reinforcement learning that exploits shared knowledge and selective execution at different levels of abstraction, to efficiently solve large, complex problems. Our framework adopts a new transition dynamics learning algorithm that identifies the common action-feature combinations of the subtasks, and evaluates the subtask execution choices through simulation. The framework is sample efficient, and tolerates uncertain and incomplete problem characterization of the subtasks. We test the framework on common benchmark problems and complex simulated robotic environments. It compares favorably against the state-of-the-art algorithms, and scales well in very large problems.

【Keywords】: reinforcement learning; hierarchical reinforcement learning; MAXQ; R-MAX; model-based reinforcement learning

498. Best-First Width Search: Exploration and Exploitation in Classical Planning.

Paper Link】 【Pages】:3590-3596

【Authors】: Nir Lipovetzky ; Hector Geffner

【Abstract】: It has been shown recently that the performance of greedy best-first search (GBFS) for computing plans that are not necessarily optimal can be improved by adding forms of exploration when reaching heuristic plateaus: from random walks to local GBFS searches. In this work, we address this problem but using structural exploration methods resulting from the ideas of width-based search. Width-based methodsseek novel states, are not goal oriented, and their power has been shown recently in the Atari and GVG-AI video-games. We show first that width-based exploration in GBFS is more effective than GBFS with local GBFS search (GBFS-LS), and then proceed to formulate a simple and general computational framework where standard goal-oriented search (exploitation) and width-based search (structural exploration) are combined to yield a search scheme, best-first width search, that is better than both and which results in classical planning algorithms that outperform the state-of-the-art planners.

【Keywords】: Width-Based Planning, Classical Planning

499. Robust Execution of Probabilistic Temporal Plans.

Paper Link】 【Pages】:3597-3604

【Authors】: Kyle Lund ; Sam Dietrich ; Scott Chow ; James C. Boerkoel

【Abstract】: A critical challenge in temporal planning is robustly dealing with non-determinism, e.g., the durational uncertainty of a robot's activity due to slippage or other unexpected influences. Recent advances show that robustness is a better measure of solution quality than traditional metrics such as flexibility. This paper introduces the Robust Execution Problem for finding maximally robust dispatch strategies for general probabilistic temporal planning problems. While generally intractable, we introduce approximate solution techniques — one that can be computed statically prior to the start of execution with robustness guarantees and one that dynamically adjusts to opportunities and setbacks during execution. We show empirically that our dynamic approach outperforms all known approaches in terms of execution success rate.

【Keywords】: Probabilistic Temporal Planning; Simple Temporal Problem; Robustness; Scheduling Under Uncertainty

500. Multi-Agent Path Finding with Delay Probabilities.

Paper Link】 【Pages】:3605-3612

【Authors】: Hang Ma ; T. K. Satish Kumar ; Sven Koenig

【Abstract】: Several recently developed Multi-Agent Path Finding (MAPF) solvers scale to large MAPF instances by searching for MAPF plans on 2 levels: The high-level search resolves collisions between agents, and the low-level search plans paths for single agents under the constraints imposed by the high-level search. We make the following contributions to solve the MAPF problem with imperfect plan execution with small average makespans: First, we formalize the MAPF Problem with Delay Probabilities (MAPF-DP), define valid MAPF-DP plans and propose the use of robust plan-execution policies for valid MAPF-DP plans to control how each agent proceeds along its path. Second, we discuss 2 classes of decentralized robust plan-execution policies (called Fully Synchronized Policies and Minimal Communication Policies) that prevent collisions during plan execution for valid MAPF-DP plans. Third, we present a 2-level MAPF-DP solver (called Approximate Minimization in Expectation) that generates valid MAPF-DP plans.

【Keywords】: multi-agent path finding; plan execution; MDPs; multi-robot systems

501. Logical Filtering and Smoothing: State Estimation in Partially Observable Domains.

Paper Link】 【Pages】:3613-3621

【Authors】: Brent Mombourquette ; Christian J. Muise ; Sheila A. McIlraith

【Abstract】: State estimation is the task of estimating the state of a partially observable dynamical system given a sequence of executed actions and observations. In logical settings, state estimation can be realized via logical filtering, which is exact but can be intractable. We propose logical smoothing, a form of backwards reasoning that works in concert with approximated logical filtering to refine past beliefs in light of new observations.  We characterize the notion of logical smoothing together with an algorithm for backwards-forwards state estimation.  We also present an approximation of our smoothing algorithm that is space efficient. We prove properties of our algorithms, and experimentally demonstrate their behaviour, contrasting them with state estimation methods for planning. Smoothing and backwards-forwards reasoning are important techniques for reasoning about partially observable dynamical systems, introducing the logical analogue of effective techniques from control theory and dynamic programming.

【Keywords】: state estimation; belief tracking; logical filtering; approximation

502. Landmark-Based Heuristics for Goal Recognition.

Paper Link】 【Pages】:3622-3628

【Authors】: Ramon Fraga Pereira ; Nir Oren ; Felipe Meneguzzi

【Abstract】: Automated planning can be used to efficiently recognize goals and plans from partial or full observed action sequences. In this paper, we propose goal recognition heuristics that rely on information from planning landmarks - facts or actions that must occur if a plan is to achieve a goal when starting from some initial state. We develop two such heuristics: the first estimates goal completion by considering the ratio between achieved and extracted landmarks of a candidate goal, while the second takes into account how unique each landmark is among landmarks for all candidate goals. We empirically evaluate these heuristics over both standard goal/plan recognition problems, and a set of very large problems. We show that our heuristics can recognize goals more accurately, and run orders of magnitude faster, than the current state-of-the-art.

【Keywords】: Goal Recognition; Intention Detection; Classical Planning; Landmarks

503. Fast SSP Solvers Using Short-Sighted Labeling.

Paper Link】 【Pages】:3629-3635

【Authors】: Luis Enrique Pineda ; Kyle Hollins Wray ; Shlomo Zilberstein

【Abstract】: State-of-the-art methods for solving SSPs often work by limiting planning to restricted regions of the state space. The resulting problems can then be solved quickly, and the process is repeated during execution when states outside the restricted region are encountered. Typically, these approaches focus on states that are within some distance measure of the start state (e.g., number of actions or probability of being reached). However, these short-sighted approaches make it difficult to propagate information from states that are closer to a goal than to the start state, thus missing opportunities to improve planning. We present an alternative approach in which short-sightedness is used only to determine whether a state should be labeled as solved or not, but otherwise the set of states that can be accounted for during planning is unrestricted. Based on this idea, we propose the FLARES algorithm and show that it performs consistently well on a wide range of benchmark problems.

【Keywords】: Markov Decision Process; Stochastic Shortest Path Problem; Planning under Uncertainty; Heuristic Search; Short-Sighted Search

504. Higher-Dimensional Potential Heuristics for Optimal Classical Planning.

Paper Link】 【Pages】:3636-3643

【Authors】: Florian Pommerening ; Malte Helmert ; Blai Bonet

【Abstract】: Potential heuristics for state-space search are defined as weighted sums over simple state features. Atomic features consider the value of a single state variable in a factored state representation, while binary features consider joint assignments to two state variables. Previous work showed that the set of all admissible and consistent potential heuristics using atomic features can be characterized by a compact set of linear constraints. We generalize this result to binary features and prove a hardness result for features of higher dimension. Furthermore, we prove a tractability result based on the treewidth of a new graphical structure we call the context-dependency graph . Finally, we study the relationship of potential heuristics to transition cost partitioning . Experimental results show that binary potential heuristics are significantly more informative than the previously considered atomic ones.

【Keywords】: classical planning; cost-optimal planning; declarative heuristics; potential heuristics

505. Schematic Invariants by Reduction to Ground Invariants.

Paper Link】 【Pages】:3644-3650

【Authors】: Jussi Rintanen

【Abstract】: Computation of invariants, which are approximate reachability information for state-space search problems such as AI planning, has been considered to be more scalable when using a schematic representation of actions/events rather than an instantiated/ground representation. A disadvantage of schematic algorithms, however, is their complexity, which also leads to high runtimes when the number of schematic events/actions is high. We propose algorithms that reduce the problem of finding schematic invariants to solving a smaller ground problem.

【Keywords】:

506. Narrowing the Gap Between Saturated and Optimal Cost Partitioning for Classical Planning.

Paper Link】 【Pages】:3651-3657

【Authors】: Jendrik Seipp ; Thomas Keller ; Malte Helmert

【Abstract】: In classical planning, cost partitioning is a method for admissibly combining a set of heuristic estimators by distributing operator costs among the heuristics. An optimal cost partitioning is often prohibitively expensive to compute. Saturated cost partitioning is an alternative that is much faster to compute and has been shown to offer high-quality heuristic guidance on Cartesian abstractions. However, its greedy nature makes it highly susceptible to the order in which the heuristics are considered. We show that searching in the space of orders leads to significantly better heuristic estimates than with previously considered orders. Moreover, using multiple orders leads to a heuristic that is significantly better informed than any single-order heuristic. In experiments with Cartesian abstractions, the resulting heuristic approximates the optimal cost partitioning very closely.

【Keywords】: planning; optimal planning; classical planning; heuristics; abstraction heuristics; cost partitioning; Cartesian abstractions

507. Incorporating Domain-Independent Planning Heuristics in Hierarchical Planning.

Paper Link】 【Pages】:3658-3664

【Authors】: Vikas Shivashankar ; Ron Alford ; David W. Aha

【Abstract】: Heuristics serve as a powerful tool in modern domain-independent planning (DIP) systems by providing critical guidance during the search for high-quality solutions. However, they have not been broadly used with hierarchical planning techniques, which are more expressive and tend to scale better in complex domains by exploiting additional domain-specific knowledge. Complicating matters, we show that for Hierarchical Goal Network (HGN) planning, a goal-based hierarchical planning formalism that we focus on in this paper, any poly-time heuristic that is derived from a delete-relaxation DIP heuristic has to make some relaxation of the hierarchical semantics. To address this, we present a principled framework for incorporating DIP heuristics into HGN planning using a simple relaxation of the HGN semantics we call Hierarchy-Relaxation. This framework allows for computing heuristic estimates of HGN problems using any DIP heuristic in an admissibility-preserving manner. We demonstrate the feasibility of this approach by using the LMCut heuristic to guide an optimal HGN planner. Our empirical results with three benchmark domains demonstrate that simultaneously leveraging hierarchical knowledge and heuristic guidance substantially improves planning performance.

【Keywords】: Hierarchical Planning; HGN Planning; Heuristic Search; Optimal Planning

508. Computational Issues in Time-Inconsistent Planning.

Paper Link】 【Pages】:3665-3671

【Authors】: Pingzhong Tang ; Yifeng Teng ; Zihe Wang ; Shenke Xiao ; Yichong Xu

【Abstract】: Time-inconsistency refers to a paradox in decision making where agents exhibit inconsistent behaviors over time. Examples are procrastination where agents tend to postpone easy tasks, and abandonments where agents start a plan and quit in the middle. To capture such behaviors and to quantify inefficiency caused by such behaviors, Kleinberg and Oren (2014) propose a graph model with a certain cost structure and initiate the study of several interesting computation problems: 1) cost ratio: the worst ratio between the actual cost of the agent and the optimal cost, over all the graph instances; 2) motivating subgraph: how to motivate the agent to reach the goal by deleting nodes and edges; 3) Intermediate rewards: how to incentivize agents to reach the goal by placing intermediate rewards. Kleinberg and Oren give partial answers to these questions, but the main problems are open. In this paper, we give answers to all three open problems. First, we show a tight upper bound of cost ratio for graphs, and confirm the conjecture by Kleinberg and Oren that Akerlof’s structure is indeed the worst case for cost ratio. Second, we prove that finding a motivating subgraph is NP-hard, showing that it is generally inefficient to motivate agents by deleting nodes and edges in the graph. Last but not least, we show that computing a strategy to place minimum amount of total reward is also NP-hard and we provide a 2n- approximation algorithm.

【Keywords】:

509. Accelerated Vector Pruning for Optimal POMDP Solvers.

Paper Link】 【Pages】:3672-3678

【Authors】: Erwin Walraven ; Matthijs T. J. Spaan

【Abstract】: Partially Observable Markov Decision Processes (POMDPs) are powerful models for planning under uncertainty in partially observable domains. However, computing optimal solutions for POMDPs is challenging because of the high computational requirements of POMDP solution algorithms. Several algorithms use a subroutine to prune dominated vectors in value functions, which requires a large number of linear programs (LPs) to be solved and it represents a large part of the total running time. In this paper we show how the LPs in POMDP pruning subroutines can be decomposed using a Benders decomposition. The resulting algorithm incrementally adds LP constraints and uses only a small fraction of the constraints. Our algorithm significantly improves the performance of existing pruning methods and the commonly used incremental pruning algorithm. Our new variant of incremental pruning is the fastest optimal pruning-based POMDP algorithm.

【Keywords】: POMDP; planning under uncertainty

510. When to Reset Your Keys: Optimal Timing of Security Updates via Learning.

Paper Link】 【Pages】:3679-3685

【Authors】: Zizhan Zheng ; Ness B. Shroff ; Prasant Mohapatra

【Abstract】: Cybersecurity is increasingly threatened by advanced and persistent attacks. As these attacks are often designed to disable a system (or a critical resource, e.g., a user account) repeatedly, it is crucial for the defender to keep updating its security measures to strike a balance between the risk of being compromised and the cost of security updates. Moreover, these decisions often need to be made with limited and delayed feedback due to the stealthy nature of advanced attacks. In addition to targeted attacks, such an optimal timing policy under incomplete information has broad applications in cybersecurity. Examples include key rotation, password change, application of patches, and virtual machine refreshing. However, rigorous studies of optimal timing are rare. Further, existing solutions typically rely on a pre-defined attack model that is known to the defender, which is often not the case in practice. In this work, we make an initial effort towards achieving optimal timing of security updates in the face of unknown stealthy attacks. We consider a variant of the influential FlipIt game model with asymmetric feedback and unknown attack time distribution, which provides a general model to consecutive security updates.The defender's problem is then modeled as a time associative bandit problem with dependent arms. We derive upper confidence bound based learning policies that achieve low regret compared with optimal periodic defense strategies that can only be derived when attack time distributions are known.

【Keywords】:

511. Human-Aware Plan Recognition.

Paper Link】 【Pages】:3686-3693

【Authors】: Hankz Hankui Zhuo

【Abstract】: Plan recognition aims to recognize target plans given observed actions with history plan libraries ordomain models in hand. Despite of the success of previous plan recognition approaches, they all neglect the impact of human preferences on plans. For example, a kid in a shopping mall might prefer to "executing'' a plan of playing in water park, while an adult might prefer to "executing'' a plan of having a cup of coffee. It could be helpful for improving the plan recognition accuracy to consider human preferences on plans. We assume there are historical rating scores on a subset of plans given by humans, and action sequences observed on humans. We estimate unknown rating scores based on rating scores in hand using an off-the-shelf collaborative filtering approach. We then discover plans to best explain the estimated rating scores and observed actions using a skip-gram based approach. In the experiment, we evaluate our approach in three planning domains to demonstrate its effectiveness.

【Keywords】:

Reasoning under Uncertainty 16

512. Minimal Undefinedness for Fuzzy Answer Sets.

Paper Link】 【Pages】:3694-3700

【Authors】: Mario Alviano ; Giovanni Amendola ; Rafael Peñaloza

【Abstract】: Fuzzy Answer Set Programming (FASP) combines the non-monotonic reasoning typical of Answer Set Programming with the capability of Fuzzy Logic to deal with imprecise information and paraconsistent reasoning. In the context of paraconsistent reasoning, the fundamental principle of minimal undefinedness states that truth degrees close to 0 and 1 should be preferred to those close to 0.5, to minimize the ambiguity of the scenario. The aim of this paper is to enforce such a principle in FASP through the minimization of a measure of undefinedness. Algorithms that minimize undefinedness of fuzzy answer sets are presented, and implemented.

【Keywords】: Fuzzy Answer Set Programming; Measures for Undefinedness; Fuzzy Logic

513. Open-Universe Weighted Model Counting.

Paper Link】 【Pages】:3701-3708

【Authors】: Vaishak Belle

【Abstract】: Weighted model counting (WMC) has recently emerged as an effective and general approach to probabilistic inference, offering a computational framework for encoding a variety of formalisms, such as factor graphs and Bayesian networks.The advent of large-scale probabilistic knowledge bases has generated further interest in relational probabilistic representations, obtained by according weights to first-order formulas, whose semantics is given in terms of the ground theory, and solved by WMC. A fundamental limitation is that the domain of quantification, by construction and design, is assumed to be finite, which is at odds with areas such as vision and language understanding, where the existence of objects must be inferred from raw data. Dropping the finite-domain assumption has been known to improve the expressiveness of a first-order language for open-universe purposes, but these languages, so far, have eluded WMC approaches. In this paper, we revisit relational probabilistic models over an infinite domain, and establish a number of results that permit effective algorithms. We demonstrate this language on a number of examples, including a parameterized version of Pearl's Burglary-Earthquake-Alarm Bayesian network.

【Keywords】: probabilistic inference; model counting; open universes; relational graphical models

514. Deterministic versus Probabilistic Methods for Searching for an Evasive Target.

Paper Link】 【Pages】:3709-3715

【Authors】: Sara Bernardini ; Maria Fox ; Derek Long ; Chiara Piacentini

【Abstract】: Several advanced applications of autonomous aerial vehicles in civilian and military contexts involve a searching agent with imperfect sensors that seeks to locate a mobile target in a given region. Effectively managing uncertainty is key to solving the related search problem, which is why all methods devised so far hinge on a probabilistic formulation of the problem and solve it through branch-and-bound algorithms, Bayesian filtering or POMDP solvers. In this paper, we consider a class of hard search tasks involving a target that exhibits an intentional evasive behaviour and moves over a large geographical area, i.e., a target that is particularly difficult to track down and uncertain to locate. We show that, even for such a complex problem, it is advantageous to compile its probabilistic structure into a deterministic model and use standard deterministic solvers to find solutions. In particular, we formulate the search problem for our uncooperative target both as a deterministic automated planning task and as a constraint programming task and show that in both cases our solution outperforms POMDPs methods.

【Keywords】: UAVs; planning; CP; probabilistic reasoning; POMDPs

515. Non-Deterministic Planning with Temporally Extended Goals: LTL over Finite and Infinite Traces.

Paper Link】 【Pages】:3716-3724

【Authors】: Alberto Camacho ; Eleni Triantafillou ; Christian J. Muise ; Jorge A. Baier ; Sheila A. McIlraith

【Abstract】: Temporally extended goals are critical to the specification of a diversity of real-world planning problems. Here we examine the problem of non-deterministic planning with temporally extended goals specified in linear temporal logic (LTL), interpreted over either finite or infinite traces. Unlike existing LTL planners, we place no restrictions on our LTL formulae beyond those necessary to distinguish finite from infinite interpretations. We generate plans by compiling LTL temporally extended goals into problem instances described in the Planning Domain Definition Language that are solved by a state-of-the-art fully observable non-deterministic planner. We propose several different compilations based on translations of LTL to (Büchi) alternating or (Büchi) non-deterministic finite state automata, and evaluate various properties of the competing approaches. We address a diverse spectrum of LTL planning problems that, to this point, had not been solvable using AI planning techniques, and do so in a manner that demonstrates highly competitive performance.

【Keywords】: LTL, FOND, infinite, planning, synthesis, temporally extended goals

516. Optimizing Expectation with Guarantees in POMDPs.

Paper Link】 【Pages】:3725-3732

【Authors】: Krishnendu Chatterjee ; Petr Novotný ; Guillermo A. Pérez ; Jean-François Raskin ; Dorde Zikelic

【Abstract】: A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the “expectation” and “threshold” approaches and consider a “guaranteed payoff optimization (GPO)” problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks.

【Keywords】: Partially-observable Markov decision processes; Discounted payoff; Probabilistic planning; Verification

517. Latent Dependency Forest Models.

Paper Link】 【Pages】:3733-3739

【Authors】: Shanbo Chu ; Yong Jiang ; Kewei Tu

【Abstract】: Probabilistic modeling is one of the foundations of modern machine learning and artificial intelligence. In this paper, we propose a novel type of probabilistic models named latent dependency forest models (LDFMs). A LDFM models the dependencies between random variables with a forest structure that can change dynamically based on the variable values. It is therefore capable of modeling context-specific independence. We parameterize a LDFM using a first-order non-projective dependency grammar. Learning LDFMs from data can be formulated purely as a parameter learning problem, and hence the difficult problem of model structure learning is circumvented. Our experimental results show that LDFMs are competitive with existing probabilistic models.

【Keywords】: probabilistic modeling; latent dependency forest model; non-projective dependency grammar

518. Causal Effect Identification by Adjustment under Confounding and Selection Biases.

Paper Link】 【Pages】:3740-3746

【Authors】: Juan D. Correa ; Elias Bareinboim

【Abstract】: Controlling for selection and confounding biases are two of the most challenging problems in the empirical sciences as well as in artificial intelligence tasks. Covariate adjustment (or, Backdoor Adjustment) is the most pervasive technique used for controlling confounding bias, but the same is oblivious to issues of sampling selection. In this paper, we introduce a generalized version of covariate adjustment that simultaneously controls for both confounding and selection biases. We first derive a sufficient and necessary condition for recovering causal effects using covariate adjustment from an observational distribution collected under preferential selection. We then relax this setting to consider cases when additional, unbiased measurements over a set of covariates are available for use (e.g., the age and gender distribution obtained from census data). Finally, we present a complete algorithm with polynomial delay to find all sets of admissible covariates for adjustment when confounding and selection biases are simultaneously present and unbiased data is available.

【Keywords】: effects of interventions; identifiability; control of confounding; sampling selection bias; back-door criterion

519. The Linearization of Belief Propagation on Pairwise Markov Random Fields.

Paper Link】 【Pages】:3747-3753

【Authors】: Wolfgang Gatterbauer

【Abstract】: Belief Propagation (BP) is a widely used approximation for exact probabilistic inference in graphical models, such as Markov Random Fields (MRFs). In graphs with cycles, however, no exact convergence guarantees for BP are known, in general. For the case when all edges in the MRF carry the same symmetric, doubly stochastic potential, recent works have proposed to approximate BP by linearizing the update equations around default values, which was shown to work well for the problem of node classification. The present paper generalizes all prior work and derives an approach that approximates loopy BP on any pairwise MRF with the problem of solving a linear equation system. This approach combines exact convergence guarantees and a fast matrix implementation with the ability to model heterogenous networks. Experiments on synthetic graphs with planted edge potentials show that the linearization has comparable labeling accuracy as BP for graphs with weak potentials, while speeding-up inference by orders of magnitude.

【Keywords】: Belief propagation; semi-supervised learning; node classification; Markov Random Fields

520. The Kernel Kalman Rule - Efficient Nonparametric Inference with Recursive Least Squares.

Paper Link】 【Pages】:3754-3760

【Authors】: Gregor H. W. Gebhardt ; Andras Gabor Kupcsik ; Gerhard Neumann

【Abstract】: Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems.Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes' rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR.The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update.We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF).We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.

【Keywords】: kernel methods; nonparametric inference; RKHS; filtering; model learning; probabilistic reasoning

521. Misspecified Linear Bandits.

Paper Link】 【Pages】:3761-3767

【Authors】: Avishek Ghosh ; Sayak Ray Chowdhury ; Aditya Gopalan

【Abstract】: We consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features. It is, however, of interest to investigate the impact of potential misspecification in linear bandit models, where the expected rewards are perturbed away from the linear subspace determined by the arms features. Although OFUL has recently been shown to be robust to relatively small deviations from linearity, we show that any linear bandit algorithm that enjoys optimal regret performance in the perfectly linear setting (e.g., OFUL) must suffer linear regret under a sparse additive perturbation of the linear model. In an attempt to overcome this negative result,we define a natural class of bandit models characterized by a non-sparse deviation from linearity. We argue that the OFUL algorithm can fail to achieve sublinear regret even under models that have non-sparse deviation. We finally develop a novel bandit algorithm, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm. For perfectly linear bandit models, the algorithm provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling. Numerical experiments on synthetic data, and on recommendation data from the public Yahoo! Learning toRank Challenge dataset, empirically support our findings.

【Keywords】: Multi-armed Bandit; Linear Bandit; Model Misspecification

522. Reasoning about Cognitive Trust in Stochastic Multiagent Systems.

Paper Link】 【Pages】:3768-3774

【Authors】: Xiaowei Huang ; Marta Zofia Kwiatkowska

【Abstract】: We consider the setting of stochastic multiagent systems and formulate an automated verification framework for quantifying and reasoning about agents' trust. To capture human trust, we work with a cognitive notion of trust defined as a subjective evaluation that agent A makes about agent B's ability to complete a task, which in turn may lead to a decision by A to rely on B. We propose a probabilistic rational temporal logic PRTL, which extends the logic PCTL with reasoning about mental attitudes (beliefs, goals and intentions), and includes novel operators that can express concepts of social trust such as competence, disposition and dependence. The logic can express, for example, that "agent A will eventually trust agent B with probability at least p that B will be have in a way that ensures the successful completion of a given task". We study the complexity of the automated verification problem and, while the general problem is undecidable, we identify restrictions on the logic and the system that result in decidable, or even tractable, subproblems.

【Keywords】: Probabilistic logic; temporal logic; multiagent systems; autonomous systems; cognitive modelling; trust; model checking

Paper Link】 【Pages】:3775-3782

【Authors】: Radu Marinescu ; Junkyu Lee ; Alexander T. Ihler ; Rina Dechter

【Abstract】: We introduce new anytime search algorithms that combine best-first with depth-first search into hybrid schemes for Marginal MAP inference in graphical models. The main goal is to facilitate the generation of upper bounds (via the best-first part) alongside the lower bounds of solutions (via the depth-first part) in an anytime fashion. We compare against two of the best current state-of-the-art schemes and show that our best+depth search scheme produces higher quality solutions faster while also producing a bound on their accuracy, which can be used to measure solution quality during search. An extensive empirical evaluation demonstrates the effectiveness of our new methods which enjoy the strength of best-first (optimality of search) and of depth-first (memory robustness), leading to solutions for difficult instances where previous solvers were unable to find even a single solution.

【Keywords】: Graphical models; Bayesian networks; Markov networks;Marginal MAP Inference;Search

524. Multi-Objective Influence Diagrams with Possibly Optimal Policies.

Paper Link】 【Pages】:3783-3789

【Authors】: Radu Marinescu ; Abdul Razak ; Nic Wilson

【Abstract】: The formalism of multi-objective influence diagrams has recently been developed for modeling and solving sequential decision problems under uncertainty and multiple objectives. Since utility values representing the decision maker's preferences are only partially ordered (e.g., by the Pareto order) we no longer have a unique maximal value of expected utility, but a set of them. Computing the set of maximal values of expected utility and the corresponding policies can be computationally very challenging. In this paper, we consider alternative notions of optimality, one of the most important one being the notion of possibly optimal, namely optimal in at least one scenario compatible with the inter-objective tradeoffs. We develop a variable elimination algorithm for computing the set of possibly optimal expected utility values, prove formally its correctness, and compare variants of the algorithm experimentally.

【Keywords】: Multi-objective optimisation; Influence Diagrams

525. Hindsight Optimization for Hybrid State and Action MDPs.

Paper Link】 【Pages】:3790-3796

【Authors】: Aswin Raghavan ; Scott Sanner ; Roni Khardon ; Prasad Tadepalli ; Alan Fern

【Abstract】: Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.

【Keywords】: Markov Decision Processes; Factored MDP; Hybrid MDP; Hindsight Optimization; Planning under Uncertainty;

526. I See What You See: Inferring Sensor and Policy Models of Human Real-World Motor Behavior.

Paper Link】 【Pages】:3797-3803

【Authors】: Felix Schmitt ; Hans-Joachim Bieg ; Michael Herman ; Constantin A. Rothkopf

【Abstract】: Human motor behavior is naturally guided by sensing the environment. To predict such sensori-motor behavior, it is necessary to model what is sensed and how actions are chosen based on the obtained sensory measurements. Although several models of human sensing haven been proposed, rarely data of the assumed sensory measurements is available. This makes statistical estimation of sensor models problematic. To overcome this issue, we propose an abstract structural estimation approach building on the ideas of Herman et al.'s Simultaneous Estimation of Rewards and Dynamics (SERD). Assuming optimal fusion of sensory information and rational choice of actions the proposed method allows to infer sensor models even in absence of data of the sensory measurements. To the best of our knowledge, this work presents the first general approach for joint inference of sensor and policy models. Furthermore, we consider its concrete implementation in the important class of sensor scheduling linear quadratic Gaussian problems. Finally, the effectiveness of the approach is demonstrated for prediction of the behavior of automobile drivers. Specifically, we model the glance and steering behavior of driving in the presence of visually demanding secondary tasks. The results show, that prediction benefits from the inference of sensor models. This is the case, especially, if also information is considered, that is contained in gaze switching behavior.

【Keywords】: Inverse Optimal Control; Linear Quadratic Gaussian Problem; Sensor Scheduling; Partial Observable Markov Decision Process; Perception by Bayesian Inference; Automobile Drivers;

527. Solving Constrained Combinatorial Optimisation Problems via MAP Inference without High-Order Penalties.

Paper Link】 【Pages】:3804-3811

【Authors】: Zhen Zhang ; Qinfeng Shi ; Julian J. McAuley ; Wei Wei ; Yanning Zhang ; Rui Yao ; Anton van den Hengel

【Abstract】: Solving constrained combinatorial optimisation problems via MAP inference is often achieved by introducing extra potential functions for each constraint. This can result in very high order potentials, e.g. a 2nd-order objective with pairwise potentials and a quadratic constraint over all N variables would correspond to an unconstrained objective with an order-N potential. This limits the practicality of such an approach, since inference with high order potentials is tractable only for a few special classes of functions. We propose an approach which is able to solve constrained combinatorial problems using belief propagation without increasing the order. For example, in our scheme the 2nd-order problem above remains order 2 instead of order N. Experiments on applications ranging from foreground detection, image reconstruction, quadratic knapsack, and the M-best solutions problem demonstrate the effectiveness and efficiency of our method. Moreover, we show several situations in which our approach outperforms commercial solvers like CPLEX and others designed for specific constrained MAP inference problems.

【Keywords】: MAP inference; Graphical Model; Hard Constraints

Robotics 7

528. Deep Learning Quadcopter Control via Risk-Aware Active Learning.

Paper Link】 【Pages】:3812-3818

【Authors】: Olov Andersson ; Mariusz Wzorek ; Patrick Doherty

【Abstract】: Modern optimization-based approaches to control increasingly allow automatic generation of complex behavior from only a model and an objective. Recent years has seen growing interest in fast solvers to also allow real-time operation on robots, but the computational cost of such trajectory optimization remains prohibitive for many applications. In this paper we examine a novel deep neural network approximation and validate it on a safe navigation problem with a real nano-quadcopter. As the risk of costly failures is a major concern with real robots, we propose a risk-aware resampling technique. Contrary to prior work this active learning approach is easy to use with existing solvers for trajectory optimization, as well as deep learning. We demonstrate the efficacy of the approach on a difficult collision avoidance problem with non-cooperative moving obstacles. Our findings indicate that the resulting neural network approximations are least 50 times faster than the trajectory optimizer while still satisfying the safety requirements. We demonstrate the potential of the approach by implementing a synthesized deep neural network policy on the nano-quadcopter microcontroller.

【Keywords】: Deep Learning;Trajectory Optimization;Behavior and Control;Robotics

529. Latent Dirichlet Allocation for Unsupervised Activity Analysis on an Autonomous Mobile Robot.

Paper Link】 【Pages】:3819-3826

【Authors】: Paul Duckworth ; Muhannad Al-Omari ; James Charles ; David C. Hogg ; Anthony G. Cohn

【Abstract】: For autonomous robots to collaborate on joint tasks with humans they require a shared understanding of an observed scene. We present a method for unsupervised learning of common human movements and activities on an autonomous mobile robot, which generalises and improves on recent results. Our framework encodes multiple qualitative abstractions of RGBD video from human observations and does not require external temporal segmentation. Analogously to information retrieval in text corpora, each human detection is modelled as a random mixture of latent topics. A generative probabilistic technique is used to recover topic distributions over an auto-generated vocabulary of discrete, qualitative spatio-temporal code words. We show that the emergent categories align well with human activities as interpreted by a human. This is a particularly challenging task on a mobile robot due to the varying camera viewpoints which lead to incomplete, partial and occluded human detections.

【Keywords】: Unsupervised Learning; Qualitative Spatio-Temporal Representations; Mobile Robotics; Plan and Activity Recognition; Latent Dirichlet Allocation;

530. Unsupervised Feature Learning for 3D Scene Reconstruction with Occupancy Maps.

Paper Link】 【Pages】:3827-3833

【Authors】: Vitor Campanholo Guizilini ; Fabio Tozeto Ramos

【Abstract】: This paper addresses the task of unsupervised feature learning for three-dimensional occupancy mapping, as a way to segment higher-level structures based on raw unorganized point cloud data. In particular, we focus on detecting planar surfaces, which are common in most structured or semi-structured environments. This segmentation is then used to minimize the amount of parameters necessary to properly create a 3D occupancy model of the surveyed space, thus increasing computational speed and decreasing memory requirements. As the 3D modeling tool, an extension to Hilbert Maps was selected, since it naturally uses a feature-based representation of the environment to achieve real-time performance. Experiments conducted in simulated and real large-scale datasets show a substantial gain in performance, while decreasing the amount of stored information by orders of magnitude without sacrificing accuracy.

【Keywords】: Mapping; Kernel Methods; Scene Reconstruction; 3D models; Feature Learning

531. Grounded Action Transformation for Robot Learning in Simulation.

Paper Link】 【Pages】:3834-3840

【Authors】: Josiah P. Hanna ; Peter Stone

【Abstract】: Robot learning in simulation is a promising alternative to the prohibitive sample cost of learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the physical robot. Grounded simulation learning (GSL) promises to address this issue by altering the simulator to better match the real world. This paper proposes a new algorithm for GSL -- Grounded Action Transformation -- and applies it to learning of humanoid bipedal locomotion. Our approach results in a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk. We further evaluate our methodology in controlled experiments using a second, higher-fidelity simulator in place of the real world. Our results contribute to a deeper understanding of grounded simulation learning and demonstrate its effectiveness for learning robot control policies.

【Keywords】: Grounded simulation learning; Robotic bipedal walking; Transfer from simulation

532. A Diversified Generative Latent Variable Model for WiFi-SLAM.

Paper Link】 【Pages】:3841-3847

【Authors】: Hao Xiong ; Dacheng Tao

【Abstract】: WiFi-SLAM aims to map WiFi signals within an unknown environment while simultaneously determining the location of a mobile device. This localization method has been extensively used in indoor, space, undersea, and underground environments. For the sake of accuracy, most methods label the signal readings against ground truth locations. However, this is impractical in large environments, where it is hard to collect and maintain the data. Some methods use latent variable models to generate latent-space locations of signal strength data, an advantage being that no prior labeling of signal strength readings and their physical locations is required. However, the generated latent variables cannot cover all wireless signal locations and WiFi-SLAM performance is significantly degraded. Here we propose the diversified generative latent variable model (DGLVM) to overcome these limitations. By building a positive-definite kernel function, a diversity-encouraging prior is introduced to render the generated latent variables non-overlapping, thus capturing more wireless signal measurements characteristics. The defined objective function is then solved by variational inference. Our experiments illustrate that the method performs WiFi localization more accurately than other label-free methods.

【Keywords】: SLAM; Diversity Encouraging Prior; Latent Variable Model

533. Associate Latent Encodings in Learning from Demonstrations.

Paper Link】 【Pages】:3848-3854

【Authors】: Hang Yin ; Francisco S. Melo ; Aude Billard ; Ana Paiva

【Abstract】: We contribute a learning from demonstration approach for robots to acquire skills from multi-modal high-dimensional data. Both latent representations and associations of different modalities are proposed to be jointly learned through an adapted variational auto-encoder. The implementation and results are demonstrated in a robotic handwriting scenario, where the visual sensory input and the arm joint writing motion are learned and coupled. We show the latent representations successfully construct a task manifold for the observed sensor modalities. Moreover, the learned associations can be exploited to directly synthesize arm joint handwriting motion from an image input in an end-to-end manner. The advantages of learning associative latent encodings are further highlighted with the examples of inferring upon incomplete input images. A comparison with alternative methods demonstrates the superiority of the present approach in these challenging tasks.

【Keywords】: MLA: Machine Learning Applications (General/other); ML: Deep Learning/Neural Networks; ROB: Behavior and Control

534. Dynamically Constructed (PO)MDPs for Adaptive Robot Planning.

Paper Link】 【Pages】:3855-3863

【Authors】: Shiqi Zhang ; Piyush Khandelwal ; Peter Stone

【Abstract】: To operate in human-robot coexisting environments, intelligent robots need to simultaneously reason with commonsense knowledge and plan under uncertainty. Markov decision processes (MDPs) and partially observable MDPs (POMDPs), are good at planning under uncertainty toward maximizing long-term rewards; P-LOG, a declarative programming language under Answer Set semantics, is strong in commonsense reasoning. In this paper, we present a novel algorithm called iCORPP to dynamically reason about, and construct (PO)MDPs using P-LOG. iCORPP successfully shields exogenous domain attributes from (PO)MDPs, which limits computational complexity and enables (PO)MDPs to adapt to the value changes these attributes produce. We conduct a number of experimental trials using two example problems in simulation and demonstrate iCORPP on a real robot. Results show significant improvements compared to competitive baselines.

【Keywords】:

Paper Link】 【Pages】:3864-3870

【Authors】: Thomas Caridroit ; Jean-Marie Lagniez ; Daniel Le Berre ; Tiago de Lima ; Valentin Montmirail

【Abstract】: We present a SAT-based approach for solving the modal logic S5-satisfiability problem. That problem being NP-complete, the translation into SAT is not a surprise. Our contribution is to greatly reduce the number of propositional variables and clauses required to encode the problem. We first present a syntactic property called diamond degree. We show that the size of an S5-model satisfying a formula phi can be bounded by its diamond degree. Such measure can thus be used as an upper bound for generating a SAT encoding for the S5-satisfiability of that formula. We also propose a lightweight caching system which allows us to further reduce the size of the propositional formula.We implemented a generic SAT-based approach within the modal logic S5 solver S52SAT. It allowed us to compare experimentally our new upper-bound against previously known one, i.e. the number of modalities of phi and to evaluate the effect of our caching technique. We also compared our solver againstexisting modal logic S5 solvers. The proposed approach outperforms previous ones on the benchmarks used. These promising results open interesting research directions for the practical resolution of others modal logics (e.g. K, KT, S4)

【Keywords】: Modal Logic S5; Translation into SAT; Propositional Logic; NP-complete; Benchmarks

536. A BTP-Based Family of Variable Elimination Rules for Binary CSPs.

Paper Link】 【Pages】:3871-3877

【Authors】: Achref El Mouelhi

【Abstract】: The study of broken-triangles is becoming increasingly ambitious, by both solving constraint satisfaction problems (CSPs) in polynomial time and reducing search space size through value merging or variable elimination. Considerable progress has been made in extending this important concept, such as dual broken-triangle and weakly broken-triangle, in order to maximize the number of captured tractable CSP instances and/or the number of merged values. Specifically, m -wBTP allows to merge more values than BTP. k -BTP, WBTP and m -BTP permit to capture more tractable instances than BTP. Here, we introduce a new weaker form of BTP, which will be called m -fBTP for flexible broken-triangle property. m -fBTP allows on the one hand to eliminate more variables than BTP while preserving satisfiability and on the other to define new bigger tractable class for which arc consistency is a decision procedure. Likewise, m -fBTP permits to merge more values than BTP but less than m -wBTP.

【Keywords】: Constraint satisfaction; arc consistency; variable elimination; broken triangle

537. Algorithms for Deciding Counting Quantifiers over Unary Predicates.

Paper Link】 【Pages】:3878-3884

【Authors】: Marcelo Finger ; Glauber De Bona

【Abstract】: We study algorithms for fragments of first order logic ex- tended with counting quantifiers, which are known to be highly complex in general. We propose a fragment over unary predicates that is NP-complete and for which there is a nor- mal form where Counting Quantification sentences have a single Unary predicate, thus call it the CQU fragment. We provide an algebraic formulation of the CQU satisfiability problem in terms of Integer Linear Programming based on which two algorithms are proposed, a direct reduction to SAT instances and an Integer Linear Programming version extended with a column generation mechanism. The latter is shown to lead to a viable implementation and experiments shows this algorithm presents a phase transition behavior.

【Keywords】: Counting Quantifier; Integral Constraits; SatisfiabilityC

538. Maximum Model Counting.

Paper Link】 【Pages】:3885-3892

【Authors】: Daniel J. Fremont ; Markus N. Rabe ; Sanjit A. Seshia

【Abstract】: We introduce the problem Max#SAT, an extension of model counting (#SAT). Given a formula over sets of variables X, Y, and Z, the Max#SAT problem is to maximize over the variables X the number of assignments to Y that can be extended to a solution with some assignment to Z. We demonstrate that Max#SAT has applications in many areas, showing how it can be used to solve problems in probabilistic inference (marginal MAP), planning, program synthesis, and quantitative information flow analysis. We also give an algorithm which by making only polynomially many calls to an NP oracle can approximate the maximum count to within any desired multiplicative error. The NP queries needed are relatively simple, arising from recent practical approximate model counting and sampling algorithms, which allows our technique to be effectively implemented with a SAT solver. Through several experiments we show that our approach can be successfully applied to interesting problems.

【Keywords】: optimization; quantitative information flow; probabilistic inference; program synthesis

539. Phase Transitions for Scale-Free SAT Formulas.

Paper Link】 【Pages】:3893-3899

【Authors】: Tobias Friedrich ; Anton Krohmer ; Ralf Rothenberger ; Andrew M. Sutton

【Abstract】: Recently, a number of non-uniform random satisfiability models have been proposed that are closer to practical satisfiability problems in some characteristics. In contrast to uniform random Boolean formulas, scale-free formulas have a variable occurrence distribution that follows a power law. It has been conjectured that such a distribution is a more accurate model for some industrial instances than the uniform random model. Though it seems that there is already an awareness of a threshold phenomenon in such models, there is still a complete picture lacking. In contrast to the uniform model, the critical density threshold does not lie at a single point, but instead exhibits a functional dependency on the power-law exponent. For scale-free formulas with clauses of length k=2, we give a lower bound on the phase transition threshold as a function of the scaling parameter. We also perform computational studies that suggest our bound is tight and investigate the critical density for formulas with higher clause lengths. Similar to the uniform model, on formulas with k>=3, we find that the phase transition regime corresponds to a set of formulas that are difficult to solve by backtracking search.

【Keywords】: propositional satisfiability; random sat models; phase transitions; scale-free networks

540. The Opacity of Backbones.

Paper Link】 【Pages】:3900-3906

【Authors】: Lane A. Hemaspaandra ; David E. Narváez

【Abstract】: A backbone of a boolean formula F is a collection S of its variables for which there is a unique partial assignment a S such that F [ a S ] is satisfiable (Monasson et al. 1999; Williams, Gomes, and Selman 2003).  This paper studies the nontransparency of backbones.  We show that, under the widely believed assumption that integer factoring is hard, there exist sets of boolean formulas that have obvious, nontrivial backbones yet finding the values, a S , of those backbones is intractable.  We also show that, under the same assumption, there exist sets of boolean formulas that obviously have large backbones yet producing such a backbone S is intractable.  Further, we show that if integer factoring is not merely worst-case hard but is frequently hard, as is widely believed, then the frequency of hardness in our two results is not too much less than that frequency.

【Keywords】: backbones; SAT; complexity

541. Between Subgraph Isomorphism and Maximum Common Subgraph.

Paper Link】 【Pages】:3907-3914

【Authors】: Ruth Hoffmann ; Ciaran McCreesh ; Craig Reilly

【Abstract】: When a small pattern graph does not occur inside a larger target graph, we can ask how to find "as much of the pattern as possible" inside the target graph. In general, this is known as the maximum common subgraph problem, which is much more computationally challenging in practice than subgraph isomorphism. We introduce a restricted alternative, where we ask if all but k vertices from the pattern can be found in the target graph. This allows for the development of slightly weakened forms of certain invariants from subgraph isomorphism which are based upon degree and number of paths.  We show that when k is small, weakening the invariants still retains much of their effectiveness. We are then able to solve this problem on the standard problem instances used to benchmark subgraph isomorphism algorithms, despite these instances being too large for current maximum common subgraph algorithms to handle. Finally, by iteratively increasing k, we obtain an algorithm which is also competitive for the maximum common subgraph

【Keywords】:

542. Should Algorithms for Random SAT and Max-SAT Be Different?

Paper Link】 【Pages】:3915-3921

【Authors】: Sixue Liu ; Gerard de Melo

【Abstract】: We analyze to what extent the random SAT and Max-SAT problems differ in their properties. Our findings suggest that for random k-CNF with ratio in a certain range, Max-SAT can be solved by any SAT algorithm with subexponential slowdown, while for formulae with ratios greater than some constant, algorithms under the random walk framework require substantially different heuristics. In light of these results, we propose a novel probabilistic approach for random Max-SAT called ProMS. Experimental results illustrate that ProMS outperforms many state-of-the-art local search solvers on random Max-SAT benchmarks.

【Keywords】: SAT and Max-SAT; Random Formula; Local Search Algorithm

543. Soft and Cost MDD Propagators.

Paper Link】 【Pages】:3922-3928

【Authors】: Guillaume Perez ; Jean-Charles Régin

【Abstract】: Recent developments of efficient propagators, operations and creation methods for MDDs allow us to directly build efficient MDD-based models, without the need for intermediate data structures. In this paper, we take another step in this direction by improving the propagators of cost MDDs. In addition, we introduce a soft MDD propagator in order to deal with unsatisfiable problems. This directly offers cost and soft versions for table constraints and any constraints which can be represented by an MDD (regular, slide, knapsack...).

【Keywords】: MDD; propagator; global constraints; arc consistency

544. Rigging Nearly Acyclic Tournaments Is Fixed-Parameter Tractable.

Paper Link】 【Pages】:3929-3935

【Authors】: M. S. Ramanujan ; Stefan Szeider

【Abstract】: Single-elimination tournaments (or knockout tournaments) are a popular format in sports competitions that is also widely used for decision making and elections. In this paper we study the algorithmic problem of manipulating the outcome of a tournament. More specifically, we study the problem of finding a seeding of the players such that a certain player wins the resulting tournament. The problem is known to be NP-hard in general. In this paper we present an algorithm for this problem that exploits structural restrictions on the tournament. More specifically, we establish that the problem is fixed-parameter tractable when parameterized by the size of a smallest feedback arc set of the tournament (interpreting the tournament as an oriented complete graph). This is a natural parameter because most problems on tournaments (including this one) are either trivial or easily solvable on acyclic tournaments, leading to the question — what about nearly acyclic tournaments or tournaments with a small feedback arc set? Our result significantly improves upon a recent algorithm by Aziz et al. (2014) whose running time is bounded by an exponential function where the size of a smallest feedback arc set appears in the exponent and the base is the number of players.

【Keywords】: Tournament Fixing; Agenda Control; Fixed Parameter Tractability

545. RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem.

Paper Link】 【Pages】:3936-3943

【Authors】: Saeedeh Shekarpour ; Edgard Marx ; Sören Auer ; Amit P. Sheth

【Abstract】: For non-expert users, a textual query is the most popular and simple means for communicating with a retrieval or question answering system.However, there is a risk of receiving queries which do not match with the background knowledge.Query expansion and query rewriting are solutions for this problem but they are in danger of potentially yielding a large number of irrelevant words, which in turn negatively influences runtime as well as accuracy.In this paper, we propose a new method for automatic rewriting input queries on graph-structured RDF knowledge bases.We employ a Hidden Markov Model to determine the most suitable derived words from linguistic resources.We introduce the concept of triple-based co-occurrence for recognizing co-occurred words in RDF data.This model was bootstrapped with three statistical distributions.Our experimental study demonstrates the superiority of the proposed approach to the traditional n-gram model.

【Keywords】: Query rewriting; Hidden Markov model; n-gram language model; triple-based co-occurence

Paper Link】 【Pages】:3944-3950

【Authors】: Cornelis Jan van Leeuwen ; Przemyslaw Pawelczak

【Abstract】: We propose a novel incomplete cooperative algorithm for distributed constraint optimization problems (DCOPs) denoted as Cooperative Constraint Approximation (CoCoA). The key strategy of the algorithm is to use a semi-greedy approach in which knowledge is distributed amongst neighboring agents, and assigning a value only once instead of an iterative approach. Furthermore, CoCoA uses a unique-first approach to improve the solution quality. It is designed such that it can solve DCOPs as well as Asymmetric DCOPS, with only few messages being communicated between neighboring agents. Experimentally, through evaluating graph coloring problems, randomized (A)DCOPs, and a sensor network communication problem, we show that CoCoA is able to very quickly find solutions of high quality with a smaller communication overhead than state-of-the-art DCOP solvers such as DSA, MGM-2, ACLS, MCS-MGM and Max-Sum. In our asymmetric use case problem of a sensor network, we show that CoCoA not only finds the best solution, but also finds this solution faster than any other algorithm.

【Keywords】: DCOP; Constraint Satisfaction; Incomplete; Non-iterative

547. Extending Compact-Table to Negative and Short Tables.

Paper Link】 【Pages】:3951-3957

【Authors】: Hélène Verhaeghe ; Christophe Lecoutre ; Pierre Schaus

【Abstract】: Table constraints are very useful for modeling combinatorial constrained problems, and thus play an important role in Constraint Programming (CP). During the last decade, many algorithms have been proposed for enforcing the property known as Generalized Arc Consistency (GAC) on such constraints. A state-of-the art GAC algorithm called Compact-Table (CT), which has been recently proposed, significantly outperforms all previously proposed algorithms. In this paper, we extend this algorithm in order to deal with both short supports and negative tables, i.e., tables that contain universal values and conflicts. Our experimental results show the interest of using this fast general algorithm.

【Keywords】: Table constraints;Extensional constraints;Negative table;Short tuple;Filtering algorithm;Reversible-sparse-bitset

548. General Bounds on Satisfiability Thresholds for Random CSPs via Fourier Analysis.

Paper Link】 【Pages】:3958-3966

【Authors】: Colin Wei ; Stefano Ermon

【Abstract】: Random constraint satisfaction problems (CSPs) have been widely studied both in AI and complexity theory. Empirically and theoretically, many random CSPs have been shown to exhibit a phase transition. As the ratio of constraints to variables passes certain thresholds, they transition from being almost certainly satisfiable to unsatisfiable. The exact location of this threshold has been thoroughly investigated, but only for certain common classes of constraints. In this paper, we present new bounds for the location of these thresholds in boolean CSPs. Our main contribution is that our bounds are fully general, and apply to any fixed constraint function that could be used to generate an ensemble of random CSPs. These bounds rely on a novel Fourier analysis and can be easily computed from the Fourier spectrum of a constraint function. Our bounds are within a constant factor of the exact threshold location for many well-studied random CSPs. We demonstrate that our bounds can be easily instantiated to obtain thresholds for many constraint functions that had not been previously studied, and evaluate them experimentally.

【Keywords】: random constraint satisfaction problems; general bounds on satisfiability thresholds; Fourier analysis of random constraints

Vision 53

549. Regularized Diffusion Process for Visual Retrieval.

Paper Link】 【Pages】:3967-3973

【Authors】: Song Bai ; Xiang Bai ; Qi Tian ; Longin Jan Latecki

【Abstract】: Diffusion process has advanced visual retrieval greatly owing to its capacity in capturing the geometry structure of the underlying manifold. Recent studies (Donoser and Bischof 2013) have experimentally demonstrated that diffusion process on the tensor product graph yields better retrieval performances than that on the original affinity graph. However, the principle behind this kind of diffusion process remains unclear, i.e., what kind of manifold structure is captured and how it is reflected. In this paper, we propose a new variant o diffusion process, which also operates on a tensor product graph. It is defined in three equivalent formulations (regularization framework, iterative framework and limit framework, respectively). Based on our study, three insightful conclusions are drawn which theoretically explain how this kind of diffusion process can better reveal the intrinsic relationship between objects. Besides, extensive experimental results on various retrieval tasks testify the validity of the proposed method.

【Keywords】: Image Retrieval;Re-ranking;Diffusion Process

550. Collective Deep Quantization for Efficient Cross-Modal Retrieval.

Paper Link】 【Pages】:3974-3980

【Authors】: Yue Cao ; Mingsheng Long ; Jianmin Wang ; Shichen Liu

【Abstract】: Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the first attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid networks and well-specified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the cross-modal correlation can be substantially enhanced. CDQ enables efficient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.

【Keywords】: Deep quantization; Collective quantization

551. Reference Based LSTM for Image Captioning.

Paper Link】 【Pages】:3981-3987

【Authors】: Minghai Chen ; Guiguang Ding ; Sicheng Zhao ; Hui Chen ; Qiang Liu ; Jungong Han

【Abstract】: Image captioning is an important problem in artificial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difficult to find which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might fix the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.

【Keywords】:

552. A Multi-Task Deep Network for Person Re-Identification.

Paper Link】 【Pages】:3988-3994

【Authors】: Weihua Chen ; Xiaotang Chen ; Jianguo Zhang ; Kaiqi Huang

【Abstract】: Person re-identification (ReID) focuses on identifying people across different scenes in video surveillance, which is usually formulated as a binary classification task or a ranking task in current person ReID approaches. In this paper, we take both tasks into account and propose a multi-task deep network (MTDnet) that makes use of their own advantages and jointly optimize the two tasks simultaneously for person ReID. To the best of our knowledge, we are the first to integrate both tasks in one network to solve the person ReID. We show that our proposed architecture significantly boosts the performance. Furthermore, deep architecture in general requires a sufficient dataset for training, which is usually not met in person ReID. To cope with this situation, we further extend the MTDnet and propose a cross-domain architecture that is capable of using an auxiliary set to assist training on small target sets. In the experiments, our approach outperforms most of existing person ReID algorithms on representative datasets including CUHK03, CUHK01, VIPeR, iLIDS and PRID2011, which clearly demonstrates the effectiveness of the proposed approach.

【Keywords】: Person Re-identification;Multi-task;Deep learning

553. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem.

Paper Link】 【Pages】:3995-4001

【Authors】: Ronald Clark ; Sen Wang ; Hongkai Wen ; Andrew Markham ; Niki Trigoni

【Abstract】: In this paper we present an on-manifold sequence-to-sequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the first end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches. Specifically, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain specific information which significantly mitigates drift. We show that our approach is competitive with state-of-the-art traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors.

【Keywords】: computer vision; sensor fusion; visual-inertial odometry

554. Deep Correlated Metric Learning for Sketch-based 3D Shape Retrieval.

Paper Link】 【Pages】:4002-4008

【Authors】: Guoxian Dai ; Jin Xie ; Fan Zhu ; Yi Fang

【Abstract】: The explosive growth of 3D models has led to the pressing demand for an efficient searching system. Traditional model-based search is usually not convenient, since people don't always have 3D model available by side. The sketch-based 3D shape retrieval is a promising candidate due to its simpleness and efficiency. The main challenge for sketch-based 3D shape retrieval is the discrepancy across different domains. In the paper, we propose a novel deep correlated metric learning (DCML) method to mitigate the discrepancy between sketch and 3D shape domains. The proposed DCML trains two distinct deep neural networks (one for each domain) jointly with one loss, which learns two deep nonlinear transformations to map features from both domains into a nonlinear feature space. The proposed loss, including discriminative loss and correlation loss, aims to increase the discrimination of features within each domain as well as the correlation between different domains. In the transfered space, the discriminative loss minimizes the intra-class distance of the deep transformed features and maximizes the inter-class distance of the deep transformed features at least a predefined margin within each domain, while the correlation loss focuses on minimizing the distribution discrepancy across different domains. Our proposed method is evaluated on SHREC 2013 and 2014 benchmarks, and the experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

【Keywords】: metric learning; shape retrieval

555. Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition.

Paper Link】 【Pages】:4009-4015

【Authors】: Zhen Dong ; Su Jia ; Chi Zhang ; Mingtao Pei ; Yuwei Wu

【Abstract】: In this paper, we aim to construct a deep neural network which embeds high dimensional symmetric positive definite (SPD) matrices into a more discriminative low dimensional SPD manifold. To this end, we develop two types of basic layers: a 2D fully connected layer which reduces the dimensionality of the SPD matrices, and a symmetrically clean layer which achieves non-linear mapping. Specifically, we extend the classical fully connected layer such that it is suitable for SPD matrices, and we further show that SPD matrices with symmetric pair elements setting zero operations are still symmetric positive definite. Finally, we complete the construction of the deep neural network for SPD manifold learning by stacking the two layers. Experiments on several face datasets demonstrate the effectiveness of the proposed method.

【Keywords】:

556. Sherlock: Scalable Fact Learning in Images.

Paper Link】 【Pages】:4016-4024

【Authors】: Mohamed Elhoseiny ; Scott Cohen ; Walter Chang ; Brian L. Price ; Ahmed M. Elgammal

【Abstract】: We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand an  unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g., <boy>), (2) attributes (e.g., <boy, tall>), (3) actions (e.g., <boy, playing>), and (4) interactions (e.g., <boy, riding, a horse >). Each fact has a semantic language view (e.g., < boy, playing>) and a visual view (an image with this fact). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview learning literature and also introduce two learning representation models as potential baselines. We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of more than 202,000 facts and 814,000 images. Our experiments show the advantage of relating facts by the structure by the proposed models compared to the designed baselines on bidirectional fact retrieval.

【Keywords】: Language and Vision; Scalable Learning; Scalable Recognition; zero-shot Generalization; few-shot Generalization; Structured Embedding; Interaction recognition; structured recognition

557. Robust Visual Tracking via Local-Global Correlation Filter.

Paper Link】 【Pages】:4025-4031

【Authors】: Heng Fan ; Jinhai Xiang

【Abstract】: Correlation filter has drawn increasing interest in visual tracking due to its high efficiency, however, it is sensitive to partial occlusion, which may result in tracking failure. To address this problem, we propose a novel local-global correlation filter (LGCF) for object tracking. Our LGCF model utilizes both local-based and global-based strategies, and effectively combines these two strategies by exploiting the relationship of circular shifts among local object parts and global target for their motion models to preserve the structure of object. In specific, our proposed model has two advantages: (1) Owing to the benefits of local-based mechanism, our method is robust to partial occlusion by leveraging visible parts. (2) Taking into account the relationship of motion models among local parts and global target, our LGCF model is able to capture the inner structure of object, which further improves its robustness to occlusion. In addition, to alleviate the issue of drift away from object, we incorporate temporal consistencies of both local parts and global target in our LGCF model. Besides, we adopt an adaptive method to accurately estimate the scale of object. Extensive experiments on OTB15 with 100 videos demonstrate that our tracking algorithm performs favorably against state-of-the-art methods.

【Keywords】: visual tracking; local-global correlation filter (LGCF);

558. DECK: Discovering Event Composition Knowledge from Web Images for Zero-Shot Event Detection and Recounting in Videos.

Paper Link】 【Pages】:4032-4038

【Authors】: Chuang Gan ; Chen Sun ; Ram Nevatia

【Abstract】: We address the problem of zero-shot event recognition in consumer videos. An event usually consists of multiple human-human and human-object interactions over a relative long period of time. A common approach proceeds by representing videos with banks of object and action concepts, but requires additional user inputs to specify the desired concepts per event. In this paper, we provide a fully automatic algorithm to select representative and reliable concepts for event queries. This is achieved by discovering event composition knowledge (DECK) from web images. To evaluate our proposed method, we use the standard zero-shot event detection protocol (ZeroMED), but also introduce a novel zero-shot event recounting (ZeroMER) problem to select supporting evidence of the events. Our ZeroMER formulation aims to select video snippets that are relevant and diverse. Evaluation on the challenging TRECVID MED dataset show that our proposed method achieves promising results on both tasks.

【Keywords】: zero-shot;video recognition; event detection

559. Differentiating Between Posed and Spontaneous Expressions with Latent Regression Bayesian Network.

Paper Link】 【Pages】:4039-4045

【Authors】: Quan Gan ; Siqi Nie ; Shangfei Wang ; Qiang Ji

【Abstract】: Spatial patterns embedded in human faces are crucial for differentiating posed expressions from spontaneous ones, yet they have not been thoroughly exploited in the literature. To tackle this problem, we present a generative model, i.e., Latent Regression Bayesian Network (LRBN), to effectively capture the spatial patterns embedded in facial landmark points to differentiate between posed and spontaneous facial expressions. The LRBN is a directed graphical model consisting of one latent layer and one visible layer. Due to the “explaining away“ effect in Bayesian networks, LRBN is able to capture both the dependencies among the latent variables given the observation and the dependencies among visible variables. We believe that such dependencies are crucial for faithful data representation. Specifically, during training, we construct two LRBNs to capture spatial patterns inherent in displacements of landmark points from spontaneous facial expressions and posed facial expressions respectively. During testing, the samples are classified into posed or spontaneous expressions according to their likelihoods on two models. Efficient learning and inference algorithms are proposed. Experimental results on two benchmark databases demonstrate the advantages of the proposed approach in modeling spatial patterns as well as its superior performance to the existing methods in differentiating between posed and spontaneous expressions.

【Keywords】: expression; Latent regression Bayesian network; posed; spontaneous

560. Active Video Summarization: Customized Summaries via On-line Interaction with the User.

Paper Link】 【Pages】:4046-4052

【Authors】: Ana Garcia del Molino ; Xavier Boix ; Joo-Hwee Lim ; Ah-Hwee Tan

【Abstract】: To facilitate the browsing of long videos, automatic video summarization provides an excerpt that represents its content. In the case of egocentric and consumer videos, due to their personal nature, adapting the summary to specific user's preferences is desirable. Current approaches to customizable video summarization obtain the user's preferences prior to the summarization process. As a result, the user needs to manually modify the summary to further meet the preferences. In this paper, we introduce Active Video Summarization (AVS), an interactive approach to gather the user's preferences while creating the summary. AVS asks questions about the summary to update it on-line until the user is satisfied. To minimize the interaction, the best segment to inquire next is inferred from the previous feedback. We evaluate AVS in the commonly used UTEgo dataset. We also introduce a new dataset for customized video summarization (CSumm) recorded with a Google Glass. The results show that AVS achieves an excellent compromise between usability and quality. In 41% of the videos, AVS is considered the best over all tested baselines, including summaries manually generated. Also, when looking for specific events in the video, AVS provides an average level of satisfaction higher than those of all other baselines after only six questions to the user.

【Keywords】: Video Summarization; Wearable and Consumer Videos; User's Preferences; Active Inference

561. Building an End-to-End Spatial-Temporal Convolutional Network for Video Super-Resolution.

Paper Link】 【Pages】:4053-4060

【Authors】: Jun Guo ; Hongyang Chao

【Abstract】: We propose an end-to-end deep network for video super-resolution. Our network is composed of a spatial component that encodes intra-frame visual patterns, a temporal component that discovers inter-frame relations, and a reconstruction component that aggregates information to predict details. We make the spatial component deep, so that it can better leverage spatial redundancies for rebuilding high-frequency structures. We organize the temporal component in a bidirectional and multi-scale fashion, to better capture how frames change across time. The effectiveness of the proposed approach is highlighted on two datasets, where we observe substantial improvements relative to the state of the arts.

【Keywords】: video super-resolution; spatial-temporal network; deep learning

562. Zero-Shot Recognition via Direct Classifier Learning with Transferred Samples and Pseudo Labels.

Paper Link】 【Pages】:4061-4067

【Authors】: Yuchen Guo ; Guiguang Ding ; Jungong Han ; Yue Gao

【Abstract】: As an interesting and emerging topic, zero-shot recognition (ZSR) makes it possible to train a recognition model by specifying the category's attributes when there are no labeled exemplars available. The fundamental idea for ZSR is to transfer knowledge from the abundant labeled data in different but related source classes via the class attributes. Conventional ZSR approaches adopt a two-step strategy in test stage, where the samples are projected into the attribute space in the first step, and then the recognition is carried out based on considering the relationship between samples and classes in the attribute space. Due to this intermediate transformation, information loss is unavoidable, thus degrading the performance of the overall system. Rather than following this two-step strategy, in this paper, we propose a novel one-step approach that is able to perform ZSR in the original feature space by using directly trained classifiers. To tackle the problem that no labeled samples of target classes are available, we propose to assign pseudo labels to samples based on the reliability and diversity, which in turn will be used to train the classifiers. Moreover, we adopt a robust SVM that accounts for the unreliability of pseudo labels. Extensive experiments on four datasets demonstrate consistent performance gains of our approach over the state-of-the-art two-step ZSR approaches.

【Keywords】: zero-shot learning, optimization, sample transfer

563. Attributes for Improved Attributes: A Multi-Task Network Utilizing Implicit and Explicit Relationships for Facial Attribute Classification.

Paper Link】 【Pages】:4068-4074

【Authors】: Emily M. Hand ; Rama Chellappa

【Abstract】: Attributes, or mid-level semantic features, have gained popularity in the past few years in domains ranging from activity recognition to face verification. Improving the accuracy of attribute classifiers is an important first step in any application which uses these attributes. In most works to date, attributes have been considered independent of each other. However, attributes can be strongly related, such as heavy makeup and wearing lipstick as well as male and goatee and many others. We propose a multi-task deep convolutional neural network (MCNN) with an auxiliary network at the top (AUX) which takes advantage of attribute relationships for improved classification. We call our final network MCNN-AUX. MCNN-AUX uses attribute relationships in three ways: by sharing the lowest layers for all attributes, by sharing the higher layers for spatially-related attributes, and by feeding the attribute scores from MCNN into the AUX network to find score-level relationships. Using MCNN-AUX rather than individual attribute classifiers, we are able to reduce the number of parameters in the network from 64 million to fewer than 16 million and reduce the training time by a factor of 16. We demonstrate the effectiveness of our method by producing results on two challenging publicly available datasets achieving state-of-the-art performance on many attributes.

【Keywords】: Attributes; Face Recognition; Multi-Task Learning; CNN

564. Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification.

Paper Link】 【Pages】:4075-4081

【Authors】: Xiangteng He ; Yuxin Peng

【Abstract】: Fine-grained image classification is challenging due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Since two different sub-categories is distinguished only by the subtle differences in some specific parts, semantic part localization is crucial for fine-grained image classification. Most previous works improve the accuracy by looking for the semantic parts, but rely heavily upon the use of the object or part annotations of images whose labeling are costly. Recently, some researchers begin to focus on recognizing sub-categories via weakly supervised part detection instead of using the expensive annotations. However, these works ignore the spatial relationship between the object and its parts as well as the interaction of the parts, both of them are helpful to promote part selection. Therefore, this paper proposes a weakly supervised part selection method with spatial constraints for fine-grained image classification, which is free of using any bounding box or part annotations. We first learn a whole-object detector automatically to localize the object through jointly using saliency extraction and co-segmentation. Then two spatial constraints are proposed to select the distinguished parts. The first spatial constraint, called box constraint, defines the relationship between the object and its parts, and aims to ensure that the selected parts are definitely located in the object region, and have the largest overlap with the object region. The second spatial constraint, called parts constraint, defines the relationship of the object's parts, is to reduce the parts' overlap with each other to avoid the information redundancy and ensure the selected parts are the most distinguishing parts from other categories. Combining two spatial constraints promotes parts selection significantly as well as achieves a notable improvement on fine-grained image classification. Experimental results on CUB-200-2011 dataset demonstrate the superiority of our method even compared with those methods using expensive annotations.

【Keywords】: Fine-grained Image Classification;Weakly Supervised Part Selection Model;Spatial Constraints

565. Video Recovery via Learning Variation and Consistency of Images.

Paper Link】 【Pages】:4082-4088

【Authors】: Zhouyuan Huo ; Shangqian Gao ; Weidong Cai ; Heng Huang

【Abstract】: Matrix completion algorithms have been popularly used to recover images with missing entries, and they are proved to be very effective. Recent works utilized tensor completion models in video recovery assuming that all video frames are homogeneous and correlated. However, real videos are made up of different episodes or scenes, i.e. heterogeneous. Therefore, a video recovery model which utilizes both video spatiotemporal consistency and variation is necessary. To solve this problem, we propose a new video recovery method Sectional Trace Norm with Variation and Consistency Constraints (STN-VCC). In our model, capped L1-norm regularization is utilized to learn the spatial-temporal consistency and variation between consecutive frames in video clips. Meanwhile, we introduce a new low-rank model to capture the low-rank structure in video frames with a better approximation of rank minimization than traditional trace norm. An efficient optimization algorithm is proposed, and we also provide a proof of convergence in the paper. We evaluate the proposed method via several video recovery tasks and experiment results show that our new method consistently outperforms other related approaches.

【Keywords】: Video recovery; matrix completion; low-rank model

566. Nonnegative Orthogonal Graph Matching.

Paper Link】 【Pages】:4089-4095

【Authors】: Bo Jiang ; Jin Tang ; Chris H. Q. Ding ; Bin Luo

【Abstract】: Graph matching problem that incorporates pair-wise constraints can be formulated as Quadratic Assignment Problem(QAP). The optimal solution of QAP is discrete and combinational, which makes QAP problem NP-hard. Thus, many algorithms have been proposed to find approximate solutions. In this paper, we propose a new algorithm, called Nonnegative Orthogonal Graph Matching (NOGM), for QAP matching problem. NOGM is motivated by our new observation that the discrete mapping constraint of QAP can be equivalently encoded by a nonnegative orthogonal constraint which is much easier to implement computationally. Based on this observation, we develop an effective multiplicative update algorithm to solve NOGM and thus can find an effective approximate solution for QAP problem. Comparing with many traditional continuous methods which usually obtain continuous solutions and should be further discretized, NOGM can obtain a sparse solution and thus incorporates the desirable discrete constraint naturally in its optimization. Promising experimental results demonstrate benefits of NOGM algorithm.

【Keywords】:

567. Multi-Path Feedback Recurrent Neural Networks for Scene Parsing.

Paper Link】 【Pages】:4096-4102

【Authors】: Xiaojie Jin ; Yunpeng Chen ; Zequn Jie ; Jiashi Feng ; Shuicheng Yan

【Abstract】: In this paper, we consider the scene parsing problem and propose a novel Multi-Path Feedback recurrent neural network (MPF-RNN) for parsing scene images. MPF-RNN can enhance the capability of RNNs in modeling long-range context information at multiple levels and better distinguish pixels that are easy to confuse. Different from feedforward CNNs and RNNs with only single feedback, MPF-RNN propagates the contextual features learned at top layer through multiple weighted recurrent connections to learn bottom features. For better training MPF-RNN, we propose a new strategy that considers accumulative loss at multiple recurrent steps to improve performance of the MPF-RNN on parsing small objects. With these two novel components, MPF-RNN has achieved significant improvement over strong baselines (VGG16 and Res101) on five challenging scene parsing benchmarks, including traditional SiftFlow, Barcelona, CamVid, Stanford Background as well as the recently released large-scale ADE20K.

【Keywords】: scene parsing; semantic segmentation; RNN

568. Detection and Recognition of Text Embedded in Online Images via Neural Context Models.

Paper Link】 【Pages】:4103-4110

【Authors】: Chulmoo Kang ; Gunhee Kim ; Suk I. Yoo

【Abstract】: We address the problem of detecting and recognizing the text embedded in online images that are circulated over the Web. Our idea is to leverage context information for both text detection and recognition. For detection, we use local image context around the text region, based on that the text often sequentially appear in online images. For recognition, we exploit the metadata associated with the input online image, including tags, comments, and title, which are used as a topic prior for the word candidates in the image. To infuse such two sets of context information, we propose a contextual text spotting network (CTSN). We perform comparative evaluation with five state-of-the-art text spotting methods on newly collected Instagram and Flickr datasets. We show that our approach that benefits from context information is more successful for text spotting in online images.

【Keywords】: Text Detection; Text Recognition; Neural Networks; Context Model

569. Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network.

Paper Link】 【Pages】:4111-4117

【Authors】: Suha Kwak ; Seunghoon Hong ; Bohyung Han

【Abstract】: We propose a weakly supervised semantic segmentation algorithm based on deep neural networks, which relies on image-level class labels only. The proposed algorithm alternates between generating segmentation annotations and learning a semantic segmentation network using the generated annotations. A key determinant of success in this framework is the capability to construct reliable initial annotations given image-level labels only. To this end, we propose Superpixel Pooling Network (SPN), which utilizes superpixel segmentation of input image as a pooling layout to reflect low-level image structure for learning and inferring semantic segmentation. The initial annotations generated by SPN are then used to learn another neural network that estimates pixel-wise semantic labels. The architecture of the segmentation network decouples semantic segmentation task into classification and segmentation so that the network learns class-agnostic shape prior from the noisy annotations. It turns out that both networks are critical to improve semantic segmentation accuracy. The proposed algorithm achieves outstanding performance in weakly supervised semantic segmentation task compared to existing techniques on the challenging PASCAL VOC 2012 segmentation benchmark.

【Keywords】: semantic segmentation; deep neural network; weakly supervised learning; superpixel

570. Robust MIL-Based Feature Template Learning for Object Tracking.

Paper Link】 【Pages】:4118-4125

【Authors】: Xiangyuan Lan ; Pong C. Yuen ; Rama Chellappa

【Abstract】: Because of appearance variations, training samples of the tracked targets collected by the online tracker are required for updating the tracking model. However, this often leads to tracking drift problem because of potentially corrupted samples: 1) contaminated/outlier samples resulting from large variations (e.g. occlusion, illumination), and 2) misaligned samples caused by tracking inaccuracy. Therefore, in order to reduce the tracking drift while maintaining the adaptability of a visual tracker, how to alleviate these two issues via an effective model learning (updating) strategy is a key problem to be solved. To address these issues, this paper proposes a novel and optimal model learning (updating) scheme which aims to simultaneously eliminate the negative effects from these two issues mentioned above in a unified robust feature template learning framework. Particularly, the proposed feature template learning framework is capable of: 1) adaptively learning uncontaminated feature templates by separating out contaminated samples, and 2) resolving label ambiguities caused by misaligned samples via a probabilistic multiple instance learning (MIL) model. Experiments on challenging video sequences show that the proposed tracker performs favourably against several state-of-the-art trackers.

【Keywords】: visual tracking

571. Learning Patch-Based Dynamic Graph for Visual Tracking.

Paper Link】 【Pages】:4126-4132

【Authors】: Chenglong Li ; Liang Lin ; Wangmeng Zuo ; Jin Tang

【Abstract】: Existing visual tracking methods usually localize the object with a bounding box, in which the foreground object trackers/detectors are often disturbed by the introduced background information. To handle this problem, we aim to learn a more robust object representation for visual tracking. In particular, the tracked object is represented with a graph structure (i.e., a set of non-overlapping image patches), in which the weight of each node (patch) indicates how likely it belongs to the foreground and edges are also weighed for indicating the appearance compatibility of two neighboring nodes. This graph is dynamically learnt (i.e., the nodes and edges received weights) and applied in object tracking and model updating. We constrain the graph learning from two aspects: i) the global low-rank structure over all nodes and ii) the local sparseness of node neighbors. During the tracking process, our method performs the following steps at each frame. First, the graph is initialized by assigning either 1 or 0 to the weights of some image patches according to the predicted bounding box. Second, the graph is optimized through designing a new ALM (Augmented Lagrange Multiplier) based algorithm. Third, the object feature representation is updated by imposing the weights of patches on the extracted image features. The object location is finally predicted by adopting the Struck tracker. Extensive experiments show that our approach outperforms the state-of-the-art tracking methods on two standard benchmarks, i.e., OTB100 and NUS-PRO.

【Keywords】: video tracking; sparse and low-rank representation

572. Image Caption with Global-Local Attention.

Paper Link】 【Pages】:4133-4139

【Authors】: Linghui Li ; Sheng Tang ; Lixi Deng ; Yongdong Zhang ; Qi Tian

【Abstract】: Image caption is becoming important in the field of artificial intelligence. Most existing methods based on CNN-RNN framework suffer from the problems of object missing and misprediction due to the mere use of global representation at image-level. To address these problems, in this paper, we propose a global-local attention (GLA) method by integrating local representation at object-level with global representation at image-level through attention mechanism. Thus, our proposed method can pay more attention to how to predict the salient objects more precisely with high recall while keeping context information at image-level cocurrently. Therefore, our proposed GLA method can generate more relevant sentences, and achieve the state-of-the-art performance on the well-known Microsoft COCO caption dataset with several popular metrics.

【Keywords】: CNN, RNN, image description

573. Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models.

Paper Link】 【Pages】:4140-4146

【Authors】: Siyi Li ; Dit-Yan Yeung

【Abstract】: Despite recent advances in the visual tracking community, most studies so far have focused on the observation model. As another important component in the tracking system, the motion model is much less well-explored especially for some extreme scenarios. In this paper, we consider one such scenario in which the camera is mounted on an unmanned aerial vehicle (UAV) or drone. We build a benchmark dataset of high diversity, consisting of 70 videos captured by drone cameras. To address the challenging issue of severe camera motion, we devise simple baselines to model the camera motion by geometric transformation based on background feature points. An extensive comparison of recent state-of-the-art trackers and their motion model variants on our drone tracking dataset validates both the necessity of the dataset and the effectiveness of the proposed methods. Our aim for this work is to lay the foundation for further research in the UAV tracking area.

【Keywords】: object tracking; camera motion; dataset

574. A Multiview-Based Parameter Free Framework for Group Detection.

Paper Link】 【Pages】:4147-4153

【Authors】: Xuelong Li ; Mulin Chen ; Feiping Nie ; Qi Wang

【Abstract】: Group detection is fundamentally important for analyzing crowd behaviors, and has attracted plenty of attention in artificial intelligence. However, existing works mostly have limitations due to the insufficient utilization of crowd properties and the arbitrary processing of individuals. In this paper,we propose the Multiview-based Parameter Free (MPF) approach to detect groups in crowd scenes. The main contributions made in this study are threefold: (1) a new structural context descriptor is designed to characterize the structural property of individuals in crowd motions; (2) an self-weighted multiview clustering method is proposed to cluster feature points by incorporating their motion and context similarities;(3) a novel framework is introduced for group detection, which is able to determine the group number automatically without any parameter or threshold to be tuned. Extensive experiments on various real world datasets demonstrate the effectiveness of the proposed approach, and show its superiority against state-of-the-art group detection techniques.

【Keywords】: Crowd Analysis; Multi-View Clustering; Context; Group Detection

575. Weakly-Supervised Deep Nonnegative Low-Rank Model for Social Image Tag Refinement and Assignment.

Paper Link】 【Pages】:4154-4160

【Authors】: Zechao Li ; Jinhui Tang

【Abstract】: It has been well known that the user-provided tags of social images are imperfect, i.e., there exist noisy, irrelevant or incomplete tags. It heavily degrades the performance of many multimedia tasks. To alleviate this problem, we propose a Weakly-supervised Deep Nonnegative Low-rank model (WDNL) to improve the quality of tags by integrating the low-rank model with deep feature learning. A nonnegative low-rank model is introduced to uncover the intrinsic relationships between images and tags by simultaneously removing noisy or irrelevant tags and complementing missing tags. The deep architecture is leveraged to seamlessly connect the visual content and the semantic tag. That is, the proposed model can well handle the scalability by assigning tags to new images. Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of the proposed method compared with some state-of-the-art methods.

【Keywords】: tag refinement; image annotation; deep low-rank model

576. TextBoxes: A Fast Text Detector with a Single Deep Neural Network.

Paper Link】 【Pages】:4161-4167

【Authors】: Minghui Liao ; Baoguang Shi ; Xiang Bai ; Xinggang Wang ; Wenyu Liu

【Abstract】: This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks.

【Keywords】: scene text; convolutional neural network; text localization; word spotting; end to end recognition;

577. An Artificial Agent for Robust Image Registration.

Paper Link】 【Pages】:4168-4175

【Authors】: Rui Liao ; Shun Miao ; Pierre de Tournemire ; Sasa Grbic ; Ali Kamen ; Tommaso Mansi ; Dorin Comaniciu

【Abstract】: 3-D image registration, which involves aligning two or more images, is a critical step in a variety of medical applications from diagnosis to therapy. Image registration is commonly performed by optimizing an image matching metric as a cost function. However this task is challenging due to the non-convex nature of the matching metric over the plausible registration parameter space and insufficient approches for a robust optimization. As a result, current approaches are often customized to a specific problem and sensitive to image quality and artifacts. In this paper, we propose a completely different approach to image registration, inspired by how experts perform the task. We first cast the image registration problem as a "strategic learning" process, where the goal is to find the best sequence of motion actions (e.g. up, down, etc) that yields image alignment. Within this approach, an artificial agent is learned, modeled using deep convolutional neural networks, with 3D raw image data as the input, and the next optimal action as the output. To copy with the dimensionality of the problem, we propose a greedy supervised approach for an end-to-end training, coupled with attention-driven hierarchical strategy. The resulting registration approach inherently encodes both a data-driven matching metric and an optimal registration strategy (policy). We demonstrate on two 3-D/3-D medical image registration examples with drastically different nature of challenges, that the artificial agent outperforms several state-of-the-art registration methods by a large margin in terms of both accuracy and robustness.

【Keywords】: Image Registration; Supervised Learning; Reinforcement Learning; Deep Networks

578. Attention Correctness in Neural Image Captioning.

Paper Link】 【Pages】:4176-4182

【Authors】: Chenxi Liu ; Junhua Mao ; Fei Sha ; Alan L. Yuille

【Abstract】: Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the ``correctness'' of the implicitly-learned attention maps has only been assessed qualitatively by visualization of several examples. In this paper we focus on evaluating and improving the correctness of attention in neural image captioning models. Specifically, we propose a quantitative evaluation metric for the consistency between the generated attention maps and human annotations, using recently released datasets with alignment between regions in images and entities in captions. We then propose novel models with different levels of explicit supervision for learning attention maps during training. The supervision can be strong when alignment between regions and caption entities are available, or weak when only object segments and categories are provided. We show on the popular Flickr30k and COCO datasets that introducing supervision of attention maps during training solidly improves both attention correctness and caption quality, showing the promise of making machine perception more human-like.

【Keywords】:

579. Boosting Complementary Hash Tables for Fast Nearest Neighbor Search.

Paper Link】 【Pages】:4183-4189

【Authors】: Xianglong Liu ; Cheng Deng ; Yadong Mu ; Zhujin Li

【Abstract】: Hashing has been proven a promising technique for fast nearest neighbor search over massive databases. In many practical tasks it usually builds multiple hash tables for a desired level of recall performance. However, existing multi-table hashing methods suffer from the heavy table redundancy, without strong table complementarity and effective hash code learning. To address the problem, this paper proposes a multi-table learning method which pursues a specified number of complementary and informative hash tables from a perspective of ensemble learning. By regarding each hash table as a neighbor prediction model, the multi-table search procedure boils down to a linear assembly of predictions stemming from multiple tables. Therefore, a sequential updating and learning framework is naturally established in a boosting mechanism, theoretically guaranteeing the table complementarity and algorithmic convergence. Furthermore, each boosting round pursues the discriminative hash functions for each table by a discrete optimization in the binary code space. Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.

【Keywords】: Nearest Neighbor Search; Boosting; Multiple Indexing; Complementary Hash Tables; Locality Sensitive Hashing

580. Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition.

Paper Link】 【Pages】:4190-4196

【Authors】: Xiao Liu ; Jiang Wang ; Shilei Wen ; Errui Ding ; Yuanqing Lin

【Abstract】: A key challenge in fine-grained recognition is how to find and represent discriminative local regions.Recent attention models are capable of learning discriminative region localizers only from category labels with reinforcement learning. However, not utilizing any explicit part information, they are not able to accurately find multiple distinctive regions.In this work, we introduce an attribute-guided attention localization scheme where the local region localizers are learned under the guidance of part attribute descriptions.By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm. The attribute labeling requirement of the scheme is more amenable than the accurate part location annotation required by traditional part-based fine-grained recognition methods.Experimental results on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme on both fine-grained recognition and attribute recognition.

【Keywords】: attention, reinforcement learning, fine-grained

581. Video Captioning with Listwise Supervision.

Paper Link】 【Pages】:4197-4203

【Authors】: Yuan Liu ; Xue Li ; Zhongchao Shi

【Abstract】: Automatically describing video content with natural language is a fundamental challenging that has received increasing attention. However, existing techniques restrict the model learning on the pairs of each video and its own sentences, and thus fail to capture more holistically semantic relationships among all sentences. In this paper, we propose to model relative relationships of different video-sentence pairs and present a novel framework, named Long Short-Term Memory with Listwise Supervision (LSTM-LS), for video captioning. Given each video in training data, we obtain a ranking list of sentences w.r.t. a given sentence associated with the video using nearest-neighbor search. The ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking list. The video captioning problem is then solved by learning LSTM model for sentence generation, through maximizing the ranking quality over all the sentences in the list. The experiments on MSVD dataset show that our proposed LSTM-LS produces better performance than the state of the art in generating natural sentences: 51.1% and 32.6% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on the movie description M-VAD dataset.

【Keywords】: Video Captioning; Recurrent Neural Networks; Deep Convolutional Neural Networks

582. Closing the Loop for Edge Detection and Object Proposals.

Paper Link】 【Pages】:4204-4210

【Authors】: Yao Lu ; Linda G. Shapiro

【Abstract】: Edge grouping and object perception are unified procedures in perceptual organization. However the computer vision literature classifies them as independent tasks. In this paper, we argue that edge detection and object proposals should benefit one another. To achieve this, we go beyond bounding boxes and extract closed contours that represent potential objects within. A novel objectness metric is proposed to score and rank the proposal boxes by considering the sizes and edge intensities of the closed contours. To improve the edge detector given the top-down object proposals, we group local closed contours and construct global object hierarchies and segmentations. The edge detector is retrained and enhanced using these hierarchical segmentations as additional feature channels. In the experiments we show that by closing the loop for edge detection and object proposals, we observe improvements for both tasks. Unifying edges and object proposals is valid and useful.

【Keywords】:

583. Learning Discriminative Activated Simplices for Action Recognition.

Paper Link】 【Pages】:4211-4217

【Authors】: Chenxu Luo ; Chang Ma ; Chun-yu Wang ; Yizhou Wang

【Abstract】: We address the task of action recognition from a sequence of 3D human poses. This is a challenging task firstly because the poses of the same class could have large intra-class variations either caused by inaccurate 3D pose estimation or various performing styles. Also different actions, e.g., walking vs. jogging, may share similar poses which makes the representation not discriminative to differentiate the actions. To solve the problems, we propose a novel representation for 3D poses by a mixture of Discriminative Activated Simplices (DAS). Each DAS consists of a few bases and represent pose data by their convex combinations. The discriminative power of DAS is firstly realized by learning discriminative bases across classes with a block diagonal constraint enforced on the basis coefficient matrix. Secondly, the DAS provides tight characterization of the pose manifolds thus reducing the chance of generating overlapped DAS between similar classes. We justify the power of the model on benchmark datasets and witness consistent performance improvements.

【Keywords】: activated simplices; discriminative dictionary learning

584. Non-Rigid Point Set Registration with Robust Transformation Estimation under Manifold Regularization.

Paper Link】 【Pages】:4218-4224

【Authors】: Jiayi Ma ; Ji Zhao ; Junjun Jiang ; Huabing Zhou

【Abstract】: In this paper, we propose a robust transformation estimation method based on manifold regularization for non-rigid point set registration. The method iteratively recovers the point correspondence and estimates the spatial transformation between two point sets. The correspondence is established based on existing local feature descriptors which typically results in a number of outliers. To achieve an accurate estimate of the transformation from such putative point correspondence, we formulate the registration problem by a mixture model with a set of latent variables introduced to identify outliers, and a prior involving manifold regularization is imposed on the transformation to capture the underlying intrinsic geometry of the input data. The non-rigid transformation is specified in a reproducing kernel Hilbert space and a sparse approximation is adopted to achieve a fast implementation. Extensive experiments on both 2D and 3D data demonstrate that our method can yield superior results compared to other state-of-the-arts, especially in case of badly degraded data.

【Keywords】: point set registration; manifold regularization; non-rigid

585. Online Multi-Target Tracking Using Recurrent Neural Networks.

Paper Link】 【Pages】:4225-4232

【Authors】: Anton Milan ; Seyed Hamid Rezatofighi ; Anthony R. Dick ; Ian D. Reid ; Konrad Schindler

【Abstract】: We present a novel approach to online multi-target tracking based on recurrent neural networks (RNNs). Tracking multiple objects in real-world scenes involves many challenges, including a) an a-priori unknown and time-varying number of targets, b) a continuous state estimation of all present targets, and c) a discrete combinatorial problem of data association. Most previous methods involve complex models that require tedious tuning of parameters. Here, we propose for the first time, an end-to-end learning approach for online multi-target tracking. Existing deep learning methods are not designed for the above challenges and cannot be trivially applied to the task. Our solution addresses all of the above points in a principled way. Experiments on both synthetic and real data show promising results obtained at ~300 Hz on a standard CPU, and pave the way towards future research in this direction.

【Keywords】: Multi-target tracking; recurrent neural networks; long short-term memory; data association

586. Text-Guided Attention Model for Image Captioning.

Paper Link】 【Pages】:4233-4239

【Authors】: Jonghwan Mun ; Minsu Cho ; Bohyung Han

【Abstract】: Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics.

【Keywords】: Image Captioning; Attention Model

587. Fully Convolutional Neural Networks with Full-Scale-Features for Semantic Segmentation.

Paper Link】 【Pages】:4240-4246

【Authors】: Tianxiang Pan ; Bin Wang ; Guiguang Ding ; Jun-Hai Yong

【Abstract】: In this work, we propose a novel method to involve full-scale-features into the fully convolutional neural networks (FCNs) for Semantic Segmentation. Current works on FCN has brought great advances in the task of semantic segmentation, but the receptive field, which represents region areas of input volume connected to any output neuron, limits the available information of output neuron's prediction accuracy. We investigate how to involve the full-scale or full-image features into FCNs to enrich the receptive field. Specially, the full-scale feature network (FFN) extends the full-connected network and makes an end-to-end unified training structure. It has two appealing properties. First, the introduction of full-scale-features is beneficial for prediction. We build a unified extracting network and explore several fusion functions for concatenating features. Amounts of experiments have been carried out to prove that full-scale-features makes fair accuracy raising. Second, FFN is applicable to many variants of FCN which could be regarded as a general strategy to improve the segmentation accuracy. Our proposed method is evaluated on PASCAL VOC 2012, and achieves a state-of-art result.

【Keywords】: Fully convolution network; receptive field; full-scale-features

588. Title Learning Latent Subevents in Activity Videos Using Temporal Attention Filters.

Paper Link】 【Pages】:4247-4254

【Authors】: A. J. Piergiovanni ; Chenyou Fan ; Michael S. Ryoo

【Abstract】: In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos. Many high-level activities are often composed of multiple temporal parts (e.g., sub-events) with different duration/speed, and our objective is to make the model explicitly learn such temporal structure using multiple attention filters and benefit from them. Our temporal filters are designed to be fully differentiable, allowing end-of-end training of the temporal filters together with the underlying frame-based or segment-based convolutional neural network architectures. This paper presents an approach of learning a set of optimal static temporal attention filters to be shared across different videos, and extends this approach to dynamically adjust attention filters per testing video using recurrent long short-term memory networks (LSTMs). This allows our temporal attention filters to learn latent sub-events specific to each activity. We experimentally confirm that the proposed concept of temporal attention filters benefits the activity recognition, and we visualize the learned latent sub-events.

【Keywords】:

589. Privacy-Preserving Human Activity Recognition from Extreme Low Resolution.

Paper Link】 【Pages】:4255-4262

【Authors】: Michael S. Ryoo ; Brandon Rothrock ; Charles Fleming ; Hyun Jong Yang

【Abstract】: Privacy protection from surreptitious video recordings is an important societal challenge. We desire a computer vision system (e.g., a robot) that can recognize human activities and assist our daily life, yet ensure that it is not recording video that may invade our privacy. This paper presents a fundamental approach to address such contradicting objectives: human activity recognition while only using extreme low-resolution (e.g., 16x12) anonymized videos. We introduce the paradigm of inverse super resolution (ISR), the concept of learning the optimal set of image transformations to generate multiple low-resolution (LR) training videos from a single video. Our ISR learns different types of sub-pixel transformations optimized for the activity classification, allowing the classifier to best take advantage of existing high-resolution videos (e.g., YouTube videos) by creating multiple LR training videos tailored for the problem. We experimentally confirm that the paradigm of inverse super resolution is able to benefit activity recognition from extreme low-resolution videos.

【Keywords】:

590. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data.

Paper Link】 【Pages】:4263-4270

【Authors】: Sijie Song ; Cuiling Lan ; Junliang Xing ; Wenjun Zeng ; Jiaying Liu

【Abstract】: Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In this work, we propose an end-to-end spatial and temporal attention model for human action recognition from skeleton data. We build our model on top of the Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM), which learns to selectively focus on discriminative joints of skeleton within each frame of the inputs and pays different levels of attention to the outputs of different frames. Furthermore, to ensure effective training of the network, we propose a regularized cross-entropy loss to drive the model learning process and develop a joint training strategy accordingly. Experimental results demonstrate the effectiveness of the proposed model, both on the small human action recognition dataset of SBU and the currently largest NTU dataset.

【Keywords】: action recognition; LSTM; attention model

591. Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs.

Paper Link】 【Pages】:4271-4277

【Authors】: Xinhang Song ; Luis Herranz ; Shuqiang Jiang

【Abstract】: Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data.

【Keywords】: RGB-D scene recognition; weakly supervised; fine tune; CNN

592. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.

Paper Link】 【Pages】:4278-4284

【Authors】: Christian Szegedy ; Sergey Ioffe ; Vincent Vanhoucke ; Alexander A. Alemi

【Abstract】: Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question: Are there any benefits to combining Inception architectures with residual connections? Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4 networks, we achieve 3.08% top-5 error on the test set of the ImageNet classification (CLS) challenge.

【Keywords】: deep learning; convolutional neural network; vision; ILSVRC; image classification; Inception

593. Image Cosegmentation via Saliency-Guided Constrained Clustering with Cosine Similarity.

Paper Link】 【Pages】:4285-4291

【Authors】: Zhiqiang Tao ; Hongfu Liu ; Huazhu Fu ; Yun Fu

【Abstract】: Cosegmentation jointly segments the common objects from multiple images. In this paper, a novel clustering algorithm, called Saliency-Guided Constrained Clustering approach with Cosine similarity (SGC3), is proposed for the image cosegmentation task, where the common foregrounds are extracted via a one-step clustering process. In our method, the unsupervised saliency prior is utilized as a partition-level side information to guide the clustering process. To guarantee the robustness to noise and outlier in the given prior, the similarities of instance-level and partition-level are jointly computed for cosegmentation. Specifically, we employ cosine distance to calculate the feature similarity between data point and its cluster centroid, and introduce a cosine utility function to measure the similarity between clustering result and the side information. These two parts are both based on the cosine similarity, which is able to capture the intrinsic structure of data, especially for the non-spherical cluster structure. Finally, a K-means-like optimization is designed to solve our objective function in an efficient way. Experimental results on two widely-used datasets demonstrate our approach achieves competitive performance over the state-of-the-art cosegmentation methods.

【Keywords】: Cosegmentation; Constrained Clustering; Saliency

594. Quantifying and Detecting Collective Motion by Manifold Learning.

Paper Link】 【Pages】:4292-4298

【Authors】: Qi Wang ; Mulin Chen ; Xuelong Li

【Abstract】: The analysis of collective motion has attracted many researchers in artificial intelligence. Though plenty of works have been done on this topic, the achieved performance isstill unsatisfying due to the complex nature of collective motions. By investigating the similarity of individuals, this paper proposes a novel framework for both quantifying and detecting collective motions. Our main contributions are threefold: (1) the time-varying dynamics of individuals are deeply investigated to better characterize the individual motion; (2) a structure-based collectiveness measurement is designed toprecisely quantify both individual-level and scene-level properties of collective motions; (3) a multi-stage clustering strategy is presented to discover a more comprehensive understanding of the crowd scenes, containing both local and global collective motions. Extensive experimental results on realworld data sets show that our method is capable of handling crowd scenes with complicated structures and various dynamics, and demonstrate its superior performance against state-of-the-art competitors.

【Keywords】: Crowd Analysis; Manifold Learning; Group Detection

595. Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing.

Paper Link】 【Pages】:4299-4305

【Authors】: Yuanlu Xu ; Xiaobai Liu ; Lei Qin ; Song-Chun Zhu

【Abstract】: In this paper, we propose a Spatio-temporal Attributed Parse Graph (ST-APG) to integrate semantic attributes with trajectories for cross-view people tracking. Given videos from multiple cameras with overlapping field of view (FOV), our goal is to parse the videos and organize the trajectories of all targets into a scene-centered representation. We leverage rich semantic attributes of human, e.g., facing directions, postures and actions, to enhance cross-view tracklet associations, besides frequently used appearance and geometry features in the literature.In particular, the facing direction of a human in 3D, once detected, often coincides with his/her moving direction or trajectory. Similarly, the actions of humans, once recognized, provide strong cues for distinguishing one subject from the others. The inference is solved by iteratively grouping tracklets with cluster sampling and estimating people semantic attributes by dynamic programming.In experiments, we validate our method on one public dataset and create another new dataset that records people's daily life in public, e.g., food court, office reception and plaza, each of which includes 3-4 cameras. We evaluate the proposed method on these challenging videos and achieve promising multi-view tracking results.

【Keywords】: multi-view tracking; human behavior analysis; joint inference; cluster sampling

596. Unsupervised Learning of Multi-Level Descriptors for Person Re-Identification.

Paper Link】 【Pages】:4306-4312

【Authors】: Yang Yang ; Longyin Wen ; Siwei Lyu ; Stan Z. Li

【Abstract】: In this paper, we propose a novel coding method named weighted linear coding (WLC) to learn multi-level (e.g., pixel-level, patch-level and image-level) descriptors from raw pixel data in an unsupervised manner. It guarantees the property of saliency with a similarity constraint. The resulting multi-level descriptors have a good balance between the robustness and distinctiveness. Based on WLC, all data from the same region can be jointly encoded. Consequently, when we extract the holistic image features, it is able to preserve the spatial consistency. Furthermore, we apply PCA to these features and compact person representations are then achieved. During the stage of matching persons, we exploit the complementary information resided in multi-level descriptors via a score-level fusion strategy. Experiments on the challenging person re-identification datasets - VIPeR and CUHK 01, demonstrate the effectiveness of our method.

【Keywords】: unsupervised learning;multi-level descriptors;person re-identification;

597. Leveraging Saccades to Learn Smooth Pursuit: A Self-Organizing Motion Tracking Model Using Restricted Boltzmann Machines.

Paper Link】 【Pages】:4313-4319

【Authors】: Arjun Yogeswaran ; Pierre Payeur

【Abstract】: In this paper, we propose a biologically-plausible model to explain the emergence of motion tracking behaviour in early development using unsupervised learning. The model's training is biased by a concept called retinal constancy, which measures how similar visual contents are between successive frames. This biasing is similar to a reward in reinforcement learning, but is less explicit, as it modulates the model's learning rate instead of being a learning signal itself. The model is a two-layer deep network. The first layer learns to encode visual motion, and the second layer learns to relate that motion to gaze movements, which it perceives and creates through bi-directional nodes. By randomly generating gaze movements to traverse the local visual space, desirable correlations are developed between visual motion and the appropriate gaze to nullify that motion such that maximal retinal constancy is achieved. Biologically, this is similar to using saccades to look around and learning from moments where a target and the saccade move together such that the image stays the same on the retina, and developing smooth pursuit behaviour to perform this action in the future. Restricted Boltzmann machines are used to implement this model because they can form a deep belief network, perform online learning, and act generatively. These properties all have biological equivalents and coincide with the biological plausibility of using saccades as leverage to learn smooth pursuit. This method is unique because it uses general machine learning algorithms, and their inherent generative properties, to learn from real-world data. It also implements a biological theory, uses motion instead of recognition via local searches, without temporal filtering, and learns in a fully unsupervised manner. Its tracking performance after being trained on real-world images with simulated motion is compared to its tracking performance after being trained on natural video. Results show that this model is able to successfully follow targets in natural video, despite partial occlusions, scale changes, and nonlinear motion.

【Keywords】: computer vision; active learning; smooth pursuit; motion tracking; self-organization; unsupervised learning; machine learning; biologically-inspired; reinforcement learning

Paper Link】 【Pages】:4320-4326

【Authors】: Tan Yu ; Yuwei Wu ; Sreyasee Das Bhattacharjee ; Junsong Yuan

【Abstract】: Recently, global features aggregated from local convolutional features of the convolutional neural network have shown to be much more effective in comparison with hand-crafted features for image retrieval. However, the global feature might not effectively capture the relevance between the query object and reference images in the object instance search task, especially when the query object is relatively small and there exist multiple types of objects in reference images. Moreover, the object instance search requires to localize the object in the reference image, which may not be achieved through global representations. In this paper, we propose a Fuzzy Objects Matching (FOM) framework to effectively and efficiently capture the relevance between the query object and reference images in the dataset. In the proposed FOM scheme, object proposals are utilized to detect the potential regions of the query object in reference images. To achieve high search efficiency, we factorize the feature matrix of all the object proposals from one reference image into the product of a set of fuzzy objects and sparse codes. In addition, we refine the feature of the generated fuzzy objects according to its neighborhood in the feature space to generate more robust representation. The experimental results demonstrate that the proposed FOM framework significantly outperforms the state-of-the-art methods in precision with less memory and computational cost on three public datasets.

【Keywords】:

599. Face Hallucination with Tiny Unaligned Images by Transformative Discriminative Neural Networks.

Paper Link】 【Pages】:4327-4333

【Authors】: Xin Yu ; Fatih Porikli

【Abstract】: Conventional face hallucination methods rely heavily on accurate alignment of low-resolution (LR) faces before upsampling them. Misalignment often leads to deficient results and unnatural artifacts for large upscaling factors. However, due to the diverse range of poses and different facial expressions, aligning an LR input image, in particular when it is tiny, is severely difficult. To overcome this challenge, here we present an end-to-end transformative discriminative neural network (TDN) devised for super-resolving unaligned and very small face images with an extreme upscaling factor of 8. Our method employs an upsampling network where we embed spatial transformation layers to allow local receptive fields to line-up with similar spatial supports. Furthermore, we incorporate a class-specific loss in our objective through a successive discriminative network to improve the alignment and upsampling performance with semantic information. Extensive experiments on large face datasets show that the proposed method significantly outperforms the state-of-the-art.

【Keywords】: face hallucination; transformative discriminative network; super-resolution

600. Leveraging Video Descriptions to Learn Video Question Answering.

Paper Link】 【Pages】:4334-4340

【Authors】: Kuo-Hao Zeng ; Tseng-Hung Chen ; Ching-Yao Chuang ; Yuan-Hong Liao ; Juan Carlos Niebles ; Min Sun

【Abstract】: We propose a scalable approach to learn video-based question answering (QA): to answer a free-form natural language question about the contents of a video. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended from MN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), and SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.

【Keywords】: Question Answering, Language and Vision, Deep Learning/Neural Networks

601. Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image.

Paper Link】 【Pages】:4341-4348

【Authors】: Xiaoke Zhu ; Xiao-Yuan Jing ; Fei Wu ; Yunhong Wang ; Wangmeng Zuo ; Wei-Shi Zheng

【Abstract】: Person re-identification (re-id) plays an important role in video surveillance and forensics applications. In many cases, person re-id needs to be conducted between image and video clip, e.g., re-identifying a suspect from large quantities of pedestrian videos given a single image of him. We call re-id in this scenario as image to video person re-id (IVPR). In practice, image and video are usually represented with different features, and there usually exist large variations between frames within each video. These factors make matching between image and video become a very challenging task. In this paper, we propose a joint feature projection matrix and heterogeneous dictionary pair learning (PHDL) approach for IVPR. Specifically, PHDL jointly learns an intra-video projection matrix and a pair of heterogeneous image and video dictionaries. With the learned projection matrix, the influence of variations within each video to the matching can be reduced. With the learned dictionary pair, the heterogeneous image and video features can be transformed into coding coefficients with the same dimension, such that the matching can be conducted using coding coefficients. Furthermore, to ensure that the obtained coding coefficients have favorable discriminability, PHDL designs a point-to-set coefficient discriminant term. Experiments on the public iLIDS-VID and PRID 2011 datasets demonstrate the effectiveness of the proposed approach.

【Keywords】: Image to video person re-identification; Feature projection matrix and heterogeneous dictionary pair learning (PHDL); Point-to-set coefficient discriminant term

Special Track on Cognitive Systems 16

602. Natural Language Acquisition and Grounding for Embodied Robotic Systems.

Paper Link】 【Pages】:4349-4356

【Authors】: Muhannad Al-Omari ; Paul Duckworth ; David C. Hogg ; Anthony G. Cohn

【Abstract】: We present a cognitively plausible novel framework capable of learning the grounding in visual semantics and the grammar of natural language commands given to a robot in a table top environment. The input to the system consists of video clips of a manually controlled robot arm, paired with natural language commands describing the action. No prior knowledge is assumed about the meaning of words, or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). The learning process automatically clusters the continuous perceptual spaces into concepts corresponding to linguistic input. A novel relational graph representation is used to build connections between language and vision. As well as the grounding of language to perception, the system also induces a set of probabilistic grammar rules. The knowledge learned is used to parse new commands involving previously unseen objects.

【Keywords】: cognitive robotics; language and vision; bootstrap problem

603. Analogical Chaining with Natural Language Instruction for Commonsense Reasoning.

Paper Link】 【Pages】:4357-4363

【Authors】: Joseph A. Blass ; Kenneth D. Forbus

【Abstract】: Understanding commonsense reasoning is one of the core challenges of AI. We are exploring an approach inspired by cognitive science, called analogical chaining, to create cognitive systems that can perform commonsense reasoning. Just as rules are chained in deductive systems, multiple analogies build upon each other’s inferences in analogical chaining. The cases used in analogical chaining – called common sense units – are small, to provide inferential focus and broader transfer. Importantly, such common sense units can be learned via natural language instruction, thereby increasing the ease of extending such systems. This paper describes analogical chaining, natural language instruction via microstories, and some subtleties that arise in controlling reasoning. The utility of this technique is demonstrated by performance of an implemented system on problems from the Choice of Plausible Alternatives test of commonsense causal reasoning.

【Keywords】: Analogical Reasoning; Commonsense Reasoning; Natural Language Understanding

604. Inductive Reasoning about Ontologies Using Conceptual Spaces.

Paper Link】 【Pages】:4364-4370

【Authors】: Zied Bouraoui ; Shoaib Jameel ; Steven Schockaert

【Abstract】: Structured knowledge about concepts plays an increasingly important role in areas such as information retrieval. The available ontologies and knowledge graphs that encode such conceptual knowledge, however, are inevitably incomplete. This observation has led to a number of methods that aim to automatically complete existing knowledge bases. Unfortunately, most existing approaches rely on black box models, e.g. formulated as global optimization problems, which makes it difficult to support the underlying reasoning process with intuitive explanations. In this paper, we propose a new method for knowledge base completion, which uses interpretable conceptual space representations and an explicit model for inductive inference that is closer to human forms of commonsense reasoning. Moreover, by separating the task of representation learning from inductive reasoning, our method is easier to apply in a wider variety of contexts. Finally, unlike optimization based approaches, our method can naturally be applied in settings where various logical constraints between the extensions of concepts need to be taken into account.

【Keywords】: Inductive reasoning; Conceptual spaces; Ontologies; Description logics; Entity Embeddings

605. Integrating the Cognitive with the Physical: Musical Path Planning for an Improvising Robot.

Paper Link】 【Pages】:4371-4377

【Authors】: Mason Bretan ; Gil Weinberg

【Abstract】: Embodied cognition is a theory stating that the processes and functions comprising the human mind are influenced by a person's physical body. Embodied musical cognition is a theory of the musical mind stating that the person's body largely influences his or her musical experiences and actions (such as performing, learning, or listening to music). In this work, a proof of concept demonstrating the utility of an embodied musical cognition for robotic musicianship is described. Though alternative theories attempting to explain human musical cognition exist (such as cognitivism and connectionism), this work contends that the integration of physical constraints and musical knowledge is vital for a robot in order to optimize note generating decisions based on limitations of sound generating motion and enable more engaging performance through increased coherence between the generated music and sound accompanying motion. Moreover, such a system allows for efficient and autonomous exploration of the relationship between music and physicality and the resulting music that is contingent on such a connection.

【Keywords】: embodied cognition; music; robotics; path planning

606. Imagined Visual Representations as Multimodal Embeddings.

Paper Link】 【Pages】:4378-4384

【Authors】: Guillem Collell ; Ted Zhang ; Marie-Francine Moens

【Abstract】: Language and vision provide complementary information. Integrating both modalities in a single multimodal representation is an unsolved problem with wide-reaching applications to both natural language processing and computer vision. In this paper, we present a simple and effective method that learns a language-to-vision mapping and uses its output visual predictions to build multimodal representations. In this sense, our method provides a cognitively plausible way of building representations, consistent with the inherently re-constructive and associative nature of human memory. Using seven benchmark concept similarity tests we show that the mapped (or imagined) vectors not only help to fuse multimodal information, but also outperform strong unimodal baselines and state-of-the-art multimodal methods, thus exhibiting more human-like judgments. Ultimately, the present work sheds light on fundamental questions of natural language understanding concerning the fusion of vision and language such as the plausibility of more associative and re-constructive approaches.

【Keywords】: multimodal representations, representation learning, semantic similarity, semantic relatedness, visual similarity

607. Goal Operations for Cognitive Systems.

Paper Link】 【Pages】:4385-4391

【Authors】: Michael T. Cox ; Dustin Dannenhauer ; Sravya Kondrakunta

【Abstract】: Cognitive agents operating in complex and dynamic domains benefit from significant goal management. Operations on goals include formulation, selection, change, monitoring and delegation in addition to goal achievement. Here we model these operations as transformations on goals. An agent may observe events that affect the agent’s ability to achieve its goals. Hence goal transformations allow unachievable goals to be converted into similar achievable goals. This paper examines an implementation of goal change within a cognitive architecture. We introduce goal transformation at the metacognitive level as well as goal transformation in an automated planner and discuss the costs and benefits of each approach. We evaluate goal change in the MIDCA architecture using a resource-restricted planning domain, demonstrating a performance benefit due to goal operations.

【Keywords】: goal reasoning; goal change; hierarchical task network planning; cognitive architecture

608. Combining Logical Abduction and Statistical Induction: Discovering Written Primitives with Human Knowledge.

Paper Link】 【Pages】:4392-4398

【Authors】: Wang-Zhou Dai ; Zhi-Hua Zhou

【Abstract】: In many real tasks there are human knowledge expressed in logic formulae as well as data samples described by raw features (e.g., pixels, strings). It is popular to apply SRL or PILPtechniques to exploit human knowledge through learning of symbolic data, or statistical learning techniques to learn from the raw data samples; however, it is often desired to directly exploit these logic formulae on raw data processing, like human beings utilizing knowledge to guide perception. In this paper, we propose an approach, LASIN, which combines Logical Abduction and Statistical Induction. The LASIN approach generates candidate hypotheses based on the abduction of first-order formulae, and then, the hypotheses are exploited as constraints for statistical induction. We apply theLASIN approach to the learning of representation of written primitives, where a primitive is a basic component in human writing. Our results show that the discovered primitives are reasonable for human perception, and these primitives, if used in learning tasks such as classification and domain adaptation, lead to better performances than simply applying feature learning based on raw data only.

【Keywords】: logic abduction; statistical induction; knowledge; human learning

609. Reactive Versus Anticipative Decision Making in a Novel Gift-Giving Game.

Paper Link】 【Pages】:4399-4405

【Authors】: Elias Fernández Domingos ; Juan-Carlos Burguillo ; Tom Lenaerts

【Abstract】: Evolutionary game theory focuses on the fitness differences between simple discrete or probabilistic strategies to explain the evolution of particular decision-making behavior within strategic situations. Although this approach has provided substantial insights into the presence of fairness or generosity in gift-giving games, it does not fully resolve the question of which cognitive mechanisms are required to produce the choices observed in experiments. One such mechanism that humans have acquired, is the capacity to anticipate. Prior work showed that forward-looking behavior, using a recurrent neural network to model the cognitive mechanism, are essential to produce the actions of human participants in behavioral experiments. In this paper, we evaluate whether this conclusion extends also to gift-giving games, more concretely, to a game that combines the dictator game with a partner selection process. The recurrent neural network model used here for dictators, allows them to reason about a best response to past actions of the receivers (reactive model) or to decide which action will lead to a more successful outcome in the future (anticipatory model). We show for both models the decision dynamics while training, as well as the average behavior. We find that the anticipatory model is the only one capable of accounting for changes in the context of the game, a behavior also observed in experiments, expanding previous conclusions to this more sophisticated game.

【Keywords】: Anticipation; Dictator game; partner selection; reputation; recurrent neural networks

610. Towards Continuous Scientific Data Analysis and Hypothesis Evolution.

Paper Link】 【Pages】:4406-4414

【Authors】: Yolanda Gil ; Daniel Garijo ; Varun Ratnakar ; Rajiv Mayani ; Ravali Adusumilli ; Hunter Boyce ; Arunima Srivastava ; Parag Mallick

【Abstract】: Scientific data is continuously generated throughout the world. However, analyses of these data are typically performed exactly once and on a small fragment of recently generated data. Ideally, data analysis would be a continuous process that uses all the data available at the time, and would be automatically re-run and updated when new data appears. We present a framework for automated discovery from data repositories that tests user-provided hypotheses using expert-grade data analysis strategies, and reassesses hypotheses when more data becomes available. Novel contributions of this approach include a framework to trigger new analyses appropriate for the available data through lines of inquiry that support progressive hypothesis evolution, and a representation of hypothesis revisions with provenance records that can be used to inspect the results. We implemented our approach in the DISK framework, and evaluated it using two scenarios from cancer multi-omics: 1) data for new patients becomes available over time, 2) new types of data for the same patients are released. We show that in all scenarios DISK updates the confidence on the original hypotheses as it automatically analyzes new data.

【Keywords】: automated discovery; hypothesis testing; scientific workflows; hypothesis evolution; provenance

611. Flexible Model Induction through Heuristic Process Discovery.

Paper Link】 【Pages】:4415-4421

【Authors】: Pat Langley ; Adam Arvay

【Abstract】: Inductive process modeling involves the construction of explanatory accounts for multivariate time series. As typically specified, background knowledge is available in the form of generic processes that serve as the building blocks for candidate model structures. In this paper, we present a more flexible approach that, when available processes are insufficient to construct an acceptable model, automatically produces new generic processes that let it complete the task. We describe FPM, a system that implements this idea by composing knowledge about algebraic rate expressions and about conceptual processes like predation and remineralization in ecology. We demonstrate empirically FPM's ability to construct new generic processes when necessary and to transfer them later to new modeling tasks. We also compare its failure-driven approach with a naive scheme that generates all possible processes at the outset. We conclude by discussing prior work on equation discovery and model construction, along with plans for additional research.

【Keywords】: Scientific discovery; Process models; Explanation; Induction

612. When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?

Paper Link】 【Pages】:4422-4428

【Authors】: Smitha Milli ; Falk Lieder ; Thomas L. Griffiths

【Abstract】: While optimal metareasoning is notoriously intractable, humans are nonetheless able to adaptively allocate their computational resources. A possible approximation that humans may use to do this is to only metareason over a finite set of cognitive systems that perform variable amounts of computation. The highly influential "dual-process" accounts of human cognition, which postulate the coexistence of a slow accurate system with a fast error-prone system, can be seen as a special case of this approximation. This raises two questions: how many cognitive systems should a bounded optimal agent be equipped with and what characteristics should those systems have? We investigate these questions in two settings: a one-shot decision between two alternatives, and planning under uncertainty in a Markov decision process. We find that the optimal number of systems depends on the variability of the environment and the costliness of metareasoning. Consistent with dual-process theories, we also find that when having two systems is optimal, then the first system is fast but error-prone and the second system is slow but accurate.

【Keywords】: bounded-optimality; metareasoning; cognitive systems

613. Scanpath Complexity: Modeling Reading Effort Using Gaze Information.

Paper Link】 【Pages】:4429-4436

【Authors】: Abhijit Mishra ; Diptesh Kanojia ; Seema Nagar ; Kuntal Dey ; Pushpak Bhattacharyya

【Abstract】: Measuring reading effort is useful for practical purposes such as designing learning material and personalizing text comprehension environment. We propose a quantification of reading effort by measuring the complexity of eye-movement patterns of readers. We call the measure Scanpath Complexity. Scanpath complexity is modeled as a function of various properties of gaze fixations and saccades- the basic parameters of eye movement behavior. We demonstrate the effectiveness of our scanpath complexity measure by showing that its correlation with different measures of lexical and syntactic complexity as well as standard readability metrics is better than popular baseline measures based on fixation alone.

【Keywords】: Cognitive Load and Eye-tracking;Gaze and Cognitive Load;Reading Effort;Scanpath Complexity

614. Identifying Useful Inference Paths in Large Commonsense Knowledge Bases by Retrograde Analysis.

Paper Link】 【Pages】:4437-4443

【Authors】: Abhishek Sharma ; Keith M. Goolsbey

【Abstract】: Commonsense reasoning at scale is a critical problem for modern cognitive systems. Large theories have millions of axioms, but only a handful are relevant for answering a given goal query. Irrelevant axioms increase the search space, overwhelming unoptimized inference engines in large theories. Therefore, methods that help in identifying useful inference paths are an essential part of large cognitive systems. In this paper, we use retrograde analysis to build a database of proof paths that lead to at least one successful proof. This database helps the inference engine identify more productive parts of the search space. A heuristic based on this approach is used to order nodes during a search. We study the efficacy of this approach on hundreds of queries from the Cyc KB. Empirical results show that this approach leads to significant reduction in inference time.

【Keywords】: commonsense reasoning; search control heuristics; efficient deductive reasoning

615. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge.

Paper Link】 【Pages】:4444-4451

【Authors】: Robert Speer ; Joshua Chin ; Catherine Havasi

【Abstract】: Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected from many sources that include expert-created resources, crowd-sourcing, and games with a purpose. It is designed to represent the general knowledge involved in understanding language, improving natural language applications by allowing the application to better understand the meanings behind the words people use. When ConceptNet is combined with word embeddings acquired from distributional semantics (such as word2vec), it provides applications with understanding that they would not acquire from distributional semantics alone, nor from narrower resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.

【Keywords】: ConceptNet; knowledge graph; word embeddings

616. Towards a Brain Inspired Model of Self-Awareness for Sociable Agents.

Paper Link】 【Pages】:4452-4458

【Authors】: Budhitama Subagdja ; Ah-Hwee Tan

【Abstract】: Self-awareness is a crucial feature for a sociable agent or robot to better interact with humans. In a futuristic scenario, a conversational agent may occasionally be asked for its own opinion or suggestion based on its own thought, feelings, or experiences as if it is an individual with identity, personality, and social life. In moving towards that direction, in this paper, a brain inspired model of self-awareness is presented that allows an agent to learn to attend to different aspects of self as an individual with identity, physical embodiment, mental states, experiences, and reflections on how others may think about oneself. The model is built and realized on a NAO humanoid robotic platform to investigate the role of this capacity of self-awareness on the robot's learning and interactivity.

【Keywords】: Self-Awareness; Fusion ART; Neural Networks; Conversational Agent

617. Semantic Proto-Role Labeling.

Paper Link】 【Pages】:4459-4466

【Authors】: Adam R. Teichert ; Adam Poliak ; Benjamin Van Durme ; Matthew R. Gormley

【Abstract】: The semantic function tags of Bonial, Stowe, and Palmer (2013) and the ordinal, multi-property annotations of Reisinger et al. (2015) draw inspiration from Ddowty's semantic proto-role theory. We approach proto-role labeling as a multi-label classification problem and establish strong results for the task by adapting a successful model of traditional semantic role labeling. We achieve a proto-role micro-averaged F1 of 81.7 using gold syntax and explore joint and conditional models of proto-roles and categorical roles. In comparing the effect of Bonial, Stowe, and Palmer's tags to PropBank ArgN-style role labels, we are surprised that neither annotations greatly improve proto-role prediction; however, we observe that ArgN models benefit much from observed syntax and from observed or modeled proto-roles while our models of the semantic function tags do not.

【Keywords】: Semantic Role Labeling; Proto-Roles; Graphical Models

Special Track on Computational Sustainability 14

618. Matrix Factorisation for Scalable Energy Breakdown.

Paper Link】 【Pages】:4467-4473

【Authors】: Nipun Batra ; Hongning Wang ; Amarjeet Singh ; Kamin Whitehouse

【Abstract】: Homes constitute more than one-thirds of the total energy consumption. Producing an energy breakdown for a home has been shown to reduce household energy consumption by up to 15%, among other benefits. However, existing approaches to produce an energy breakdown require hardware to be installed in each home and are thus prohibitively expensive. In this paper, we propose a novel application of feature-based matrix factorisation that does not require any additional hard- ware installation. The basic premise of our approach is that common design and construction patterns for homes create a repeating structure in their energy data. Thus, a sparse basis can be used to represent energy data from a broad range of homes. We evaluate our approach on 516 homes from a publicly available data set and find it to be more effective than five baseline approaches that either require sensing in each home, or a very rigorous survey across a large number of homes coupled with complex modelling. We also present a deployment of our system as a live web application that can potentially provide energy breakdown to millions of homes.

【Keywords】:

619. Regularization in Hierarchical Time Series Forecasting with Application to Electricity Smart Meter Data.

Paper Link】 【Pages】:4474-4480

【Authors】: Souhaib Ben Taieb ; Jiafan Yu ; Mateus Neves Barreto ; Ram Rajagopal

【Abstract】: Accurate electricity demand forecast plays a key role in sustainable power systems. It enables better decision making in the planning of electricity generation and distribution for many use cases. The electricity demand data can often be represented in a hierarchical structure. For example, the electricity consumption of a whole country could be disaggregated by states, cities, and households. Hierarchical forecasts require not only good prediction accuracy at each level of the hierarchy, but also the consistency between different levels. State-of-the-art hierarchical forecasting methods usually apply adjustments on the individual level forecasts to satisfy the aggregation constraints. However, the high-dimensionality of the unpenalized regression problem and the estimation errors in the high-dimensional error covariance matrix can lead to increased variability in the revised forecasts with poor prediction performance. In order to provide more robustness to estimation errors in the adjustments, we present a new hierarchical forecasting algorithm that computes sparse adjustments while still preserving the aggregation constraints. We formulate the problem as a high-dimensional penalized regression, which can be efficiently solved using cyclical coordinate descent methods. We also conduct experiments using a large-scale hierarchical electricity demand data. The results confirm the effectiveness of our approach compared to state-of-the-art hierarchical forecasting methods, in both the sparsity of the adjustments and the prediction accuracy. The proposed approach to hierarchical forecasting could be useful for energy generation including solar and wind energy, as well as numerous other applications.

【Keywords】: sustainability; machine learning; time series, regularization

620. Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method.

Paper Link】 【Pages】:4481-4487

【Authors】: Zhiguang Cao ; Hongliang Guo ; Jie Zhang ; Frans A. Oliehoek ; Ulrich Fastenrath

【Abstract】: The stochastic shortest path problem is of crucial importance for the development of sustainable transportation systems. Existing methods based on the probability tail model seek for the path that maximizes the probability of arriving at the destination before a deadline. However, they suffer from low accuracy and/or high computational cost. We design a novel Q-learning method where the converged Q-values have the practical meaning as the actual probabilities of arriving on time so as to improve accuracy. By further adopting dynamic neural networks to learn the value function, our method can scale well to large road networks with arbitrary deadlines. Experimental results on real road networks demonstrate the significant advantages of our method over other counterparts.

【Keywords】:

621. Counting-Based Reliability Estimation for Power-Transmission Grids.

Paper Link】 【Pages】:4488-4494

【Authors】: Leonardo Dueñas-Osorio ; Kuldeep S. Meel ; Roger Paredes ; Moshe Y. Vardi

【Abstract】: Modern society is increasingly reliant on the functionality of infrastructure facilities and utility services. Consequently, there has been surge of interest in the problem of quantification of system reliability, which is known to be #P-complete. Reliability also contributes to the resilience of systems, so as to effectively make them bounce back after contingencies. Despite diverse progress, most techniques to estimate system reliability and resilience remain computationally expensive. In this paper, we investigate how recent advances in hashing-based approaches to counting can be exploited to improve computational techniques for system reliability.The primary contribution of this paper is a novel framework, RelNet, that reduces the problem of computing reliability for a given network to counting the number of satisfying assignments of a Σ 1 1 formula, which is amenable to recent hashing-based techniques developed for counting satisfying assignments of SAT formula. We then apply RelNet to ten real world power-transmission grids across different cities in the U.S. and are able to obtain, to the best of our knowledge, the first theoretically sound a priori estimates of reliability between several pairs of nodes of interest. Such estimates will help managing uncertainty and support rational decision making for community resilience.

【Keywords】: network reliability, hashing-based counting; approxmc

622. Three New Algorithms to Solve N-POMDPs.

Paper Link】 【Pages】:4495-4501

【Authors】: Yann Dujardin ; Tom Dietterich ; Iadine Chades

【Abstract】: In many fields in computational sustainability, applications of POMDPs are inhibited by the complexity of the optimal solution. One way of delivering simple solutions is to represent the policy with a small number of alpha-vectors. We would like to find the best possible policy that can be expressed using a fixed number N of alpha-vectors. We call this the N-POMDP problem. The existing solver alpha-min approximately solves finite-horizon POMDPs with a controllable number of alpha-vectors. However alpha-min is a greedy algorithm without performance guarantees, and it is rather slow. This paper proposes three new algorithms, based on a general approach that we call alpha-min-2. These three algorithms are able to approximately solve N-POMDPs. Alpha-min-2-fast (heuristic) and alpha-min-2-p (with performance guarantees) are designed to complement an existing POMDP solver, while alpha-min-2-solve (heuristic) is a solver itself. Complexity results are provided for each of the algorithms, and they are tested on well-known benchmarks. These new algorithms will help users to interpret solutions to POMDP problems in computational sustainability.

【Keywords】: POMDP; Compact solutions; Computational sustainability; Integer linear programming

623. Fine-Grained Car Detection for Visual Census Estimation.

Paper Link】 【Pages】:4502-4508

【Authors】: Timnit Gebru ; Jonathan Krause ; Yilun Wang ; Duyun Chen ; Jia Deng ; Li Fei-Fei

【Abstract】: Targeted socio-economic policies require an accurate understanding of a country’s demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning-driven approaches are cheaper and faster—with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. To facilitate our work, we have collected the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources, classified by car experts to account for even the most subtle of visual differences. We use this data to construct the largest scale fine-grained detection system reported to date. Our prediction results correlate well with ground truth income data (r=0.82), Massachusetts department of vehicle registration, and sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Finally, we learn interesting relationships between cars and neighborhoods allowing us to perform the first large scale sociological analysis of cities using computer vision techniques.

【Keywords】: deep learning; social analysis;object detection;object classification

624. Spatial Projection of Multiple Climate Variables Using Hierarchical Multitask Learning.

Paper Link】 【Pages】:4509-4515

【Authors】: André Ricardo Gonçalves ; Arindam Banerjee ; Fernando J. Von Zuben

【Abstract】: Future projection of climate is typically obtained by combining outputs from multiple Earth System Models (ESMs) for several climate variables such as temperature and precipitation. While IPCC has traditionally used a simple model output average, recent work has illustrated potential advantages of using a multitask learning (MTL) framework for projections of individual climate variables. In this paper we introduce a framework for hierarchical multitask learning (HMTL) with two levels of tasks such that each super-task, i.e., task at the top level, is itself a multitask learning problem over sub-tasks. For climate projections, each super-task focuses on projections of specific climate variables spatially using an MTL formulation. For the proposed HMTL approach, a group lasso regularization is added to couple parameters across the super-tasks, which in the climate context helps exploit relationships among the behavior of different climate variables at a given spatial location. We show that some recent works on MTL based on learning task dependency structures can be viewed as special cases of HMTL. Experiments on synthetic and real climate data show that HMTL produces better results than decoupled MTL methods applied separately on the super-tasks and HMTL significantly outperforms baselines for climate projection.

【Keywords】: Multitask Learning; Earth System Models Ensemble; Structured Regression; Structure Learning

625. Species Distribution Modeling of Citizen Science Data as a Classification Problem with Class-Conditional Noise.

Paper Link】 【Pages】:4516-4523

【Authors】: Rebecca A. Hutchinson ; Liqiang He ; Sarah C. Emerson

【Abstract】: Species distribution models relate the geographic occurrence pattern of a species to environmental features and are used for a variety of scientific and management purposes. One source of data for building species distribution models is citizen science, in which volunteers report locations where they observed (or did not observe) sets of species. Since volunteers have variable levels of expertise, citizen science data may contain both false positives and false negatives in the location labels (present vs. absent) they provide, but many common modeling approaches for this task do not address these sources of noise explicitly. In this paper, we propose to formulate the species distribution modeling task as a classification problem with class-conditional noise. Our approach builds on other applications of class-conditional noise models to crowdsourced data, but we focus on leveraging features of the noise processes that are distinct from the class features. We describe the conditions under which the parameters of our proposed model are identifiable and apply it to simulated data and data from the eBird citizen science project.

【Keywords】: classification; species distribution modeling; citizen science; compuataional sustainability; class-conditional label noise

626. Combining Satellite Imagery and Open Data to Map Road Safety.

Paper Link】 【Pages】:4524-4530

【Authors】: Alameen Najjar ; Shun'ichi Kaneko ; Yoshikazu Miyanaga

【Abstract】: Improving road safety is critical for the sustainable development of cities. A road safety map is a powerful tool that can help prevent future traffic accidents. However, accurate mapping requires accurate data collection, which is both expensive and labor intensive. Satellite imagery is increasingly becoming abundant, higher in resolution and affordable. Given the recent successes deep learning has achieved in the visual recognition field, we are interested in investigating whether it is possible to use deep learning to accurately predict road safety directly from raw satellite imagery. To this end, we propose a deep learning-based mapping approach that leverages open data to learn from raw satellite imagery robust deep models able to predict accurate city-scale road safety maps at an affordable cost. To empirically validate the proposed approach, we trained a deep model on satellite images obtained from over 647 thousand traffic-accident reports collected over a period of four years by the New York city Police Department. The best model predicted road safety from raw satellite imagery with an accuracy of 78%. We also used the New York city model to predict for the city of Denver a city-scale map indicating road safety in three levels. Compared to a map made from three years' worth of data collected by the Denver city Police Department, the map predicted from raw satellite imagery has an accuracy of 73%.

【Keywords】: Deep learning; Satellite imagery; Open data; Road safety; Computational sustainability; Convolutional Neural Networks

627. Fast-Tracking Stationary MOMDPs for Adaptive Management Problems.

Paper Link】 【Pages】:4531-4537

【Authors】: Martin Péron ; Kai Helge Becker ; Peter Bartlett ; Iadine Chades

【Abstract】: Adaptive management is applied in conservation and natural resource management, and consists of making sequential decisions when the transition matrix is uncertain. Informally described as ’learning by doing’, this approach aims to trade off between decisions that help achieve the objective and decisions that will yield a better knowledge of the true transition matrix. When the true transition matrix is assumed to be an element of a finite set of possible matrices, solving a mixed observability Markov decision process (MOMDP) leads to an optimal trade-off but is very computationally demanding. Under the assumption (common in adaptive management) that the true transition matrix is stationary, we propose a polynomial-time algorithm to find a lower bound of the value function. In the corners of the domain of the value function (belief space), this lower bound is provably equal to the optimal value function. We also show that under further assumptions, it is a linear approximation of the optimal value function in a neighborhood around the corners. We evaluate the benefits of our approach by using it to initialize the solvers MO-SARSOP and Perseus on a novel computational sustainability problem and a recent adaptive management data challenge. Our approach leads to an improved initial value function and translates into significant computational gains for both solvers.

【Keywords】: Partially observable Markov decision process; mixed observable Markov decision process; adaptive management; adaptive learning policy; exploration/exploitation trade-off

628. Extracting Urban Microclimates from Electricity Bills.

Paper Link】 【Pages】:4538-4544

【Authors】: Thuy Vu ; Douglas Stott Parker

【Abstract】: Sustainable energy policies are of growing importance in all urban centers.Climate — and climate change — will play increasingly important roles in these policies.Climate zones defined by the California Energy Commissionhave long been influential in energy management.For example, recently a two-zone division of Los Angeles(defined by historical temperature averages) was introduced for electricity rate restructuring.The importance of climate zones has been enormous,and climate change could make them still more important. AI can provide improvements on the ways climate zones are derived and managed.This paper reports on analysis of aggregate household electricity consumption (EC) data from local utilities in Los Angeles,seeking possible improvements in energy management. In this analysis we noticed that EC data permits identificationof interesting geographical zones  — regions having EC patterns that are characteristically different from surrounding regions.We believe these zones could be useful in a variety of urban models.

【Keywords】: computational sustainability; energy management; Gaussian Process Random Field

629. Robust Optimization for Tree-Structured Stochastic Network Design.

Paper Link】 【Pages】:4545-4551

【Authors】: XiaoJian Wu ; Akshat Kumar ; Daniel Sheldon ; Shlomo Zilberstein

【Abstract】: Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river network optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real- world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates.

【Keywords】: Network Design; Robust Optimization; Dynamic Programming; Approximation Algorithm

630. Dynamic Optimization of Landscape Connectivity Embedding Spatial-Capture-Recapture Information.

Paper Link】 【Pages】:4552-4558

【Authors】: Yexiang Xue ; XiaoJian Wu ; Dana Morin ; Bistra Dilkina ; Angela Fuller ; J. Andrew Royle ; Carla P. Gomes

【Abstract】: Maintaining landscape connectivity is increasingly important in wildlife conservation, especially for species experiencing the effects of habitat loss and fragmentation. We propose a novel approach to dynamically optimize landscape connectivity. Our approach is based on a mixed integer program formulation, embedding a spatial capture-recapture model that estimates the density, space usage, and landscape connectivity for a given species. Our method takes into account the fact that local animal density and connectivity change dynamically and non-linearly with different habitat protection plans. In order to scale up our encoding, we propose a sampling scheme via random partitioning of the search space using parity functions. We show that our method scales to real-world size problems and dramatically outperforms the solution quality of an expectation maximization approach and a sample average approximation approach.

【Keywords】:

631. Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data.

Paper Link】 【Pages】:4559-4566

【Authors】: Jiaxuan You ; Xiaocheng Li ; Melvin Low ; David Lobell ; Stefano Ermon

【Abstract】: Agricultural monitoring, especially in developing countries, can help prevent famine and support humanitarian efforts. A central challenge is yield estimation, i.e., predicting crop yields before harvest. We introduce a scalable, accurate, and inexpensive method to predict crop yields using publicly available remote sensing data. Our approach improves existing techniques in three ways. First, we forego hand-crafted features traditionally used in the remote sensing community and propose an approach based on modern representation learning ideas. We also introduce a novel dimensionality reduction technique that allows us to train a Convolutional Neural Network or Long-short Term Memory network and automatically learn useful features even when labeled training data are scarce. Finally, we incorporate a Gaussian Process component to explicitly model the spatio-temporal structure of the data and further improve accuracy. We evaluate our approach on county-level soybean yield prediction in the U.S. and show that it outperforms competing techniques.

【Keywords】: Deep learning, Crop yield prediction, Remote sensing

Special Track on Integrated Systems 8

Paper Link】 【Pages】:4567-4573

【Authors】: Peng Dai ; Femida Gwadry-Sridhar ; Michael Bauer ; Michael Borrie ; Xue Teng

【Abstract】: Alzheimer's disease (AD) is a genetically complex neurodegenerative disease, which leads to irreversible brain damage, severe cognitive problems and ultimately death. A number of clinical trials and study initiatives have been set up to investigate AD pathology, leading to large amounts of high dimensional heterogeneous data (biomarkers) for analysis. This paper focuses on combining clinical features from different modalities, including medical imaging, cerebrospinal fluid (CSF), etc., to diagnose AD and predict potential progression. Due to privacy and legal issues involved with clinical research, the study cohort (number of patients) is relatively small, compared to thousands of available biomarkers (predictors). We propose a hybrid pathological analysis model, which integrates manifold learning and Random Vector functional-link network (RVFL) so as to achieve better ability to extract discriminant information with limited training materials. Furthermore, we model (current and future) cognitive healthiness as a regression problem about age. By comparing the difference between predicted age and actual age, we manage to show statistical differences between different pathological stages. Verification tests are conducted based on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Extensive comparison is made against different machine learning algorithms, i.e. Support Vector Machine (SVM), Random Forest (RF), Decision Tree and Multilayer Perceptron (MLP). Experimental results show that our proposed algorithm achieves better results than the comparison targets, which indicates promising robustness for practical clinical implementation.

【Keywords】: Alzheimer's Disease; Aging; Automatic Diagnosis; Prognosis

633. Mixed Discrete-Continuous Planning with Convex Optimization.

Paper Link】 【Pages】:4574-4580

【Authors】: Enrique Fernández-González ; Erez Karpas ; Brian Charles Williams

【Abstract】: Robots operating in the real world must be able to handle both discrete and continuous change. Many robot behaviors can be controlled through numeric parameters (called control variables), which affect the rate of the continuous change. Previous approaches capable of reasoning efficiently with control variables impose severe restrictions that limit the expressivity of the problems that can be solved. A broad class of robotic applications require, for example, convex quadratic constraints on state variables and control variables that are jointly constrained and that affect multiple state variables simultaneously. However, extensions to prior approaches are not straightforward, since these characteristics are non-linear and hard to scale. We introduce cqScotty, a heuristic forward search planner that solves these problems efficiently. While naive formulations of consistency checks are not convex and do not scale, cqScotty uses an efficient convex formulation, in the form of a Second Order Cone Program (SOCP), that is very fast to solve. We demonstrate the scalability of our approach on three new realistic domains.

【Keywords】: Hybrid planning, Automated planning, Mixed discrete-continuous planning; Planning and robotics; Temporal planning

634. Integration of Planning with Recognition for Responsive Interaction Using Classical Planners.

Paper Link】 【Pages】:4581-4588

【Authors】: Richard G. Freedman ; Shlomo Zilberstein

【Abstract】: Interaction between multiple agents requires some form of coordination and a level of mutual awareness. When computers and robots interact with people, they need to recognize human plans and react appropriately. Plan and goal recognition techniques have focused on identifying an agent's task given a sufficiently long action sequence. However, by the time the plan and/or goal are recognized, it may be too late for computing an interactive response. We propose an integration of planning with probabilistic recognition where each method uses intermediate results from the other as a guiding heuristic for recognition of the plan/goal in-progress as well as the interactive response. We show that, like the used recognition method, these interaction problems can be compiled into classical planning problems and solved using off-the-shelf methods. In addition to the methodology, this paper introduces problem categories for different forms of interaction, an evaluation metric for the benefits from the interaction, and extensions to the recognition algorithm that make its intermediate results more practical while the plan is in progress.

【Keywords】: Planning; Plan Recognition; Goal Recognition; Interaction

635. Configuration Planning with Temporal Constraints.

Paper Link】 【Pages】:4589-4595

【Authors】: Uwe Köckemann ; Lars Karlsson

【Abstract】: Configuration planning is a form of task planning that takes into consideration both causal and information dependencies in goal achievement. This type of planning is interesting, for instance, in smart home environments which contain various sensors and robots to provide services to the inhabitants. Requests for information, for instance from an activity recognition system, should cause the smart home to configure itself in such a way that all requested information will be provided when it is needed. This paper addresses temporal configuration planning in which information availability and goals are linked to temporal intervals which are subject to constrains. Our solutions are based on constraint-based planning which uses different types of constraints to model different types of knowledge. We propose and compare two approaches to configuration planning. The first one models information via conditions and effects of planning operators and essentially reduces configuration planning to constraint-based temporal planning. The second approach solves information dependencies separately from task planning and optimizes the cost of reaching individual information goals. We compare these approaches in terms of the time it takes to solve problems and the quality of the solutions they provide.

【Keywords】: configuration planning; constraint-based planning; temporal planning

636. Learning to Predict Intent from Gaze During Robotic Hand-Eye Coordination.

Paper Link】 【Pages】:4596-4602

【Authors】: Yosef Razin ; Karen M. Feigh

【Abstract】: Effective human-aware robots should anticipate their user’s intentions. During hand-eye coordination tasks, gaze often precedes hand motion and can serve as a powerful predictor for intent. However, cooperative tasks where a semi-autonomous robot serves as an extension of the human hand have rarely been studied in the context of hand-eye coordination. We hypothesize that accounting for anticipatory eye movements in addition to the movements of the robot will improve intent estimation. This research compares the application of various machine learning methods to intent prediction from gaze tracking data during robotic hand-eye coordination tasks. We found that with proper feature selection, accuracies exceeding 94% and AUC greater than 91% are achievable with several classification algorithms but that anticipatory gaze data did not improve intent prediction.

【Keywords】: Human-Aware AI; Application of Supervised Learning; Human-Robot Interaction; Gaze Tracking

637. Vision-Language Fusion for Object Recognition.

Paper Link】 【Pages】:4603-4610

【Authors】: Sz-Rung Shiang ; Stephanie Rosenthal ; Anatole Gershman ; Jaime G. Carbonell ; Jean Oh

【Abstract】: While recent advances in computer vision have caused object recognition rates to spike, there is still much room for improvement. In this paper, we develop an algorithm to improve object recognition by integrating human-generated contextual information with vision algorithms. Specifically, we examine how interactive systems such as robots can utilize two types of context information--verbal descriptions of an environment and human-labeled datasets. We propose a re-ranking schema, MultiRank, for object recognition that can efficiently combine such information with the computer vision results. In our experiments, we achieve up to 9.4% and 16.6% accuracy improvements using the oracle and the detected bounding boxes, respectively, over the vision-only recognizers. We conclude that our algorithm has the ability to make a significant impact on object recognition in robotics and beyond.

【Keywords】: Language and Vision; Object Recognition; Relational/Graph-Based Learning

638. State Projection via AI Planning.

Paper Link】 【Pages】:4611-4617

【Authors】: Shirin Sohrabi ; Anton V. Riabov ; Octavian Udrea

【Abstract】: Imagining the future helps anticipate and prepare for what is coming. This has great importance to many, if not all, human endeavors. In this paper, we develop the Planning Projector system prototype, which applies plan-recognition-as-planning technique to both explain the observations derived from analyzing relevant news and social media, and project a range of possible future state trajectories for human review. Unlike the plan recognition problem, where a set of goals, and often a plan library must be given as part of the input, the Planning Projector system takes as input the domain knowledge, a sequence of observations derived from the news, a time horizon, and the number of trajectories to produce. It then computes the set of trajectories by applying a planner capable of finding a set of high-quality plans on a transformed planning problem. The Planning Projector prototype integrates several components including: (1) knowledge engineering: the process of encoding the domain knowledge from domain experts; (2) data transformation: the problem of analyzing and transforming the raw data into a sequence of observations; (3) trajectory computation: characterizing the future state projection problem and computing a set of trajectories; (4) user interface: clustering and visualizing the trajectories. We evaluate our approach qualitatively and conclude that the Planning Projector helps users understand future possibilities so that they can make more informed decisions.

【Keywords】: Plan recognition, Planning, Knowledge Engineering

639. Building Task-Oriented Dialogue Systems for Online Shopping.

Paper Link】 【Pages】:4618-4626

【Authors】: Zhao Yan ; Nan Duan ; Peng Chen ; Ming Zhou ; Jianshe Zhou ; Zhoujun Li

【Abstract】: We present a general solution towards building task-oriented dialogue systems for online shopping, aiming to assist online customers in completing various purchase-related tasks, such as searching products and answering questions, in a natural language conversation manner. As a pioneering work, we show what & how existing NLP techniques, data resources, and crowdsourcing can be leveraged to build such task-oriented dialogue systems for E-commerce usage. To demonstrate its effectiveness, we integrate our system into a mobile online shopping app. To the best of our knowledge, this is the first time that an AI bot in Chinese is practically used in online shopping scenario with millions of real consumers. Interesting and insightful observations are shown in the experimental part, based on the analysis of human-bot conversation log. Several current challenges are also pointed out as our future directions.

【Keywords】: Dialogue System; Natural Language Processing; Knowledge Acquisition

Innovative Applications of Artificial Intelligence Conference: Deployed Application Case Studies 2

640. Large-Scale Occupational Skills Normalization for Online Recruitment.

Paper Link】 【Pages】:4627-4634

【Authors】: Faizan Javed ; Phuong Hoang ; Thomas Mahoney ; Matt McNair

【Abstract】: Job openings often go unfulfilled despite a surfeit of unemployed or underemployed workers. One of the main reasons for this is a mismatch between the skills required by employers and the skills that workers possess. This mismatch, also known as the skills gap, can pose socio-economic challenges for an economy. A first step in alleviating the skills gap is to accurately detect skills in human capital data such as resumes and job ads. Comprehensive and accurate detection of skills facilitates analysis of labor market dynamics. It also helps bridge the divide between supply and demand of labor by facilitating reskilling and workforce training programs. In this paper, we describe SKILL, a Named Entity Normalization (NEN) system for occupational skills. SKILL is composed of 1) A skills tagger which uses properties of semantic word vectors to recognize and normalize relevant skills, and 2) A skill entity sense disambiguation component which infers the correct meaning of an identified skill by leveraging Markov Chain Monte Carlo (MCMC) algorithms. Data-driven evaluation using end-user surveys demonstrates that SKILL achieves 90% precision and 73% recall for skills tagging. SKILL is currently used by various internal teams at CareerBuilder for big data workforce analytics, semantic search, job matching, and recommendations.

【Keywords】: entity normalization; skills gap; taxonomies

641. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.

Paper Link】 【Pages】:4635-4643

【Authors】: Yexiang Xue ; Junwen Bai ; Ronan Le Bras ; Brendan Rappazzo ; Richard Bernstein ; Johan Bjorck ; Liane Longpre ; Santosh K. Suram ; Robert Bruce van Dover ; John M. Gregoire ; Carla P. Gomes

【Abstract】: High-throughput materials discovery involves the rapid synthesis, measurement, and characterization of many different but structurally related materials. A central problem in materials discovery, the phase map identification problem, involves the determination of the crystal structure of materials from materials composition and structural characterization data. We present Phase-Mapper, a novel solution platform that allows humans to interact with both the data and products of AI algorithms, including the incorporation of human feedback to constrain or initialize solutions. Phase-Mapper is compatible with any spectral demixing algorithm, including our novel solver, AgileFD, which is based on convolutive non-negative matrix factorization. AgileFD allows materials scientists to rapidly interpret XRD patterns, and can incorporate constraints to capture the physics of the materials as well as human feedback. We compare three solver variants with previously proposed methods in a large-scale experiment involving 20 synthetic systems, demonstrating the efficacy of imposing physical constraints using AgileFD. Since the deployment of Phase-Mapper at the Department of Energy's Joint Center for Artificial Photosynthesis (JCAP), thousands of X-ray diffraction patterns have been processed and the results are yielding discovery of new materials for energy applications, as exemplified by the discovery of a new family of metal oxide solar light absorbers, among the previously unsolved Nb-Mn-V oxide system, which is provided here as an illustrative example. Phase-Mapper is also being deployed at the Stanford Synchrotron Radiation Lightsource (SSRL) to enable phase mapping on datasets in real time.

【Keywords】:

Innovative Applications of Artificial Intelligence Conference: Emerging Application Case Studies 17

642. A Machine Learning Approach for Semantic Structuring of Scientific Charts in Scholarly Documents.

Paper Link】 【Pages】:4644-4649

【Authors】: Rabah A. Al-Zaidy ; C. Lee Giles

【Abstract】: Large scholarly repositories are designed to provide scientists and researchers with a wealth of information that is retrieved from data present in a variety of formats. A typical scholarly document contains information in a combined layout of texts and graphic images. Common types of graphics found in these documents are scientific charts that are used to represent data values in a visual format. Experimental results are rarely described without the aid of one form of a chart or another, whether it is 2D plot, bar chart, pie chart, etc. Metadata of these graphics is usually the only content that is made available for search by user queries. By processing the image content and extracting the data represented in the graphics, search engines will be able to handle more specific queries related to the data itself. In this paper we describe a machine learning based system that extracts and recognizes the various data fields present in a bar chart for semantic labeling. Our approach comprises of a graphics and text separation and extraction phase, followed by a component role classification for both text and graphic components that are in turn used for semantic analysis and representation of the chart. The proposed system is tested on a set of over 200 bar charts extracted from over 1,000 scientific articles in PDF format.

【Keywords】: Bar Charts; Role Labeling; Semantic Structuring; Scholarly Document Figures

643. ParkUs: A Novel Vehicle Parking Detection System.

Paper Link】 【Pages】:4650-4656

【Authors】: Pietro Edoardo Carnelli ; Joy Yeh ; Mahesh Sooriyabandara ; Aftab Khan

【Abstract】: Finding on-street parking in congested urban areas is a challenging chore that most drivers worldwide dislike. Previousvehicle traffic studies have estimated that around thirty percent of vehicles travelling in inner city areas are made up ofdrivers searching for a vacant parking space. While there arehardware sensor based solutions to monitor on-street parking occupancy in real-time, instrumenting and maintainingsuch a city wide system is a substantial investment. In this paper, a novel vehicle parking activity detection method, calledParkUs, is introduced and tested with the aim to eventuallyreduce vacant car parking space search times. The systemutilises accelerometer and magnetometer sensors found in allsmartphones in order to detect parking activity within a cityenvironment. Moreover, it uses a novel sensor fusion featurecalled the Orthogonality Error Estimate (OEE). We show thatthe OEE is an excellent indicator as it’s capable of detecting parking activities with high accuracy and low energy consumption. One of the envisioned applications of the ParkUssystem will be to provide all drivers with guidelines on wherethey are most likely to find vacant parking spaces within acity. Therefore, reducing the time required to find a vacantparking space and subsequently vehicle congestion and emissions within the city.

【Keywords】: vehicle parking; smartphone sensing;

644. UbuntuWorld 1.0 LTS - A Platform for Automated Problem Solving & Troubleshooting in the Ubuntu OS.

Paper Link】 【Pages】:4657-4663

【Authors】: Tathagata Chakraborti ; Kartik Talamadupula ; Kshitij P. Fadnis ; Murray Campbell ; Subbarao Kambhampati

【Abstract】: In this paper, we present UbuntuWorld 1.0 LTS - a platform for developing automated technical support agents in the Ubuntu operating system. Specifically, we propose to use the Bash terminal as a simulator of the Ubuntu environment for a learning-based agent and demonstrate the usefulness of adopting reinforcement learning (RL) techniques for basic problem solving and troubleshooting in this environment. We provide a plug-and-play interface to the simulator as a python package where different types of agents can be plugged in and evaluated, and provide pathways for integrating data from online support forums like Ask Ubuntu into an automated agent’s learning process. Finally, we show that the use of this data significantly improves the agent’s learning efficiency. We believe that this platform can be adopted as a real-world test bed for research on automated technical support.

【Keywords】: Ubuntu, Technical support, Reinforcement Learning, Planning

645. Calories Prediction from Food Images.

Paper Link】 【Pages】:4664-4669

【Authors】: Manal Chokr ; Shady Elbassuoni

【Abstract】: Calculating the amount of calories in a given food item is now a common task. We propose a machine-learning-based approach to predict the amount of calories from food images. Our system does not require any input from the user, except from an image of the food item. We take a pragmatic ap- proach to accurately predict the amount of calories in a food item and solve the problem in three phases. First, we identify the type of the food item in the image. Second, we estimate the size of the food item in grams. Finally, by taking into consideration the output of the first two phases, we predict the amount of calories in the photographed food item. All these three phases are based purely on supervised machine learning. We show that this pipelined approach is very effective in predicting the amount of calories in a food item as compared to baseline approaches which directly predicts the amount of calories from the image.

【Keywords】: machine learning; image processing; artificial intelligence; digital public health; diet; calorie estimation

646. Real-Time Indoor Localization in Smart Homes Using Semi-Supervised Learning.

Paper Link】 【Pages】:4670-4677

【Authors】: Negar Ghourchian ; Michel Allegue-Martinez ; Doina Precup

【Abstract】: Long-term automated monitoring of residential or small in- dustrial properties is an important task within the broader scope of human activity recognition. We present a device- free wifi-based localization system for smart indoor spaces, developed in a collaboration between McGill University and Aerˆıal Technologies. The system relies on existing wifi net- work signals and semi-supervised learning, in order to au- tomatically detect entrance into a residential unit, and track the location of a moving subject within the sensing area. The implemented real-time monitoring platform works by detect- ing changes in the characteristics of the wifi signals collected via existing off-the-shelf wifi-enabled devices in the environ- ment. This platform has been deployed in several apartments in the Montreal area, and the results obtained show the poten- tial of this technology to turn any regular home with an ex- isting wifi network into a smart home equipped with intruder alarm and room-level location detector. The machine learn- ing component has been devised so as to minimize the need for user annotation and overcome temporal instabilities in the input signals. We use a semi-supervised learning framework which works in two phases. First, we build a base learner for mapping wifi signals to different physical locations in the en- vironment from a small amount of labeled data; during its lifetime, the learner automatically re-trains when the uncer- tainty level rises significantly, without the need for further supervision. This paper describes the technical and practical issues arising in the design and implementation of such a sys- tem for real residential units, and illustrates its performance during on-going deployment.

【Keywords】: Smart home; wifi signals; indoor activity recognition

647. Constraint-Based Verification of a Mobile App Game Designed for Nudging People to Attend Cancer Screening.

Paper Link】 【Pages】:4678-4685

【Authors】: Arnaud Gotlieb ; Marine Louarn ; Mari Nygård ; Tomás Ruiz-López ; Sagar Sen ; Roberta Gori

【Abstract】: In Norway, cervical cancer prevention involves the participation of as many eligible women aged 25-69 years as possible. However, reaching and inviting every eligible women to attend cervical cancer screening and HPV vaccination is difficult. Using social nudging and gamification in modern means of communication can encourage the participation of unscreened people. Simula Research Laboratory together with the Cancer Registry of Norway have developed FightHPV, a mobile app game intended to inform adolescent and eligible women about cervical cancer screening and HPV vaccination while they play and, to facilitate their further participation to prevention campaigns. However, game design and health information transfer can be hard to reconcile, as the design of each game episode is more guided by the release of information than gameplay and playing difficulty. In this paper, we propose a constraint-based model of FightHPV to evaluate the difficulty of each episode and to help the game designer in improving the player experience. This approach is relevant to facilitate social nudging of eligible women to participate to cervical cancer screening and HPV vaccination, as shown by the initial deployment of FightHPV and tests performed in focus groups. The design of this mobile app can thus be regarded as a new application case of Artificial Intelligence techniques such as gamification and constraint programming.

【Keywords】: (gamification;cancer screening;constraint-based verification)

648. Predicting Fuel Consumption and Flight Delays for Low-Cost Airlines.

Paper Link】 【Pages】:4686-4693

【Authors】: Yuji Horiguchi ; Yukino Baba ; Hisashi Kashima ; Masahito Suzuki ; Hiroki Kayahara ; Jun Maeno

【Abstract】: Low-cost airlines (LCAs) represent a new category of airlines that provides low-fare flights. The rise and growth of LCAs has intensified the price competition among airlines, and LCAs require continuous efforts to reduce their operating costs to lower flight prices; however, LCA passengers still demand high-quality services. A common measure of airline service quality is on-time departure performance. Be- cause LCAs apply efficient aircraft utilization and the time between flights is likely to be small, additional effort is required to avoid flight delays and improve their service quality. In this paper, we apply state-of-the-art predictive modeling approaches to real airline datasets and investigate the feasibility of machine learning methods for cost reduction and service quality improvement in LCAs. We address two prediction problems: fuel consumption prediction and flight delay prediction. We train predictive models using flight and passenger information, and our experiment results show that our regression model predicts the amount of fuel consumption more accurately than flight dispatchers, and our binary classifier achieves an area under the ROC curve (AUC) of 0.75 for predicting a delay of a specific flight route.

【Keywords】:

649. Cracks Under Pressure? Burst Prediction in Water Networks Using Dynamic Metrics.

Paper Link】 【Pages】:4694-4700

【Authors】: Gollakota Kaushik ; Abinaya Manimaran ; Arunchandar Vasan ; Venkatesh Sarangan ; Anand Sivasubramaniam

【Abstract】: Ranking pipes according to their burst likelihood can help a water utility triage its proactive maintenance budget effectively. In the research literature, data-driven approaches have been used recently to predict pipe bursts. Such approaches make use of static features of the individual pipes such as diameter,length, and material to estimate burst likelihood for the next year by learning over past historical data. The burst likelihood of a pipe also depends on dynamic features such as its pressure and flow. Existing works ignore dynamic features because the features need to be measured or are difficult to obtain accurately using a well-calibrated hydraulic model. We complement prior data-driven approaches by proposing a methodology to approximately estimate the dynamic features of individual pipes from readily available network structure and other data. We study the error introduced by our approximation on an academic benchmark water network with ground truth. Using a real-world pipe burst dataset obtained from a European water utility for multiple years, we show that our approximate dynamic features improve the ability of machine learning classifiers to predict pipe bursts. The performance (as measured by the percentage of future bursts predicted) of the best forming classifier improves by nearly 50% through these dynamic features.

【Keywords】: prediction; burst; dynamic features; approximation

650. Determining Relative Airport Threats from News and Social Media.

Paper Link】 【Pages】:4701-4707

【Authors】: Rupinder Paul Khandpur ; Taoran Ji ; Yue Ning ; Liang Zhao ; Chang-Tien Lu ; Erik R. Smith ; Christopher Adams ; Naren Ramakrishnan

【Abstract】: Airports are a prime target for terrorist organizations, drug traffickers, smugglers, and other nefarious groups. Traditional forms of security assessment are not real-time and often do not exist for each airport and port of entry. Thus, homeland security professionals must rely on measures of attractiveness of an airport as a target for attacks. We present an open source indicators approach, using news and social media, to conduct relative threat assessment, i.e., estimating if one airport is under greater threat than another. The three ingredients of our approach are a dynamic query expansion algorithm for tracking emerging threat-related chatter, news-Twitter reciprocity modeling for capturing interactions between social and traditional media, and a ranking scheme to provide an ordered assessment of airport threats. Case studies based on actual aviation incidents are presented.

【Keywords】: Open Source Indicators; Social Media Analytics; Text Mining; Ranking

651. Risk-Aware Planning: Methods and Case Study for Safer Driving Routes.

Paper Link】 【Pages】:4708-4714

【Authors】: John Krumm ; Eric Horvitz

【Abstract】: Vehicle crashes account for over one million fatalities and many more million injuries annually worldwide. Some roads are safer than others, so a driving route optimized for safety may reduce the number of crashes. We have developed a method to estimate the probability of a crash on any road as a function of the traffic volume, road characteristics, and environmental conditions. We trained a regression model to estimate traffic volume and a binary classifier to estimate crash probability on road segments. Modeling a route’s crash probability as a series of Bernoulli probability trials, we show how to use a simple Dijkstra algorithm to compute the safest route between two locations. Compared to the fastest route, the safest route averages about 1.7 times as long in duration and about half as dangerous. We also show how to smoothly trade off safety for time, giving several different route options with different crash probabilities and durations.

【Keywords】: vehicle accidents; safe driving routes; risk-aware planning

652. Designing Better Playlists with Monte Carlo Tree Search.

Paper Link】 【Pages】:4715-4720

【Authors】: Elad Liebman ; Piyush Khandelwal ; Maytal Saar-Tsechansky ; Peter Stone

【Abstract】: In recent years, there has been growing interest in the study of automated playlist generation — music recommender systems that focus on modeling preferences over song sequences rather than on individual songs in isolation. This paper addresses this problem by learning personalized models on the fly of both song and transition preferences, uniquely tailored to each user’s musical tastes. Playlist recommender systems typically include two main components: i) a preference-learning component, and ii) a planning component for selecting the next song in the playlist sequence. While there has been much work on the former, very little work has been devoted to the latter. This paper bridges this gap by focusing on the planning aspect of playlist generation within the context of DJ-MC, our playlist recommendation application. This paper also introduces a new variant of playlist recommendation, which incorporates the notion of diversity and novelty directly into the reward model. We empirically demonstrate that the proposed planning approach significantly improves performance compared to the DJ-MC baseline in two playlist recommendation settings, increasing the usability of the framework in real world settings.

【Keywords】:

653. On Designing a Social Coach to Promote Regular Aerobic Exercise.

Paper Link】 【Pages】:4721-4727

【Authors】: Shiwali Mohan ; Anusha Venkatakrishnan ; Michael Silva ; Peter Pirolli

【Abstract】: Our research aims at developing interactive, social agents that can coach people to learn new tasks, skills, and habits. In this paper, we focus on coaching sedentary, overweight individuals to exercise regularly. We employ adaptive goal setting in which the coach generates, tracks, and revises personalized exercise goals for a trainee. The goals become incrementally more difficult as the trainee progresses through the training program. Our approach is model-based - the coach maintains a parameterized model of the trainee's aerobic capability that drives its expectation of the trainee's performance. The model is continually revised based on interactions with the trainee. The coach is embodied in a smartphone application which serves as a medium for coach-trainee interaction. We show that our approach can adapt the trainee program not only to several trainees with different capabilities but also to how a trainee's capability improves as they begin to exercise more. Experts rate the goals selected by the coach better than other plausible goals, demonstrating that our approach is effective.

【Keywords】: coaching agents; social agents; human-agent interaction; cognitive systems; health behavior change, user modeling, model-based reasoning, model-based adaptation

654. Crowdsensing Air Quality with Camera-Enabled Mobile Devices.

Paper Link】 【Pages】:4728-4733

【Authors】: Zhengxiang Pan ; Han Yu ; Chunyan Miao ; Cyril Leung

【Abstract】: Crowdsensing of air quality is a useful way to improve public awareness and supplement local air quality monitoring data. However, current air quality monitoring approaches are either too sophisticated, costly or bulky to be used effectively by the mass. In this paper, we describe AirTick, a mobile app that can turn any camera enabled smart mobile device into an air quality sensor, thereby enabling crowdsensing of air quality. AirTick leverages image analytics and deep learning techniques to produce accurate estimates of air quality following the Pollutant Standards Index (PSI). We report the results of an initial experimental and empirical evaluations of AirTick. The AirTick tool has been shown to achieve, on average, 87% accuracy in day time operation and 75% accuracy in night time operation. Feedbacks from 100 test users indicate that they perceive AirTick to be highly useful and easy to use. Our results provide a strong positive case for the benefits of applying artificial intelligence techniques for convenient and scalable crowdsensing of air quality.

【Keywords】: Crowdsensing; Air pollution; Deep learning; Image processing

655. Optimal Sequential Drilling for Hydrocarbon Field Development Planning.

Paper Link】 【Pages】:4734-4739

【Authors】: Ruben Rodriguez Torrado ; Jesus Rios ; Gerald Tesauro

【Abstract】: We present a novel approach for planning the development of hydrocarbon fields, taking into account the sequential nature of well drilling decisions and the possibility to react to future information. In a dynamic fashion, we want to optimally decide where to drill each well conditional on every possible piece of information that could be obtained from previous wells. We formulate this sequential drilling optimization problem as a POMDP, and propose an algorithm to search for an optimal drilling policy. We show that our new approach leads to better results compared to the current standard in the oil and gas (O&G) industry.

【Keywords】: Reinforcement Learning; Monte Carlo Tree Search; Oil and Gas application; Field development planning

656. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing.

Paper Link】 【Pages】:4740-4745

【Authors】: Philip S. Thomas ; Georgios Theocharous ; Mohammad Ghavamzadeh ; Ishan Durugkar ; Emma Brunskill

【Abstract】: In this paper we consider the problem of evaluating one digital marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary---an assumption that is typically false, particularly for digital marketing applications. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction methods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital marketing data set.

【Keywords】: nonstationary;off-policy policy evaluation

657. Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Dataset.

Paper Link】 【Pages】:4746-4752

【Authors】: Samarth Tripathi ; Shrinivas Acharya ; Ranti Dev Sharma ; Sudhanshu Mittal ; Samit Bhattacharya

【Abstract】: Emotion recognition is an important field of research in Brain Computer Interactions. As technology and the understanding of emotions are advancing, there are growing opportunities for automatic emotion recognition systems. Neural networks are a family of statistical learning models inspired by biological neural networks and are used to estimate functions that can depend on a large number of inputs that are generally unknown. In this paper we seek to use this effectiveness of Neural Networks to classify user emotions using EEG signals from the DEAP (Koelstra et al (2012)) dataset which represents the benchmark for Emotion classification research. We explore 2 different Neural Models, a simple Deep Neural Network and a Convolutional Neural Network for classification. Our model provides the state-of-the-art classification accuracy, obtaining 4.51 and 4.96 percentage point improvements over (Rozgic et al (2013)) classification of Valence and Arousal into 2 classes (High and Low) and 13.39 and 6.58 percentage point improvements over (Chung and Yoon(2012)) classification of Valence and Arousal into 3 classes (High, Normal and Low). Moreover our research is a testament that Neural Networks could be robust classifiers for brain signals, even outperforming traditional learning techniques.

【Keywords】: Deep Learning, Emotion Classification, Neural Networks, EEG

658. A Logic Based Approach to Answering Questions about Alternatives in DIY Domains.

Paper Link】 【Pages】:4753-4759

【Authors】: Yi Wang ; Joohyung Lee ; Doo Soon Kim

【Abstract】: Many question answering systems have primarily focused on factoid questions. These systems require the answers to be explicitly stored in a knowledge base (KB) but due to this requirement, they fail to answer many questions for which the answers cannot be pre-formulated. This paper presents a question answering system which aims at answering non-factoid questions in the DIY domain using logic-based reasoning. Specifically, the system uses Answer Set Programming to derive an answer by combining various types of knowledge such as domain and commonsense knowledge. We showcase the system by answering one specific type of questions -- questions about alternatives. The evaluation result shows that our logic-based reasoning together with the KB (constructed from texts using Information Extraction) significantly improves the user experience.

【Keywords】:

Innovative Applications of Artificial Intelligence Conference: Challenge Problem Papers 2

659. Automated Data Cleansing through Meta-Learning.

Paper Link】 【Pages】:4760-4761

【Authors】: Ian Gemp ; Georgios Theocharous ; Mohammad Ghavamzadeh

【Abstract】: Data preprocessing or cleansing is one of the biggest hurdles in industry for developing successful machine learning applications.  The process of data cleansing includes data imputation, feature normalization & selection, dimensionality reduction, and data balancing applications.  Currently such preprocessing is manual.  One approach for automating this process is meta -learning.  In this paper, we experiment with state of the art meta-learning methodologies and identify the inadequacies and research challenges for solving such a problem.

【Keywords】: data cleansing; meta-learning; autoML; metric learning

660. Explainable Agency for Intelligent Autonomous Systems.

Paper Link】 【Pages】:4762-4764

【Authors】: Pat Langley ; Ben Meadows ; Mohan Sridharan ; Dongkyu Choi

【Abstract】: As intelligent agents become more autonomous, sophisticated, and prevalent, it becomes increasingly important that humans interact with them effectively. Machine learning is now used regularly to acquire expertise, but common techniques produce opaque content whose behavior is difficult to interpret. Before they will be trusted by humans, autonomous agents must be able to explain their decisions and the reasoning that produced their choices. We will refer to this general ability as explainable agency. This capacity for explaining decisions is not an academic exercise. When a self-driving vehicle takes an unfamiliar turn, its passenger may desire to know its reasons. When a synthetic ally in a computer game blocks a player's path, he may want to understand its purpose. When an autonomous military robot has abandoned a high-priority goal to pursue another one, its commander may request justification. As robots, vehicles, and synthetic characters become more self-reliant, people will require that they explain their behaviors on demand. The more impressive these agents' abilities, the more essential that we be able to understand them.

【Keywords】: Autonomous agents; Cognitive systems; Explanation

Educational Advances in Artificial Intelligence Symposium Full Papers 9

661. ARTY: Fueling Creativity through Art, Robotics and Technology for Youth.

Paper Link】 【Pages】:4765-4770

【Authors】: Debra T. Burhans ; Karthik Dantu

【Abstract】: ARTY is a week-long program for middle school students to teach them programming of robots and allow them to express themselves artistically. It was started in 2013 and ran its fourth edition in 2016. We describe the ideas behind the inception of this program, its curriculum, our experiences during the 2016 workshop and challenges/future directions for the program. Our primary intent in this paper is to convey the program curriculum and its design, including the way in which robots can be viewed as vehicles for artistic expression. Some results from a brief attitudinal survey that was administered before and after the workshop are also included along with a discussion of outcomes assessment and issues.

【Keywords】: art education; robotics education; stem education;

662. Cornhole: A Widely-Accessible AI Robotics Task.

Paper Link】 【Pages】:4771-4774

【Authors】: Nate Derbinsky ; Tyler M. Frasca

【Abstract】: In this paper we present the game of cornhole as a compelling, accessible, and adaptable AI robotics task. Cornhole is a fun and social game with simple rules, but involves strategy and physical training for humans to play competitively; thus, developing a robot that can play at the level of even the average human player presents a multitude of opportunities for curricular integration at a variety of levels. We characterize the AI tasks involved with the game, and present results and resources gained from preliminary offerings.

【Keywords】: Education; Robotics; Artificial Intelligence; Cornhole

663. A Summer Research Experience in Robotics.

Paper Link】 【Pages】:4775-4780

【Authors】: Cindy M. Grimm ; Alicia Lyman-Holt ; William D. Smart

【Abstract】: The Robotics Program at Oregon State University has beenrunning an NSF-funded summer Research Experiences forUndergraduates (REU) site since 2014. Over twenty studentsper year (on average) have participated in the site, spendingten weeks embedded in the OSU Robotics Program. Our mainfocus with this REU Site is to give the participants a com-plete research experience, from problem definition to the fi-nal presentation of results, "in miniature". Our secondary ed-ucational objectives are: 1) Teach basic non-technical skillsneeded for graduate work, such as time management and lit-erature review, 2) Provide details on how to apply to gradu-ate school and for funding, 3) Clarify what we look for in agraduate student, and 4) Detail what to expect from the grad-uate student experience. In this paper, we describe the over-all structure of the participants’ summer experience, outlinesome of the training materials that we use, describe the moti-vations for our approach, and discuss the lessons that we havelearned after running the program for a number of years.

【Keywords】: REU; education

664. Creating Serious Robots That Improve Society.

Paper Link】 【Pages】:4781-4785

【Authors】: Susan P. Imberman ; Jean McManus ; Gina Otts

【Abstract】: The Grace Hopper conference has many lectures/activities for participants. Tech Node presentations at this conference are two hours and focus on encouraging open discussion around a topic. This "not so grand" challenge, originally created for this conference, requires participants to brainstorm a robot creation that could somehow improve society in one of four societal areas: Elder Care (non- medical), Search and Rescue, Environment, and Affordable Home Health Care. This project format also can be used as an unplugged activity for a CS0/CS1 class or as a more advanced project that employs image processing and AI techniques such as machine learning.

【Keywords】: Robotics activity; Unplugged

665. Recovering Concept Prerequisite Relations from University Course Dependencies.

Paper Link】 【Pages】:4786-4791

【Authors】: Chen Liang ; Jianbo Ye ; Zhaohui Wu ; Bart Pursel ; C. Lee Giles

【Abstract】: Prerequisite relations among concepts play an important role in many educational applications such as intelligent tutoring system and curriculum planning. With the increasing amount of educational data available, automatic discovery of concept prerequisite relations has become both an emerging research opportunity and an open challenge. Here, we investigate how to recover concept prerequisite relations from course dependencies and propose an optimization based framework to address the problem. We create the first real dataset for empirically studying this problem, which consists of the listings of computer science courses from 11 U.S. universities and their concept pairs with prerequisite labels. Experiment results on a synthetic dataset and the real course dataset both show that our method outperforms existing baselines.

【Keywords】: Concept prerequisites; Educational data mining;

666. Open-Ended Robotics Exploration Projects for Budding Researchers.

Paper Link】 【Pages】:4792-4797

【Authors】: David R. Musicant ; Abha Laddha ; Tom Choi

【Abstract】: There are many benefits to introducing students to the idea of doing projects where the outcome is unknown or unsure. Some have proposed that engaging students in research can help with retention of underrepresented groups. In this paper, we report on a particular approach we have used to introduce high school students to open-ended robotics projects in a three-week summer program. We describe the structure of our summer program, how we ramp the students up to speed, and we summarize the five open-ended "research" projects that the students work on. These projects can be adopted for open-ended work elsewhere by high school students or undergraduates.

【Keywords】: robotics; education; undergraduate research

667. Dude, Where's My Robot?: A Localization Challenge for Undergraduate Robotics.

Paper Link】 【Pages】:4798-4802

【Authors】: Paul Ruvolo

【Abstract】: I present a robotics localization challenge based on the inexpensive Neato XV robotic vacuum cleaner platform. The challenge teaches skills such as computational modeling, probabilistic inference, efficiency vs. accuracy tradeoffs, debugging, parameter tuning, and benchmarking of algorithmic performance. Rather than allowing students to pursue any localization algorithm of their choosing, here, I propose a challenge structured around the particle filter family of algorithms. This additional scaffolding allows students at all levels to successfully implement one approach to the challenge, while providing enough flexibility and richness to enable students to pursue their own creative ideas. Additionally, I provide infrastructure for automatic evaluation of systems through the collection of ground truth robot location data via ceiling-mounted location tags that are automatically scanned using an upward facing camera attached to the robot. The robot and supporting hardware can be purchased for under $400 dollars, and the challenge can even be run without any robots at all using a set of recorded sensor traces.

【Keywords】: Bayesian inference; approximation algorithms; robotics assignments

668. A Monte Carlo Localization Assignment Using a Neato Vacuum with ROS.

Paper Link】 【Pages】:4803-4805

【Authors】: Zuozhi Yang ; Todd W. Neller

【Abstract】: Monte Carlo Localization (MCL) is a sampling-based algorithm for mobile robot localization. In this paper we describe an MCL assignment and its required hardware and software. The Neato vacuum robot and a Raspberry Pi serve as the core of the robot model. The Robot Operating System (ROS) is used as the robot programming environment. Students are expected to learn the localization problem, implement the MCL algorithm, and better understand the kidnapped robot problem and the limitations of MCL by observing the performance of the algorithm in real-time application.

【Keywords】: robotics; localization; Monte Carlo localization; assignments

669. An Image Wherever You Look! Making Vision Just Another Sensor for AI/Robotics Projects.

Paper Link】 【Pages】:4806-4812

【Authors】: Andy Zhang ; John Lee ; Ciante Jones ; Zachary Dodds

【Abstract】: Visual sensing can be difficult to incorporate into undergraduate robotics and AI assignments. Images, after all, do not provide a direct estimate of the geometric conditions within the field of view. Yet vision is increasingly compelling as a part of undergraduate AI and robotics, given the centrality of pixels in our students' interactions with technology and each other. This paper shares a small-footprint framework designed to make visual sensing as easy to incorporate into AI projects and assignments, e.g., as a source of evidence for localization algorithms, as range sensors. The framework leverages (hand-built) circular panoramas and the image-matching capabilities provided by OpenCV's python library. An example localization project highlights its pedagogical accessibility and ease of deployment atop low-cost hardware and alongside other sensors.

【Keywords】: Robotics; Computer Vision; Image-based Localization

Educational Advances in Artificial Intelligence Symposium Poster Papers 4

670. Application for AI-OCR Module: Auto Detection of Emails/Letter Images.

Paper Link】 【Pages】:4813-4814

【Authors】: Kelsey Fargas ; Bingjie Zhou ; Elizabeth Staruk ; Sheila Tejada

【Abstract】: The purpose of this project is to provide instructions for teaching the Artificial Intelligence topic of supervised machine learning for the task of Optical Character Recognition (OCR) at various levels of a student’s undergraduate curriculum, such as basic knowledge, novice, and intermediate. The levels vary from beginner with a slight background in computing and computer science to intermediate with a better understanding of computer science fundamentals and algorithms.

【Keywords】: Education; OCR; machine learning

671. Exploring Artificial Intelligence Through Image Recognition.

Paper Link】 【Pages】:4815-4816

【Authors】: Kelsey Fargas ; Bingjie Zhou ; Elizabeth Staruk ; Sheila Tejada

【Abstract】: This demonstration showcases the different use cases of Artificial Intelligence (AI) in education by introducing students to applications of the Scribbler robot with the Fluke board in order to cultivate an interest in programming, robotics, and AI. The targeted audience for this is students aged eight through twelve. This demonstration uses three Scribbler robots to introduce students to common tools in AI (OpenCV and Tesseract), and teach them the basics of coding in an interactive, unintimidating way; by physically describing the goals of simple shape-building algorithms and implementing them using cards with both visual and written representations of the instructions.

【Keywords】: Robots; OCR; computer vision; machine learning; education

672. Online SPARC for Drawing and Animation.

Paper Link】 【Pages】:4817-4818

【Authors】: Elias Marcopoulos ; Maede Rayatidamavandi ; Crisel Suarez ; Yuanlin Zhang

【Abstract】: We developed a method to draw and animate using SPARC, a logic programming system, and an online environment to support this method.Particularly, we introduce two predicates: one for drawing and one for animation. By our method, programmers will write a SPARC program, using our introduced predicates, to specify their drawing or animation. The drawing or animation will then be rendered upon executing the program with our system. In fact, our online system provides an environment where the programmers can easily edit and execute their programs.

【Keywords】: Logic Programming; Answer Set Programming; Education

673. AI Projects for Computer Science Capstone Classes (Extended Abstract).

Paper Link】 【Pages】:4819-4821

【Authors】: Matthew E. Taylor ; Sakire Arslan Ay

【Abstract】: Capstone senior design projects provide students with a collaborative software design and development experience to reinforce learned material while allowing students latitude in developing real-world applications. Our two-semester capstone classes are required for all computer science majors. Students must have completed a software engineering course — capstone classes are typically taken during their last two semesters. Project proposals come from a variety of sources, including industry, WSU faculty (from our own and other departments), local agencies, and entrepreneurs. We have recently targeted projects in AI — although students typically have little background, they find the ideas and methods compelling. This paper outlines our instructional approach and reports our experiences with three projects.

【Keywords】: Undergraduate AI Projects; Capstone Course Projects

Educational Advances in Artificial Intelligence Symposium Model AI Assignments 1

674. Model AI Assignments 2017.

Paper Link】 【Pages】:4822-4825

【Authors】: Todd W. Neller ; Joshua Eckroth ; Sravana Reddy ; Joshua Ziegler ; Jason M. Bindewald ; Gilbert L. Peterson ; Thomas Way ; Paula Matuszek ; Lillian N. Cassel ; Mary-Angela Papalaskari ; Carol Weiss ; Ariel Anders ; Sertac Karaman

【Abstract】: The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning experience, we here present abstracts of six AI assignments from the 2017 session that are easily adoptable, playfully engaging, and flexible for a variety of instructor needs.

【Keywords】: artificial intelligence; education; assignments

Senior Member Blue Sky 5

675. The AI Rebellion: Changing the Narrative.

Paper Link】 【Pages】:4826-4830

【Authors】: David W. Aha ; Alexandra Coman

【Abstract】: Sci-fi narratives permeating the collective consciousness endow AI Rebellion with ample negative connotations. However, for AI agents, as for humans, attitudes of protest, objection, and rejection have many potential benefits in support of ethics, safety, self-actualization, solidarity, and social justice, and are necessary in a wide variety of contexts. We launch a conversation on constructive AI rebellion and describe a framework meant to support discussion, implementation, and deployment of AI Rebel Agents as protagonists of positive narratives.

【Keywords】: Intelligent Agents; Goal Reasoning; Rebel Agents

676. Moral Decision Making Frameworks for Artificial Intelligence.

Paper Link】 【Pages】:4831-4835

【Authors】: Vincent Conitzer ; Walter Sinnott-Armstrong ; Jana Schaich Borg ; Yuan Deng ; Max Kramer

【Abstract】: The generality of decision and game theory has enabled domain-independent progress in AI research. For example, a better algorithm for finding good policies in (PO)MDPs can be instantly used in a variety of applications. But such a general theory is lacking when it comes to moral decision making. For AI applications with a moral component, are we then forced to build systems based on many ad-hoc rules? In this paper we discuss possible ways to avoid this conclusion.

【Keywords】: moral AI; game theory; machine learning

677. Why Teaching Ethics to AI Practitioners Is Important.

Paper Link】 【Pages】:4836-4840

【Authors】: Judy Goldsmith ; Emanuelle Burton

【Abstract】: We argue that it is crucial to the future of AI that our students be trained in multiple complementary modes of ethical reasoning, so that they may make ethical design and implementation choices, ethical career decisions, and that their software will be programmed to take into account the complexities of acting ethically in the world.

【Keywords】: ethics, teaching, pedagogy, utilitarianism, deontology, virtue ethics

678. Strategic Social Network Analysis.

Paper Link】 【Pages】:4841-4845

【Authors】: Tomasz P. Michalak ; Talal Rahwan ; Michael Wooldridge

【Abstract】: How can individuals and communities protect their privacy against social network analysis tools? How do criminals or terrorists organizations evade detection by such tools? Under which conditions can these tools be made strategy proof? These fundamental questions have attracted little attention in the literature to date, as most social network analysis tools are built around the assumption that individuals or groups in a network do not act strategically to evade such tools. With this in mind, we outline in this paper a new paradigm for social network analysis, whereby the strategic behaviour of network actors is explicitly modeled. Addressing this research challenge has various implications. For instance, it may allow two individuals to keep their relationship secret or private. It may also allow members of an activist group to conceal their membership, or even conceal the existence of their group from authoritarian regimes. Furthermore, it may assist security agencies and counter terrorism units in understanding the strategies that covert organizations use to escape detection, and give rise to new strategy-proof countermeasures.

【Keywords】: social network analysis; game theory; counter-terrorism

679. Getting More Out of the Exposed Structure in Constraint Programming Models of Combinatorial Problems.

Paper Link】 【Pages】:4846-4851

【Authors】: Gilles Pesant

【Abstract】: To solve combinatorial problems, Constraint Programming builds high-level models that expose much of the structure of the problem. The distinctive driving force of Constraint Programming has been this direct access to problem structure. This has been key to the design of powerful filtering algorihms but we could do much more. Considering the set of solutions to each constraint as a multivariate discrete distribution opens the door to more structure-revealing computations that may significantly change this solving paradigm. As a result we could improve our ability to solve combinatorial problems and our understanding of the structure of practical problems.

【Keywords】:

Senior Member Summary Talks 7

680. A Selected Summary of AI for Computational Sustainability.

Paper Link】 【Pages】:4852-4857

【Authors】: Douglas H. Fisher

【Abstract】: This paper and summary talk broadly survey computational sustainability research. Rather than a detailed treatment of the research projects in the area, which is beyond the scope of the paper and talk, the paper includes a meta-survey, pointing to edited collections and overviews in the literature for the interested reader. Computational sustainability research has been broadly characterized by AI methods employed, sustainability areas addressed, and contributions made to (typically, human) decision-making. The paper addresses these characterizations as well, which will facilitate a deeper synthesis later, to include the potential for developing sophisticated and holistic AI decision-making and advisory agents.

【Keywords】:

681. Explaining Ourselves: Human-Aware Constraint Reasoning.

Paper Link】 【Pages】:4858-4862

【Authors】: Eugene C. Freuder

【Abstract】: Human-aware AI is increasingly important as AI becomes more powerful and ubiquitous. A good foundation for human-awareness should enable ourselves and our "AIs" to "explain ourselves" naturally to each other. Constraint reasoning offers particular opportunities and challenges in this regard. This paper takes note of the history of work in this area and encourages increased attention, laying out a rough research agenda.

【Keywords】: constraint programming; constraint satisfaction; explanation; human-aware artificial intelligence

682. Multi-Robot Allocation of Tasks with Temporal and Ordering Constraints.

Paper Link】 【Pages】:4863-4869

【Authors】: Maria L. Gini

【Abstract】: Task allocation is ubiquitous in computer science and robotics, yet some problems have received limited attention in the computer science and AI community. Specifically, we will focus on multi-robot task allocation problems when tasks have time windows or ordering constraints. We will outline the main lines ofresearch and open problems.

【Keywords】: ,multi-robot coordination; task allocation; temporal constraints

683. Progress and Challenges in Research on Cognitive Architectures.

Paper Link】 【Pages】:4870-4876

【Authors】: Pat Langley

【Abstract】: Research on cognitive architectures attempts to develop unified theories of the mind. This paradigm incorporates many ideas from other parts of AI, but it differs enough in its aims and methods that it merits separate treatment. In this paper, we review the notion of cognitive architectures and some recurring themes in their study. Next we examine the substantial progress made by the subfield over the past 40 years, after which we turn to some topics that have received little attention and that pose challenges for the research community.

【Keywords】: Cognitive architectures; Unified theories

684. Machine Learning for Entity Coreference Resolution: A Retrospective Look at Two Decades of Research.

Paper Link】 【Pages】:4877-4884

【Authors】: Vincent Ng

【Abstract】: Though extensively investigated since the 1960s, entity coreference resolution, a core task in natural language understanding, is far from being solved. Nevertheless, significant progress has been made on learning-based coreference research since its inception two decades ago. This paper provides an overview of the major milestones made in learning-based coreference research and discusses a hard entity coreference task, the Winograd Schema Challenge, which has recently received a lot of attention in the AI community.

【Keywords】: natural language processing; text mining; coreference resolution

685. Incidental Supervision: Moving beyond Supervised Learning.

Paper Link】 【Pages】:4885-4890

【Authors】: Dan Roth

【Abstract】: Machine Learning and Inference methods have become ubiquitous in our attempt to induce more abstract representations of natural language text, visual scenes, and other messy, naturally occurring data, and support decisions that depend on it. However, learning models for these tasks is difficult partly because generating the necessary supervision signals for it is costly and does not scale. This paper describes several learning paradigms that are designed to alleviate the supervision bottleneck. It will illustrate their benefit in the context of multiple problems, all pertaining to inducing various levels of semantic representations from text. In particular, we discuss (i) esponse Driven Learning of models, a learning protocol that supports inducing meaning representations simply by observing the model's behavior in its environment, (ii) the exploitation of Incidental Supervision signals that exist in the data, independently of the task at hand, to learn models that identify and classify semantic predicates, and (iii) the use of weak supervision to combine simple models to support global decisions where joint supervision is not available.

【Keywords】:

686. Latent Tree Analysis.

Paper Link】 【Pages】:4891-4898

【Authors】: Nevin L. Zhang ; Leonard K. M. Poon

【Abstract】: Latent tree analysis seeks to model the correlations amonga set of random variables using a tree of latent variables. It was proposed as an improvement to latent class analysis—a method widely used in social sciences and medicine to identify homogeneous subgroups in a population. It provides new and fruitful perspectives on a number of machine learningareas, including cluster analysis, topic detection, and deep probabilistic modeling. This paper gives an overview of the research on latent tree analysis and various ways it is used inpractice.

【Keywords】: Latent tree models; clustering; topic detection

Student Abstracts 64

687. Improving Performance of Analogue Readout Layers for Photonic Reservoir Computers with Online Learning.

Paper Link】 【Pages】:4899-4900

【Authors】: Piotr Antonik ; Marc Haelterman ; Serge Massar

【Abstract】: Reservoir Computing is a bio-inspired computing paradigm for processing time-dependent signals (Jaeger and Haas 2004; Maass, Natschläger, and Markram 2002). The performance of its hardware implementation (see e.g. (Soriano et al. 2015) for a review) is comparable to state-of-the-art digital algorithms on a series of benchmark tasks.The major bottleneck of these implementation is the readout layer, based on slow offline post-processing. Several analogue solutions have been proposed (Smerieri et al. 2012; Duport et al. 2016; Vinckier et al. 2016), but all suffered from noticeable decrease in performance due to added complexity of the setup. Here we propose the online learning approach to solve these issues. We present an experimental reservoir computer with a simple analogue readout layer, based on previous works, and show numerically that online learning allows to disregard the added complexity of an analogue layer and obtain the same level of performance as with a digital layer. This work thus demonstrates that online training allows building high-performance fully-analogue reservoir computers, and represents an important step towards experimental validation of the proposed solution.

【Keywords】: Reservoir computing; neuromorphic hardware; opto-electronics; analogue readout; online learning

688. Chaotic Time Series Prediction Using a Photonic Reservoir Computer with Output Feedback.

Paper Link】 【Pages】:4901-4902

【Authors】: Piotr Antonik ; Michiel Hermans ; Marc Haelterman ; Serge Massar

【Abstract】: Reservoir Computing is a bio-inspired computing paradigm for processing time dependent signals (Jaeger andHaas 2004; Maass, Natschläger, and Markram 2002). It canbe easily implemented in hardware. The performance ofthese analogue devices matches digital algorithms on a series of benchmark tasks (see e.g. (Soriano et al. 2015) fora review). Their capacities could be extended by feedingthe output signal back into the reservoir, which would allow them to be applied to various signal generation tasks(Antonik et al. 2016b). In practice, this requires a high-speed readout layer for real-time output computation. Herewe achieve this by means of a field-programmable gate array (FPGA), and demonstrate the first photonic reservoircomputer with output feedback. We test our setup on theMackey-Glass chaotic time series generation task and obtain interesting prediction horizons, comparable to numerical simulations, with ample room for further improvement.Our work thus demonstrates the potential offered by the output feedback and opens a new area of novel applications forphotonic reservoir computing.

【Keywords】: Reservoir computing; neuromorphic hardware; opto-electronics; chaos emulation; time series prediction

Paper Link】 【Pages】:4903-4904

【Authors】: Masataro Asai ; Alex Fukunaga

【Abstract】: Recent enhancements to greedy best-first search (GBFS) improve performance by occasionally adopting a non-greedy node expansion policy, resulting in more exploratory behavior. However, previous exploratory mechanisms do not address exploration within the space sharing the same heuristic estimate (plateau) and the search bias in a breadth direction. In this abstract, we briefly describe two modes of exploration (diversification), which work inter-(across) and intra-(within) plateau, and also introduce IP-diversification, a method combining Minimum Spanning Tree and randomization, which addresses “breadth”-bias instead of the “depth”-bias addressed by the existing methods.

【Keywords】: Planning; Heuristic Search; Diversified Search; Exploration

690. Frame-Based Ontology Alignment.

Paper Link】 【Pages】:4905-4906

【Authors】: Luigi Asprino ; Valentina Presutti ; Aldo Gangemi ; Paolo Ciancarini

【Abstract】: The need of handling semantic heterogeneity of resources is a key problem of the Semantic Web. State of the art techniques for ontology matching are the key technology for addressing this issue. However, they only partially exploit the natural lan- guage descriptions of ontology entities and they are mostly unable to find correspondences between entities having dif- ferent logical types (e.g. mapping properties to classes). We introduce a novel approach aimed at finding correspondences between ontology entities according to the intensional mean- ing of their models, hence abstracting from their logical types. Lexical linked open data and frame semantics play a crucial role in this proposal. We argue that this approach may lead to a step ahead in the state of the art of ontology matching, and positively affect related applications such as question an- swering and knowledge reconciliation.

【Keywords】:

691. Learning Options in Multiobjective Reinforcement Learning.

Paper Link】 【Pages】:4907-4908

【Authors】: Rodrigo Cesar Bonini ; Felipe Leno da Silva ; Anna Helena Reali Costa

【Abstract】: Reinforcement Learning (RL) is a successful technique to train autonomous agents. However, the classical RL methods take a long time to learn how to solve tasks. Option-based solutions can be used to accelerate learning and transfer learned behaviors across tasks by encapsulating a partial policy into an action. However, the literature report only single-agent and single-objective option-based methods, but many RL tasks, especially real-world problems, are better described through multiple objectives. We here propose a method to learn options in Multiobjective Reinforcement Learning domains in order to accelerate learning and reuse knowledge across tasks. Our initial experiments in the Goldmine Domain show that our proposal learn useful options that accelerate learning in multiobjective domains. Our next steps are to use the learned options to transfer knowledge across tasks and evaluate this method with stochastic policies.

【Keywords】: Reinforcement Learning; Options; Multiobjective Reinforcement Learning

692. Towards User Personality Profiling from Multiple Social Networks.

Paper Link】 【Pages】:4909-4910

【Authors】: Kseniya Buraya ; Aleksandr Farseev ; Andrey Filchenkov ; Tat-Seng Chua

【Abstract】: The exponential growth of online social networks has inspired us to tackle the problem of individual user attributes inference from the Big Data perspective. It is well known that various social media networks exhibit different aspects of user interactions, and thus represent users from diverse points of view. In this preliminary study, we make the first step towards solving the significant problem of personality profiling from multiple social networks. Specifically, we tackle the task of relationship prediction, which is closely related to our desired problem. Experimental results show that the incorporation of multi-source data helps to achieve better prediction performance as compared to single-source baselines.

【Keywords】: social networks; user profiling; user personality

693. Semantic Inference of Bird Songs Using Dynamic Bayesian Networks.

Paper Link】 【Pages】:4911-4912

【Authors】: Keisuke Daimon ; Richard W. Hedley ; Charles E. Taylor

【Abstract】: Knowledge representation and natural language processing are core interests to the field of artificial intelligence (AI). While most research has been directed toward machines and humans, the principles and methods developed for AI might be extended to other species as well. Birds frequently behave in a manner that is intelligent and convey information in their vocalizations that is meaningful to others. In this paper we report on a method combining clustering and dynamic Bayesian networks to describe the semantics of songs among Cassin’s Vireos (Vireo cassinii), and show how behavioral contexts possibly affect bird song output.

【Keywords】: Dynamic Bayesian network; Semantic inference; Cassin's Vireo; Bird song

694. An Advising Framework for Multiagent Reinforcement Learning Systems.

Paper Link】 【Pages】:4913-4914

【Authors】: Felipe Leno da Silva ; Ruben Glatt ; Anna Helena Reali Costa

【Abstract】: Reinforcement Learning has long been employed to solve sequential decision-making problems with minimal input data. However, the classical approach requires a long time to learn a suitable policy, especially in Multiagent Systems. The teacher-student framework proposes to mitigate this problem by integrating an advising procedure in the learning process, in which an experienced agent (human or not) can advise a student to guide her exploration. However, the teacher is assumed to be an expert in the learning task. We here propose an advising framework where multiple agents advise each other while learning in a shared environment, and the advisor is not expected to necessarily act optimally. Our experiments in a simulated Robot Soccer environment show that the learning process is improved by incorporating this kind of advice.

【Keywords】: Markov Decision Processes; Multiagent Learning; Multiagent Systems; Reinforcement Learning

695. Android Malware Detection with Weak Ground Truth Data.

Paper Link】 【Pages】:4915-4916

【Authors】: Jordan DeLoach ; Doina Caragea ; Xinming Ou

【Abstract】: For Android malware detection, precise ground truth is a rare commodity. As security knowledge evolves, what may be considered ground truth at one moment in time may change, and apps once considered benign may turn out to be malicious. The inevitable noise in data labels poses a challenge to inferring effective machine learning classifiers. Our work is focused on approaches for learning classifiers for Android malware detection in a manner that is methodologically sound with regard to the uncertain and ever-changing ground truth in the problem space. We leverage the fact that although data labels are unavoidably noisy, a malware label is much more precise than a benign label. While you can be confident that an app is malicious, you can never be certain that a benign app is really benign, or just undetected malware. Based on this insight, we leverage a modified Logistic Regression classifier that allows us to learn from only positive and unlabeled data, without making any assumptions about benign labels. We find Label Regularized Logistic Regression to perform well for noisy app datasets, as well as datasets where there is a limited amount of positive labeled data, both of which are representative of real-world situations.

【Keywords】: Android Malware; Malware Detection; Semi-supervised Learning

696. Discovering Conversational Dependencies between Messages in Dialogs.

Paper Link】 【Pages】:4917-4918

【Authors】: Wenchao Du ; Pascal Poupart ; Wei Xu

【Abstract】: We investigate the task of inferring conversational dependencies between messages in one-on-one online chat, which has become one of the most popular forms of customer service. We propose a novel probabilistic classifier that leverages conversational, lexical and semantic information. The approach is evaluated empirically on a set of customer service chat logs from a Chinese e-commerce website. It outperforms heuristic baselines.

【Keywords】: NLP; Discourse; Dialog

697. Coordinating Human and Agent Behavior in Collective-Risk Scenarios.

Paper Link】 【Pages】:4919-4920

【Authors】: Elias Fernández Domingos ; Juan-Carlos Burguillo ; Ann Nowé ; Tom Lenaerts

【Abstract】: Various social situations entail a collective risk. A well-known example is climate change, wherein the risk of a future environmental disaster clashes with the immediate economic interest of developed and developing countries. The collective-risk game operationalizes this kind of situations. The decision process of the participants is determined by how good they are in evaluating the probability of future risk as well as their ability to anticipate the actions of the opponents. Anticipatory behavior contrasts with the reactive theories often used to analyze social dilemmas. Our initial work can already show that anticipative agents are a better model to human behavior than reactive ones. All the agents we studied used a recurrent neural network, however, only the ones that used it to predict future outcomes (anticipative agents) were able to account for changes in the context of games, a behavior also observed in experiments with humans. This extended abstract aims to explain how we wish to investigate anticipation within the context of the collective-risk game and the relevance these results may have for the field of hybrid socio-technical systems.

【Keywords】: human behavior; collective-risk game; recurrent neural network; anticipative agent; reactive agent; hybrid systems

698. The Complexity of Succinct Elections.

Paper Link】 【Pages】:4921-4922

【Authors】: Zack Fitzsimmons ; Edith Hemaspaandra

【Abstract】: The computational study of elections generally assumes that the preferences of the electorate come in as a list of votes. Depending on the context, it may be much more natural to represent the preferences of the electorate succinctly, as the distinct votes and their counts. Though the succinct representation may be exponentially smaller than the nonsuccinct, we find only one natural case where the complexity increases, in sharp contrast to the case where each voter has a weight, where the complexity usually increases.

【Keywords】: elections; complexity; manipulative actions; succinct representation

699. A Position-Biased PageRank Algorithm for Keyphrase Extraction.

Paper Link】 【Pages】:4923-4924

【Authors】: Corina Florescu ; Cornelia Caragea

【Abstract】: Given the large amounts of online textual documents available these days, e.g., news articles and scientific papers, effective methods for extracting keyphrases, which provide a high-level topic description of a document, are greatly needed.We propose PositionRank, an unsupervised graph-based approach to keyphrase extraction that incorporates information from all positions of a word's occurrences into a biased PageRank to extract keyphrases. Our model obtains remarkable improvements in performance over strong baselines.

【Keywords】: keyphrase extraction; biased-PageRank; position information

700. Robust Stable Marriage.

Paper Link】 【Pages】:4925-4926

【Authors】: Begum Genc ; Mohamed Siala ; Barry O'Sullivan ; Gilles Simonin

【Abstract】: Stable Marriage (SM) is a well-known matching problem, where the aim is to match a set of men and women. The resulting matching must satisfy two properties: there is no unassigned person and there are no other assignments where two people of opposite gender prefer each other to their current assignments. We propose a new version of SM called as Robust Stable Marriage (RSM) by combining stability and robustness. We define robustness by introducing (a,b)-supermatches, which has been inspired by (a,b)-supermodels. An (a,b)-supermatch is a stable matching, where if at most a pairs want to break up, it is possible to find another stable matching by breaking at most b other pairs.

【Keywords】: Stable Marriage, Matching Under Preferences, Optimisation

701. Handwriting Profiling Using Generative Adversarial Networks.

Paper Link】 【Pages】:4927-4928

【Authors】: Arna Ghosh ; Biswarup Bhattacharya ; Somnath Basu Roy Chowdhury

【Abstract】: Handwriting is a skill learned by humans from a very early age. The ability to develop one’s own unique handwriting as well as mimic another person’s handwriting is a task learned by the brain with practice. This paper deals with this very problem where an intelligent system tries to learn the handwriting of an entity using Generative Adversarial Networks (GANs). We propose a modified architecture of DCGAN (Radford, Metz, and Chintala 2015) to achieve this. We also discuss about applying reinforcement learning techniques to achieve faster learning. Our algorithm hopes to give new insights in this area and its uses include identification of forged documents, signature verification, computer generated art, digitization of documents among others. Our early implementation of the algorithm illustrates a good performance with MNIST datasets.

【Keywords】: generative adversarial networks; handwriting generation; reinforcement learning; artificial vision

702. Policy Reuse in Deep Reinforcement Learning.

Paper Link】 【Pages】:4929-4930

【Authors】: Ruben Glatt ; Anna Helena Reali Costa

【Abstract】: Driven by recent developments in Artificial Intelligence research, a promising new technology for building intelligent agents has evolved. The approach is termed Deep Reinforcement Learning and combines the classic field of Reinforcement Learning (RL) with the representational power of modern Deep Learning approaches. It is very well suited for single task learning but needs a long time to learn any new task. To speed up this process, we propose to extend the concept to multi-task learning by adapting Policy Reuse, a Transfer Learning approach from classic RL, to use with Deep Q-Networks.

【Keywords】: Reinforcement Learning; Deep Learning; Transfer Learning; Artificial Intelligence

703. Grounded Action Transformation for Robot Learning in Simulation.

Paper Link】 【Pages】:4931-4932

【Authors】: Josiah P. Hanna ; Peter Stone

【Abstract】: Robot learning in simulation is a promising alternative to the prohibitive sample cost of learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the physical robot. This paper proposes a new algorithm for learning in simulation — Grounded Action Transformation — and applies it to learning of humanoid bipedal locomotion. Our approach results in a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk.

【Keywords】: Grounded simulation learning; robot learning; simulation

704. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation.

Paper Link】 【Pages】:4933-4934

【Authors】: Josiah P. Hanna ; Peter Stone ; Scott Niekum

【Abstract】: In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data. We empirically evaluate the proposed methods in a standard policy evaluation tasks.

【Keywords】: high confidence off-policy evaluation; model-based reinforcement learning; bootstrapping

705. Fast Electrical Demand Optimization Under Real-Time Pricing.

Paper Link】 【Pages】:4935-4936

【Authors】: Shan He ; Mark Wallace ; Campbell Wilson ; Ariel Liebman

【Abstract】: The introduction of smart meters has motivated the electricity industry to manage electrical demand, using dynamic pricing schemes such as real-time pricing. The overall aim of demand management is to minimize electricity generation and distribution costs while meeting the demands and preferences of consumers. However, rapidly scheduling consumption of large groups of households is a challenge. In this paper, we present a highly scalable approach to find the optimal consumption levels for households in an iterative and distributed manner. The complexity of this approach is independent of the number of households, which allows it to be applied to problems with large groups of households. Moreover, the intermediate results of this approach can be used by smart meters to schedule tasks with a simple randomized method.

【Keywords】:

706. SReN: Shape Regression Network for Comic Storyboard Extraction.

Paper Link】 【Pages】:4937-4938

【Authors】: Zheqi He ; Yafeng Zhou ; Yongtao Wang ; Zhi Tang

【Abstract】: The goal of storyboard extraction is to decompose the comic image into several storyboards(or frames), which is the fundamental step of comic image understanding and producing digital comic documents suitable for mobile reading. Most of existing approaches are based on hand crafted low-level visual patters like edge segments and line segments, which do not capture high-level vision. To overcome shortcomings of the existing approaches, we propose a novel architecture based on deep convolutional neural network, namely Shape Regression Network(SReN), to detect storyboards within comic images. Firstly, we use Fast R-CNN to generate rectangle bounding boxes as storyboard proposals. Then we train a deep neural network to predict quadrangles for these propos- als. Unlike existing object detection methods which only output rectangle bounding boxes, SReN can produce more precise quadrangle bounding boxes. Experimental results, evaluating on 7382 comic pages, demonstrate that SReN outperforms the state-of-the-art methods by more than 10% in terms of F1-score and page correction rate.

【Keywords】: Regression CNN; Comic; Deep learning

Paper Link】 【Pages】:4939-4940

【Authors】: Xingchang Huang ; Yanghui Rao ; Haoran Xie ; Tak-Lam Wong ; Fu Lee Wang

【Abstract】: Cross-domain sentiment classification aims to tag sentiments for a target domain by labeled data from a source domain. Due to the difference between domains, the accuracy of a trained classifier may be very low. In this paper, we propose a boosting-based learning framework named TR-TrAdaBoost for cross-domain sentiment classification. We firstly explore the topic distribution of documents, and then combine it with the unigram TrAdaBoost. The topic distribution captures the domain information of documents, which is valuable for cross-domain sentiment classification. Experimental results indicate that TR-TrAdaBoost represents documents well and boost the performance and robustness of TrAdaBoost.

【Keywords】: Sentiment Classification; Transfer Learning; Topic Modeling

708. A Deep Learning Approach for Arabic Caption Generation Using Roots-Words.

Paper Link】 【Pages】:4941-4942

【Authors】: Vasu Jindal

【Abstract】: Automatic caption generation is a key research field in the machine learning community. However, most of the current research is performed on English caption generation ignoring other languages like Arabic and Persian. In this paper, we propose a novel technique leveraging the heavy influence of root words in Arabic to automatically generate captions in Arabic. Fragments of the images are associated with root words and deep belief network pre-trained using Restricted Boltzmann Machines are used to extract words associated with image. Finally, dependency tree relations are used to generate sentence-captions by using the dependency on root words. Our approach is robust and attains BLEU-1 score of 34.8.

【Keywords】: computer vision, machine learning, deep learning, image caption

709. Learning to Avoid Dominated Action Sequences in Planning for Black-Box Domains.

Paper Link】 【Pages】:4943-4944

【Authors】: Yuu Jinnai ; Alex Fukunaga

【Abstract】: Black-box domains where the successor states generated by applying an action are generated by a completely opaque simulator pose a challenge for domain-independent planning. The main computational bottleneck in search-based planning for such domains is the number of calls to the black-box simulation. We propose a method for significantly reducing the number of calls to the simulator by the search algorithm by detecting and pruning sequences of actions which are dominated by others. We apply our pruning method to Iterated Width and breadth-first search in domain-independent black-box planning for Atari 2600 games, adding our pruning method significantly improves upon the baseline algorithms.

【Keywords】: Black-box Planning; Online Search; Arcade Learning Environment

710. Kernelized Evolutionary Distance Metric Learning for Semi-Supervised Clustering.

Paper Link】 【Pages】:4945-4946

【Authors】: Wasin Kalintha ; Satoshi Ono ; Masayuki Numao ; Ken-ichi Fukui

【Abstract】: Many research studies on distance metric learning (DML) reiterate that the definition of distance between two data points substantially affects clustering tasks. Recently, variety of DML methods have been proposed to improve the accuracy of clustering by learning a distance metric; however, most of them only perform a linear transformation, which yields insignificant to non-linear separable data. This study proposes a DML method which provides an integration of kernelization technique with Mahalanobis-based DML. Thus, non-linear transformation of the distance metric can be performed. Moreover, a cluster validity index is optimized by an evolutionary algorithm. The empirical results on semi-supervised clustering suggest the promising result on both synthetic and real-world data set.

【Keywords】: Kernelized; Distance Metric Learning; Semi-supervised clustering; Differential evolution

711. Redesigning Stochastic Environments for Maximized Utility.

Paper Link】 【Pages】:4947-4948

【Authors】: Sarah Keren ; Avigdor Gal ; Erez Karpas ; Luis Enrique Pineda ; Shlomo Zilberstein

【Abstract】: ​We present the Utility Maximizing Design (UMD) model​ for optimally redesigning stochastic environments to achieve maximized performance. This model suits well contemporary ​​applications that involve the design of environments where robots and humans co-exist an co-operate, e.g., vacuum cleaning robot. We discuss two special cases of the UMD model. The first is the equi-reward UMD (ER-UMD)​ ​in which the agents and the system share a utility function, such as for the vacuum cleaning robot. The second is the goal​ ​recognition design (GRD) setting, discussed in the literature, in which system and agent utilities are independent. To find the set of optimal​​ modifications to apply to a UMD model, we propose the use of heuristic search, extending previous methods used for GRD settings. After specifying the conditions for optimality in the​ general case, we present an admissible heuristic for the ER-UMD case. We also present a novel compilation that embeds​ the redesign process into a planning problem, allowing use of any off-the-shelf solver to find the best way to modify an environment when a design budget is specified. Our evaluation shows the feasibility of the approach using standard bench​​marks from the probabilistic planning competition.​

【Keywords】: Probabilistic planning; Markov Decision Process; Goal recognition; Compilation to planning

712. Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes.

Paper Link】 【Pages】:4949-4950

【Authors】: Taylor W. Killian ; George Konidaris ; Finale Doshi-Velez

【Abstract】: An intriguing application of transfer learning emerges when tasks arise with similar, but not identical, dynamics. Hidden Parameter Markov Decision Processes (HiP-MDP) embed these tasks into a low-dimensional space; given the embedding parameters one can identify the MDP for a particular task. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an arduous training procedure. In this work, we apply a Gaussian Process latent variable model to jointly model the dynamics and the embedding, leading to a more elegant formulation, one that allows for better uncertainty quantification and thus more robust transfer.

【Keywords】: Reinforcement Learning; Transfer Learning; Latent Variable Models; Gaussian Process Dynamical Model

713. Wikitop: Using Wikipedia Category Network to Generate Topic Trees.

Paper Link】 【Pages】:4951-4952

【Authors】: Saravana Kumar ; Prasath Rengarajan ; Arockia Xavier Annie

【Abstract】: Automated topic identification is an essential component invarious information retrieval and knowledge representationtasks such as automated summary generation, categorization search and document indexing. In this paper, we present the Wikitop system to automatically generate topic trees from the input text by performing hierarchical classification using the Wikipedia Category Network (WCN). Our preliminary results over a collection of 125 articles are encouraging and show potential of a robust methodology for automated topic tree generation.

【Keywords】: text categorization; topic identification; SVM classifiers

714. Predicting Mortality of Intensive Care Patients via Learning about Hazard.

Paper Link】 【Pages】:4953-4954

【Authors】: Dae Hyun Lee ; Eric Horvitz

【Abstract】: Patients in intensive care units (ICU) are acutely ill and have the highest mortality rates for hospitalized patients. Predictive models and planning system could forecast and guide interventions to prevent the hazardous deterioration of patients’ physiologies, thereby giving the opportunity of employing machine learning and inference to assist with the care of ICU patients. We report on the construction of a prediction pipeline that estimates the probability of death by inferring rates of hazard over time, based on patients’ physiological measurements. The inferred model provided the contribution of each variable and information about the influence of sets of observations on the overall risks and expected trajectories of patients.

【Keywords】: Critical Care; Predictive modeling; Risk Trajectory; healthcare and medicine; Patient Mortality

Paper Link】 【Pages】:4955-4956

【Authors】: Xiaoming Li ; Hui Fang ; Jie Zhang

【Abstract】: We rethink the link prediction problem in signed social networks by also considering "no-relation" as a future status of a node pair, rather than simply distinguishing positive and negative links proposed in the literature. To understand the underlying mechanism of link formation in signed networks, we propose a feature framework on the basis of a thorough exploration of potential features for the newly identified problem. Grounded on the framework, we also design a trinary classification model, and experimental results show that our method outperforms the state-of-the-art approaches.

【Keywords】:

716. ATSUM: Extracting Attractive Summaries for News Propagation on Microblogs.

Paper Link】 【Pages】:4957-4958

【Authors】: Fang Liu ; Xiaojun Wan

【Abstract】: In this paper, we investigate how to automatically extract attractive summaries for news propagation on microblogs and propose a novel system called ATSUM to achieve this goal via text attractiveness analysis. It first analyzes the sentences in a news article and automatically predict the attractiveness score of each sentence by using the support vector regression method. The predicted attractiveness scores are then incorporated into a summarization system. Experimental results on a manually labeled dataset verify the effectiveness of the proposed methods.

【Keywords】: document summarization; ATSUM

717. A Systematic Practice of Judging the Success of a Robotic Grasp Using Convolutional Neural Network.

Paper Link】 【Pages】:4959-4960

【Authors】: Hengshuang Liu ; Pengcheng Ai ; Junling Chen

【Abstract】: In this abstract, we present a novel method using the deep convolutional neural network combined with traditional mechanical control techniques to solve the problem of determining whether a robotic grasp is successful or not. To finish the task, we construct a data acquisition platform capable of robot arm grasping and photo capturing, and collect a diversity of pictures by adjusting the shape and posture of the objects and controlling the robot arm to move randomly. For the purpose of validating the generalization capability, we adopt a stochastic sampling method based on cross validation to test our model. The experiment shows that, with an increasing number of shapes of objects involved in training, the network can identify new samples in a more accurate and steadier way. The accuracy rises from 89.2% when we use only one category of shape for training to above 99.7% when we use 17 categories for training.

【Keywords】: Robotic grasping; Convolutional neural network; Generalization capability

718. Neuron Learning Machine for Representation Learning.

Paper Link】 【Pages】:4961-4962

【Authors】: Jia Liu ; Maoguo Gong ; Qiguang Miao

【Abstract】: This paper presents a novel neuron learning machine (NLM) which can extract hierarchical features from data. We focus on the single-layer neural network architecture and propose to model the network based on the Hebbian learning rule. Hebbian learning rule describes how synaptic weight changes with the activations of presynaptic and postsynaptic neurons. We model the learning rule as the objective function by considering the simplicity of the network and stability of solutions. We make a hypothesis and introduce a correlation based constraint according to the hypothesis. We find that this biologically inspired model has the ability of learning useful features from the perspectives of retaining abstract information. NLM can also be stacked to learn hierarchical features and reformulated into convolutional version to extract features from 2-dimensional data.

【Keywords】: computational intelligence; representation learning

719. Community-Based Question Answering via Contextual Ranking Metric Network Learning.

Paper Link】 【Pages】:4963-4964

【Authors】: Hanqing Lu ; Ming Kong

【Abstract】: The exponential growth of information on Community-based Question Answering (CQA) sites has raised the challenges for the accurate matching of high-quality answers to the given questions. Many existing approaches learn the matching model mainly based on the semantic similarity between questions and answers, which can not effectively handle the ambiguity problem of questions and the sparsity problem of CQA data. In this paper, we propose to solve these two problems by exploiting users' social contexts. Specifically, we propose a novel framework for CQA task by exploiting both the question-answer content in CQA site and users' social contexts. The experiment on real-world dataset shows the effectiveness of our method.

【Keywords】: Question-answering; Network Learning; LSTM

720. Auto-Annotation of 3D Objects via ImageNet.

Paper Link】 【Pages】:4965-4966

【Authors】: Huan Luo ; Cheng Wang ; Jonathan Li

【Abstract】: Automatic annotation of 3D objects in cluttered scenes shows its great importance to a variety of applications. Nowadays, 3D point clouds, a new 3D representation of real-world objects, can be easily and rapidly collected by mobile LiDAR systems, e.g. RIEGL VMX-450 system. Moreover, the mobile LiDAR system can also provide a series of consecutive multi-view images which are calibrated with 3D point clouds. This paper proposes to automatically annotate 3D objects of interest in point clouds of road scenes by exploiting a multitude of annotated images in image databases, such as LabelMe and ImageNet. In the proposed method, an object detector trained on the annotated images is used to locate the object regions in acquired multi-view images. Then, based on the correspondences between multi-view images and 3D point clouds, a probabilistic graphical model is used to model the temporal, spatial and geometric constraints to extract the 3D objects automatically. A new dataset was built for evaluation and the experimental results demonstrate a satisfied performance on 3D object extraction.

【Keywords】:

721. Semantic Interpretation of Social Network Communities.

Paper Link】 【Pages】:4967-4968

【Authors】: Tushar Maheshwari ; Aishwarya N. Reganti ; Upendra Kumar ; Tanmoy Chakraborty ; Amitava Das

【Abstract】: A community in a social network is considered to be a group of nodes densely connected internally and sparsely connected externally.Although previous work intensely studied network topology within a community, its semantic interpretation is hardly understood. In this paper, we attempt to understand whether individuals in a community possess similar Personalities, Values and Ethical background. Finally, we show that Personality and Values models could be used as features to discover more accurate community structure compared to the one obtained from only network information.

【Keywords】: Values and Ethics; Community Detection; Social Media

722. Extreme Gradient Boosting and Behavioral Biometrics.

Paper Link】 【Pages】:4969-4970

【Authors】: Benjamin Manning

【Abstract】: As insider hacks become more prevalent it is becoming more useful to identify valid users from the inside of a system rather than from the usual external entry points where exploits are used to gain entry. One of the main goals of this study was to ascertain how well Gradient Boosting could be used for prediction or, in this case, classification or identification of a specific user through the learning of HCI-based behavioral biometrics. If applicable, this procedure could be used to verify users after they have gained entry into a protected system using data that is as human-centric as other biometrics, but less invasive. For this study an Extreme Gradient Boosting algorithm was used for training and testing on a dataset containing keystroke dynamics information. This specific algorithm was chosen because the majority of current research utilizes mainstream methods such as KNN and SVM and the hypothesis of this study was centered on the potential applicability of ensemble related decision or model trees. The final predictive model produced an accuracy of 0.941 with a Kappa value of 0.942 demonstrating that HCI-based behavioral biometrics in the form of keystroke dynamics can be used to identify the users of a system.

【Keywords】: Algorithm, ensemble, security, authentication

723. Plan Recognition Design.

Paper Link】 【Pages】:4971-4972

【Authors】: Reuth Mirsky ; Roni Stern ; Ya'akov (Kobi) Gal ; Meir Kalech

【Abstract】: Goal Recognition Design (GRD) is the problem of designing a domain in a way that will allow easy identification of agents' goals. This work extends the original GRD problem to the Plan Recognition Design (PRD) problem which is the task of designing a domain using plan libraries in order to facilitate fast identification of an agent's plan. While GRD can help to explain faster which goal the agent is trying to achieve, PRD can help in faster understanding of how the agent is going to achieve its goal. We define a new measure that quantifies the worst-case distinctiveness of a given planning domain, propose a method to reduce it in a given domain and show the reduction of this new measure in three domains from the literature.

【Keywords】: Plan Recognition; Recognition Design; Domain Compilation;

724. Automatically Extracting Axioms in Classical Planning.

Paper Link】 【Pages】:4973-4974

【Authors】: Shuwa Miura ; Alex Fukunaga

【Abstract】: Axioms can be used to model derived predicates in domain-independent planning models. Formulating models which use axioms can sometimes result in problems with much smaller search spaces than the original model. We propose a method for automatically extracting a particular class of axioms from standard STRIPS PDDL models. More specifically, we identify operators whose effects become irrelevant given some other operator, and generate axioms that capture this relationship. We show that this algorithm can be used to successfully extract axioms from standard IPC benchmark instances, and show that the extracted axioms can be used to significantly improve the performance of an IP-based planner.

【Keywords】:

725. SEAPoT-RL: Selective Exploration Algorithm for Policy Transfer in RL.

Paper Link】 【Pages】:4975-4976

【Authors】: Akshay Narayan ; Zhuoru Li ; Tze-Yun Leong

【Abstract】: We propose a new method for transferring a policy from a source task to a target task in model-based reinforcement learning. Our work is motivated by scenarios where a robotic agent operates in similar but challenging environments, such as hospital wards, differentiated by structural arrangements or obstacles, such as furniture. We address problems that require fast responses adapted from incomplete, prior knowledge of the agent in new scenarios. We present an efficient selective exploration strategy that maximally reuses the source task policy. Reuse efficiency is effected through identifying sub-spaces that are different in the target environment, thus limiting the exploration needed in the target task. We empirically show that SEAPoT performs better in terms of jump starts and cumulative average rewards, as compared to existing state-of-the-art policy reuse methods.

【Keywords】: transfer learning; policy transfer

726. Coalition Structure Generation Utilizing Graphical Representation of Partition Function Games.

Paper Link】 【Pages】:4977-4978

【Authors】: Kazuki Nomoto ; Yuko Sakurai ; Makoto Yokoo

【Abstract】: Forming effective coalition is a central research challenge in AI and multi-agent systems. The Coalition Structure Generation (CSG) problem is well-known as one of major research topics in coalitional games. The CSG problem is to partition a set of agents into coalitions so that the sum of utilities is maximized. This paper studies a CSG problem for partition function games (PFGs), where the value of a coalition differs depending on the formation of other coalitions generated by non-member agents. Traditionally, in PFGs, the input of a coalitional game is a black-box function called a partition function that maps an embedded coalition (a coalition and the coalition structure) to its value. Recently, a novel concise representation scheme called the Partition Decision Trees (PDTs) has been proposed. The PDTs is a graphical representation based on multiple rules. In this paper, we propose new algorithms that can solve a CSG problem by utilizing PDTs representation. More specifically, we modify PDTs representation to effectively handle negative value rules and apply the depth-first branch and bound algorithm. We experimentally show that our algorithm can solve a CSG problem well.

【Keywords】: Cooperative game theory; Coalition structure generation problem; Externality

727. Audio Feature Learning with Triplet-Based Embedding Network.

Paper Link】 【Pages】:4979-4980

【Authors】: Xiaoyu Qi ; Deshun Yang ; Xiaoou Chen

【Abstract】: We propose a triplet-based network for audio feature learning for version identification. Existing methods use hand-crafted features for a music as a whole while we learn features by a triplet-based neural network on segment-level, focusing on the most similar parts between music versions. We conduct extensive experiments and demonstrate our merits.

【Keywords】: audio feature; metric learning; triplet network

728. A Finite Memory Automaton for Two-Armed Bernoulli Bandit Problems.

Paper Link】 【Pages】:4981-4982

【Authors】: Ariel Rao

【Abstract】: Existing approaches to the multi-armed bandit (MAB) primarily rely on perfect recall of past actions to generate estimates for arm payoff probabilities; it is further assumed that the decision maker knows whether arm payoff probabilities can change. To capture the computational limitations many decision making systems face, we explore performance under bounded resources in the form of imperfect recall of past information. We present a finite memory automaton (FMA) designed to solve static and dynamic MAB problems. The FMA demonstrates that an agent can learn a low regret strategy without knowing whether arm payoff probabilities are static or dynamic and without having perfect recall of past actions. Roughly speaking, the automaton works by maintaining a relative ranking of arms rather than estimating precise payoff probabilities.

【Keywords】: machine learning, reinforcement learning, bandit problems, finite automaton

729. Semantic Representation Using Explicit Concept Space Models.

Paper Link】 【Pages】:4983-4984

【Authors】: Walid Ahmed Fouad Shalaby ; Wlodek Zadrozny

【Abstract】: Explicit concept space models have proven efficacy for text representation in many natural language and text mining applications. The idea is to embed textual structures into a semantic space of concepts which captures the main topics of these structures. Despite their wide applicability, existing models have many shortcomings such as sparsity and being restricted to Wikipedia as the main knowledge source from which concepts are extracted. In this paper we highlight some of these limitations. We also describe Mined Semantic Analysis (MSA); a novel concept space model which employs unsupervised learning in order to uncover implicit relations between concepts. MSA leverages the discovered concept-concept associations to enrich the semantic representations. We evaluate MSA’s performance on benchmark data sets for measuring lexical semantic relatedness. Empirical results show superior performance of MSA compared to prior state-of-the-art methods.

【Keywords】: Semantic relatedness; concept space models; bag-of-concepts; association rule mining

730. A Sampling Based Approach for Proactive Project Scheduling with Time-Dependent Duration Uncertainty.

Paper Link】 【Pages】:4985-4986

【Authors】: Wen Song ; Donghun Kang ; Jie Zhang ; Hui Xi

【Abstract】: Most of the existing proactive scheduling approaches assume the durations of activities can be described by independent random variables that have no relation with time. We deal with the more challenging problem where the duration uncertainty is related to the scheduled time period. We propose a sampling based approach by extending the Consensus method from stochastic optimization. Experimental results show the effectiveness of our approach in solution quality and stability.

【Keywords】: Proactive project scheduling; Time-dependent duration uncertainty; Consensus approach

731. PAG2ADMG: A Novel Methodology to Enumerate Causal Graph Structures.

Paper Link】 【Pages】:4987-4988

【Authors】: Nishant Subramani ; Doug Downey

【Abstract】: Causal graphs, such as directed acyclic graphs (DAGs) and partial ancestral graphs (PAGs), represent causal relationships among variables in a model. Methods exist for learning DAGs and PAGs from data and for converting DAGs to PAGs. However, these methods only output a single causal graph consistent with the independencies/dependencies (the Markov equivalence class M) estimated from the data. However, many distinct graphs may be consistent with M, and a data modeler may wish to select among these using domain knowledge. In this paper, we present a method that makes this possible. We introduce PAG2ADMG, the first method for enumerating all causal graphs consistent with M, under certain assumptions. PAG2ADMG converts a given PAG into a set of acyclic directed mixed graphs (ADMGs). We prove the correctness of the approach and demonstrate its efficiency relative to brute-force enumeration.

【Keywords】: Causality; Causal Graphs; Ancestral Graphs; Mixed Graphs; Markov Equivalence Classes

732. Preference Elicitation in DCOPs for Scheduling Devices in Smart Buildings.

Paper Link】 【Pages】:4989-4990

【Authors】: Atena M. Tabakhi

【Abstract】: Researchers have used Distributed Constraint Optimization Problems (DCOPs) as a powerful approach to model various multi-agent coordination problems, taking into account their preferences and constraints. A core limitation of this model is the assumption that all agents’ preferences are specified a priori. However, in a number of application domains such knowledge become available only after being elicited from users in these domains. In this abstract, we explore the effects of preference elicitation in our motivating application of scheduling smart appliances with the aim of reducing users’ electricity bill cost as well as increasing their comfort.

【Keywords】: DCOP; SBDS; Smart Homes; Distributed problems

733. Multimodal Fusion of EEG and Musical Features in Music-Emotion Recognition.

Paper Link】 【Pages】:4991-4992

【Authors】: Nattapong Thammasan ; Ken-ichi Fukui ; Masayuki Numao

【Abstract】: Multimodality has been recently exploited to overcome the challenges of emotion recognition. In this paper, we present a study of fusion of electroencephalogram (EEG) features and musical features extracted from musical stimuli at decision level in recognizing the time-varying binary classes of arousal and valence. Our empirical results demonstrate that EEG modality was suffered from the non-stability of EEG signals, yet fusing with music modality could alleviate the issue and enhance the performance of emotion recognition.

【Keywords】: emotion recognition; affective computing; brain-computer interface

734. Predicting User Roles from Computer Logs Using Recurrent Neural Networks.

Paper Link】 【Pages】:4993-4994

【Authors】: Aaron Tuor ; Samuel Kaplan ; Brian Hutchinson ; Nicole Nichols ; Sean Robinson

【Abstract】: Network and other computer administrators typically have access to a rich set of logs tracking actions by users. However, they often lack metadata such as user role, age, and gender that can provide valuable context for users' actions. Inferring user attributes automatically has wide ranging implications; among others, for customization (anticipating user needs and priorities), for managing resources (anticipating demand) and for security (interpreting anomalous behavior).

【Keywords】: insider threat, neural network, streaming, anomaly detection

735. Extracting Highly Effective Features for Supervised Learning via Simultaneous Tensor Factorization.

Paper Link】 【Pages】:4995-4996

【Authors】: Sunny Verma ; Wei Liu ; Chen Wang ; Liming Zhu

【Abstract】: Real world data is usually generated over multiple time periods associated with multiple labels, which can be represented as multiple labeled tensor sequences. These sequences are linked together, sharing some common features while exhibiting their own unique features. Conventional tensor factorization techniques are limited to extract either common or unique features, but not both simultaneously. However, both types of these features are important in many machine learning systems as they inherently affect the systems' performance. In this paper, we propose a novel supervised tensor factorization technique which simultaneously extracts ordered common and unique features. Classification results using features extracted by our method on CIFAR-10 database achieves significantly better performance over other factorization methods, illustrating the effectiveness of the proposed technique.

【Keywords】: Data Mining, Machine Learning

736. Hybridizing Interval Temporal Logics: The First Step.

Paper Link】 【Pages】:4997-4998

【Authors】: Przemyslaw Andrzej Walega

【Abstract】: Temporal reasoning is one of the main topics investigated within the field of Artificial Intelligence. Formal methods for temporal reasoning arouse interest of researchers from both theoretical and practical point of view. Such methods enable modelling and studying human-like reasoning mechanisms, thus constituting a valuable tool in cognitive science, philosophy, and linguistics. On the other hand, temporal reasoning formalisms have a number of potential practical applications, e.g., in task scheduling, action planning, and temporal databases. Temporal reasoning methods may be divided into point-based and interval-based depending on the type of the considered primitive ontological objects. My work revolves around the latter type of methods which seem to be more human-like and more suitable for such applications as continuous process modelling. My main result is that the satisfiability problem in a hybridized fragment of Halpern-Shoham logic in which formulas are in a form of conjunction of Horn clauses and only box modal operators are allowed (diamond operators are disallowed) is NP-complete over reflexive, as well as over irreflexive and dense time frames. Before hybridization this fragment was P-complete over such time structures.

【Keywords】: temporal logics; hybrid logics; interval logics; computational complexity

737. Boosting for Real-Time Multivariate Time Series Classification.

Paper Link】 【Pages】:4999-5000

【Authors】: Haishuai Wang ; Jun Wu

【Abstract】: Multivariate time series (MTS) is useful for detecting abnormity cases in healthcare area. In this paper, we propose an ensemble boosting algorithm to classify abnormality surgery time series based on learning shapelet features. Specifically, we first learn shapelets by logistic regression from multivariate time series. Based on the learnt shapelets, we propose a MTS ensemble boosting approach when the time series arrives as stream fashion. Experimental results on a real-world medical dataset demonstrate the effectiveness of the proposed methods.

【Keywords】: time series classification

738. Semantic Connection Based Topic Evolution.

Paper Link】 【Pages】:5001-5002

【Authors】: Jiamiao Wang

【Abstract】: Contrary to previous studies on topic evolution that directly extract topics by topic modeling and preset the number of topics, we propose a method of topic evolution based on semantic connection for an adaptive number of topics and rapid responses to the changes of contents. Semantic connection not only indicates the content similarity between documents but also shows the time decay, so semantic connection features can be used to visualize topic evolution, which makes the analyses of changes much easier. Preliminary experimental results demonstrate that our method performs well compared to a state-of-the-art baseline on both qualities of topics and the sensitivity of changes.

【Keywords】: LDA, Topic Model, Semantics, Topic Evolution

739. Keyphrase Extraction with Sequential Pattern Mining.

Paper Link】 【Pages】:5003-5004

【Authors】: Qingren Wang ; Victor S. Sheng ; Xindong Wu

【Abstract】: Existing studies show that extracting a complete keyphrase candidate set is the first and crucial step to extract high quality keyphrases from documents. Based on a common sense that words do not repeatedly appear in an effective keyphrase, we propose a novel algorithm named KCSP for document-specific keyphrase candidate search using sequential pattern mining with gap constraints, which only needs to scan a document once and automatically specifies appropriate gap constraints for words without users’ participation. The experimental results confirm that it helps improve the quality of keyphrase extraction.

【Keywords】: keyphrase extraction; sequential pattern mining; entropy; interval calculation

740. Cycle-Based Singleton Local Consistencies.

Paper Link】 【Pages】:5005-5006

【Authors】: Robert J. Woodward ; Berthe Y. Choueiry ; Christian Bessiere

【Abstract】: We propose to exploit cycles in the constraint network of a Constraint Satisfaction Problem (CSP) to vehicle constraint propagation and improve the effectiveness of local consistency algorithms. We focus our attention on the consistency property Partition-One Arc-Consistency (POAC), which is a stronger variant of Singleton Arc-Consistency (SAC). We modify the algorithm for enforcing POAC to operate on a minimum cycle basis (MCB) of the incidence graph of the CSP. We empirically show that our approach improves the performance of problem solving and constitutes a novel and effective localization of consistency algorithms. Although this paper focuses on POAC, we believe that exploiting cycles, such as MCBs, is applicable to other consistency algorithms and that our study opens a new direction in the design of consistency algorithms. This research is documented in a technical report (Woordward, Choueiry, and Bessiere 2016). http://consystlab.unl.edu/our_work/StudentReports/TR-UNL-CSE-2016-0004.pdf

【Keywords】:

741. Evolutionary Machine Learning for RTS Game StarCraft.

Paper Link】 【Pages】:5007-5008

【Authors】: Lianlong Wu ; Andrew Markham

【Abstract】: Real-Time Strategy (RTS) games involve multiple agents acting simultaneously, and result in enormous state dimensionality. In this paper, we propose an abstracted and simplified model for the famous game StarCraft, and design a dynamic programming algorithm to solve the building order problem, which takes minimal time to achieve a specific target. In addition, Genetic Algorithms (GA) are used to find an optimal target for the opening stage.

【Keywords】: Genetic Algorithm; Machine Learning; RTS Game; Artificial Intelligence; StarCraft

742. Enhancing the Privacy of Predictors.

Paper Link】 【Pages】:5009-5010

【Authors】: Ke Xu ; Swair Shah ; Tongyi Cao ; Crystal Maung ; Haim Schweitzer

【Abstract】: The privacy challenge considered here is to prevent an adversary from using available feature values to predict confi- dential information. We propose an algorithm providing such privacy for predictors that have a linear operator in the first stage. Privacy is achieved by zeroing out feature components in the approximate null space of the linear operator. We show that this has little effect on predicting desired information.

【Keywords】: Privacy; Multi-label Learning; Linear Regression

743. Detecting Review Spammer Groups.

Paper Link】 【Pages】:5011-5012

【Authors】: Min Yang ; Ziyu Lu ; Xiaojun Chen ; Fei Xu

【Abstract】: With an increasing number of paid writers posting fake reviews to promote or demote some target entities through Internet, review spammer detection has become a crucial and challenging task. In this paper, we propose a three-phase method to address the problem of identifying review spammer groups and individual spammers, who get paid for posting fake comments. We evaluate the effectiveness and performance of the approach on a real-life online shopping review dataset from amazon.com. The experimental result shows that our model achieved comparable or better performance than previous work on spammer detection.

【Keywords】: Spammer detection;Topic model

744. Attention Based LSTM for Target Dependent Sentiment Classification.

Paper Link】 【Pages】:5013-5014

【Authors】: Min Yang ; Wenting Tu ; Jingxuan Wang ; Fei Xu ; Xiaojun Chen

【Abstract】: We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classification. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.

【Keywords】: sentiment classification;LSTM

745. Authorship Attribution with Topic Drift Model.

Paper Link】 【Pages】:5015-5016

【Authors】: Min Yang ; Dingju Zhu ; Yong Tang ; Jingxuan Wang

【Abstract】: Authorship attribution is an active research direction due to its legal and financial importance. The goal is to identify the authorship of anonymous texts. In this paper, we propose a Topic Drift Model (TDM), monitoring the dynamicity of authors’ writing style and latent topics of interest. Our model is sensitive to the temporal information and the ordering of words, thus it extracts more information from texts.

【Keywords】: Authorship attribution; Topic model

746. Participatory Art Museum: Collecting and Modeling Crowd Opinions.

Paper Link】 【Pages】:5017-5018

【Authors】: Xiaoyu Zeng ; Ruohan Zhang

【Abstract】: We collect public opinions on museum artworks using online crowdsourcing techniques. We ask two research questions. First, do crowd opinions on artworks differ from expert interpretations? Second, how can museum manage large amount of crowd opinions, such that users can efficiently retrieve useful information? We address these questions through opinion modeling via semantic embedding and dimension reduction.

【Keywords】: Participatory Museum; Crowdsourcing; Semantic Embedding

747. High-Resolution Mobile Fingerprint Matching via Deep Joint KNN-Triplet Embedding.

Paper Link】 【Pages】:5019-5020

【Authors】: Fandong Zhang ; Jufu Feng

【Abstract】: In mobile devices, the limited area of fingerprint sensors brings demand of partial fingerprint matching. Existing fingerprint authentication algorithms are mainly based on minutiae matching. However, their accuracy degrades significantly for partial-to-partial matching due to the lack of minutiae. Optical fingerprint sensor can capture very high-resolution fingerprints (2000dpi) with rich details as pores, scars, etc. These details can cover the shortage of minutiae insufficiency. In this paper, we propose a novel matching algorithm for such fingerprints, namely Deep Joint KNN-Triplet Embedding, by making good use of these subtle features. Our model employs a deep convolutional neural network (CNN) with a well-designed joint loss to project raw fingerprint images into an Euclidean space. Then we can use L2-distance to measure the similarity of two fingerprints. Experiments indicate that our model outperforms several state-of-the-art approaches.

【Keywords】: Mobile Fingerprint; Convolutional Neural Network; Deep Metric Learning; Triplet Loss

748. A Computational Assessment Model for the Adaptive Level of Rehabilitation Exergames for the Elderly.

Paper Link】 【Pages】:5021-5022

【Authors】: Hao Zhang ; Chunyan Miao ; Han Yu ; Cyril Leung

【Abstract】: Rehabilitation exergames can engage the elderly in physical activities and help them recover part of their deteriorating capabilities. However, most existing exergames lack measures of how suitable they are to specific individuals. In this paper, we propose the Computational Person-Environment Fit model to evaluate the adaptability of the exergames to each individual elderly user.

【Keywords】:

749. Natural Language Person Retrieval.

Paper Link】 【Pages】:5023-5024

【Authors】: Tao Zhou ; Jie Yu

【Abstract】: Following the recent progress in image classification and image captioning using deep learning, we developed a novel person retrieval system using natural language, which to our knowledge is first of its kind. Our system employs a state-of-the-art deep learning based natural language object retrieval framework to detect and retrieve people in images. Quantitative experimental results show significant improvement over state-of-the-art meth- ods for generic object retrieval. This line of research provides great advantages for searching large amounts of video surveil- lance footage and it can also be utilized in other domains, such as human-robot interaction.

【Keywords】: person retrieval; LSTM; CNN; batch normalization; max pooling

750. User Modeling Using LSTM Networks.

Paper Link】 【Pages】:5025-5027

【Authors】: Konrad Zolna ; Bartlomiej Romanski

【Abstract】: The LSTM model presented is capable of describing a user of a particular website without human expert supervision. In other words, the model is able to automatically craft features which depict attitude, intention and the overall state of a user. This effect is achieved by projecting the complex history of the user (sequence data corresponding to his actions on the website) into fixed-size vectors of real numbers. The representation obtained may be used to enrich typical models used in e-commerce: click-through rate, conversion rate, recommender systems etc. The goal of this paper is to demonstrate a way of creating the mentioned projection, which we called user2vec, and present possible benefits of incorporating this solution to enhance conversion rate model. Thus enriched model’s superiority is due not only to its increased internal complexity but also to its capability of learning from wider data – it indirectly analyzes actions of all website users, rather than being limited to the users who clicked on an ad.

【Keywords】: user modeling; recurrent neural networks; conversion rate; transfer learning; user2vec

Doctoral Consortium 16

751. Explainable Image Understanding Using Vision and Reasoning.

Paper Link】 【Pages】:5028-5029

【Authors】: Somak Aditya

【Abstract】: Image Understanding is fundamental to intelligent agents.Researchers have explored Caption Generation and VisualQuestion Answering as independent aspects of Image Understanding (Johnson et al. 2015; Xiong, Merity, and Socher2016). Common to most of the successful approaches, are the learning of end-to-end signal mapping (image-to-caption, image and question to answer). The accuracy is impressive. It is also important to explain a decision to end-user(justify the results, and rectify based on feedback). Very recently, there has been some focus (Hendricks et al. 2016;Liu et al. ) on explaining some aspects of the learning systems. In my research, I look towards building explainableImage Understanding systems that can be used to generate captions and answer questions. Humans learn both from examples (learning) and by reading (knowledge). Inspired by such an intuition, researchers have constructed Knowledge-Bases that encode (probabilistic) commonsense and background knowledge. In this work, we look towards efficiently using this probabilistic knowledge on top of machine learning capabilities, to rectify noise in visual detections and generate captions or answers to posed questions.

【Keywords】: vision and language, probabilistic logic

752. Problem Formulation for Accommodation Support in Plan-Based Interactive Narratives.

Paper Link】 【Pages】:5030-5031

【Authors】: Adam Amos-Binks

【Abstract】: Branching story games have gained popularity for adapting to user actions within a story world. An active area of Interactive Narrative (IN) research uses automated planning to generate story plans as it can lighten the authorial burden of writing a branching story. Branches can be generated from a declarative representation rather than hand-crafted. A goal of an Experience Manager (EM) is to guide a user through a space of desirable narrative trajectories, or story branches, in an IN. However, in the cases when an EM must accommodate user actions and mediate them from a desired narrative trajectory to a new narrative trajectory, automated planning’s authorial advantage becomes a liability as the available narrative trajectories are not known apriori. This limitation can lead to the EM choosing a new narrative trajectory that is not coherent with the previous one and may result in a negative user experience. The goal of my research is to develop a problem formulation methodology for story planning problems that elicits the available narrative trajectories enabling an EM to execute more coherent accommodations.

【Keywords】: Automated planning; Problem formulation; AI Techniques for Games; AI Storytelling; Experience management

753. An Evolutionary Algorithm Based Framework for Task Allocation in Multi-Robot Teams.

Paper Link】 【Pages】:5032-5033

【Authors】: Muhammad Usman Arif

【Abstract】: Multi-Robot Task Allocation (MRTA) has no formal framework which could provide solutions covering different domains within the MRTA taxonomy without changing the optimization scheme. This research aims to develop a novel framework using evolutionary computing. The study proposes a modular approach towards developing this framework in which individual problem types of the MRTA taxonomy are solved one at a time. The performance of the framework will be evaluated against the popular approaches suggested for each problem type.

【Keywords】: Multi-robot task allocation; Framework; task allocation; Multi-agent

754. Accelerating Multiagent Reinforcement Learning through Transfer Learning.

Paper Link】 【Pages】:5034-5035

【Authors】: Felipe Leno da Silva ; Anna Helena Reali Costa

【Abstract】: Reinforcement Learning (RL) is a widely used solution for sequential decision-making problems and has been used in many complex domains. However, RL algorithms suffer from scalability issues, especially when multiple agents are acting in a shared environment. This research intends to accelerate learning in multiagent sequential decision-making tasks by reusing previous knowledge, both from past solutions and advising between agents. We intend to contribute a Transfer Learning framework focused on Multiagent RL, requiring as few domain-specific hand-coded parameters as possible.

【Keywords】: Transfer Learning; Reinforcement Learning; Multiagent Systems

755. Improving Deep Reinforcement Learning with Knowledge Transfer.

Paper Link】 【Pages】:5036-5037

【Authors】: Ruben Glatt ; Anna Helena Reali Costa

【Abstract】: Recent successes in applying Deep Learning techniques on Reinforcement Learning algorithms have led to a wave of breakthrough developments in agent theory and established the field of Deep Reinforcement Learning (DRL). While DRL has shown great results for single task learning, the multi-task case is still underrepresented in the available literature. This D.Sc. research proposal aims at extending DRL to the multi- task case by leveraging the power of Transfer Learning algorithms to improve the training time and results for multi-task learning. Our focus lies on defining a novel framework for scalable DRL agents that detects similarities between tasks and balances various TL techniques, like parameter initialization, policy or skill transfer.

【Keywords】: Deep Reinforcement Learning; Transfer Learning; Artificial Intelligence

756. Problems in Large-Scale Image Classification.

Paper Link】 【Pages】:5038-5039

【Authors】: Yuchen Guo

【Abstract】: The number of images is growing rapidly in recent years because of development of Internet, especially the social networks like Facebook, and the popularization of portable image capture devices like smart phone. Annotating them with semantically meaningful words to describe them, i.e., classification, is a useful way to manage these images. However, the huge number of images and classes brings several challenges to classification, of which two are 1) how to measure the similarity efficiently between large-scale images, for example, measuring similarity between samples is the building block for SVM and kNN classifiers, and 2) how to train supervised classification models for newly emerging classes with only a few or even no labeled samples because new concepts appear every day in the Web, like Tesla's Model S. The research of my Ph. D. thesis focuses on the two problems in large-scale image classification mentioned above. Formally, these two problems are termed as large-scale similarity search which focuses on the large scale of samples/images and zero-shot/few-shots learning which focuses on the large scale of classes. Specifically, my research considers the following three aspects: 1) hashing based large-scale similarity search which adopts hashing to improve the efficiency; 2) cross-class transfer active learning which simultaneously transfers knowledge from the abundant labeled samples in the Web and selects the most informative samples for expert labeling such that we can construct effective classifiers for novel classes with only a few labeled samples; and 3) zero-shot learning which utilizes no labeled samples for novel classes at all to build supervised classifiers for them by transferring knowledge from the related classes.

【Keywords】: Large-scale Similarity Search, Cross-class Transfer Active Learning, Zero-shot Learning

757. Representations for Continuous Learning.

Paper Link】 【Pages】:5040-5041

【Authors】: David Isele

【Abstract】: Systems deployed in unstructured environments must be able to adapt to novel situations. This requires the ability to perform in domains that may be vastly different from training domains. My dissertation focuses on the representations used in lifelong learning and how these representations enable predictions and knowledge sharing over time, allowing an agent to continuously learn and adapt in changing environments. Specifically, my contributions will enable lifelong learning systems to efficiently accumulate data, use prior knowledge to predict models for novel tasks, and alter existing models to account for changes in the environment.

【Keywords】:

758. Structured Prediction in Time Series Data.

Paper Link】 【Pages】:5042-5043

【Authors】: Jia Li

【Abstract】: Time series data is common in a wide range of disciplines including finance, biology, sociology, and computer science. Analyzing and modeling time series data is fundamental for studying various problems in those fields. For instance, studying time series physiological data can be used to discriminate patients’ abnormal recovery trajectories and normal ones (Hripcsak, Albers, and Perotte 2015). GPS data are useful for studying collective decision making of groupliving animals (Strandburg-Peshkin et al. 2015). There are different methods for studying time series data such as clustering, regression, and anomaly detection. In this proposal, we are interested in structured prediction problems in time series data. Structured prediction focuses on prediction task where the outputs are structured and interdependent, contrary to the non-structured prediction which assumes that the outputs are independent of other predicted outputs. Structured prediction is an important problem as there are structures inherently existing in time series data. One difficulty for structured prediction is that the number of possible outputs can be exponential which makes modeling all the potential outputs intractable.

【Keywords】: Time Series; Sequence Tagging; Active Learning

759. A Supervised Sparse Learning Framework to Solve EEG Inverse Problem for Discriminative Activations Pattern.

Paper Link】 【Pages】:5044-5045

【Authors】: Feng Liu

【Abstract】: Electroencephalography (EEG) is one of the most important noninvasive neuroimaging tools that provides excellent temporal accuracy. As the EEG electrode sensors measure electrical potentials on the scalp instead of direct measuring activities of brain voxels deep inside the head, many approaches are proposed to infer the activated brain regions due to its significance in neuroscience research and clinical application. However, since mostly part of the brain activity is composed of the spontaneous neural activities or non-task related activations, task related activation patterns will be corrupted in strong background signal/noises. In our research, we proposed a sparse learning framework for solving EEG inverse problem which aims to explicitly extract the discriminative sources for different cognitive tasks by fusing the label information into the inverse model. The proposed framework is capable of estimation the discriminative brain sources under given different brain states where traditional inverse methods failed. We introduced two models, one is formulated as supervised sparse dictionary learning and the other one is the graph regularized discriminative source estimation model to promote the consistency within same class. Preliminary experimental results also validated that the proposed sparse learning framework is effective to discover the discriminative task-related brain activation sources, which shows the potential to advance the high resolution EEG source analysis for real-time non-invasive brain imaging research.

【Keywords】: EEG; inverse problem; dictionary learning; discriminative source; K-SVD

760. V for Verification: Intelligent Algorithm of Checking Reliability of Smart Systems.

Paper Link】 【Pages】:5046-5047

【Authors】: Anna Lukina

【Abstract】: Cyber-physical systems (CPS) are intended to receive information from the environment through sensors and perform appropriate actions using actuators of the controller. In the last years world of intelligent technologies has grown in an exponential fashion: from cruise control to smart ecosystems. Next we are facing the future of CPS involved in almost every aspect of our lives bringing higher comfortability and efficiency. Our goal is to help smart inventions adjust to this highly uncertain environment and guarantee safety for its inhabitants. The physical environment renders the problem of CPS verification extremely cumbersome. Due to a wealth of uncertainties introduced by physical processes, the system is best described by stochastic models. Approximate prediction techniques, such as Statistical Model Checking (SMC), have therefore recently become increasingly popular. As a result, verification of a CPS boils down to quantitative analysis of how close the system is to reaching bad states (safety property) or desired goal (liveness property). Controlling the systems, that is, computing appropriate response actions depending on the environment, involves probabilistic state estimation, as well as optimal action prediction, i.e., choosing the best next step by simulating the future. In my thesis, I develop a novel intelligent algorithm addressing existing deficiencies of SMC such as poor prediction of rare events (RE) and sampling divergence.

【Keywords】:

761. Modelling Familiarity for Intelligent Personalized Social Mobilization.

Paper Link】 【Pages】:5048-5049

【Authors】: Zhengxiang Pan

【Abstract】: With the rise of the Internet and social media, social mobilization - large-scale mobilization manpower for scientific, social, and political activities through crowdsourcing - has become a widespread practice. Despite the success, social mobilization is not without its limitations. Local trapping of diffusion and the dependence on highly connected individuals to mobilize people in distance locations affect the effectiveness of social mobilization. Furthermore, as empirical studies on people's responses to various social mobilization approaches are lacking, it is a significant challenge for artificial intelligence (AI) researchers to design effective and efficient decision support mechanisms to help manage this emerging phenomenon. In my thesis, I conduct large-scale empirical studies to help the AI research community establish baseline personal variabilities in different people's response patterns to social mobilization approaches. Based on the collected dataset, I will further propose computational algorithmic crowdsourcing mechanisms which leverage the empirical evidence to improve the effectiveness and efficiency of social mobilization, towards achieving superlinear productivity. Throughout this process, I will also incorporate human factors into the computational models to benefit social mobilization efforts.

【Keywords】: Social mobilization; collaborative crowdsourcing

762. Transfer of Knowledge through Collective Learning.

Paper Link】 【Pages】:5050-5051

【Authors】: Mohammad Rostami

【Abstract】: Learning fast and efficiently using minimal data has been consistently a challenge in machine learning. In my thesis, I explore this problem for knowledge transfer for multi-agent multi-task learning in a life-long learning paradigm. My goal is to demonstrate that by sharing knowledge between agents and similar tasks, efficient algorithms can be designed that can increase the speed of learning as well as improve performance. Moreover, this would allow for handling hard tasks through collective learning of multiple agents that share knowledge. As an initial step, I study the problem of incorporating task descriptors into lifelong learning of related tasks to perform zero-shot knowledge transfer. Zero-shot learning is highly desirable because it leads to considerable speedup in handling similar sequential tasks. Then I focus on a multi-agent learning setting, where related tasks are learned collectively and/or address privacy concerns.

【Keywords】:

763. Project Scheduling in Complex Business Environments.

Paper Link】 【Pages】:5052-5053

【Authors】: Wen Song

【Abstract】: Project scheduling is a common business management task. However, current business management environment has become more open and dynamic, which jeopardizes the effectiveness of the traditional approaches. In this abstract, I summarize my works in addressing two variations of project scheduling problems, including a combinatorial auction based approach for solving the decentralized multi-project scheduling problem, and a sampling based approach for solving the problem of project scheduling under time-dependent duration uncertainties.

【Keywords】: Planning and scheduling; Project scheduling; Resource allocation

764. Human-Like Spatial Reasoning Formalisms.

Paper Link】 【Pages】:5054-5055

【Authors】: Przemyslaw Andrzej Walega

【Abstract】: My work on the PhD thesis concerns human-like reasoning about relations between spatial objects and the way they change in time. In particular, my research is focused on logic-based reasoning systems that model human spatial reasoning methods and may enable better understanding of humans reasoning mechanisms in future. Importantly, such formalisms are also interested from the practical point of view – they have a number of potential applications, e.g., in robotics, architecture design, databases, among others.

【Keywords】: temporal logics; computational complexity; answer set programming; spatial reasoning

765. Joint Learning of Structural and Textual Features for Web Scale Event Extraction.

Paper Link】 【Pages】:5056-5057

【Authors】: Julia Wiedmann

【Abstract】: The web has become the central platform and marketplace for the organization, propagation of events and sale of tickets of any kind. Such events range from concerts, workshops, sport events, professional events to small local events. Single event pages are typically split into a textual event description and a set of core event attributes that are specifically highlighted and presented in the same template for all events of a particular source. In this work, we aim to learn a joint model for the extraction of event attributes from both event descriptions and templates. We also investigate the automatic discovery of event sources and the identification of single event pages within event sources. By considering all three problems as part of an integral system, we can exploit mutual reinforcement between the models derived for each sub problem.

【Keywords】:

766. Scalable Nonparametric Tensor Analysis.

Paper Link】 【Pages】:5058-5060

【Authors】: Shandian Zhe

【Abstract】: Multiway data, described by tensors, are common in real-world applications. For example, online advertising click logs can be represented by a three-mode tensor (user, advertisement, context). The analysis of tensors is closely related to many important applications, such as click-through-rate (CTR) prediction, anomaly detection and product recommendation. Despite the success of existing tensor analysis approaches, such as Tucker, CANDECOMP/PARAFAC and infinite Tucker decompositions, they are either not enough powerful to capture complex hidden relationships in data, or not scalable to handle real-world large data. In addition, they may suffer from the extreme sparsity in real data, i.e., when the portion of nonzero entries is extremely low; they lack of principled ways to discover other patterns — such as an unknown number of latent clusters — which are critical for data mining tasks such as anomaly detection and market targeting. To address these challenges, I used nonparametric Bayesian techniques, such as Gaussian processes (GP) and Dirichlet processes (DP), to model highly nonlinear interactions and to extract hidden patterns in tensors; I derived tractable variational evidence lower bounds, based on which I developed scalable, distributed or online approximate inference algorithms. Experiments on both simulation and real-world large data have demonstrated the effect of my propoaed approaches.

【Keywords】:

What's Hot 7

767. SAT Competition 2016: Recent Developments.

Paper Link】 【Pages】:5061-5063

【Authors】: Tomas Balyo ; Marijn J. H. Heule ; Matti Järvisalo

【Abstract】: We give an overview of SAT Competition 2016, the 2016 edition of thefamous competition for Boolean satisfiability (SAT) solvers with over 20 years of history. A key aim is to point out ``what's hot'' in SAT competitions in 2016, i.e., new developments in thecompetition series, including new competition tracks and new solver techniquesimplemented in some of the award-winning solvers.

【Keywords】: satisfiability; competition; benchmarks

768. What's Hot in Evolutionary Computation.

Paper Link】 【Pages】:5064-5066

【Authors】: Tobias Friedrich ; Frank Neumann

【Abstract】: We provide a brief overview on some hot topics in the area of evolutionary computation. Our main focus is on recent developments in the areas of combinatorial optimization and real-world applications. Furthermore, we highlight recent progress on the theoretical understanding of evolutionary computing methods.

【Keywords】: evolutionary computation; real-world applications, combinatorial optimization, theory

769. What's Hot in Case-Based Reasoning.

Paper Link】 【Pages】:5067-5069

【Authors】: Ashok K. Goel ; Belén Díaz-Agudo

【Abstract】: Case-based reasoning addresses new problems by remembering and adapting solutions previously used to solve similar problems. Pulled by the increasing number of applications and pushed by a growing interest in memory intensive techniques, research on case-based reasoning appears to be gaining momentum. In this article, we briefly summarize recent developments in research on case-based reasoning based partly on the recent Twenty Fourth International Conference on Case-Based Reasoning.

【Keywords】: Case-Based Reasoning, Cognitive Systems

770. Automated Negotiating Agents Competition (ANAC).

Paper Link】 【Pages】:5070-5072

【Authors】: Catholijn M. Jonker ; Reyhan Aydogan ; Tim Baarslag ; Katsuhide Fujita ; Takayuki Ito ; Koen V. Hindriks

【Abstract】: The annual International Automated Negotiating Agents Competition (ANAC) is used by the automated negotiation research community to benchmark and evaluate its work andto challenge itself. The benchmark problems and evaluation results and the protocols and strategies developed are available to the wider research community.

【Keywords】: benchmark, competition, automated negotiation, protocol, profile

771. What's Hot in Constraint Programming.

Paper Link】 【Pages】:5073-5075

【Authors】: Laurent D. Michel ; Michel Rueher

【Abstract】: The CP conference is the annual international conference on constraint programming. It is concerned with all aspects of computing with constraints, including theory, algorithms, environments, languages, models, systems, and applications such as decision-making, resource allocation, scheduling, configuration, and planning. The CP community is very keen to ensure it remains open to interdisciplinary research at the intersection between constraint programming and related fields. Hence, in addition to the usual technical and application tracks, the CP 2016 conference featured thematic tracks: Computational Sustainability, CP and Biology, Preferences, Social Choice and Optimization, and Testing and Verification. In this overview, we highlight several remarkable papers that have been selected by the senior program committee and papers with the most innovative methods and techniques, and a very high potential for applications (in our opinion).

【Keywords】: cp; application; optimization

772. What's Hot at CPAIOR (Extended Abstract).

Paper Link】 【Pages】:5076-5078

【Authors】: Claude-Guy Quimper

【Abstract】: The 13th International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming (CPAIOR 2016), was held in Banff, Canada, May 29 - June 1, 2016. In order to trigger exchanges between the constraint programming and the operations research community, CPAIOR was co-located with CORS 2016, the Canadian Operational Research society's conference.

【Keywords】: Constraint Programming, Operations Research, Conference

773. The State of the AIIDE Conference in 2017.

Paper Link】 【Pages】:5079-5080

【Authors】: Nathan R. Sturtevant ; Brian Magerko

【Abstract】: This abstract looks at the state of the Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE), describing some of the changes in the field and areas of focus for current work.

【Keywords】:

Demonstrations 13

774. Deep Music: Towards Musical Dialogue.

Paper Link】 【Pages】:5081-5082

【Authors】: Mason Bretan ; Sageev Oore ; Jesse Engel ; Douglas Eck ; Larry Heck

【Abstract】: Computer dialogue systems are designed with the intention of supporting meaningful interactions with humans. Common modes of communication include speech, text, and physical gestures. In this work we explore a communication paradigm in which the input and output channels consist of music. Specifically, we examine the musical interaction scenario of call and response. We present a system that utilizes a deep autoencoder to learn semantic embeddings of musical input. The system learns to transform these embeddings in a manner such that reconstructing from these transformation vectors produces appropriate musical responses. In order to generate a response the system employs a combination of generation and unit selection. Selection is based on a nearest neighbor search within the embedding space and for real-time application the search space is pruned using vector quantization. The live demo consists of a person playing a midi keyboard and the computer generating a response that is played through a loudspeaker.

【Keywords】: music; machine learning; deep neural networks; interaction

775. AniDraw: When Music and Dance Meet Harmoniously.

Paper Link】 【Pages】:5083-5084

【Authors】: Yaohua Bu ; Taoran Tang ; Jia Jia ; Zhiyuan Ma ; Songyao Wu ; Yuming You

【Abstract】: In this paper, we present a demo, AniDraw, which can help users practice the coordination between their hands, mouth and eyes by combing the elements of music, painting and dance. Users can sketch a cartoon character through multitouch screens and then hum songs, which will drive the cartoon character to dance to create a lively animation. In technical realization, we apply the mechanism of acoustic driving in which AniDraw extracts time-domain acoustic features to map to the intensity of dances, frequency-domain ones to map to the style of dances, and high-level ones including onesets and tempos to map to start, duration and speed of dances. AniDraw can not only stimulate users’ enthusiasm in artistic creation, but also enhance their esthetic ability on harmony.

【Keywords】: Audio-visual multimodal integration; acoustic feature extraction; art creation

776. Arnold: An Autonomous Agent to Play FPS Games.

Paper Link】 【Pages】:5085-5086

【Authors】: Devendra Singh Chaplot ; Guillaume Lample

【Abstract】: Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present Arnold, a completely autonomous agent to play First-Person Shooter Games using only screen pixel data and demonstrate its effectiveness on Doom, a classical first-person shooter game. Arnold is trained with deep reinforcement learning using a recent Action-Navigation architecture, which uses separate deep neural networks for exploring the map and fighting enemies. Furthermore, it utilizes a lot of techniques such as augmenting high-level game features, reward shaping and sequential updates for efficient training and effective performance. Arnold outperforms average humans as well as in-built game bots on different variations of the deathmatch. It also obtained the highest kill-to-death ratio in both the tracks of the Visual Doom AI Competition and placed second in terms of the number of frags.

【Keywords】: Deep Learning; Reinforcement Learning; Machine Learning; Artificial Intelligence; FPS Games; Doom

777. A Virtual Personal Fashion Consultant: Learning from the Personal Preference of Fashion.

Paper Link】 【Pages】:5087-5088

【Authors】: Jingtian Fu ; Yejun Liu ; Jia Jia ; Yihui Ma ; Fanhang Meng ; Huan Huang

【Abstract】: Besides fashion, personalization is another important factor of wearing. How to balance fashion trend and personal preference to better appreciate wearing is a non-trivial task. In previous work we develop a demo, Magic Mirror, to recommend clothing collocation based on the fashion trend. However, the diversity of people’s aesthetics is huge. In order to meet different demand, Magic Mirror is upgraded in this paper, and it can give out recommendations by considering both the fashion trend and personal preference, and work as a private clothing consultant. For more suitable recommendation, the virtual consultant will learn users’ tastes and preferences from their behaviors by using Genetic algorithm. Users can get collocations or matched top/bottom recommendation after choosing occasion and style. They can also get a report about their fashion state and aesthetic standpoint on recent wearing.

【Keywords】: clothing collocation, aesthetics learning, personally recommend

778. Efficient Clinical Concept Extraction in Electronic Medical Records.

Paper Link】 【Pages】:5089-5090

【Authors】: Yufan Guo ; Deepika Kakrania ; Tyler Baldwin ; Tanveer F. Syeda-Mahmood

【Abstract】: Automatic identification of clinical concepts in electronic medical records (EMR) is useful not only in forming a complete longitudinal health record of patients, but also in recovering missing codes for billing, reducing costs, finding more accurate clinical cohorts for clinical trials, and enabling better clinical decision support. Existing systems for clinical concept extraction are mostly knowledge-driven, relying on exact match retrieval from original or lemmatized reports, and very few of them are scaled up to handle large volumes of complex, diverse data. In this demonstration we will showcase a new system for real-time detection of clinical concepts in EMR. The system features a large vocabulary of over 5.6 million concepts. It achieves high precision and recall, with good tolerance to typos through the use of a novel prefix indexing and subsequence matching algorithm, along with a recursive negation detector based on efficient, deep parsing. Our system has been tested on over 12.9 million reports of more than 200 different types, collected from 800,000+ patients. A comparison with the state of the art shows that it outperforms previous systems in addition to being the first system to scale to such large collections.

【Keywords】: clinical concept extraction; clinical NLP; information extraction

779. Integrating Verbal and Nonvebval Input into a Dynamic Response Spoken Dialogue System.

Paper Link】 【Pages】:5091-5092

【Authors】: Ting-Yao Hu ; Chirag Raman ; Salvador Medina Maza ; Liangke Gui ; Tadas Baltrusaitis ; Robert E. Frederking ; Louis-Philippe Morency ; Alan W. Black ; Maxine Eskénazi

【Abstract】: In this work, we present a dynamic response spoken dialogue system (DRSDS). It is capable of understanding the verbal and nonverbal language of users and making instant, situation-aware response. Incorporating with two external systems, MultiSense and email summarization, we built an email reading agent on mobile device to show the functionality of DRSDS.

【Keywords】: nonverbal language understanding; dynamic natural language generation; email summarization; emotion detection

Paper Link】 【Pages】:5093-5094

【Authors】: Lu Jiang ; Liangliang Cao ; Yannis Kalantidis ; Sachin Farfade ; Alexander G. Hauptmann

【Abstract】: The boom of mobile devices and cloud services has led to an explosion of personal photo and video data. However, due to the missing user-generated metadata such as titles or descriptions, it usually takes a user a lot of swipes to find some video on the cell phone. To solve the problem, we present an innovative idea called Visual Memory QA which allow a user not only to search but also to ask questions about her daily life captured in the personal videos. The proposed system automatically analyzes the content of personal videos without user-generated metadata, and offers a conversational interface to accept and answer questions. To the best of our knowledge, it is the first to answer personal questions discovered in personal photos or videos. The example questions are "what was the lat time we went hiking in the forest near San Francisco?"; "did we have pizza last week?"; "with whom did I have dinner in AAAI 2015?".

【Keywords】: Personal Photo; Personal Video; Video Content Understanding; Question Answering; Neural Networks

781. Sarcasm Suite: A Browser-Based Engine for Sarcasm Detection and Generation.

Paper Link】 【Pages】:5095-5096

【Authors】: Aditya Joshi ; Diptesh Kanojia ; Pushpak Bhattacharyya ; Mark James Carman

【Abstract】: Sarcasm Suite is a browser-based engine that deploys five of our past papers in sarcasm detection and generation. The sarcasm detection modules use four kinds of incongruity: sentiment incongruity, semantic incongruity, historical context incongruity and conversational context incongruity. The sarcasm generation module is a chatbot that responds sarcastically to user input. With a visually appealing interface that indicates predictions using `faces' of our co-authors from our past papers, Sarcasm Suite is our first demonstration of our work in computational sarcasm.

【Keywords】: sentiment analysis; sarcasm; computational sarcasm; sarcasm generation; natural language generation

782. An Event Reconstruction Tool for Conflict Monitoring Using Social Media.

Paper Link】 【Pages】:5097-5098

【Authors】: Junwei Liang ; Desai Fan ; Han Lu ; Poyao Huang ; Jia Chen ; Lu Jiang ; Alexander G. Hauptmann

【Abstract】: What happened during the Boston Marathon in 2013? Nowadays, at any major event, lots of people take videos and share them on social media. To fully understand exactly what happened in these major events, researchers and analysts often have to examine thousands of these videos manually. To reduce this manual effort, we present an investigative system that automatically synchronizes these videos to a global timeline and localizes them on a map. In addition to alignment in time and space, our system combines various functions for analysis, including gunshot detection, crowd size estimation, 3D reconstruction and person tracking. To our best knowledge, this is the first time a unified framework has been built for comprehensive event reconstruction for social media videos.

【Keywords】: Event Reconstruction;Video Analysis;Video Synchronization;Video Localization;3D Reconstruction;Person Tracking

783. Webly-Supervised Learning of Multimodal Video Detectors.

Paper Link】 【Pages】:5099-5100

【Authors】: Junwei Liang ; Lu Jiang ; Alexander G. Hauptmann

【Abstract】: Given any complicated or specialized video content search query, e.g. ”Batkid (a kid in batman costume)” or ”destroyed buildings”, existing methods require manually labeled data to build detectors for searching. We present a demonstration of an artificial intelligence application, Webly-labeled Learning (WELL) that enables learning of ad-hoc concept detectors over unlimited Internet videos without any manual an-notations. A considerable number of videos on the web are associated with rich but noisy contextual information, such as the title, which provides a type of weak annotations or la-bels of the video content. To leverage this information, our system employs state-of-the-art webly-supervised learning(WELL) (Liang et al. ). WELL considers multi-modal information including deep learning visual, audio and speech features, to automatically learn accurate video detectors based on the user query. The learned detectors from a large number of web videos allow users to search relevant videos over their personal video archives, not requiring any textual metadata,but as convenient as searching on Youtube.

【Keywords】: Video Analysis;Webly-supervised Learning;Video Classification

784. SenseRun: Real-Time Running Routes Recommendation towards Providing Pleasant Running Experiences.

Paper Link】 【Pages】:5101-5102

【Authors】: Jiayu Long ; Jia Jia ; Han Xu

【Abstract】: In this demo, we develop a mobile running application, SenseRun, to involve landscape experiences for routes recommendation. We firstly define landscape experiences, perceived enjoyment from landscape as motivators for running, by public natural area and traffic density. Based on landscape experiences, we categorize locations into 3 types (natural, leisure, traffic space) and set them with different basic weight. Real-time context factors (weather, season and hour of the day) are involved to adjust the weight. We propose a multi-attributes method to recommend routes with weight based on MVT (The Marginal Value Theorem) k-shortest-paths algorithm. We also use a landscape-awareness sounds algorithm as supplementary of landscape experiences. Experimental results improve that SenseRun can enhance running experiences and is helpful to promote regular physical activities.

【Keywords】: routes recommendation; soundscape construction; landscape; running experiences

785. Natural Language Dialogue for Building and Learning Models and Structures.

Paper Link】 【Pages】:5103-5104

【Authors】: Ian E. Perera ; James F. Allen ; Lucian Galescu ; Choh Man Teng ; Mark H. Burstein ; Scott E. Friedman ; David D. McDonald ; Jeffrey M. Rye

【Abstract】: We demonstrate an integrated system for building and learning models and structures in both a real and virtual environment. The system combines natural language understanding, planning, and methods for composition of basic concepts into more complicated concepts. The user and the system interact via natural language to jointly plan and execute tasks involving building structures, with clarifications and demonstrations to teach the system along the way. We use the same architecture for building and simulating models of biology, demonstrating the general-purpose nature of the system where domain-specific knowledge is concentrated in sub-modules with the basic interaction remaining domain-independent. These capabilities are supported by our work on semantic parsing, which generates knowledge structures to be grounded in a physical representation, and composed with existing knowledge to create a dynamic plan for completing goals. Prior work on learning from natural language demonstrations enables learning of models from very few demonstrations, and features are extracted from definitions in natural language. We believe this architecture for interaction opens up a wide possibility of human-computer interaction and knowledge transfer through natural language.

【Keywords】: natural language processing; dialogue systems; situated agents; planning; biology; learning; symbol grounding; natural language understanding; human-computer interaction

786. From Semantic Models to Cognitive Buildings.

Paper Link】 【Pages】:5105-

【Authors】: Joern Ploennigs ; Anika Schumann

【Abstract】: Today's operation of buildings is either based on simple dashboards that are not scalable to thousands of sensor data or on rules that provide very limited fault information only. In either case considerable manual effort is required for diagnosing building operation problems related to energy usage or occupant comfort. We present a Cognitive Building demo that uses (i) semantic reasoning to model physical relationships of sensors and systems, (ii) machine learning to predict and detect anomalies in energy flow, occupancy and user comfort, and (iii) speech-enabled Augmented Reality interfaces for immersive interaction with thousands of devices. Our demo analyzes data from more than 3,300 sensors and shows how we can automatically diagnose building operation problems.

【Keywords】: Cognitive IoT; Semantic Models; Reasoning; Machine-Learning; Augmented Reality