Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020, Cincinnati, Ohio, USA, May 7-9, 2020. SIAM 【DBLP Link】
【Paper Link】 【Pages】:1-9
【Authors】: Akihiro Yamaguchi ; Shigeru Maya ; Kohei Maruchi ; Ken Ueno
【Abstract】: Shapelets are time-series segments effective for classifying time-series instances. Joint learning of both classifiers and shapelets has been studied in recent years, because such methods provide both interpretable results and superior accuracy. Partial Area Under the ROC curve (pAUC) for a low range of False Positive Rates (FPR) is an important performance measure for practical cases in industries such as medicine, manufacturing, and maintenance. In this study, we propose a method that jointly learns both shapelets and a classifier for pAUC optimization in any FPR range, including the full AUC. We demonstrate superiority of pAUC on UCR time-series datasets and its effectiveness in industrial case studies.
【Keywords】:
【Paper Link】 【Pages】:10-18
【Authors】: Erik Scharwächter ; Emmanuel Müller
【Abstract】: In many application domains, time series are monitored to detect extreme events like technical faults, natural disasters, or disease outbreaks. Unfortunately, it is often non-trivial to select both a time series that is informative about events and a powerful detection algorithm: detection may fail because the detection algorithm is not suitable, or because there is no shared information between the time series and the events of interest. In this work, we thus propose a non-parametric statistical test for shared information between a time series and a series of observed events. Our test allows identifying time series that carry information on event occurrences without committing to a specific event detection methodology. In a nutshell, we test for divergences of the value distributions of the time series at increasing lags after event occurrences with a multiple two-sample testing approach. In contrast to related tests, our approach is applicable for time series over arbitrary domains, including multivariate numeric, strings or graphs. We perform a large-scale simulation study to show that it outperforms or is on par with related tests on our task for univariate time series. We also demonstrate the real-world applicability of our approach on datasets from social media and smart home environments.
【Keywords】:
【Paper Link】 【Pages】:19-27
【Authors】: Yang Li ; Buyue Qian ; Xianli Zhang ; Hui Liu
【Abstract】: Predicting the future health conditions of patients based on Electronic Health Records (EHR) is an important research topic. Due to the temporal nature of EHR data, the major challenge is how to properly model the sequences of patient visits. Recurrent Neural Networks (RNNs) with attention mechanisms are widely employed to address this challenge, but often vulnerable to data insufficiency. Lately, predictive models with the guidance of medical knowledge have been proposed to solve this problem and achieve superior performance. Although these models learn reasonable embeddings (infused with knowledge) for clinical variables, they are not able to fully make use of the underlying information in the knowledge graph. To address this, we propose an end-to-end robust solution, namely Graph Neural networks based Diagnosis Prediction (GNDP), to predict future conditions for patients. Compared with existing methods, GNDP learns the spatial and temporal patterns from patients' sequential graph, in which the domain knowledge is naturally infused. We evaluate our GNDP model against a set of state-of-the-art methods on two real-world EHR datasets and the results demonstrate that our approach significantly outperforms the baseline methods.
【Keywords】:
【Paper Link】 【Pages】:28-36
【Authors】: Tyler Wilson ; Pang-Ning Tan ; Lifeng Luo
【Abstract】: Convolutional methods are useful for modeling geospatial data as they enable the extraction of broad-scale spatial patterns from the local attributes observed at each location. The weighted aggregation performed by the convolutional operator also helps to smoothen the noisy data collected at the given locations. However, current convolutional methods are primarily designed to learn the spatial dependencies of the input (predictor) variables only. The recent success in applying multi-task learning to various geospatial prediction problems shows that the model parameters themselves may also be spatially related. This suggests the possibility of employing convolutional methods to learn the spatial dependencies among the model parameters at different locations, especially in situations where there are limited training data available to fit accurate local models. In this paper, we investigate three different ways to incorporate convolutions into geospatial prediction models—convolutions on the predictors, model parameters, or a hybrid of both. We provide guidance on when convolution of each type can be fruitfully applied and verify their effectiveness using both synthetic and real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:37-45
【Authors】: Suhas Thejaswi ; Aristides Gionis
【Abstract】: In this paper, we study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multi-set of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several interesting applications, for example, recommending tours for tourists, or searching for abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we define, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution can scale to massive graphs with up to hundred million edges, despite the problems being NP-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and highly optimized. For example, in a real-world graph dataset with more than six million edges and a multi-set query with ten colors, we can extract an optimal solution in less than eight minutes on a haswell desktop with four cores.
【Keywords】:
【Paper Link】 【Pages】:46-54
【Authors】: Suwen Lin ; Xian Wu ; Gonzalo J. Martínez ; Nitesh V. Chawla
【Abstract】: Missing data points is a common problem associated with data collected from wearables. This problem is particularly compounded if different subjects have different aspects of missingness associated with them – that is varying degrees of compliance behavior of individuals (participants) with respect to wearables as well as personal changes in lifestyle and health impacting heart rate. Moreover, despite the varying degree of compliance behavior, the wearable in itself might have glitches that lead to observations being dropped. Thus, any missing value imputation in such data has to not only generalize to the wearable behavior but also to the participant behavior. In this paper, we present a deep learning based approach for imputing missing values in heart rate time series data collected from a participant's wearable. In particular, for each participant, we first leverage his/her historical heart rate records as a reference set to extract the underlying personalized characteristics, and then impute the missing heart rate values by considering both contextual information of the current observations and the user's features learned from previous records. Adversarial training is applied to guide the learning process, which imputed more reasonable heart rate series with the consideration of human health conditions, e.g., heart rate fluctuations. Extensive experiments are conducted on two real-world data to show the superiority of our proposed method over state-of-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:55-63
【Authors】: Julia Lasserre ; Abdul-Saboor Sheikh ; Evgenii Koriagin ; Urs Bergmann ; Roland Vollgraf ; Reza Shirvany
【Abstract】: Fashion e-commerce has enjoyed an exponential growth in the last few years. A key challenge of the market players is to offer customers a personalized experience and to suggest relevant articles. In that respect, although product recommendation is a well-studied field, size and fit recommendation is still in its infancy. The size and fit topic is a very challenging problem as data is extremely sparse and noisy. Most approaches so far have exploited traditional machine learning techniques. In this work, we bring forward a meta-learning approach using an underlying deep neural network. The advantage of such an approach lies in its ability to exploit large scale data, learn across fashion categories, and absorb new data efficiently without re-training. We benchmark our method against 3 recent methods proven successful in the domain, and demonstrate various strengths of the proposed approach. To that end, we use a large-scale anonymized dataset of about 9.4 million customer-size interactions, collected over 5 years from around 384k customers.
【Keywords】:
【Paper Link】 【Pages】:64-72
【Authors】: Zhiwei Liu ; Mengting Wan ; Stephen Guo ; Kannan Achan ; Philip S. Yu
【Abstract】: Within-basket recommendation reduces the exploration time of users, where the user's intention of the basket matters. The intent of a shopping basket can be retrieved from both user-item collaborative filtering signals and multi-item correlations. By defining a basket entity to represent the basket intent, we can model this problem as a basket-item link prediction task in the User-Basket-Item (UBI) graph. Previous work solves the problem by leveraging user-item interactions and item-item interactions simultaneously. However, collectivity and heterogeneity characteristics are hardly investigated before. Collectivity defines the semantics of each node which should be aggregated from both directly and indirectly connected neighbors. Heterogeneity comes from multi-type interactions as well as multi-type nodes in the UBI graph. To this end, we propose a new framework named BasConv, which is based on the graph convolutional neural network. Our BasConv model has three types of aggregators specifically designed for three types of nodes. They collectively learn node embeddings from both neighborhood and high-order context. Additionally, the interactive layers in the aggregators can distinguish different types of interactions. Extensive experiments on two real-world datasets prove the effectiveness of BasConv.
【Keywords】:
【Paper Link】 【Pages】:73-81
【Authors】: Hengrui Zhang ; Julian McAuley
【Abstract】: Graph-based recommendation algorithms treat user-item interactions as bipartite graphs, based on which low-dimensional vector representations of users and items seek to preserve the relationships among them. Previous methods usually capture users' preferences by directly learning first-order neighborhood patterns for each node, which limits their ability to exploit the similarity between two distant users/items as well as a user's preferences toward distant items. To address this potential weakness, in this paper, we propose SMOG-CF (Stacked Mixed-Order Graph Convolutional Networks for Collaborative Filtering), a GCN-based framework that can directly capture high-order connectivity among nodes. Instead of implicitly capturing high-order connectivity through embedding propagation, SMOG-CF facilitates ‘path-level’ information propagation between neighboring nodes at any order. The matrix form of our embedding propagation formulas yields a model that is easy to deploy and can be extended to a general framework by adopting various information construction and aggregation equations. Experiments on several datasets of varying scale demonstrate the efficacy of our model.
【Keywords】:
【Paper Link】 【Pages】:82-90
【Authors】: Rui Ding ; Guibing Guo ; Xiaochun Yang ; Bowei Chen ; Zhirong Liu ; Xiuqiang He
【Abstract】: Recently, GAN-based collaborative filtering methods have gained increasing attention in recommendation tasks which can learn remarkable user and item representation. However, these existing GAN-based methods mainly suffer from two limitations: (1) Their trainings are not comprehensive given the fact that the discriminator may be trained misleadingly and over-early converging since the generator may accidentally sample real items as fake ones, resulting in the emergence of contradicting labels for the same items. (2) They fail to consider implicit friends (users with the same interests.), leading to severe limitations of recommendation performance. In this paper, we propose BiGAN, an innovative bidirectional adversarial recommendation model which can alleviate the limitations mentioned above in recommendation tasks. It consists of two GANs, namely ForwardGAN and BackwardGAN. Specifically, ForwardGAN learns to generate a group of possible interacted items given a specific user, it aims to ensure that the discriminator Df can be trained effectively. Furthermore, BackwardGAN fully exploits implicit friends with similar behaviors, then propagates them back to ForwardGAN, where a similarity exploration strategy is implemented to gain more outstanding user representation. Therefore, two GANs are trained jointly in a circle, where the augment of one GAN will enhance another one, leading to the promising user and item representation. In the experimental part, we demonstrate that our model is superior to other state-of-the-art recommenders.
【Keywords】:
【Paper Link】 【Pages】:91-99
【Authors】: Xunqiang Jiang ; Binbin Hu ; Yuan Fang ; Chuan Shi
【Abstract】: Recommender systems play an important role in helping users discover items of interest from a large resource collection in various online services. Although current deep neural network-based collaborative filtering methods have achieved state-of-the-art performance in recommender systems, they still face a few major weaknesses. Most importantly, such deep methods usually focus on the direct interaction between users and items only, without explicitly modeling high-order co-occurrence contexts. Furthermore, they treat the observed data uniformly, without fine-grained differentiation of importance or relevance in the user-item interactions and high-order co-occurrence contexts. Inspired by recent progress in memory networks, we propose a novel multiplex memory network for collaborative filtering (MMCF). More specifically, MMCF leverages a multiplex memory layer consisting of an interaction memory and two co-occurrence context memories simultaneously, in order to jointly capture and locate important and relevant information in both user-item interactions and co-occurrence contexts. Lastly, we conduct extensive experiments on four datasets, and the results show the superior performance of our model in comparison with a suite of state-of-the-art methods.
【Keywords】:
【Paper Link】 【Pages】:100-108
【Authors】: Buru Chang ; Yookyung Koh ; Donghyeon Park ; Jaewoo Kang
【Abstract】: Successive point-of-interest (POI) recommendation based on user check-in histories plays an important role in mobile-based social media platforms. Although a large amount of check-in data including textual content is generated from such platforms, most successive POI recommendation models do not leverage textual contents that provide useful information for understanding user interests. To address this problem, we propose a new content-aware successive POI recommendation (CAPRE) model in this paper. Based on a multi-head attention mechanism and a character-level convolutional neural network, CAPRE encodes usergenerated textual contents into content embedding to capture user interests. Based on long short-term memories (LSTMs), CAPRE capture content-aware user behavior patterns from encoded content embedding. Evaluation results on real-world datasets show that CAPRE achieves state-of-the-art recommendation performance.
【Keywords】:
【Paper Link】 【Pages】:109-117
【Authors】: Zahra Ghafoori ; Christopher Leckie
【Abstract】: Deep learning is increasingly used for unsupervised feature extraction and anomaly detection in big datasets. Most deep learning based anomaly detection techniques separately train a neural network for feature extraction, then apply a traditional anomaly detection method on the extracted features. These hybrid techniques have achieved higher accuracy than traditional anomaly detection methods and reconstruction-error-based deep autoencoders. However, recent research demonstrates that jointly optimising the objectives of the deep network and the anomaly detection technique in a hybrid architecture substantially improves detection performance. Existing methods that use this objective assume that the normal (i.e., non-anomalous) data comes from a single distribution. In this paper, we show that violation of this assumption negatively affects performance of these methods and creates model bias in the favour of anomalies. We propose Deep Multi-sphere Support Vector Data Description, which jointly optimises the objectives of the deep network and anomaly detection. It generates useful and discriminative features by embeding normal data with a multi-modal distribution into multiple data-enclosing hyperspheres with minimum volume. We empirically show that our proposed method outperforms state-of-the-art shallow and deep anomaly detection methods.
【Keywords】:
【Paper Link】 【Pages】:118-126
【Authors】: Adrian Englhardt ; Holger Trittenbach ; Dennis Vetter ; Klemens Böhm
【Abstract】: Active learning methods collect annotations in the form of class labels, often from human experts, to improve upon some classification task. In many cases, one can collect annotations for a batch of observations at a time, e.g., when several annotators are available. This can make the annotation process more efficient, both regarding human and computational resources. However, selecting a good batch is difficult. It requires to understand several trade-offs between the costs of classifier training, batch selection, annotation effort, and classification accuracy. For one-class classification, a very important application of active learning, batch selection has not received any attention in the literature so far. In this article, we strive to find a sweet spot between the costs of batch-mode learning and classification accuracy. To this end, we first frame batch selection as an optimization problem. We then propose several strategies to identify good batches, discuss their properties, and evaluate them on real-world data. A core result is that a sweet spot indeed exists, with active learning costs reduced by up to an order of magnitude compared to a sequential procedure, without sacrificing accuracy.
【Keywords】:
【Paper Link】 【Pages】:127-135
【Authors】: Vincent Vercruyssen ; Wannes Meert ; Jesse Davis
【Abstract】: Given its large applicational potential, time series anomaly detection has become a crucial data mining task. Its goal is to identify periods of a time series where there is a deviation from the expected behavior. Existing approaches focus on analyzing whether the currently observed behavior differs from previously seen, normal behavior. In contrast, this paper tackles the the task where the absence of a previously observed behavior is indicative of an anomaly. In other words, a pattern that is expected to recur in the time series is absent. In real-world use cases, absent patterns can be linked to serious problems. For instance, if a scheduled, regular maintenance operation of a machine does not take place, this can be harmful to the machine at a later time. In this paper, we introduce the task of detecting when a specific pattern is absent in a real-valued time series. We propose a novel technique called FZapPa that can address this task. Empirically, FZapPa outperforms existing anomaly techniques on a benchmark of real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:136-144
【Authors】: Li Zhang ; Yifeng Gao ; Jessica Lin
【Abstract】: Finding anomalous subsequence in a long time series is a very important but difficult problem. Existing state-of-the-art methods have been focusing on searching for the subsequence that is the most dissimilar to the rest of the subsequences; however, they do not take into account the background patterns that contain the anomalous candidates. As a result, such approaches are likely to miss local anomalies. We introduce a new definition named semantic discord, which incorporates the context information from larger subsequences containing the anomaly candidates. We propose an efficient algorithm with a derived lower bound that is up to 3 orders of magnitude faster than the brute force algorithm in real world data. We demonstrate that our method significantly outperforms the state-of-the-art methods in locating anomalies by extensive experiments. We further explain the interpretability of semantic discord.
【Keywords】:
【Paper Link】 【Pages】:145-153
【Authors】: Adrian Englhardt ; Klemens Böhm
【Abstract】: The quality of a classifier hinges on the availability of training data. In scenarios where data collection is restricted or expensive, e.g., compute-intensive simulations, training data may be small and/or biased. In principle, data synthesis then allows to extend the data set. Yet it is difficult for a user to extend the data without any guidance when the data space is unbound or of high dimensionality. In this article we target at the domain expansion problem, i.e., expanding the classifier knowledge beyond an initial sample that completely falls into one class. We first propose a general framework for query synthesis in the one-class setting. Then we present a new query synthesis strategy to quickly explore the data space beyond the initial sample. For the evaluation we derive three options to simulate an oracle in the one-class setting that can answer arbitrary queries. Experiments on both synthetic and real world data demonstrate that our new query strategy indeed expands the knowledge of a one-class classifier beyond a small and biased initial sample. Our strategy outperforms realistic baselines on most domain expansion problems.
【Keywords】:
【Paper Link】 【Pages】:154-162
【Authors】: Mehadi Hassen ; Philip K. Chan
【Abstract】: In this paper, we present a neural network based representation for the Open Set Recognition problem. In this representation instances from the same class are close to each other while instances from different classes are further apart. When used for Open Set Recognition tasks, evaluated on three datasets from two different domains, the proposed approach results in a statistically significant improvement compared to other approaches.
【Keywords】:
【Paper Link】 【Pages】:163-171
【Authors】: Chi-Cheng Chiu ; Pin-Yen Lin ; Chih-Jen Lin
【Abstract】: Coordinate descent (CD) methods have been a state-of-the-art technique for training large-scale linear SVM. The most used setting is to solve the dual problem of an SVM formulation without the bias term (or an SVM formulation by embedding the bias term in the weight vector). The reason of omitting the bias term is that dual SVM no longer has a linear constraint and the CD procedure of updating one variable at a time is very simple. However, some have criticized the decision of not considering the bias term. To understand the role of the bias term in the design of CD methods for linear SVM, we give a thorough study on two-variable CD. First, if the bias term is not considered, we develop a two-variable CD that is competitive with the commonly used one-variable CD and is superior for difficult problems. The procedure is simple and has theoretical linear-rate convergence. Second, we investigate two-variable CD for linear SVM with the bias term. Analysis shows that CD is much less efficient for such a setting. Therefore, we conclude that in using CD for linear SVM, in general the bias term should not be considered.
【Keywords】:
【Paper Link】 【Pages】:172-180
【Authors】: Naoto Ohsaka ; Tomoya Sakai ; Akihiro Yabe
【Abstract】: Predictive optimization is a framework for designing an entire data-analysis pipeline that comprises both prediction and optimization, to be able to maximize overall throughput performance. In practical demand analysis, a knowledge of hierarchies, which might be geographical or categorical, is recognized as useful, though such additional knowledge has not been taken into account in existing predictive optimization. In this paper, we propose a novel hierarchical predictive optimization pipeline that is able to deal with a wide range of applications including inventory management. Based on an existing hierarchical demand prediction model, we present a stochastic matching framework that can manage prediction-uncertainty in decision making. We further provide a greedy approximation algorithm for solving demand matching on hierarchical structures. In experimental evaluations on both artificial and real-world data, we demonstrate the effectiveness of our proposed hierarchical-predictive-optimization pipeline.
【Keywords】:
【Paper Link】 【Pages】:181-189
【Authors】: Hung-Yi Chou ; Pin-Yen Lin ; Chih-Jen Lin
【Abstract】: One-class support vector machines (SVM) and support vector data description (SVDD) are two effective outlier detection techniques. They have been successfully applied to many applications under the kernel settings, but for some high dimensional data, linear rather than kernel one-class SVM and SVDD may be more suitable. Past developments on kernel and linear classification have indicated that specially designed optimization algorithms can make the training for linear scenarios much faster. However, we point out that because of some differences from standard linear SVM, existing algorithms may not be efficient for one-class scenarios. We then develop some novel coordinate descent methods for linear one-class SVM and SVDD. Experiments demonstrate their superiority on the convergence speed.
【Keywords】:
【Paper Link】 【Pages】:190-198
【Authors】: Xinyan Li ; Qilong Gu ; Yingxue Zhou ; Tiancong Chen ; Arindam Banerjee
【Abstract】: While stochastic gradient descent (SGD) and variants have been surprisingly successful for training deep nets, several aspects of the optimization dynamics and generalization are still not well understood. In this paper, we present new empirical observations and theoretical results on both the optimization dynamics and generalization behavior of SGD for deep nets based on the Hessian of the training loss and associated quantities. We consider three specific research questions: (1) what is the relationship between the Hessian of the loss and the second moment of stochastic gradients (SGs)? (2) how can we characterize the stochastic optimization dynamics of SGD with fixed step sizes based on the first and second moments of SGs? and (3) how can we characterize a scale-invariant generalization bound of deep nets based on the Hessian of the loss? Throughout the paper, we support theoretical results with empirical observations, with experiments on synthetic data, MNIST, and CIFAR-10, with different batch sizes, and with different difficulty levels by synthetically adding random labels.
【Keywords】:
【Paper Link】 【Pages】:199-207
【Authors】: Peng Xu ; Fred Roosta ; Michael W. Mahoney
【Abstract】: While first-order optimization methods, such as SGD are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.
【Keywords】:
【Paper Link】 【Pages】:208-216
【Authors】: Hongyuan You ; Furkan Kocayusufoglu ; Ambuj K. Singh
【Abstract】: Network regularization is an effective tool for incorporating structural prior knowledge to learn coherent models over networks, and has yielded provably accurate estimates in applications ranging from spatial economics to neuroimaging studies. Recently, there has been an increasing interest in extending network regularization to the spatio-temporal case to accommodate the evolution of networks. However, in both static and spatio-temporal cases, missing or corrupted edge weights can compromise the ability of network regularization to discover desired solutions. To address these gaps, we propose a novel approach—discrepancy-aware network regularization (DANR)—that is robust to inadequate regularizations and effectively captures model evolution and structural changes over spatio-temporal networks. We develop a distributed and scalable algorithm based on alternating direction method of multipliers (ADMM) to solve the proposed problem with guaranteed convergence to global optimum solutions. Experimental results on both synthetic and real-world networks demonstrate that our approach achieves improved performance on various tasks, and enables interpretation of model changes in evolving networks.
【Keywords】:
【Paper Link】 【Pages】:217-225
【Authors】: Guiliang Liu ; Xu Li ; Mingming Sun ; Ping Li
【Abstract】: Open Information Extraction (OIE) is a task of generating the structured representations of information from natural language sentences. Recently years, many works have trained an End-to-End OIE extractor based on Sequence-to-Sequence (Seq2Seq) model and applied Reinforce Algorithm to update the model. However, the model performance often suffers from a large training variance and limited exploration. This paper introduces a reinforcement learning framework that enables an Advantage Actor-Critic (AAC) algorithm to update the Seq2Seq model with samples from a novel Confidence Exploration (CE). The AAC algorithm reduces the training variance with a fine-grained evaluation of each individual word. The confidence exploration provides effective training samples by exploring the word at key positions. Empirical evaluations demonstrate the leading performance of our Advantage Actor-Critic algorithm and Confidence Exploration over other comparison methods.
【Keywords】:
【Paper Link】 【Pages】:226-234
【Authors】: Rong Zhang ; Qifei Zhou ; Bo Wu ; Weiping Li ; Tong Mo
【Abstract】: Duplicate Question Identification (DQI) improves the processing efficiency and accuracy of large-scale community question answering and automatic QA system. The purpose of DQI task is to identify whether the paired questions are semantically equivalent. However, how to distinguish the synonyms or homonyms in paired questions is still challenging. Most previous works focus on the word-level or phrase-level semantic differences. We firstly propose to explore the asking emphasis of a question as a key factor in DQI. Asking emphasis bridges semantic equivalence between two questions. In this paper, we propose an attention model with multi-fusion asking emphasis (MFAE) for DQI. At first, BERT is used to obtain the dynamic pre-trained word embeddings. Then we get inter- and intra-asking emphasis by summing inter-attention and self-attention, respectively; the idea is that, the more a word interacts with others, the more important the word is. Finally, we use eight-way combinations to generate multi-fusion asking emphasis and multi-fusion word representation. Experimental results demonstrate that our model achieves state-of-the-art performance on both Quora Question Pairs and CQADupStack data. In addition, our model can also improve the results for natural language inference task on SNLI and MultiNLI datasets. The code is available at https://github.com/rzhangpku/MFAE.
【Keywords】:
【Paper Link】 【Pages】:235-243
【Authors】: Bo Yang ; Kejun Huang ; Nicholas D. Sidiropoulos
【Abstract】: Seeking outside funding is a critical task for many companies, but it is also challenging to identify the potential investors given the heterogeneity in size, interests, and expertise. In this work, we propose to tackle this problem via data-driven approaches. Towards this end, we first harvest relevant publicly available data about institutional investors and their holdings in publicly traded companies, as well as key financial metrics on the same set of public companies. Using these data, we approach the problem of interest as a recommender system with side information. We formulate two principal goals: predicting “missing” (potential) investor holding positions; and providing top-K investor recommendations. For each goal, custom recommendation algorithms are proposed to achieve the corresponding objective. Numerical experiments are carefully designed to validate the effectiveness of the proposed algorithms, which exhibit good performance in practice.
【Keywords】:
【Paper Link】 【Pages】:244-252
【Authors】: Gianni Costa ; Riccardo Ortale
【Abstract】: We propose a new statistical-learning approach to marrying topic modeling and document clustering. In particular, a Bayesian generative model of text collections is developed, in which the two foresaid tasks are incorporated as coupled latent factors, that govern document wording. The latter consists of word embeddings, so as to capture the semantic and syntactic regularities among words. Collapsed Gibbs sampling is derived mathematically and implemented algorithmically, along with parameter estimation, with the aim to jointly perform topic modeling and document clustering through Bayesian reasoning. Comparative tests on benchmark real-world corpora reveal the effectiveness of the devised approach in clustering collections of text documents and coherently recovering their semantics.
【Keywords】:
【Paper Link】 【Pages】:253-261
【Authors】: Guruprasad Nayak ; Rahul Ghosh ; Xiaowei Jia ; Varun Mithafi ; Vipin Kumar
【Abstract】: Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are desired at fine resolutions but available training data is scarce. In this paper, we propose classification algorithms that leverage supervision from coarser resolutions to help train models on finer resolutions. The different resolutions are modeled as different views of the data in a multi-view framework that exploits the complementarity of features across different views to improve models on both views. Unlike traditional multi-view learning problems, the key challenge in our case is that there is no one-to-one correspondence between instances across different views in our case, which requires explicit modeling of the correspondence of instances across resolutions. We propose to use the features of instances at different resolutions to learn the correspondence between instances across resolutions using attention mechanism. Experiments on the real-world application of mapping urban areas using satellite observations and sentiment classification on text data show the effectiveness of the proposed methods.
【Keywords】:
【Paper Link】 【Pages】:262-270
【Authors】: Tomoki Ito ; Kota Tsubouchi ; Hiroki Sakaji ; Tatsuo Yamashita ; Kiyoshi Izumi
【Abstract】: Deep neural networks are powerful for text sentiment analysis; however, in the real world, they cannot be used in situations where explanations are required owing to their black-box property. In response, we propose a novel neural network model called sentiment shift neural network (SSNN) that can explain the process of its sentiment analysis prediction in a way that humans find natural and agreeable. The SSNN has the following three interpretable layers: the word-level original sentiment layer, sentiment shift layer, and word-level contextual sentiment layer. Using these layers, the SSNN can explain the process of its document-level sentiment analysis results in a human-like way. Realizing the interpretability of these layers is a crucial problem. To realize this interpretability, we propose a novel learning strategy called joint sentiment propagation (JSP) learning. Using real textual datasets, we experimentally demonstrate that the proposed JSP learning is effective for improving the interpretability of layers in SSNN and that both the predictability and explanation ability of the SSNN are high.
【Keywords】:
【Paper Link】 【Pages】:271-279
【Authors】: Ruocheng Guo ; Jundong Li ; Huan Liu
【Abstract】: Counterfactual evaluation of novel treatment assignment functions (e.g., advertising algorithms and recommender systems) is one of the most crucial causal inference problems for practitioners. Traditionally, randomized controlled trials (e.g., A/B tests) are performed to evaluate treatment assignment functions. However, they can be time-consuming, expensive, and even unethical in some cases. Therefore, counterfactual evaluation of treatment assignment functions becomes a pressing issue because a massive amount of observational data becomes available in the big data era. Counterfactual evaluation requires controlling the influence of hidden confounders – the unmeasured features that causally influence both treatment assignments and outcomes. However, most of the existing methods rely on the assumption of no hidden confounders. This assumption can be untenable in the context of massive observational data. When such data comes with network information, the later can be potentially useful to correct hidden confounding bias. As such, we first formulate a novel problem, counterfactual evaluation of treatment assignment functions with networked observational data. Then, we investigate the following research questions: How can we utilize network information in counterfactual evaluation? Can network information improve the estimates in counterfactual evaluation? Toward answering these questions, first, we propose a novel framework, Counterfactual Network Evaluator (CONE), which (1) learns partial representations of latent confounders under the supervision of observed treatments and outcomes; and (2) combines them for counterfactual evaluation. Then through extensive experiments, we corroborate the effectiveness of CONE. The results imply that incorporating network information mitigates hidden confounding bias in counterfactual evaluation.
【Keywords】:
【Paper Link】 【Pages】:280-288
【Authors】: Hongjing Zhang ; S. S. Ravi ; Ian Davidson
【Abstract】: Active learning aims to reduce labeling efforts by selectively asking humans to annotate the most important data points from an unlabeled pool and is an example of human-machine interaction. Though active learning has been extensively researched for classification and ranking problems, it is relatively understudied for regression problems. Most existing active learning for regression methods use the regression function learned at each active learning iteration to select the next informative point to query. This introduces several challenges such as handling noisy labels, parameter uncertainty and overcoming initially biased training data. Instead, we propose a feature-focused approach that formulates both sequential and batch-mode active regression as a novel bipartite graph optimization problem. We conduct experiments on both noise-free and noisy settings. Our experimental results on benchmark data sets demonstrate the effectiveness of our proposed approach.
【Keywords】:
【Paper Link】 【Pages】:289-297
【Authors】: Charu C. Aggarwal ; Yao Li ; Philip S. Yu
【Abstract】: Many forms of social, information, and communication network activity create large volumes of graph stream data. In such cases, it is often desirable to track interesting properties of the underlying nodes, as they change over time. These dynamic properties can often be represented in the form of time-dependent labels associated with the nodes. Dynamic changes in such node labels may be indicative of important events or patterns of activity. This paper will study the problem of differential classification in graph streams, in which we predict significant classification events; i.e. the changes in classification labels of the nodes. Different from the static collective classification problem, this approach focusses on dynamic and real-time detection of changes in node classification, as opposed to the actual classification of nodes. The differential stream classification problem can also be considered a general form of the nodecentric event detection problem, in which node labels are used in order to supervise the detection process. We present experimental results illustrating the effectiveness of our method.
【Keywords】:
【Paper Link】 【Pages】:298-306
【Authors】: Aritra Ghosh ; Saayan Mitra ; Somdeb Sarkhel ; Viswanathan Swaminathan
【Abstract】: Maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. The policy maximizing the utility is referred to as the optimal bidding strategy. Earlier works on optimal bidding strategy apply model-based batch reinforcement learning methods which can not generalize to unknown budget and time constraint. Further, the advertiser observes a censored market price which makes direct evaluation infeasible on batch test datasets. Previous works ignore the losing auctions to alleviate the difficulty with censored states; thus significantly modifying the test distribution. We address the challenge of lacking a clear evaluation procedure as well as the error propagated through batch reinforcement learning methods in RTB systems. We exploit two conditional independence structures in the sequential bidding process that allow us to propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic. Moreover, the framework allows us to train a model that can generalize to the unseen budget conditions than limit only to those observed in history. We compare our methods on two real-world RTB datasets with several baselines and demonstrate significantly improved performance under various budget settings.
【Keywords】:
【Paper Link】 【Pages】:307-315
【Authors】: Wentao Wang ; Suhang Wang ; Wenqi Fan ; Zitao Liu ; Jiliang Tang
【Abstract】: In many real-world classification applications such as fake news detection, the training data can be extremely imbalanced, which brings challenges to existing classifiers as the majority classes dominate the loss functions of classifiers. Oversampling techniques such as SMOTE are effective approaches to tackle the class imbalance problem by producing more synthetic minority samples. Despite their success, the majority of existing oversampling methods only consider local data distributions when generating minority samples, which can result in noisy minority samples that do not fit global data distributions or interleave with majority classes. Hence, in this paper, we study the class imbalance problem by simultaneously exploring local and global data information since: (i) the local data distribution could give detailed information for generating minority samples; and (ii) the global data distribution could provide guidance to avoid generating outliers or samples that interleave with majority classes. Specifically, we propose a novel framework GL-GAN, which leverages the SMOTE method to explore local distribution in a learned latent space and employs GAN to capture the global information, so that synthetic minority samples can be generated under even extremely imbalanced scenarios. Experimental results on diverse real data sets demonstrate the effectiveness of our GL-GAN framework in producing realistic and discriminative minority samples for improving the classification performance of various classifiers on imbalanced training data. Our code is available at https://github.com/wentao-repo/GL-GAN.
【Keywords】:
【Paper Link】 【Pages】:316-324
【Authors】: Jay S. Stanley III ; Scott Gigante ; Guy Wolf ; Smita Krishnaswamy
【Abstract】: We propose a novel framework for combining datasets via alignment of their intrinsic geometry. This alignment can be used to fuse data originating from disparate modalities, or to correct batch effects while preserving intrinsic data structure. Importantly, we do not assume any pointwise correspondence between datasets, but instead rely on correspondence between a (possibly unknown) subset of data features. We leverage this assumption to construct an isometric alignment between the data. This alignment is obtained by relating the expansion of data features in harmonics derived from diffusion operators defined over each dataset. These expansions encode each feature as a function of the data geometry. We use this to relate the diffusion coordinates of each dataset through our assumption of partial feature correspondence. Then, a unified diffusion geometry is constructed over the aligned data, which can also be used to correct the original data measurements. We demonstrate our method on several datasets, showing in particular its effectiveness in biological applications including fusion of single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data measured on the same population of cells, and removal of batch effect between biological samples.
【Keywords】:
【Paper Link】 【Pages】:325-333
【Authors】: Ricky Laishram ; Ahmet Erdem Sariyüce ; Tina Eliassi-Rad ; Ali Pinar ; Sucheta Soundarajan
【Abstract】: In many online social networking platforms, the participation of an individual is motivated by the participation of others. If an individual chooses to leave a platform, this may produce a cascade in which that person's friends then choose to leave, causing their friends to leave, and so on. In some cases, it may be possible to incentivize key individuals to stay active within the network, thus preventing such a cascade. This problem is modeled using the anchored k-core of a network, which, for a network G and set of anchor nodes A, is the maximal subgraph of G in which every node has a total of at least k neighbors between the subgraph and anchors. In this work, we propose Residual Core Maximization (RCM), a novel algorithm for finding b anchor nodes so that the size of the anchored k-core is maximized. We perform a comprehensive experimental evaluation on numerous real-world networks and compare RCM to various baselines. We observe that RCM is more effective and efficient than the state-of-the-art methods: on average, RCM produces anchored k-cores that are 1.65 times larger than those produced by the baseline algorithm, and is approximately 500 times faster on average.
【Keywords】:
【Paper Link】 【Pages】:334-342
【Authors】: Yi He ; Sheng Chen ; Thu Nguyen ; Bruce A. Wade ; Xindong Wu
【Abstract】: Mining vertex-wise interactions in graphs helps reveal useful information in real-world applications, such as bioinformatics networks BioGRID and DrugBank and academic networks DBLP and Arxiv. A main challenge in developing a general learning method for this setting is that each vertex may be associated with features from heterogeneous feature spaces, representing very disparate information. Moreover, features could be raw and low-level, leading to sparse representations. Some solutions in this area treat all feature spaces as equally important and concatenate features from heterogeneous feature spaces into a single feature vector. Others harmonize different feature spaces by respecting their relative significance in mining vertex-wise interactions but requiring construct specialized harmonizing function and/or handcrafting expressive features, both of which entail expert knowledge. Motivated by this observation, we propose a new learning paradigm named Deep Matrix Tri-Factorization (DM3F), which draws insights from deep models: (i) DM3F replaces the linear combination with a neural architecture that can learn an arbitrary harmonizing function from data; and (ii) DM3F allows raw feature inputs and automatically extracts high-level feature representations via a layer-by-layer learning mechanism. These two characteristics of DM3F make it accessible for users without expert knowledge. DM3F includes two orthogonal and complementary models, allowing an ensemble mechanism to optimize its performance during both training and predicting. A theoretical analysis of DM3F reveals that it possesses several desirable properties, including that it strictly generalizes matrix factorization models. We demonstrate the performance of DM3F on two real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:343-351
【Authors】: Christian Bauckhage ; Rafet Sifa ; Stefan Wrobel
【Abstract】: The combinatorial problem of max-sum diversification asks for a maximally diverse subset of a given set of data. Here, we show that it can be expressed as an Ising energy minimization problem. Given this result, max-sum diversification can be solved on adiabatic quantum computers and we present proof of concept simulations which support this claim. This, in turn, suggests that quantum computing might play a role in data mining. We therefore discuss quantum computing in a tutorial like manner and elaborate on its current strengths and weaknesses for data analysis.
【Keywords】:
【Paper Link】 【Pages】:352-360
【Authors】: Wenqi Fan ; Yao Ma ; Han Xu ; Xiaorui Liu ; Jianping Wang ; Qing Li ; Jiliang Tang
【Abstract】: Canonical Correlation Analysis (CCA) aims to learn the linear projections of two sets of variables where they are correlated maximally, which is not optimal for variables with non-linear relations. Recent years have witnessed great efforts in developing deep neural networks based CCA models, which are able to learn flexible non-linear and highly correlated representations between two variables. In addition to learning representations, generating realistic multi-view samples is also becoming highly desired in many real-world applications. However, the majority of existing CCA models do not provide mechanisms for realistic samples generation. Meanwhile, adversarial learning techniques such as generative adversarial networks have been proven to be effective in generating realistic samples similar to real data distribution. Thus, incorporating adversarial learning techniques has a great potential to advance Canonical Correlation Analysis. In this paper, we harness the power of adversarial learning techniques to equip Canonical Correlation Analysis with the ability of realistic data generation. In particular, we propose a Deep Adversarial Canonical Correlation Analysis model (DACCA), which can simultaneously learn representation of multi-view data but also generate realistic multi-view samples. Comprehensive experiments have been conducted on three real-world datasets and the results demonstrate the effectiveness of the proposed model. Our code is available at https://github.com/wenqifan03/DACCA.
【Keywords】:
【Paper Link】 【Pages】:361-369
【Authors】: Yan Zhang ; Zhao Zhang ; Zheng Zhang ; Mingbo Zhao ; Li Zhang ; Zhengjun Zha ; Meng Wang
【Abstract】: In this paper, we technically propose a novel framework called Deep Self-representative Concept Factorization Network (DSCF-Net), for clustering deep features. To improve the representation and clustering abilities, DSCF-Net explicitly considers discovering hidden deep semantic features, enhancing the robustness properties of the deep factorization to noise and preserving the local manifold structures of deep features. Specifically, DSCF-Net integrates the robust deep concept factorization, deep self-expressive representation and adaptive locality preserving feature learning into a unified framework. To discover hidden deep representations, DSCF-Net designs a hierarchical factorization architecture using multiple layers of linear transformations, where the hierarchical representation is performed by formulating the problem as optimizing the basis concepts in each layer to improve the representation indirectly. DSCF-Net also improves robustness by subspace recovery for sparse error correction firstly and then performs deep factorization in the recovered visual subspace. To obtain localitypreserving representations, we also present an adaptive deep self-representative weighting strategy by using the coefficient matrix as adaptive weights to keep the locality of representations. Extensive results show that DSCF-Net delivers state-of-the-art performance on several public databases.
【Keywords】:
【Paper Link】 【Pages】:370-378
【Authors】: Fan Yang ; Ninghao Liu ; Mengnan Du ; Kaixiong Zhou ; Shuiwang Ji ; Xia Hu
【Abstract】: Deep neural network (DNN) has become an effective computational tool because of its superior performance in practice. However, the generalization of DNN still largely depends on the training data, no matter in quantity or quality. In this paper, we propose a knowledge instillation framework, named NeuKI, for feed-forward DNN, aiming to enhance learning performance with the aid of knowledge. This task is particularly challenging due to the complicated nature of knowledge and numerous variants of DNN architectures. To bridge the gap, we construct a separate knowledge-DNN faithfully encoding the instilled knowledge for joint training. The core idea is to regularize the training of target-DNN with the constructed knowledge-DNN, so that the instilled knowledge can guide the model training. The proposed NeuKI is demonstrated to be applicable to both knowledge rules and constraints, where rules are encoded by structure and constraints are handled by loss. Experiments are conducted on several real-world datasets from different domains, and the results demonstrate the effectiveness of NeuKI in improving learning performance, as well as relevant data efficiency and model interpretability.
【Keywords】:
【Paper Link】 【Pages】:379-387
【Authors】: Xin Dai ; Xiangnan Kong ; Xinyue Liu ; John Boaz Lee ; Constance M. Moore
【Abstract】: Neuroimaging data typically undergoes several preprocessing steps before further analysis and mining can be done. Affine image registration is one of the important tasks during preprocessing. Recently, several image registration methods which are based on Convolutional Neural Networks have been proposed. However, due to the high computational and memory requirements of CNNs, these methods cannot be used in real-time for large neuroimaging data like fMRI. In this paper, we propose a Dual-Attention Recurrent Network (DRN) which uses a hard attention mechanism to allow the model to focus on small, but task-relevant, parts of the input image – thus reducing computational and memory costs. Furthermore, DRN naturally supports inhomogeneity between the raw input image (e.g., functional MRI) and the image we want to align it to (e.g., anatomical MRI) so it can be applied to harder registration tasks such as fMRI coregistration and normalization. Extensive experiments on two different datasets demonstrate that DRN significantly reduces the computational and memory costs compared with other neural network-based methods without sacrificing the quality of image registration.
【Keywords】:
【Paper Link】 【Pages】:388-396
【Authors】: Siwu Liu ; Ji Hwan Park ; Shinjae Yoo
【Abstract】: Graph convolution is a generalization of the convolution operation from structured grid data to unstructured graph data. Because any type of data can be represented on a feature graph, graph convolution has been a powerful tool for modeling various types of data. However, such flexibility comes with a price: expensive time and space complexities. Even with state-of-the-art scalable graph convolution algorithms, it remains challenging to scale graph convolution for practical applications. Hence, we propose using Diverse Power Iteration Embeddings (DPIE) to construct scalable graph convolution neural networks. DPIE is an approximated spectral embedding with orders of magnitude faster speed that does not incur additional space complexity, resulting in efficient and effective graph convolution approximation. DPIE-based graph convolution avoids expensive convolution operation in the form of matrix-vector multiplication using the embedding of a lower dimension. At the same time, DPIE generates graphs implicitly, which dramatically reduces space cost when building graphs from unstructured data. The method is tested on various types of data. We also extend the graph convolution to extreme-scale data never-before studied in the graph convolution field. Experiment results show the scalability and effectiveness of DPIE-based graph convolution.
【Keywords】:
【Paper Link】 【Pages】:397-405
【Authors】: Rafael Rêgo Drumond ; Lukas Brinkmeyer ; Josif Grabocka ; Lars Schmidt-Thieme
【Abstract】: The performance of gradient-based optimization strategies depends heavily on the initial weights of the parametric model. Recent works show that there exist weight initializations from which optimization procedures can find the task-specific parameters faster than from uniformly random initializations and that such a weight initialization can be learned by optimizing a specific model architecture across similar tasks via MAML (Model-Agnostic Meta-Learning). Current methods are limited to populations of classification tasks that share the same number of classes due to the static model architectures used during meta-learning. In this paper, we present HIDRA, a meta-learning approach that enables training and evaluating across tasks with any number of target variables. We show that Model-Agnostic Meta-Learning trains a distribution for all the neurons in the output layer and a specific weight initialization for the ones in the hidden layers. HIDRA explores this by learning one master neuron, which is used to initialize any number of output neurons for a new task. Extensive experiments on the Miniimagenet and Omniglot data sets demonstrate that HIDRA improves over standard approaches while generalizing to tasks with any number of target variables. Moreover, our approach is shown to robustify low-capacity models in learning across complex tasks with a high number of classes for which regular MAML fails to learn any feasible initialization.
【Keywords】:
【Paper Link】 【Pages】:406-414
【Authors】: Yuta Saito ; Hayato Sakata ; Kazuhide Nakata
【Abstract】: Uplift modeling aims to optimize treatment policies and is a promising method for causal-based personalization in various domains such as medicine and marketing. However, applying this method to real-world problems faces challenges such as the impossibility of validation and binary treatment limitation. The Contextual Treatment Selection (CTS) algorithm was proposed to overcome the binary treatment limitation and demonstrated state-of-the-art results. However, previous experiments have implied that CTS is cost-ineffective because it requires a large amount of training data. In this paper, we demonstrate that the estimator maximized in CTS is biased against the true metric. We then propose a variance reduced estimator based on the doubly robust estimation technique that provides unbiasedness and desirable variance. We further propose a treatment policy optimization algorithm called VAriance Reduced Treatment Selection (VARTS), which maximizes our estimator. Empirical experiments on synthetic and real-world datasets demonstrated that our method outperforms other existing methods, particularly under realistic conditions such as small sample sizes and high noise levels. These theoretical and empirical results imply that our method can overcome the critical challenges of uplift modeling and should be the first choice for optimizing personalization in various fields.
【Keywords】:
【Paper Link】 【Pages】:415-423
【Authors】: Ulf Johansson ; Tuwe Löfström
【Abstract】: In many predictive modeling scenarios, the production set inputs that later will be used for the actual prediction is available and could be utilized in the modeling process. In fact, many predictive models are generated with an existing production set in mind. Despite this, few approaches utilize this information in order to produce models optimized on the production set at hand. If these models need to be comprehensible, the oracle coaching framework can be applied, often resulting in interpretable models, e.g., decision trees and rule sets, with accuracies on par with opaque models like neural networks and ensembles, on the specific production set. In oracle coaching, a strong but opaque predictive model is used to label instances, including the production set, which are later learned by a weaker but interpretable model. In this paper, oracle coaching is, for the first time, used for improving the calibration of probabilistic predictors. More specifically, setups where oracle coaching are combined with the techniques Platt scaling, isotonic regression and Venn-Abers are suggested and evaluated for calibrating probability estimation trees (PETs). A key contribution is the setup designs ensuring that the oracle-coached PETs, that per definition utilize knowledge about production data, remain well-calibrated. In the experimentation, using 23 publicly available data sets, it is shown that oracle-coached models are not only more accurate, but also significantly better calibrated, compared to standard induction. Interestingly enough, this holds both for the uncalibrated PETs, and for all calibration techniques evaluated, i.e., Platt scaling, isotonic regression and Venn-Abers. As expected, all three external techniques significantly improved the calibration of the original PETs. Finally, an outright comparison between the three external calibration techniques showed that Venn-Abers significantly outperformed the alternatives in most setups.
【Keywords】:
【Paper Link】 【Pages】:424-432
【Authors】: James R. Foulds ; Rashidul Islam ; Kamrun Naher Keya ; Shimei Pan
【Abstract】: Intersectionality is a framework that analyzes how interlocking systems of power and oppression affect individuals along overlapping dimensions including race, gender, sexual orientation, class, and disability. Intersectionality theory therefore implies it is important that fairness in artificial intelligence systems be protected with regard to multi-dimensional protected attributes. However, the measurement of fairness becomes statistically challenging in the multi-dimensional setting due to data sparsity, which increases rapidly in the number of dimensions, and in the values per dimension. We present a Bayesian probabilistic modeling approach for the reliable, data-efficient estimation of fairness with multidimensional protected attributes, which we apply to two existing intersectional fairness metrics. Experimental results on census data and the COMPAS criminal justice recidivism dataset demonstrate the utility of our methodology, and show that Bayesian methods are valuable for the modeling and measurement of fairness in intersectional contexts.
【Keywords】:
【Paper Link】 【Pages】:433-441
【Authors】: Fattaneh Jabbari ; Gregory F. Cooper
【Abstract】: Almost all of the existing algorithms for learning a causal Bayesian network structure (CBN) from observational data recover a structure that models the causal relationships that are shared by the instances in a population. Although learning such population-wide CBNs accurately is useful, it is important to learn CBNs that are specific to each instance in domains in which different instances may have varying causal structures, such as in human biology. For example, a breast cancer tumor in a patient (instance) is often a composite of causal mechanisms, where each of these individual causal mechanisms may appear relatively frequently in breast-cancer tumors of other patients, but the particular combination of mechanisms is unique to the current tumor. Therefore, it is critical to discover the specific set of causal mechanisms that are operating in each patient to understand and treat that particular patient effectively. We previously introduced an instance-specific CBN structure learning method that builds a causal model for a given instance T from the features we know about T and from a training set of data on many other instances [12]. However, that method assumes that there are no latent (hidden) confounders, that is, there are no latent variables that cause two or more of the measured variables. Unfortunately, this assumption rarely holds in practice. In the current paper, we introduce a novel instance-specific causal structure learning algorithm that uses partial ancestral graphs (PAGs) to model latent confounders. Simulations support that the proposed instance-specific method improves structure-discovery performance compared to an existing PAG-learning method called GFCI, which is not instance-specific. We also report results that provide support for instance-specific causal relationships existing in real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:442-450
【Authors】: Sara Alaee ; Alireza Abdoli ; Christian R. Shelton ; Amy C. Murillo ; Alec C. Gerry ; Eamonn J. Keogh
【Abstract】: Time series classification is an important task in its own right, and it is often a precursor to further downstream analytics. To date, virtually all works in the literature have used either shape-based classification using a distance measure or feature-based classification after finding some suitable features for the domain. It seems to be underappreciated that in many datasets it is the case that some classes are best discriminated with features, while others are best discriminated with shape. Thus, making the shape vs. feature choice will condemn us to poor results, at least for some classes. In this work, we propose a new model for classifying time series that allows the use of both shape and feature-based measures, when warranted. Our algorithm automatically decides which approach is best for which class, and at query time chooses which classifier to trust the most. We evaluate our idea on real world datasets and demonstrate that our ideas produce statistically significant improvement in classification accuracy.
【Keywords】:
【Paper Link】 【Pages】:451-459
【Authors】: Jingzheng Tu ; Guoxian Yu ; Jun Wang ; Carlotta Domeniconi ; Xiangliang Zhang
【Abstract】: Crowdsourcing is a relatively economic and efficient solution to collect annotations from the crowd through online platforms. Answers collected from workers with different expertise may be noisy and unreliable, and the quality of annotated data needs to be further maintained. Various solutions have been attempted to obtain high-quality annotations. However, they all assume that workers' label quality is stable over time (always at the same level whenever they conduct the tasks). In practice, workers' attention level changes over time, and the ignorance of which can affect the reliability of the annotations. In this paper, we focus on a novel and realistic crowdsourcing scenario involving attention-aware annotations. We propose a new probabilistic model that takes into account workers' attention to estimate the label quality. Expectation propagation is adopted for efficient Bayesian inference of our model, and a generalized Expectation Maximization algorithm is derived to estimate both the ground truth of all tasks and the label-quality of each individual crowd worker with attention. In addition, the number of tasks best suited for a worker is estimated according to changes in attention. Experiments against related methods on three real-world and one semi-simulated datasets demonstrate that our method quantifies the relationship between workers' attention and label-quality on the given tasks, and improves the aggregated labels.
【Keywords】:
【Paper Link】 【Pages】:460-468
【Authors】: Timothy LaRock ; Vahan Nanumyan ; Ingo Scholtes ; Giona Casiraghi ; Tina Eliassi-Rad ; Frank Schweitzer
【Abstract】: The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.
【Keywords】:
【Paper Link】 【Pages】:469-477
【Authors】: Yinghua Zhang ; Yu Zhang ; Ying Wei ; Kun Bai ; Yangqiu Song ; Qiang Yang
【Abstract】: Deep domain adaptation models learn a neural network in an unlabeled target domain by leveraging the knowledge from a labeled source domain. This can be achieved by learning a domain-invariant feature space. Though the learned representations are separable in the source domain, they usually have a large variance and samples with different class labels tend to overlap in the target domain, which yields suboptimal adaptation performance. To fill the gap, a Fisher loss is proposed to learn discriminative representations which are within-class compact and between-class separable. Experimental results on two benchmark datasets show that the Fisher loss is a general and effective loss for deep domain adaptation. Noticeable improvements are brought when it is used together with widely adopted transfer criteria, including MMD, CORAL and domain adversarial loss. For example, an absolute improvement of 6.67% in terms of the mean accuracy is attained when the Fisher loss is used together with the domain adversarial loss on the Office-Home dataset.
【Keywords】:
【Paper Link】 【Pages】:478-486
【Authors】: Lu Cheng ; Ruocheng Guo ; K. Selçuk Candan ; Huan Liu
【Abstract】: Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.
【Keywords】:
【Paper Link】 【Pages】:487-495
【Authors】: Anasua Mitra ; Priyesh Vijayan ; Srinivasan Parthasarathy ; Balaraman Ravindran
【Abstract】: We propose a Semi-Supervised Learning (SSL) methodology that explicitly encodes different necessary priors to learn efficient representations for nodes in a network. The key to our framework is a semi-supervised cluster invariance constraint that explicitly groups nodes of similar labels together. We show that explicitly encoding this constraint allows one to learn meaningful node representations from both qualitative (visual) and quantitative standpoints. Specifically, our methodology realizes improved node classification and visually-enhanced clusterability of nodes on a wide range of datasets over competitive baselines.
【Keywords】:
【Paper Link】 【Pages】:496-504
【Authors】: Lutz Oettershagen ; Nils M. Kriege ; Christopher Morris ; Petra Mutzel
【Abstract】: Many real-world graphs are temporal, e.g., in a social network persons only interact at specific points in time. This temporality directs possible dissemination processes on the graph, such as the spread of rumors, fake news, or diseases. However, the current state-of-the-art methods for supervised graph classification are designed mainly for static graphs and may not be able to capture temporal information. Hence, they are not powerful enough to distinguish between graphs modeling different dissemination processes. To address this, we introduce a framework to lift standard graph kernels to the temporal domain. We explore three different approaches and investigate the trade-offs between loss of temporal information and efficiency. Moreover, to handle large-scale graphs, we propose stochastic variants of our kernels with provable approximation guarantees. We evaluate our methods on various real-world social networks. Our methods beat static kernels by a large margin in terms of accuracy while still being scalable to large graphs and data sets. This confirms that taking temporal information into account is crucial for the successful classification of temporal graphs under consideration of dissemination processes.
【Keywords】:
【Paper Link】 【Pages】:505-513
【Authors】: Charles H. Martin ; Michael W. Mahoney
【Abstract】: Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call Heavy-Tailed Mechanistic Universality (HT-MU), meaning that the correlations in the layer weight matrices can be fit to a power law (PL) with exponents that lie in common Universality classes from Heavy-Tailed Random Matrix Theory (HT-RMT). From this, we develop a Universal capacity control metric that is a weighted average of PL exponents. Rather than considering small toy NNs, we examine over 50 different, large-scale pre-trained DNNs, ranging over 15 different architectures, trained on ImagetNet, each of which has been reported to have different test accuracies. We show that this new capacity metric correlates very well with the reported test accuracies of these DNNs, looking across each architecture (VGG16/…/VGG19, ResNet10/…/ResNet152, etc.). We also show how to approximate the metric by the more familiar Product Norm capacity measure, as the average of the log Frobenius norm of the layer weight matrices. Our approach requires no changes to the underlying DNN or its loss function, it does not require us to train a model (although it could be used to monitor training), and it does not even require access to the ImageNet data.
【Keywords】:
【Paper Link】 【Pages】:514-522
【Authors】: Fenglong Ma ; Yaqing Wang ; Jing Gao ; Houping Xiao ; Jing Zhou
【Abstract】: Predicting diseases for patients is an important and practical task in healthcare informatics. Existing disease prediction models focus on common diseases, i.e., there are enough available EHR data and prior medical knowledge for analyzing them. However, those models may not work for rare disease prediction as it is extremely hard to collect enough EHR data with such diseases. To tackle these issues, in this paper, we design a novel rare disease prediction system, which not only generates EHR data but also automatically selects high-quality generated data to further improve the predictive performance. Three components are designed in the system: data generation, data selection, and prediction. In particular, we propose MaskEHR to generate diverse EHR data based on the data from patients suffering from the given diseases. To remove noise information in the generated EHR data, we further design a reinforcement learning-based data selector, called RL-Selector, which can automatically choose the high-quality generated EHR data. Finally, the prediction component is used to identify patients who will potentially suffer the given diseases. These three components work together and enhance each other. Experiments on three real healthcare datasets show that the proposed system outperforms existing approaches on rare disease prediction task.
【Keywords】:
【Paper Link】 【Pages】:523-531
【Authors】: Yitao Li ; Umar Islambekov ; Cuneyt Gurcan Akcora ; Ekaterina Smirnova ; Yulia R. Gel ; Murat Kantarcioglu
【Abstract】: The Blockchain technology and, in particular blockchain-based cryptocurrencies, offer us information that has never been seen before in the financial world. In contrast to fiat currencies, all transactions of crypto-currencies and crypto-tokens are permanently recorded on distributed ledgers and are publicly available. This allows us to construct a transaction graph and to assess not only its organization but to glean relationships between transaction graph properties and crypto price dynamics. The goal of this paper is to facilitate our understanding on horizons and limitations of what can be learned on crypto-tokens from local topology and geometry of the Ethereum transaction network whose even global network properties remain scarcely explored. By introducing novel tools based on Topological Data Analysis and Functional Data Depth into Blockchain Data Analytics, we show that Ethereum network (one of the most popular blockchains for creating new crypto-tokens) can provide critical insights on price changes of crypto-tokens that are otherwise largely inaccessible with conventional data sources and traditional analytic methods.
【Keywords】:
【Paper Link】 【Pages】:532-540
【Authors】: Arka Daw ; R. Quinn Thomas ; Cayelan C. Carey ; Jordan S. Read ; Alison P. Appling ; Anuj Karpatne
【Abstract】: To simultaneously address the rising need of expressing uncertainties in deep learning models along with producing model outputs which are consistent with the known scientific knowledge, we propose a novel physics-guided architecture (PGA) of neural networks in the context of lake temperature modeling where the physical constraints are hard coded in the neural network architecture. This allows us to integrate such models with state of the art uncertainty estimation approaches such as Monte Carlo (MC) Dropout without sacrificing the physical consistency of our results. We demonstrate the effectiveness of our approach in ensuring better generalizability as well as physical consistency in MC estimates over data collected from Lake Mendota in Wisconsin and Falling Creek Reservoir in Virginia, even with limited training data. We further show that our MC estimates correctly match the distribution of ground-truth observations, thus making the PGA paradigm amenable to physically grounded uncertainty quantification.
【Keywords】:
【Paper Link】 【Pages】:541-549
【Authors】: Scott Freitas ; Andrew Wicker ; Duen Horng (Polo) Chau ; Joshua Neil
【Abstract】: Given a large enterprise network of devices and their authentication history (e.g., device logons), how can we quantify network vulnerability to lateral attack and identify at-risk devices? We systematically address these problems through D2M, the first framework that models lateral attacks on enterprise networks using multiple attack strategies developed with researchers, engineers, and threat hunters in the Microsoft Defender Advanced Threat Protection group. These strategies integrate real-world adversarial actions (e.g., privilege escalation) to generate attack paths: a series of compromised machines. Leveraging these attack paths and a novel Monte-Carlo method, we formulate network vulnerability as a probabilistic function of the network topology, distribution of access credentials and initial penetration point. To identify machines at risk to lateral attack, we propose a suite of five fast graph mining techniques, including a novel technique called AnomalyShield inspired by node immunization research. Using three real-world authentication graphs from Microsoft and Los Alamos National Laboratory (up to 223,399 authentications), we report the first experimental results on network vulnerability to lateral attack, demonstrating D2M's unique potential to empower IT admins to develop robust user access credential policies.
【Keywords】:
【Paper Link】 【Pages】:550-558
【Authors】: Rongrong Tao ; Baojian Zhou ; Feng Chen ; David Mares ; Patrick Butler ; Naren Ramakrishnan ; Ryan Kennedy
【Abstract】: The motives and means of explicit state censorship have been well studied, both quantitatively and qualitatively. Self-censorship by media outlets, however, has not received nearly as much attention, mostly because it is difficult to systematically detect. We develop a novel approach to identify news media self-censorship by using social media as a sensor. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords and a near-linear-time algorithm (called GraphDPD) to identify the highest scoring clusters as indicators of censorship. We evaluate the accuracy of our framework, versus other state-of-the-art algorithms, using both semi-synthetic and real-world data from Mexico and Venezuela during Year 2014. These tests demonstrate the capacity of our framework to identify self-censorship, and provide an indicator of broader media freedom. The results of this study lay the foundation for detection, study, and policy-response to self-censorship.
【Keywords】:
【Paper Link】 【Pages】:559-567
【Authors】: Nikhil Muralidhar ; Jie Bu ; Ze Cao ; Long He ; Naren Ramakrishnan ; Danesh K. Tafti ; Anuj Karpatne
【Abstract】: Physics-based simulations are often used to model and understand complex physical systems in domains like fluid dynamics. Such simulations although used frequently, often suffer from inaccurate or incomplete representations either due to their high computational costs or due to lack of complete physical knowledge of the system. In such situations, it is useful to employ machine learning to fill the gap by learning a model of the complex physical process directly from simulation data. However, as data generation through simulations is costly, we need to develop models being cognizant of data paucity issues. In such scenarios it is helpful if the rich physical knowledge of the application domain is incorporated in the architectural design of machine learning models. We can also use information from physics-based simulations to guide the learning process using aggregate supervision to favorably constrain the learning process. In this paper, we propose PhyNet, a deep learning model using physics-guided structural priors and physics-guided aggregate supervision for modeling the drag forces acting on each particle in a Computational Fluid Dynamics-Discrete Element Method (CFD-DEM). We conduct extensive experiments in the context of drag force prediction and showcase the usefulness of including physics knowledge in our deep learning formulation. PhyNet has been compared to several state-of-the-art models and achieves a significant performance improvement of 8.46% on average. The source code has been made available∗ and the dataset used is detailed in [1, 2].
【Keywords】:
【Paper Link】 【Pages】:568-576
【Authors】: Ryuta Matsuno ; Aristides Gionis
【Abstract】: Understanding the local structure of a graph provides valuable insights about the underlying phenomena from which the graph has originated. Sampling and examining k-subgraphs is a widely used approach to understand the local structure of a graph. In this paper, we study the problem of sampling uniformly k-subgraphs from a given graph. We analyse a few different Markov chain Monte Carlo (MCMC) approaches, and obtain analytical results on their mixing times, which improve significantly the state of the art. In particular, we improve the bound on the mixing times of the standard MCMC approach, and the state-of-the-art MCMC sampling method PSRW, using the canonical-paths argument. In addition, we propose a novel sampling method, which we call recursive subgraph sampling RSS, and its optimized variant RSS'. The proposed methods, RSS and RSS', are provably faster than the existing approaches. We conduct experiments and verify the uniformity of samples and the efficiency of RSS and RSS'.
【Keywords】:
【Paper Link】 【Pages】:577-585
【Authors】: Ekta Gujral ; Georgios Theocharous ; Evangelos E. Papalexakis
【Abstract】: In tensor mining, PARAFAC2 is a powerful and a multi-modal factor analysis method that is ideally suited for modeling for batch processing of data which forms “irregular” tensors, e.g., user movie viewing profiles, where each user's timeline does not necessarily align with other users. However, these days data is dynamically changing which hinders the use of this model for large data. The tracking of the PARAFAC2 decomposition for the dynamic tensors is very pivotal and challenging task due to the variability of incoming data and lack of online efficient algorithm in terms of time and memory. In this paper, we fill this gap by proposing an efficient method to compute the PARAFAC2 decomposition of streaming large tensor datasets containing millions of entries, called SPADE. In terms of effectiveness, our proposed method shows comparable results with the prior work, PARAFAC2, while being computationally much more efficient. We evaluate SPADE on both synthetic and real datasets, indicatively, our proposed method shows 10–23× speedup and saves 17–150× memory usage over the baseline methods and is also capable of handling larger tensor streams (≍ 7 million users) for which the batch baseline was not able to operate. To the best of our knowledge, SPADE is the first approach to online PARAFAC2 decomposition while not only being able to provide on par accuracy but also provide better performance in terms of scalability and efficiency.
【Keywords】:
【Paper Link】 【Pages】:586-594
【Authors】: Junning Deng ; Bo Kang ; Jefrey Lijffijt ; Tijl De Bie
【Abstract】: The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between two people depends on their attributes, such as their age, address, and hobbies. The connectivity of a graph can thus possibly be understood in terms of patterns of the form ‘the subgroup of individuals with properties X are often (or rarely) friends with individuals in another subgroup with properties Y'. Such rules present potentially actionable and generalizable insights into the graph. We present a method that finds pairs of node subgroups between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. This interestingness is quantified subjectively, to contrast with prior information an analyst may have about the graph. This view immediately enables iterative mining of such patterns. Our work generalizes prior work on dense subgraph mining (i.e. subgraphs induced by a single subgroup). Moreover, not only is the proposed method more general, we also demonstrate considerable practical advantages for the single subgroup special case.
【Keywords】:
【Paper Link】 【Pages】:595-603
【Authors】: Wenyi Xiao ; Huan Zhao ; Vincent W. Zheng ; Yangqiu Song
【Abstract】: In this paper, we study the fundamental problem of random walk for network embedding. We propose to use non-Markovian random walk, variants of vertex-reinforced random walk (VRRW), to fully use the history of a random walk path. To solve the getting stuck problem of VRRW, we introduce an exploitation-exploration mechanism to help the random walk jump out of the stuck set. The new random walk algorithms share the same convergence property of VRRW and thus can be used to learn stable network embeddings. Experimental results on two link prediction benchmark datasets and three node classification benchmark datasets show that our proposed approach reinforce2vec can outperform state-of-the-art random walk based embedding methods by a large margin.
【Keywords】:
【Paper Link】 【Pages】:604-612
【Authors】: Changgee Chang ; Jihwan Oh ; Qi Long
【Abstract】: Integrative analysis jointly analyzes multiple data sets to overcome curse of dimensionality. It can detect important but weak signals by jointly selecting features for all data sets, but unfortunately the sets of important features are not always the same for all data sets. Variations which allows heterogeneous sparsity structure—a subset of data sets can have a zero coefficient for a selected feature—have been proposed, but it compromises the effect of integrative analysis recalling the problem of losing weak important signals. We propose a new integrative analysis approach which not only aggregates weak important signals well in homogeneity setting but also substantially alleviates the problem of losing weak important signals in heterogeneity setting. Our approach exploits a priori known graphical structure of features by forcing joint selection of adjacent features, and integrating such information over multiple data sets can increase the power while taking into account the heterogeneity across data sets. We confirm the problem of existing approaches and demonstrate the superiority of our method through a simulation study and an application to gene expression data from ADNI.
【Keywords】:
【Paper Link】 【Pages】:613-521
【Authors】: Zeinab S. Jalali ; Weixiang Wang ; Myunghwan Kim ; Hema Raghavan ; Sucheta Soundarajan
【Abstract】: Social networks play a vital role in the spread of information through a population, and individuals in networks make important life decisions on the basis of the information to which they have access. In many cases, it is important to evaluate whether information is spreading fairly to all groups in a network. For instance, are male and female students equally likely to hear about a new scholarship? In this paper, we present the information unfairness criterion, which measures whether information spreads fairly to all groups in a network. We perform a thorough case study on the DBLP computer science co-authorship network with respect to gender. We then propose MaxFair, an algorithm to add edges to a network to decrease information unfairness, and evaluate on several real-world network datasets.
【Keywords】:
【Paper Link】 【Pages】:622-630
【Authors】: Shigeru Maya ; Akihiro Yamaguchi ; Kaneharu Nishino ; Ken Ueno
【Abstract】: Large amounts of time-series data have become accessible due to the rapid development of Internet-of-Things technologies and the demand for extracting useful knowledge from these data is increasing. Toward this goal, time-series segmentation — dividing data into similar segments — is a promising method for understanding the mechanisms in time-series data. In this paper, we focus on time-lag that appears in real datasets. Time lag — a typical phenomenon in time-series data — occurs when the speed of information diffusion differs between variables. However, conventional methods cannot distinguish differences in segmentation positions. In response, we propose Lag-Aware Multivariate Time-Series Segmentation (LAMTSS), an algorithm capturing time-lag across variables to determine segmentation positions for each variable. LAMTSS utilizes dynamic time warping without hyperparameter tuning. We confirm the accuracy of LAMTSS using artificial datasets and demonstrate the discovery of useful knowledge in real datasets.
【Keywords】:
【Paper Link】 【Pages】:631-639
【Authors】: John Boaz Lee ; Xiangnan Kong ; Constance M. Moore ; Nesreen K. Ahmed
【Abstract】: One of the primary tasks in neuroimaging is to simplify spatio-temporal scans of the brain (i.e., fMRI scans) by partitioning the voxels into a set of functional brain regions. An emerging line of research utilizes multiple fMRI scans, from a group of subjects, to calculate a single group consensus functional partition. This consensus-based approach is promising as it allows the model to improve the signal-to-noise ratio in the data. However, existing approaches are primarily non-parametric which poses problems when new samples are introduced. Furthermore, most existing approaches calculate a single partition for multiple subjects which fails to account for the functional and anatomical variability between different subjects. In this work, we study the problem of group-cohesive functional brain region discovery where the goal is to use information from a group of subjects to learn “group-cohesive” but individualized brain partitions for multiple fMRI scans. This problem is challenging since neuroimaging datasets are usually quite small and noisy. We introduce a novel deep parametric model based upon graph convolution, called the Brain Region Extraction Network (BREN). By treating the fMRI data as a graph, we are able to integrate information from neighboring voxels during brain region discovery which helps reduce noise for each subject. Our model is trained with a Siamese architecture to encourage partitions that are group-cohesive. Experiments on both synthetic and real-world data show the effectiveness of our proposed approach.
【Keywords】:
【Paper Link】 【Pages】:640-648
【Authors】: Chieh Wu ; Zulqarnain Khan ; Stratis Ioannidis ; Jennifer G. Dy
【Abstract】: We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by and are at least as expressive as spectral clustering. Our training objective, based on the Hilbert Schmidt Independence Criterion, can be optimized via gradient adaptations on the Stiefel manifold, leading to significant acceleration over spectral methods relying on eigen-decompositions. Finally, our trained embedding can be directly applied to out-of-sample data. We show experimentally that our approach outperforms several state-of-the-art deep clustering methods, as well as traditional approaches such as k-means and spectral clustering over a broad array of real and synthetic datasets.
【Keywords】:
【Paper Link】 【Pages】:649-657
【Authors】: Guangyi Zhang ; Aristides Gionis
【Abstract】: Maximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets.
【Keywords】:
【Paper Link】 【Pages】:658-666
【Authors】: Xiaoqiang Yan ; Yiqiao Mao ; Shizhe Hu ; Yangdong Ye
【Abstract】: Existing visual-textual cross-modal clustering techniques focus on finding a clustering partition of different modalities by dealing with each modality dependently or integrating multiple modalities into a shared space, which may results in unsatisfactory performance due to the heterogeneous gap of different modalities. Aiming at this problem, we propose a novel heterogeneous dual-task clustering (HDC) method, which is capable of exploring high-level relatedness between visual and textual data to improve the performance of individual task. Our intuition is that although the visual and textual data are heterogenous to each other, they may share related high-level semantics and rich latent correlations, which can lead to improved performance if we treat the clustering of visual and textual data as different but related learning tasks. Specifically, the problem of heterogeneous dual-task clustering is formulated as an information-theoretic function, in which the low-level information in each modality and high-level relatedness between multiple modalities are maximally preserved. Then, a progressive optimization method is proposed to ensure a local optimal solution. Extensive experiments show noticeable performance of the HDC approach in comparison with several state-of-the-art baselines.
【Keywords】:
【Paper Link】 【Pages】:667-675
【Authors】: Yorgos Tsitsikas ; Evangelos E. Papalexakis
【Abstract】: Tensor decomposition has been shown, time and time again, to be an effective tool in multi-aspect data mining, especially in exploratory applications where the interest is in discovering hidden interpretable structure from the data. In such exploratory applications, the number of such hidden structures is of utmost importance, since incorrect selection may imply the discovery of noisy artifacts that do not really represent a meaningful pattern. Albeit extremely important, selection of this number of latent factors, also known as low-rank, is very hard, and in most cases, practitioners and researchers resort to ad-hoc trial-and-error, or assume that somehow this number is known or is given via domain expertise. There has been a considerable amount of prior work that proposes heuristics for selecting this low rank. However, as we argue in this paper, the state-of-the-art in those heuristic methods is rather unstable and does not always reveal the correct answer. In this paper, we propose the Normalized Singular Value Deviation (NSVD), a novel method for selecting the number of latent factors in Tensor Decomposition, that is based on principled theoretical foundations. We extensively evaluate the effectiveness of NSVD in synthetic and real data and demonstrate that it yields a more robust, stable, and reliable estimation than state-of-the-art.
【Keywords】: