Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. ACM 【DBLP Link】
【Paper Link】 【Pages】:1
【Authors】: David J. Hand
【Abstract】: Financial applications of data science provide a perfect illustration of the power of the shift from subjective decision-making to data- and evidence-driven decision-making. In the space of some fifty years, an entire sector of industry has been totally revolutionised. Such applications come in three broad areas: actuarial and insurance, consumer banking, and investment banking. Actuarial and insurance work was one of the earliest adopters of data science ideas, dating from long before the term had been coined, and even before the computer had been invented. But these areas have fallen behind the latest advances in data science technology - which means there is considerable potential for applying modern data analytic ideas. Consumer banking has been described as one the first and major success stories of the data revolution. Dating from the 1960s, when the first credit cards were launched, techniques for analysing the massive data sets of consumer financial transactions have driven much of the development of data mining and data science ideas. But new model types, and new sources of data, are leading to a rich opportunity for significant developments. In investment banking the "efficient market hypothesis" of classic economics says that it is impossible to predict the financial markets. But this is false - though very nearly true. That means that there is an opportunity to use advanced data analytic methods to exploit the tiny gap between conventional theory and what actually happens. Other data science issues, such as data quality, ethics, and security, along with the need to understand the limitations of models, become particularly pointed in the context of financial applications.
【Keywords】: banking; credit cards; data science; fraud detection; retail banking
【Paper Link】 【Pages】:2
【Authors】: Alvin E. Roth
【Abstract】: Markets and marketplaces are ancient human artifacts, but in recent years they have become ever more important. In part this is because marketplaces are becoming computerized. Together with the introduction of smart phones, this also makes them ubiquitous. We can order car rides to the airport, plane rides to London, and hotel rooms for when we arrive, all on our smartphones. And as we do so we leave a data trail that is easily combined with other streams of data. This is changing not only how we interact with markets, but also how we manage and regard privacy. I'll discuss some recent developments in computerized markets and speculate about some still to come.
【Keywords】: computerized marketplaces; economics.; market design; marketplaces
【Paper Link】 【Pages】:3
【Authors】: Yee Whye Teh
【Abstract】: Much recent progress in machine learning have been fueled by the explosive growth in the amount and diversity of data available, and the computational resources needed to crunch through the data. This begs the question of whether machine learning systems necessarily need large amounts of data to solve a task well. An exciting recent development, under the banners of meta-learning, lifelong learning, learning to learn, multitask learning etc., has been the observation that often there is heterogeneity within the data sets at hand, and in fact a large data set can be viewed more productively as many smaller data sets, each pertaining to a different task. For example, in recommender systems each user can be said to be a different task with a small associated data set, and in AI one holy grail is how to develop systems that can learn to solve new tasks quickly from small amounts of data. In such settings, the problem is then how to "learn to learn quickly", by making use of similarities among tasks. One perspective for how this is achievable is that exposure to lots of previous tasks allows the system to learn a rich prior knowledge about the world in which tasks are sampled from, and it is with rich world knowledge that the system is able to solve new tasks quickly. This is a very active, vibrant and diverse area of research, with many different approaches proposed recently. In this talk I will describe a view of this problem from probabilistic and deep learning perspectives, and describe a number of efforts in this direction that I have recently been involved in.
【Keywords】: big data; learn to learn; learning to learn; lifelong learning; machine learning; meta-learning; multitask learning; small data
【Paper Link】 【Pages】:4
【Authors】: Jeannette M. Wing
【Abstract】: I use the tagline "Data for Good" to state paronomastically how we as a community should be promoting data science, especially in training future generations of data scientists. First, we should use data science for the good of humanity and society. Data science should be used to better people's lives. Data science should be used to improve relationships among people, organizations, and institutions. Data science, in collaboration with other disciplines, should be used to help tackle societal grand challenges such as climate change, education, energy, environment, healthcare, inequality, and social justice. Second, we should use data in a good manner. The acronym FATES suggests what "good" means. Fairness means that the models we build are used to make unbiased decisions or predictions. Accountability means to determine and assign responsibility-to someone or to something-for a judgment made by a machine. Transparency means being open and clear to the end user about how an outcome, e.g., a classification, a decision, or a prediction, is made. Ethics for data science means paying attention to both the ethical and privacy-preserving collection and use of data as well as the ethical decisions that the automated systems we build will make. Safety and security (yes, two words for one "S") means ensuring that the systems we build are safe (do no harm) and secure (guard against malicious behavior).
【Keywords】: accountability; data science; ethics; fairness; safety; security; transparency
【Paper Link】 【Pages】:5-14
【Authors】: Jacob D. Abernethy ; Alex Chojnacki ; Arya Farahi ; Eric Schwartz ; Jared Webb
【Abstract】: We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents' drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.
【Keywords】: active learning; flint water crisis; machine learning; public policy; risk assessment; water infrastructure
【Paper Link】 【Pages】:15-22
【Authors】: Klaus Ackermann ; Joe Walsh ; Adolfo De Unánue ; Hareem Naveed ; Andrea Navarrete Rivera ; Sun-Joo Lee ; Jason Bennett ; Michael Defoe ; Crystal Cody ; Lauren Haynes ; Rayid Ghani
【Abstract】: Machine learning research typically focuses on optimization and testing on a few criteria, but deployment in a public policy setting requires more. Technical and non-technical deployment issues get relatively little attention. However, for machine learning models to have real-world benefit and impact, effective deployment is crucial. In this case study, we describe our implementation of a machine learning early intervention system (EIS) for police officers in the Charlotte-Mecklenburg (North Carolina) and Metropolitan Nashville (Tennessee) Police Departments. The EIS identifies officers at high risk of having an adverse incident, such as an unjustified use of force or sustained complaint. We deployed the same code base at both departments, which have different underlying data sources and data structures. Deployment required us to solve several new problems, covering technical implementation, governance of the system, the cost to use the system, and trust in the system. In this paper we describe how we addressed and solved several of these challenges and provide guidance and a framework of important issues to consider for future deployments.
【Keywords】: deployment; early intervention system; machine learning; public policy
【Paper Link】 【Pages】:23-32
【Authors】: Deepak Agarwal ; Kinjal Basu ; Souvik Ghosh ; Ying Xuan ; Yang Yang ; Liang Zhang
【Abstract】: Web-based ranking problems involve ordering different kinds of items in a list or grid to be displayed in mediums like a website or a mobile app. In most cases, there are multiple objectives or metrics like clicks, viral actions, job applications, advertising revenue and others that we want to balance. Constructing a serving algorithm that achieves the desired tradeoff among multiple objectives is challenging, especially for more than two objectives. In addition, it is often not possible to estimate such a serving scheme using offline data alone for non-stationary systems with frequent online interventions. We consider a large-scale online application where metrics for multiple objectives are continuously available and can be controlled in a desired fashion by changing certain control parameters in the ranking model. We assume that the desired balance of metrics is known from business considerations. Our approach models the balance criteria as a composite utility function via a Gaussian process over the space of control parameters. We show that obtaining a solution can be equated to finding the maximum of the Gaussian process, practically obtainable via Bayesian optimization. However, implementing such a scheme for large-scale applications is challenging. We provide a novel framework to do so and illustrate its efficacy in the context of LinkedIn Feed. In particular, we show the effectiveness of our method by using both offline simulations as well as promising online A/B testing results. At the time of writing this paper, the method described was fully deployed on the LinkedIn Feed.
【Keywords】: bayesian optimization; gaussian processes; online feed ranking; thompson sampling
【Paper Link】 【Pages】:33-42
【Authors】: Samet Ayhan ; Pablo Costas ; Hanan Samet
【Abstract】: Unprecedented growth is expected globally in commercial air traffic over the next ten years. To accommodate this increase in volume, a new concept of operations has been implemented in the context of the Next Generation Air Transportation System (NextGen) in the USA and the Single European Sky ATM Research (SESAR) in Europe. However, both of the systems approach airspace capacity and efficiency deterministically, failing to account for external operational circumstances which can directly affect the aircraft's actual flight profile. A major factor in increased airspace efficiency and capacity is accurate prediction of Estimated Time of Arrival (ETA) for commercial flights, which can be a challenging task due to a non-deterministic nature of environmental factors, and air traffic. Inaccurate prediction of ETA can cause potential safety risks and loss of resources for Air Navigation Service Providers (ANSP), airlines and passengers. In this paper, we present a novel ETA Prediction System for commercial flights. The system learns from historical trajectories and uses their pertinent 3D grid points to collect key features such as weather parameters, air traffic, and airport data along the potential flight path. The features are fed into various regression models and a Recurrent Neural Network (RNN) and the best performing models with the most accurate ETA predictions are compared with the ETAs currently operational by the European ANSP, EUROCONTROL. Evaluations on an extensive set of real trajectory, weather, and airport data in Europe verify that our prediction system generates more accurate ETAs with a far smaller standard deviation than those of EUROCONTROL. This translates to smaller prediction windows of flight arrival times, thereby enabling airlines to make more cost-effective ground resource allocation and ANSPs to make more efficient flight schedules.
【Keywords】: air traffic management; estimated time of arrival; predictive analytics; recurrent neural network; regression
【Paper Link】 【Pages】:43-51
【Authors】: Tian Bai ; Shanshan Zhang ; Brian L. Egleston ; Slobodan Vucetic
【Abstract】: Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In medical claims data, which is a particular type of EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. Based on the observation that different patient conditions have different temporal progression patterns, in this paper we propose a novel interpretable deep learning model, called Timeline. The main novelty of Timeline is that it has a mechanism that learns time decay factors for every medical code. This allows the Timeline to learn that chronic conditions have a longer lasting impact on future visits than acute conditions. Timeline also has an attention mechanism that improves vector embeddings of visits. By analyzing the attention weights and disease progression functions of Timeline, it is possible to interpret the predictions and understand how risks of future visits change over time. We evaluated Timeline on two large-scale real world data sets. The specific task was to predict what is the primary diagnosis category for the next hospital visit given previous visits. Our results show that Timeline has higher accuracy than the state of the art deep learning models based on RNN. In addition, we demonstrate that time decay factors and attentions learned by Timeline are in accord with the medical knowledge and that Timeline can provide a useful insight into its predictions.
【Keywords】: attention model; deep learning; electronic health records; healthcare
【Paper Link】 【Pages】:52-61
【Authors】: Xiao Bai ; Erik Ordentlich ; Yuanyuan Zhang ; Andy Feng ; Adwait Ratnaparkhi ; Reena Somvanshi ; Aldi Tjahjadi
【Abstract】: Sponsored search has been the major source of revenue for commercial web search engines. It is crucial for a sponsored search engine to retrieve ads that are relevant to user queries to attract clicks as advertisers only pay when their ads get clicked. Retrieving relevant ads for a query typically involves in first matching related ads to the query and then filtering out irrelevant ones. Both require understanding the semantic relationship between a query and an ad. In this work, we propose a novel embedding of queries and ads in sponsored search. The query embeddings are generated from constituent word n-gram embeddings that are trained to optimize an event level word2vec objective over a large volume of search data. We show through a query rewriting task that the proposed query n-gram embedding model outperforms the state-of-the-art word embedding models for capturing query semantics. This allows us to apply the proposed query n-gram embedding model to improve query-ad matching and relevance in sponsored search. First, we use the similarity between a query and an ad derived from the query n-gram embeddings as an additional feature in the query-ad relevance model used in Yahoo Search. We show through online A/B test that using the new relevance model to filter irrelevant ads offline leads to 0.47% CTR and 0.32% revenue increase. Second, we propose a novel online query to ads matching system, built on an open-source big-data serving engine [30], using the learned query n-gram embeddings. Online A/B test shows that the new matching technique increases the search revenue by 2.32% as it significantly increases the ad coverage for tail queries.
【Keywords】: Apache spark; Vespa; distributed training; n-gram embedding; query-ad matching; query-ad relevance; sponsored search
【Paper Link】 【Pages】:62-70
【Authors】: Rahul Bhagat ; Srevatsan Muralidharan ; Alex Lobzhanidze ; Shankar Vishwanath
【Abstract】: Repeat purchasing, i.e., a customer purchasing the same product multiple times, is a common phenomenon in retail. As more customers start purchasing consumable products (e.g., toothpastes, diapers, etc.) online, this phenomenon has also become prevalent in e-commerce. However, in January 2014, when we looked at popular e-commerce websites, we did not find any customer-facing features that recommended products to customers from their purchase history to promote repeat purchasing. Also, we found limited research about repeat purchase recommendations and none that deals with the large scale purchase data that e-commerce websites collect. In this paper, we present the approach we developed for modeling repeat purchase recommendations. This work has demonstrated over 7% increase in the product click through rate on the personalized recommendations page of the Amazon.com website and has resulted in the launch of several customer-facing features on the Amazon.com website, the Amazon mobile app, and other Amazon websites.
【Keywords】: e-commerce; personalization; recommender systems; repeat purchases
【Paper Link】 【Pages】:71-79
【Authors】: Fedor Borisyuk ; Albert Gordo ; Viswanath Sivakumar
【Abstract】: In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta , designed to process images uploaded daily at Facebook scale. Sharing of image content has become one of the primary ways to communicate information among internet users within social networks such as Facebook, and the understanding of such media, including its textual information, is of paramount importance to facilitate search and recommendation applications. We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta 's system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain components work based on the lessons learnt during the development and deployment of the system.
【Keywords】: optical character recognition; text detection; text recognition
【Paper Link】 【Pages】:80-89
【Authors】: Ângelo Cardoso ; Fabio Daolio ; Saúl Vargas
【Abstract】: We describe a solution to tackle a common set of challenges in e-commerce, which arise from the fact that new products are continually being added to the catalogue. The challenges involve properly personalising the customer experience, forecasting demand and planning the product range. We argue that the foundational piece to solve all of these problems is having consistent and detailed information about each product, which is rarely available or consistent given the multitude of suppliers and types of products. We describe in detail the architecture and methodology implemented at ASOS, one of the world's largest fashion e-commerce retailers, to tackle this problem. We then show how this quantitative understanding of the products can be leveraged to improve recommendations in a hybrid recommender system approach.
【Keywords】: asymmetric factorisation; deep neural networks; fashion e-commerce; hybrid recommender system; missing labels; multi-label classification; multi-modal; multi-task; weight-sharing
【Paper Link】 【Pages】:90-99
【Authors】: Boyo Chen ; Buo-Fu Chen ; Hsuan-Tien Lin
【Abstract】: Tropical cyclone (TC) is a type of severe weather systems that occur in tropical regions. Accurate estimation of TC intensity is crucial for disaster management. Moreover, the intensity estimation task is the key to understand and forecast the behavior of TCs better. Recently, the task has begun to attract attention from not only meteorologists but also data scientists. Nevertheless, it is hard to stimulate joint research between both types of scholars without a benchmark dataset to work on together. In this work, we release a such a benchmark dataset, which is a new open dataset collected from satellite remote sensing, for the TC-image-to-intensity estimation task. We also propose a novel model to solve this task based on the convolutional neural network (CNN). We discover that the usual CNN, which is mature for object recognition, requires several modifications when being used for the intensity estimation task. Furthermore, we combine the domain knowledge of meteorologists, such as the rotation-invariance of TCs, into our model design to reach better performance. Experimental results on the released benchmark dataset verify that the proposed model is among the most accurate models that can be used for TC intensity estimation, while being relatively more stable across all situations. The results demonstrate the potential of applying data science for meteorology study.
【Keywords】: atmospheric science; blending; convolutional neural network; dropout; pooling; regression; tropical cyclone; tropical cyclone intensity
【Paper Link】 【Pages】:100-109
【Authors】: Chaochao Chen ; Ziqi Liu ; Peilin Zhao ; Longfei Li ; Jun Zhou ; Xiaolong Li
【Abstract】: Collaborative filtering, especially latent factor model, has been popularly used in personalized recommendation. Latent factor model aims to learn user and item latent factors from user-item historic behaviors. To apply it into real big data scenarios, efficiency becomes the first concern, including offline model training efficiency and online recommendation efficiency. In this paper, we propose a D istributed C ollaborative H ashing ( DCH ) model which can significantly improve both efficiencies. Specifically, we first propose a distributed learning framework, following the state-of-the-art parameter server paradigm, to learn the offline collaborative model. Our model can be learnt efficiently by distributedly computing subgradients in minibatches on workers and updating model parameters on servers asynchronously. We then adopt hashing technique to speedup the online recommendation procedure. Recommendation can be quickly made through exploiting lookup hash tables. We conduct thorough experiments on two real large-scale datasets. The experimental results demonstrate that, comparing with the classic and state-of-the-art (distributed) latent factor models, DCH has comparable performance in terms of recommendation accuracy but has both fast convergence speed in offline model training procedure and realtime efficiency in online recommendation procedure. Furthermore, the encouraging performance of DCH is also shown for several real-world applications in Ant Financial.
【Keywords】: collaborative; hashing; latent factor model; matrix factorization; parameter server
【Paper Link】 【Pages】:110-119
【Authors】: Haolan Chen ; Fred X. Han ; Di Niu ; Dong Liu ; Kunfeng Lai ; Chenglin Wu ; Yu Xu
【Abstract】: Short Text Matching plays an important role in many natural language processing tasks such as information retrieval, question answering, and conversational system. Conventional text matching methods rely on predefined templates and rules, which are not applicable to short text with limited numebr of words and limit their ability to generalize to unobserved data. Many recent efforts have been made to apply deep neural network models to natural language processing tasks, which reduces the cost of feature engineering. In this paper, we present the design of Multi-Channel Information Crossing , a multi-channel convolutional neural network model for text matching, with additional attention mechanisms from sentence and text semantics. MIX compares text snippets at varied granularities to form a series of multi-channel similarity matrices, which are crossed with another set of carefully designed attention matrices to expose the rich structures of sentences to deep neural networks. We implemented MIX and deployed the system on Tencent's Venus distributed computation platform. Thanks to carefully engineered multi-channel information crossing, evaluation results suggest that MIX outperforms a wide range of state-of-the-art deep neural network models by at least 11.1% in terms of the normalized discounted cumulative gain (NDCG@3), on the English WikiQA dataset. Moreover, we also performed online A/B tests with real users on the search service of Tencent QQ Browser. Results suggest that MIX raised the number of clicks on the returned results by 5.7%, due to an increased accuracy in query-document matching, which demonstrates the superior performance of MIX in production environments.
【Keywords】: ad-hoc retrieval; attention mechanism; convolutional neural networks; question answering; text matching
【Paper Link】 【Pages】:120-129
【Authors】: Xi Chen ; Yiqun Liu ; Liang Zhang ; Krishnaram Kenthapadi
【Abstract】: The LinkedIn Salary product was launched in late 2016 with the goal of providing insights on compensation distribution to job seekers, so that they can make more informed decisions when discovering and assessing career opportunities. The compensation insights are provided based on data collected from LinkedIn members and aggregated in a privacy-preserving manner. Given the simultaneous desire for computing robust, reliable insights and for having insights to satisfy as many job seekers as possible, a key challenge is to reliably infer the insights at the company level when there is limited or no data at all. We propose a two-step framework that utilizes a novel, semantic representation of companies (Company2vec) and a Bayesian statistical model to address this problem. Our approach makes use of the rich information present in the LinkedIn Economic Graph, and in particular, uses the intuition that two companies are likely to be similar if employees are very likely to transition from one company to the other and vice versa. We compute embeddings for companies by analyzing the LinkedIn members' company transition data using machine learning algorithms, then compute pairwise similarities between companies based on these embeddings, and finally incorporate company similarities in the form of peer company groups as part of the proposed Bayesian statistical model to predict insights at the company level. We perform extensive validation using several different evaluation techniques, and show that we can significantly increase the coverage of insights while, in fact, even slightly improving the quality of the obtained insights. For example, we were able to compute salary insights for 35 times as many title-region-company combinations in the U.S. as compared to previous work, corresponding to 4.9 times as many monthly active users. Finally, we highlight the lessons learned from practical deployment of our system.
【Keywords】: bayesian smoothing; company embeddings; company2vec; job transition analysis; linkedin economic graph; peer company group; salary prediction
【Paper Link】 【Pages】:130-138
【Authors】: Xumin Chen ; Peng Cui ; Lingling Yi ; Shiqiang Yang
【Abstract】: A dataset which is highly-dynamic and recency-sensitive means new data are generated in high volumes with a fast speed and of higher priority for the subsequent applications. Embedding technique is a popular research topic in recent years which aims to represent any data into low-dimensional vector space, which is widely used in different data types and have multiple applications. Generating embeddings on such data in a high-speed way is a challenging problem to consider the high dynamics and the recency sensitiveness together with both effectiveness and efficient. Popular embedding methods are usually time-consuming. As well as the common optimization methods are limited since it may not have enough time to converge or deal with recency-sensitive sample weights. This problem is still an open problem. In this paper, we propose a novel optimization method named Diffused Stochastic Gradient Descent for such highly-dynamic and recency-sensitive data. The notion of our idea is to assign recency-sensitive weights to different samples, and select samples according to their weights in calculating gradients. And after updating the embedding of the selected sample, the related samples are also updated in a diffusion strategy. We propose a Nested Segment Tree to improve the recency-sensitive weight method and the diffusion strategy into a complexity no slower than the iteration step in practice. We also theoretically prove the convergence rate of D-SGD for independent data samples, and empirically prove the efficacy of D-SGD in large-scale real datasets.
【Keywords】: diffusion; dynamic; embedding; recency-sensitive data
【Paper Link】 【Pages】:139-148
【Authors】: Konstantina Christakopoulou ; Alex Beutel ; Rui Li ; Sagar Jain ; Ed H. Chi
【Abstract】: Recommendation systems, prevalent in many applications, aim to surface to users the right content at the right time. Recently, researchers have aspired to develop conversational systems that offer seamless interactions with users, more effectively eliciting user preferences and offering better recommendations. Taking a step towards this goal, this paper explores the two stages of a single round of conversation with a user: which question to ask the user, and how to use their feedback to respond with a more accurate recommendation. Following these two stages, first, we detail an RNN-based model for generating topics a user might be interested in, and then extend a state-of-the-art RNN-based video recommender to incorporate the user's selected topic. We describe our proposed system Q&R, i.e., Question & Recommendation, and the surrogate tasks we utilize to bootstrap data for training our models. We evaluate different components of Q&R on live traffic in various applications within YouTube: User Onboarding, Homepage Recommendation, and Notifications. Our results demonstrate that our approach improves upon state-of-the-art recommendation models, including RNNs, and makes these applications more useful, such as a >1% increase in video notifications opened. Further, our design choices can be useful to practitioners wanting to transition to more conversational recommendation systems.
【Keywords】: bootstrapping conversations; engaging casual users; interactive recommendation; question and recommendation
【Paper Link】 【Pages】:149-157
【Authors】: Jonathan Chung ; Sarah A. Chau ; Nathan Herrmann ; Krista L. Lanctôt ; Moshe Eizenman
【Abstract】: Assessment of apathy in patients with Alzheimer's disease (AD) relies heavily on interviews with caregivers and patients, which can be ambiguous and time consuming. More precise and objective methods of evaluation can better inform treatment decisions. In this study, visual scanning behaviours (VSBs) on emotional and non-emotional stimuli were used to detect apathy in patients with AD. Forty-eight AD patients participated in the study. Sixteen of the patients were apathetic. Patients looked at 48 slides with non-emotional images and 32 slides with emotional images. We described two methods that use recurrent neural networks (RNNs) to learn differences between the VSBs of apathetic and non-apathetic AD patients. Method 1 uses two separate RNNs to learn group differences between visual scanning sequences on emotional and non-emotional stimuli. The outputs of the RNNs are then combined and used by a logistic regression classifier to characterise patients as either apathetic or non-apathetic. Method 1 achieved an AUC gain of 0.074 compared to a previously presented handcrafted feature method of detecting emotional blunting (AUC handcrafted = 0.646). Method 2 assumes that each individual's "style of scanning" (stereotypical eye movements) is independent of the content of the visual stimuli and uses the "style of scanning" to normalise the individual's VSBs on emotional and non-emotional stimuli. Method 2 uses RNNs in a sequence-to-sequence configuration to learn the individual's "style of scanning". The trained model is then used to create vector representations that contain information on the individual's "style of scanning" (content independent) and her/his VSBs (content dependent) on emotional and non-emotional stimuli. The distance between these vector representations is used by a logistic regression classifier to characterise patients as either apathetic or non-apathetic. Using Method 2 the AUC of the classifier improved to 0.814. The results presented suggest that using RNNs to analyse differences between VSBs on emotional and non-emotional stimuli (a measure of emotional blunting) can improve objective detection of apathy in individual patients with AD.
【Keywords】: Alzheimer's disease; apathy; fixation sequences; learning visual scanning styles; recurrent neural networks; representation of visual scanning behaviour
【Paper Link】 【Pages】:158-167
【Authors】: Giovanni Comarela ; Ramakrishnan Durairajan ; Paul Barford ; Dino D. Christenson ; Mark Crovella
【Abstract】: Predicting election outcomes is of considerable interest to candidates, political scientists, and the public at large. We propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls. However, there are a number of challenges that must be overcome to effectively use Web browsing for assessing candidate preference - including the lack of suitable ground truth data and the heterogeneity of user populations in time and space. We address these challenges, and show that the resulting methods can shed considerable light on the dynamics of voters' candidate preferences in ways that are difficult to achieve using polls.
【Keywords】: browsing behavior; candidate preference; machine learning
【Paper Link】 【Pages】:168-176
【Authors】: Michael Conover ; Matthew Hayes ; Scott Blackburn ; Pete Skomoroch ; Sam Shah
【Abstract】: Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging, resumes, or short-form social media, non-grammatical, loosely-structured text adds a new dimension to this problem. This paper presents Pangloss, a production system for entity disambiguation on noisy text. Pangloss combines a probabilistic linear-time key phrase identification algorithm with a semantic similarity engine based on context-dependent document embeddings to achieve better than state-of-the-art results (>5% in F1) compared to other research or commercially available systems. In addition, Pangloss leverages a local embedded database with a tiered architecture to house its statistics and metadata, which allows rapid disambiguation in streaming contexts and on-device disambiguation in low-memory environments such as mobile phones.
【Keywords】: entity linking; knowledge bases; natural language understanding
【Paper Link】 【Pages】:177-185
【Authors】: Joel Janek Dabrowski ; Ashfaqur Rahman ; Andrew George ; Stuart Arnold ; John McCulloch
【Abstract】: A novel approach to deterministic modelling of diurnal water quality parameters in aquaculture prawn ponds is presented. The purpose is to provide assistance to prawn pond farmers in monitoring pond water quality with limited data. Obtaining sufficient water quality data is generally a challenge in commercial prawn farming applications. Farmers can sustain large losses in their crop if water quality is not well managed. The model presented provides a means for modelling and forecasting various water quality parameters. It is inspired by data dynamics and does not rely on physical ecosystem modelling. The model is constructed within the Bayesian filtering framework. The Kalman filter and the unscented Kalman filer are applied for inference. The results demonstrate generalisability to both variables and environments. The ability for short term forecasting with mean absolute percentage errors between 0.5% and 11% is demonstrated.
【Keywords】: bayesian filtering; dissolved oxygen; kalman filter; ph; time series
【Paper Link】 【Pages】:186-195
【Authors】: Maria Daltayanni ; Ali Dasdan ; Luca de Alfaro
【Abstract】: Selecting the right audience for an advertising campaign is one of the most challenging, time-consuming and costly steps in the advertising process. To target the right audience, advertisers usually have two options: a) market research to identify user segments of interest and b) sophisticated machine learning models trained on data from past campaigns. In this paper we study how demand-side platforms (DSPs) can leverage the data they collect (demographic and behavioral) in order to learn reputation signals about end user convertibility and advertisement (ad) quality. In particular, we propose a reputation system which learns interest scores about end users, as an additional signal of ad conversion, and quality scores about ads, as a signal of campaign success. Then our model builds user segments based on a combination of demographic, behavioral and the new reputation signals and recommends transparent targeting rules that are easy for the advertiser to interpret and refine. We perform an experimental evaluation on industry data that showcases the benefits of our approach for both new and existing advertiser campaigns.
【Keywords】: audience segmentation; expert crowds; reputation
【Paper Link】 【Pages】:196-204
【Authors】: Nilaksh Das ; Madhuri Shanbhogue ; Shang-Tse Chen ; Fred Hohman ; Siwei Li ; Li Chen ; Michael E. Kounavis ; Duen Horng Chau
【Abstract】: The rapidly growing body of research in adversarial machine learning has demonstrated that deep neural networks (DNNs) are highly vulnerable to adversarially generated images. This underscores the urgent need for practical defense techniques that can be readily deployed to combat attacks in real-time. Observing that many attack strategies aim to perturb image pixels in ways that are visually imperceptible, we place JPEG compression at the core of our proposed SHIELD defense framework, utilizing its capability to effectively "compress away" such pixel manipulation. To immunize a DNN model from artifacts introduced by compression, SHIELD "vaccinates" the model by retraining it with compressed images, where different compression levels are applied to generate multiple vaccinated models that are ultimately used together in an ensemble defense. On top of that, SHIELD adds an additional layer of protection by employing randomization at test time that compresses different regions of an image using random compression levels, making it harder for an adversary to estimate the transformation performed. This novel combination of vaccination, ensembling, and randomization makes SHIELD a fortified multi-pronged defense. We conducted extensive, large-scale experiments using the ImageNet dataset, and show that our approaches eliminate up to 98% of gray-box attacks delivered by strong adversarial techniques such as Carlini-Wagner's L2 attack and DeepFool. Our approaches are fast and work without requiring knowledge about the model.
【Keywords】: JPEG compression; adversarial machine learning; deep learning; ensemble defense; machine learning security
【Paper Link】 【Pages】:205-214
【Authors】: Heidar Davoudi ; Aijun An ; Morteza Zihayat ; Gordon Edall
【Abstract】: Many online news agencies utilize the paywall mechanism to increase reader subscriptions. This method offers a non-subscribed reader a fixed number of free articles in a period of time (e.g., a month), and then directs the user to the subscription page for further reading. We argue that there is no direct relationship between the number of paywalls presented to readers and the number of subscriptions, and that this artificial barrier, if not used well, may disengage potential subscribers and thus may not well serve its purpose of increasing revenue. Moreover, the current paywall mechanism neither considers the user browsing history nor the potential articles which the user may visit in the future. Thus, it treats all readers equally and does not consider the potential of a reader in becoming a subscriber. In this paper, we propose an adaptive paywall mechanism to balance the benefit of showing an article against that of displaying the paywall (i.e., terminating the session). We first define the notion of cost and utility that are used to define an objective function for optimal paywall decision making. Then, we model the problem as a stochastic sequential decision process. Finally, we propose an efficient policy function for paywall decision making. The experimental results on a real dataset from a major newspaper in Canada show that the proposed model outperforms the traditional paywall mechanism as well as the other baselines.
【Keywords】: digital news media; paywall; sequential decision making
【Paper Link】 【Pages】:215-222
【Authors】: Daniel de Roux ; Boris Perez ; Andrés Moreno ; María-Del-Pilar Villamil ; César Figueroa
【Abstract】: Tax fraud is the intentional act of lying on a tax return form with intent to lower one's tax liability. Under-reporting is one of the most common types of tax fraud, it consists in filling a tax return form with a lesser tax base. As a result of this act, fiscal revenues are reduced, undermining public investment. Detecting tax fraud is one of the main priorities of local tax authorities which are required to develop cost-efficient strategies to tackle this problem. Most of the recent works in tax fraud detection are based on supervised machine learning techniques that make use of labeled or audit-assisted data. Regrettably, auditing tax declarations is a slow and costly process, therefore access to labeled historical information is extremely limited. For this reason, the applicability of supervised machine learning techniques for tax fraud detection is severely hindered. Such limitations motivate the contribution of this work. We present a novel approach for the detection of potential fraudulent tax payers using only unsupervised learning techniques and allowing the future use of supervised learning techniques. We demonstrate the ability of our model to identify under-reporting taxpayers on real tax payment declarations, reducing the number of potential fraudulent tax payers to audit. The obtained results demonstrate that our model doesn't miss on marking declarations as suspicious and labels previously undetected tax declarations as suspicious, increasing the operational efficiency in the tax supervision process without needing historic labeled data.
【Keywords】: anomaly detection; kernel density estimation; spectral clustering; tax fraud detection; unsupervised machine learning
【Paper Link】 【Pages】:223-232
【Authors】: Tom Decroos ; Jan Van Haaren ; Jesse Davis
【Abstract】: Sports teams are nowadays collecting huge amounts of data from training sessions and matches. The teams are becoming increasingly interested in exploiting these data to gain a competitive advantage over their competitors. One of the most prevalent types of new data is event stream data from matches. These data enable more advanced descriptive analysis as well as the potential to investigate an opponent's tactics in greater depth. Due to the complexity of both the data and game strategy, most tactical analyses are currently performed by humans reviewing video and scouting matches in person. As a result, this is a time-consuming and tedious process. This paper explores the problem of automatic tactics detection from event-stream data collected from professional soccer matches. We highlight several important challenges that these data and this problem setting pose. We describe a data-driven approach for identifying patterns of movement that account for both spatial and temporal information which represent potential offensive tactics. We evaluate our approach on the 2015/2016 season of the English Premier League and are able to identify interesting strategies per team related to goal kicks, corners and set pieces.
【Keywords】: eventstream data; pattern mining; soccer match data; sports analytics; tactics discovery
【Paper Link】 【Pages】:233-242
【Authors】: Alex Deng ; Ulf Knoblich ; Jiannan Lu
【Abstract】: During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such metrics approximately follow normal distributions, opening up potential opportunities to model them directly without extra model assumptions and solve big data problems via closed-form formulas using distributed algorithms at a fraction of the cost of simulation-based procedures like bootstrap. However, certain attributes of the metrics, such as their corresponding data generating processes and aggregation levels, pose numerous challenges for constructing trustworthy estimation and inference procedures. Motivated by four real-life examples in metric development and analytics for large-scale A/B testing, we provide a practical guide to applying the Delta method, one of the most important tools from the classic statistics literature, to address the aforementioned challenges. We emphasize the central role of the Delta method in metric analytics by highlighting both its classic and novel applications.
【Keywords】: a/b testing; big data; distributed algorithm; large sample theory; longitudinal study; online metrics; quantile inference; randomization
【Paper Link】 【Pages】:243-252
【Authors】: Tom Diethe ; Mike Holmes ; Meelis Kull ; Miquel Perelló-Nieto ; Kacper Sokol ; Hao Song ; Emma Tonkin ; Niall Twomey ; Peter A. Flach
【Abstract】: The SPHERE project is devoted to advancing eHealth in a smart-home context, and supports full-scale sensing and data analysis to enable a generic healthcare service. We describe, from a data-science perspective, our experience of taking the system out of the laboratory into more than thirty homes in Bristol, UK. We describe the infrastructure and processes that had to be developed along the way, describe how we train and deploy Machine Learning systems in this context, and give a realistic appraisal of the state of the deployed systems.
【Keywords】: ambient assisted living; ehealth; internet of things; machine learning; remote medicine; sensor networks; smart homes
【Paper Link】 【Pages】:253-262
【Authors】: Yujie Fan ; Shifu Hou ; Yiming Zhang ; Yanfang Ye ; Melih Abdulhayoglu
【Abstract】: Due to its severe damages and threats to the security of the Internet and computing devices, malware detection has caught the attention of both anti-malware industry and researchers for decades. To combat the evolving malware attacks, in this paper, we first study how to utilize both content- and relation-based features to characterize sly malware; to model different types of entities (i.e., file, archive, machine, API, DLL ) and the rich semantic relationships among them (i.e., file-archive, file-machine, file-file, API-DLL, file-API relations), we then construct a structural heterogeneous information network (HIN) and present meta-graph based approach to depict the relatedness over files. To measure the relatedness over files on the constructed HIN, since malware detection is a cost-sensitive task, it calls for efficient methods to learn latent representations for HIN. To address this challenge, based on the built meta-graph schemes, we propose a new HIN embedding model metagraph2vec on the first attempt to learn the low-dimensional representations for the nodes in HIN, where both the HIN structures and semantics are maximally preserved for malware detection. A comprehensive experimental study on the real sample collections from Comodo Cloud Security Center is performed to compare various malware detection approaches. The promising experimental results demonstrate that our developed system Scorpion which integrate our proposed method outperforms other alternative malware detection techniques. The developed system has already been incorporated into the scanning tool of Comodo Antivirus product.
【Keywords】: heterogeneous information network; malware detection; metagraph2vec; network embedding
【Paper Link】 【Pages】:263-272
【Authors】: Donatella Firmani ; Marco Maiorino ; Paolo Merialdo ; Elena Nieddu
【Abstract】: In Codice Ratio is a research project to study tools and techniques for analyzing the contents of historical documents conserved in the Vatican Secret Archives (VSA). In this paper, we present our efforts to develop a system to support the transcription of medieval manuscripts. The goal is to provide paleographers with a tool to reduce their efforts in transcribing large volumes, as those stored in the VSA, producing good transcriptions for significant portions of the manuscripts. We propose an original approach based on character segmentation. Our solution is able to deal with the dirty segmentation that inevitably occurs in handwritten documents. We use a convolutional neural network to recognize characters, and statistical language models to compose word transcriptions. Our approach requires minimal training effort, making the transcription process more scalable, as the production of training sets requires a few pages and can be easily crowdsourced. We have conducted experiments on manuscripts from the Vatican Registers, an unreleased corpus containing the correspondence of the popes. With training data produced by 120 high school students, our system has been able to produce good transcriptions that can be used by paleographers as a solid basis to speedup the transcription process at a large scale.
【Keywords】: crowdsourcing; digital humanities; handwritten text recognition; neural networks
【Paper Link】 【Pages】:273-282
【Authors】: Keith Funkhouser ; Matthew Malloy ; Enis Ceyhun Alp ; Phillip Poon ; Paul Barford
【Abstract】: Datasets that organize and associate the many identifiers produced by PCs, smartphones, and tablets accessing the internet are referred to as internet device graphs . In this paper, we demonstrate how measurement, tracking, and other internet entities can associate multiple identifiers with a single device or user after coarse associations, e.g ., based on IP-colocation , are made. We employ a Bayesian similarity algorithm that relies on examples of pairs of identifiers and their associated telemetry, including user agent, screen size, and domains visited, to establish pair-wise scores. Community detection algorithms are applied to group identifiers that belong to the same device or user. We train and validate our methodology using a unique dataset collected from a client panel with full visibility, apply it to a dataset of 700 million device identifiers collected over the course of six weeks in the United States, and show that it outperforms several unsupervised learning approaches. Results show mean precision and recall exceeding 90% for association of identifiers at both the device and user levels.
【Keywords】: device graph; internet measurement; naive bayes
【Paper Link】 【Pages】:283-292
【Authors】: Yan Gao ; Viral Gupta ; Jinyun Yan ; Changji Shi ; Zhongen Tao ; P. J. Xiao ; Curtis Wang ; Shipeng Yu ; Rómer Rosales ; Ajith Muralidharan ; Shaunak Chatterjee
【Abstract】: In recent years, social media applications (e.g., Facebook, LinkedIn) have created mobile applications (apps) to give their members instant and real-time access from anywhere. To keep members informed and drive timely engagement, these mobile apps send event notifications. However, sending notifications for every possible event would result in too many notifications which would in turn annoy members and create a poor member experience. In this paper, we present our strategy of optimizing notifications to balance various utilities (e.g., engagement, send volume) by formulating the problem using constrained optimization. To guarantee freshness of notifications, we implement the solution in a stream computing system in which we make multi-channel send decisions in near real-time. Through online A/B test results, we show the effectiveness of our proposed approach on tens of millions of members.
【Keywords】: machine learning; notifications; optimization; stream computing
【Paper Link】 【Pages】:293-301
【Authors】: Alex Gittens ; Kai Rothauge ; Shusen Wang ; Michael W. Mahoney ; Lisa Gerhardt ; Prabhat ; Jey Kottalam ; Michael F. Ringenburg ; Kristyn J. Maschhoff
【Abstract】: Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI). To remedy this, we introduce Alchemist, a system designed to call MPI-based libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear algebra, machine learning, and related computations, while still retaining the benefits of working within the Spark environment. We discuss the motivation behind the development of Alchemist, and we provide a brief overview of its design and implementation. We also compare the performances of pure Spark implementations with those of Spark implementations that leverage MPI-based codes via Alchemist. To do so, we use data science case studies: a large-scale application of the conjugate gradient method to solve very large linear systems arising in a speech classification problem, where we see an improvement of an order of magnitude; and the truncated singular value decomposition (SVD) of a 400GB three-dimensional ocean temperature data set, where we see a speedup of up to 7.9x. We also illustrate that the truncated SVD computation is easily scalable to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.
【Keywords】: alchemist; apache spark; climate modeling; conjugate gradient; csfr; distributed computing; elemental; kernel methods; low-rank approximation; mpi; speech classification; timit; truncated svd
【Paper Link】 【Pages】:302-310
【Authors】: Garrett B. Goh ; Charles Siegel ; Abhinav Vishnu ; Nathan Oken Hodas
【Abstract】: With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet's accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge and enables the development of generalizable neural networks for more accurate prediction of novel chemical properties.
【Keywords】: bioinformatics; cheminformatics; computer vision; natural language processing; transfer learning; weak supervised learning
【Paper Link】 【Pages】:311-320
【Authors】: Mihajlo Grbovic ; Haibin Cheng
【Abstract】: Search Ranking and Recommendations are fundamental problems of crucial interest to major Internet companies, including web search engines, content publishing websites and marketplaces. However, despite sharing some common characteristics a one-size-fits-all solution does not exist in this space. Given a large difference in content that needs to be ranked, personalized and recommended, each marketplace has a somewhat unique challenge. Correspondingly, at Airbnb, a short-term rental marketplace, search and recommendation problems are quite unique, being a two-sided marketplace in which one needs to optimize for host and guest preferences, in a world where a user rarely consumes the same item twice and one listing can accept only one guest for a certain set of dates. In this paper we describe Listing and User Embedding techniques we developed and deployed for purposes of Real-time Personalization in Search Ranking and Similar Listing Recommendations, two channels that drive 99% of conversions. The embedding models were specifically tailored for Airbnb marketplace, and are able to capture guest's short-term and long-term interests, delivering effective home listing recommendations. We conducted rigorous offline testing of the embedding models, followed by successful online tests before fully deploying them into production.
【Keywords】: personalization; search ranking; user modeling
【Paper Link】 【Pages】:321-330
【Authors】: Mengyue Hang ; Ian Pytlarz ; Jennifer Neville
【Abstract】: With the availability of vast amounts of user visitation history on location-based social networks (LBSN), the problem of Point-of- Interest (POI) prediction has been extensively studied. However, much of the research has been conducted solely on voluntary check-in datasets collected from social apps such as Foursquare or Yelp. While these data contain rich information about recreational activities (e.g., restaurants, nightlife, and entertainment), information about more prosaic aspects of people's lives is sparse. is not only limits our understanding of users' daily routines, but more importantly the modeling assumptions developed based on characteristics of recreation-based data may not be suitable for richer check-in data. In this work, we present an analysis of education "check-in" data using WiFi access logs collected at Purdue University. We propose a heterogeneous graph-based method to encode the correlations between users, POIs, and activities, and then jointly learn embeddings for the vertices. We evaluate our method compared to previous state-of-the-art POI prediction methods, and show that the assumptions made by previous methods signicantly degrade performance on our data with dense(r) activity signals. We also show how our learned embeddings could be used to identify similar students (e.g., for friend suggestions).
【Keywords】: heterogeneous graphs; location-based social networks; network embedding; representation learning
【Paper Link】 【Pages】:331-339
【Authors】: Shahar Harel ; Kira Radinsky
【Abstract】: Designing a new drug is a lengthy and expensive process. As the space of potential molecules is very large (10 23 - 10 60 ), a common technique during drug discovery is to start from a molecule which already has some of the desired properties. An interdisciplinary team of scientists generates hypothesis about the required changes to the prototype. In this work, we develop an algorithmic unsupervised-approach that automatically generates potential drug molecules given a prototype drug. We show that the molecules generated by the system are valid molecules and significantly different from the prototype drug. Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As an example, our system generated Isoniazid - one of the main drugs for Tuberculosis. The system is currently being deployed for use in collaboration with pharmaceutical companies to further analyze the additional generated molecules.
【Keywords】: deep learning for medicine; drug design; prototype-based drug discovery
【Paper Link】 【Pages】:340-349
【Authors】: Tianfu He ; Jie Bao ; Ruiyuan Li ; Sijie Ruan ; Yanhua Li ; Chao Tian ; Yu Zheng
【Abstract】: Illegal vehicle parking is a common urban problem faced by major cities in the world, as it incurs traffic jams, which lead to air pollution and traffic accidents. Traditional approaches to detect illegal vehicle parking events rely highly on active human efforts, e.g., police patrols or surveillance cameras. However, these approaches are extremely ineffective to cover a large city. The massive and high quality sharing bike trajectories from Mobike offer us with a unique opportunity to design a ubiquitous illegal parking detection system, as most of the illegal parking events happen at curbsides and have significant impact on the bike users. Two main components are employed to mine the trajectories in our system: 1)~trajectory pre-processing, which filters outlier GPS points, performs map-matching and builds indexes for bike trajectories; and 2)~illegal parking detection, which models the normal trajectories, extracts features from the evaluation trajectories and utilizes a distribution test-based method to discover the illegal parking events. The system is deployed on the cloud internally used by Mobike. Finally, extensive experiments and many insightful case studies based on the massive trajectories in Beijing are presented.
【Keywords】: trajectory data mining; urban computing; urban planning
【Paper Link】 【Pages】:350-358
【Authors】: Anthony Hu ; Seth R. Flaxman
【Abstract】: We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.
【Keywords】: deep learning; natural language processing; sentiment analysis; transfer learning; visual analysis
【Paper Link】 【Pages】:359-367
【Authors】: Houdong Hu ; Yan Wang ; Linjun Yang ; Pavel Komlev ; Li Huang ; Xi (Stephen) Chen ; Jiapei Huang ; Ye Wu ; Meenaz Merchant ; Arun Sacheti
【Abstract】: In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing. The system accommodates tens of billions of images in the index, with thousands of features for each image, and can respond in less than 200 ms. In order to overcome the challenges in relevance, latency, and scalability in such large scale of data, we employ a cascaded learning-to-rank framework based on various latest deep learning visual features, and deploy in a distributed heterogeneous computing platform. Quantitative and qualitative experiments show that our system is able to support various applications on Bing website and apps.
【Keywords】: content-based image retrieval; deep learning; image understanding; object detection
【Paper Link】 【Pages】:368-377
【Authors】: Yujing Hu ; Qing Da ; Anxiang Zeng ; Yang Yu ; Yinghui Xu
【Abstract】: In E-commerce platforms such as Amazon and TaoBao , ranking items in a search session is a typical multi-step decision-making problem. Learning to rank (LTR) methods have been widely applied to ranking problems. However, such methods often consider different ranking steps in a session to be independent, which conversely may be highly correlated to each other. For better utilizing the correlation between different ranking steps, in this paper, we propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session. Firstly, we formally define the concept of search session Markov decision process (SSMDP) to formulate the multi-step ranking problem. Secondly, we analyze the property of SSMDP and theoretically prove the necessity of maximizing accumulative rewards. Lastly, we propose a novel policy gradient algorithm for learning an optimal ranking policy, which is able to deal with the problem of high reward variance and unbalanced reward distribution of an SSMDP. Experiments are conducted in simulation and TaoBao search engine. The results demonstrate that our algorithm performs much better than the state-of-the-art LTR methods, with more than 40% and 30% growth of total transaction amount in the simulation and the real application, respectively.
【Keywords】: online learning to rank; policy gradient; reinforcement learning
【Paper Link】 【Pages】:378-386
【Authors】: Pierre Hulot ; Daniel Aloise ; Sanjay Dominik Jena
【Abstract】: Bike sharing systems continue gaining worldwide popularity as they offer benefits on various levels, from society to environment. Given that those systems tend to be unbalanced along time, bikes are typically redistributed throughout the day to better meet the demand. Reasonably accurate demand prediction is key to effective redistribution; however, it is has received only little attention in the literature. In this paper, we focus on predicting the hourly demand for demand rentals and returns at each station of the system. The proposed model uses temporal and weather features to predict demand mean and variance. It first extracts the main traffic behaviors from the stations. These simplified behaviors are then predicted and used to perform station-level predictions based on machine learning and statistical inference techniques. We then focus on determining decision intervals, which are often used by bike sharing companies for their online rebalancing operations. Our models are validated on a two-year period of real data from BIXI Montréal. A worst-case analysis suggests that the intervals generated by our models may decrease unsatisfied demands by 30% when compared to the current methodology employed in practice.
【Keywords】: bike sharing systems; decision support; dimensionality reduction; station-level demand prediction; traffic modeling
【Paper Link】 【Pages】:387-395
【Authors】: Kyle Hundman ; Valentino Constantinou ; Christopher Laporte ; Ian Colwell ; Tom Söderström
【Abstract】: As spacecraft send back increasing amounts of telemetry data, improved anomaly detection systems are needed to lessen the monitoring burden placed on operations engineers and reduce operational risk. Current spacecraft monitoring systems only target a subset of anomaly types and often require costly expert knowledge to develop and maintain due to challenges involving scale and complexity. We demonstrate the effectiveness of Long Short-Term Memory (LSTMs) networks, a type of Recurrent Neural Network (RNN), in overcoming these issues using expert-labeled telemetry anomaly data from the Soil Moisture Active Passive (SMAP) satellite and the Mars Science Laboratory (MSL) rover, Curiosity. We also propose a complementary unsupervised and nonparametric anomaly thresholding approach developed during a pilot implementation of an anomaly detection system for SMAP, and offer false positive mitigation strategies along with other key improvements and lessons learned during development.
【Keywords】: aerospace; anomaly detection; forecasting; lstms; neural networks; rnns; time-series
【Paper Link】 【Pages】:396-405
【Authors】: Srinivasan Iyengar ; Stephen Lee ; David E. Irwin ; Prashant J. Shenoy ; Benjamin Weil
【Abstract】: Buildings consume over 40% of the total energy in modern societies and improving their energy efficiency can significantly reduce our energy footprint. In this paper, we present WattHome, a data-driven approach to identify the least energy efficient buildings from a large population of buildings in a city or a region. Unlike previous approaches such as least squares that use point estimates, WattHome uses Bayesian inference to capture the stochasticity in the daily energy usage by estimating the parameter distribution of a building. Further, it compares them with similar homes in a given population using widely available datasets. WattHome also incorporates a fault detection algorithm to identify the underlying causes of energy inefficiency. We validate our approach using ground truth data from different geographical locations, which showcases its applicability in different settings. Moreover, we present results from a case study from a city containing >10,000 buildings and show that more than half of the buildings are inefficient in one way or another indicating a significant potential from energy improvement measures. Additionally, we provide probable cause of inefficiency and find that 41%, 23.73%, and 0.51% homes have poor building envelope, heating, and cooling system faults respectively.
【Keywords】: automated fault detection; bayesian inference; energy efficiency
【Paper Link】 【Pages】:406-415
【Authors】: Vijay Manikandan Janakiraman
【Abstract】: Although aviation accidents are rare, safety incidents occur more frequently and require a careful analysis to detect and mitigate risks in a timely manner. Analyzing safety incidents using operational data and producing event-based explanations is invaluable to airline companies as well as to governing organizations such as the Federal Aviation Administration (FAA) in the United States. However, this task is challenging because of the complexity involved in mining multi-dimensional heterogeneous time series data, the lack of time-step-wise annotation of events in a flight, and the lack of scalable tools to perform analysis over a large number of events. In this work, we propose a precursor mining algorithm that identifies events in the multidimensional time series that are correlated with the safety incident. Precursors are valuable to systems health and safety monitoring and in explaining and forecasting safety incidents. Current methods suffer from poor scalability to high dimensional time series data and are inefficient in capturing temporal behavior. We propose an approach by combining multiple-instance learning (MIL) and deep recurrent neural networks (DRNN) to take advantage of MIL's ability to learn using weakly supervised data and DRNN's ability to model temporal behavior. We describe the algorithm, the data, the intuition behind taking a MIL approach, and a comparative analysis of the proposed algorithm with baseline models. We also discuss the application to a real-world aviation safety problem using data from a commercial airline company and discuss the model's abilities and shortcomings, with some final remarks about possible deployment directions.
【Keywords】: aviation safety; deep learning; diagnostics; explainability; multiple instance learning; precursor; telemetry; time series
【Paper Link】 【Pages】:416-424
【Authors】: Grégoire Jauvion ; Nicolas Grislain
【Abstract】: In this paper, we consider the problem of optimizing the revenue a web publisher gets through real-time bidding (i.e. from ads sold in real-time auctions) and direct (i.e. from ads sold through contracts agreed in advance). We consider a setting where the publisher is able to bid in the real-time bidding auction for each impression. If it wins the auction, it chooses a direct campaign to deliver and displays the corresponding ad. This paper presents an algorithm to build an optimal strategy for the publisher to deliver its direct campaigns while maximizing its real-time bidding revenue. The optimal strategy gives a formula to determine the publisher bid as well as a way to choose the direct campaign being delivered if the publisher bidder wins the auction, depending on the impression characteristics. The optimal strategy can be estimated on past auctions data. The algorithm scales with the number of campaigns and the size of the dataset. This is a very important feature, as in practice a publisher may have thousands of active direct campaigns at the same time and would like to estimate an optimal strategy on billions of auctions. The algorithm is a key component of a system which is being developed, and which will be deployed on thousands of web publishers worldwide, helping them to serve efficiently billions of ads a day to hundreds of millions of visitors.
【Keywords】: ad-tech; auctions; big-data; high-dimensional optimization; real-time
【Paper Link】 【Pages】:425-432
【Authors】: Grégoire Jauvion ; Nicolas Grislain ; Pascal Dkengne Sielenou ; Aurélien Garivier ; Sébastien Gerchinovitz
【Abstract】: Over the last decade, digital media (web or app publishers) generalized the use of real time ad auctions to sell their ad spaces. Multiple auction platforms, also called Supply-Side Platforms (SSP), were created. Because of this multiplicity, publishers started to create competition between SSPs. In this setting, there are two successive auctions: a second price auction in each SSP and a secondary, first price auction, called header bidding auction, between SSPs. In this paper, we consider an SSP competing with other SSPs for ad spaces. The SSP acts as an intermediary between an advertiser wanting to buy ad spaces and a web publisher wanting to sell its ad spaces, and needs to define a bidding strategy to be able to deliver to the advertisers as many ads as possible while spending as little as possible. The revenue optimization of this SSP can be written as a contextual bandit problem, where the context consists of the information available about the ad opportunity, such as properties of the internet user or of the ad placement. Using classical multi-armed bandit strategies (such as the original versions of UCB and EXP3) is inefficient in this setting and yields a low convergence speed, as the arms are very correlated. In this paper we design and experiment a version of the Thompson Sampling algorithm that easily takes this correlation into account. We combine this bayesian algorithm with a particle filter, which permits to handle non-stationarity by sequentially estimating the distribution of the highest bid to beat in order to win an auction. We apply this methodology on two real auction datasets, and show that it significantly outperforms more classical approaches. The strategy defined in this paper is being developed to be deployed on thousands of publishers worldwide.
【Keywords】: Bayesian inference; Thompson sampling; ad-tech; auctions; big-data; multi-armed bandit; online learning; particle filter; real-time; sequential Monte Carlo
【Paper Link】 【Pages】:433-442
【Authors】: Prerna Khurana ; Puneet Agarwal ; Gautam Shroff ; Lovekesh Vig
【Abstract】: Recent proliferation of conversational systems has resulted in an increased demand for more natural dialogue systems, capable of more sophisticated interactions than merely providing factual answers. This is evident from usage pattern of a conversational system deployed within our organization. Users not only expect it to perform co-reference resolution of anaphora, but also of the antecedent or posterior facts presented by users with respect to their query. Presence of such facts in a conversation sometimes modifies the answer of main query, e.g., answer to 'how many sick leave do I get?' would be different when a fact 'I am on contract' is also present. Sometimes there is a need to collectively resolve three or four such facts. In this paper, we propose a novel solution which uses hierarchical neural network, comprising of BiLSTM layer and a maxpool layer that is hierarchically stacked to first obtain a representation of each user utterance and then to obtain a representation for sequence of utterances. This representation is used to identify users' intention. We also improvise this model by using skip connections in the second network to allow better gradient flow. Our model, not only a)~resolves the antecedent and posterior facts, but also b)~performs better even on self-contained queries. It is also c)~faster to train, making it the most promising approach for use in our environment where frequent training and tuning is needed. It slightly outperforms the benchmark on a publicly available dataset, and e)~performs better than obvious baselines approaches on our datasets.
【Keywords】: abstract anaphora resolution; conversational assistants; hierarchical RNNs
【Paper Link】 【Pages】:443-452
【Authors】: Patrick Koch ; Oleg Golovidov ; Steven Gardner ; Brett Wujek ; Joshua Griffin ; Yan Xu
【Abstract】: Machine learning applications often require hyperparameter tuning. The hyperparameters usually drive both the efficiency of the model training process and the resulting model quality. For hyperparameter tuning, machine learning algorithms are complex black-boxes. This creates a class of challenging optimization problems, whose objective functions tend to be nonsmooth, discontinuous, unpredictably varying in computational expense, and include continuous, categorical, and/or integer variables. Further, function evaluations can fail for a variety of reasons including numerical difficulties or hardware failures. Additionally, not all hyperparameter value combinations are compatible, which creates so called hidden constraints. Robust and efficient optimization algorithms are needed for hyperparameter tuning. In this paper we present an automated parallel derivative-free optimization framework called Autotune , which combines a number of specialized sampling and search methods that are very effective in tuning machine learning models despite these challenges. Autotune provides significantly improved models over using default hyperparameter settings with minimal user interaction on real-world applications. Given the inherent expense of training numerous candidate models, we demonstrate the effectiveness of Autotune's search methods and the efficient distributed and parallel paradigms for training and tuning models, and also discuss the resource trade-offs associated with the ability to both distribute the training process and parallelize the tuning process.
【Keywords】: bayesian optimization; derivative-free optimization; distributed computing system; hyperparameters; stochastic optimization
【Paper Link】 【Pages】:453-461
【Authors】: Marios Kokkodis
【Abstract】: Online labor markets facilitate transactions between employers and a diverse set of independent contractors around the globe. When making hiring decisions in these markets, employers have to assess a large and heterogeneous population of contractors. Because many of the contractors' characteristics are latent, employers often make risky decisions that end up in negative outcomes. In this work, we address this issue by proposing a framework for recommending contractors who are likely to get hired and successfully complete the task at hand. We start our analysis by acknowledging that employers' hiring behavior dynamically evolves with time; Employers learn to choose contractors according to the outcomes of their previously completed tasks. To capture this dynamic evolution, we propose a structured Hidden Markov Model that explicitly models task outcomes through the employers' evolution. We build and evaluate the proposed framework on a dataset of real online hiring decisions. We then compare our approach with a set of previously proposed static algorithms and we show that our proposed framework provides up to 24% improved recommendations. We conclude by discussing the positive impact that such better recommendations of candidates can have on employers, contractors, and the market itself.
【Keywords】: dynamic hiring decisions; hmm; online labor markets
【Paper Link】 【Pages】:462-471
【Authors】: Maksim Koptelov ; Albrecht Zimmermann ; Pascal Bonnet ; Ronan Bureau ; Bruno Crémilleux
【Abstract】: Pan Assays Interference Compounds (PAINS) are a significant problem in modern drug discovery: compounds showing non-target specific activity in high-throughput screening can mislead medicinal chemists during hit identification, wasting time and resources. Recent work has shown that existing structural alerts are not up to the task of identifying PAINS. To address this short-coming, we are in the process of developing a tool, PrePeP, that predicts PAINS, and allows experts to visually explore the reasons for the prediction. In the paper, we discuss the different aspects that are involved in developing a functional tool: systematically deriving structural descriptors, addressing the extreme imbalance of the data, offering visual information that pharmacological chemists are familiar with. We evaluate the quality of the approach using benchmark data sets from the literature and show that we correct several short-comings of existing PAINS alerts that have recently been pointed out.
【Keywords】: chemoinformatics; discriminative graph mining; structure activity relationships
【Paper Link】 【Pages】:472-480
【Authors】: Avishek Kumar ; Syed Ali Asad Rizvi ; Benjamin Brooks ; R. Ali Vanderveld ; Kevin H. Wilson ; Chad Kenney ; Sam Edelstein ; Adria Finch ; Andrew Maxwell ; Joe Zuckerbraun ; Rayid Ghani
【Abstract】: Water infrastructure in the United States is beginning to show its age, particularly through water main breaks. Main breaks cause major disruptions in everyday life for residents and businesses. Water main failures in Syracuse, N.Y. (as in most cities) are handled reactively rather than proactively. A barrier to proactive maintenance with limited resources is the city's inability to properly prioritize the allocation of its resources. We built a Machine Learning system to assess the risk of a water mains breaking. Using historical data on which mains have failed, descriptors of pipes, and other data sources, we evaluated several models' abilities to predict breaks three years into the future. Our results show that our system using gradient boosted decision trees performed best out of several algorithms and expert heuristics, achieving precision at 1% (P@1) of 0.62. Our model outperforms a random baseline (P@1 of 0.08) and expert heuristics such as water main age (P@1 of 0.10) and history of past main breaks (P@1 of 0.48). The model is currently deployed in the City of Syracuse. We are conducting a pilot by calculating the risk of failure for each city block over the period 2016-2018 using data up to the end of 2015 and, as of the end of 2017, there have been 42 breaks on our riskiest 52 mains. This has been a successful initiative for the city of Syracuse in improving its infrastructure and we believe this approach can be applied to other cities.
【Keywords】: city planning; civic teach; social good
【Paper Link】 【Pages】:481-490
【Authors】: Joonseok Lee ; Sami Abu-El-Haija ; Balakrishnan Varadarajan ; Apostol Natsev
【Abstract】: The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. Researchers have tackled various domains including video classification, search, personalized recommendation, and more. However, there is a research gap in combining these domains in one unified learning framework. Towards that, we propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships. Then, we use the trained embedding network to tackle various domains including video classification and recommendation, showing significant improvements over state-of-the-art baselines. The proposed approach is highly scalable to deploy on large-scale video sharing platforms like YouTube.
【Keywords】: classification; collaborative filtering; metric learning; recommendation; triplet learning; video understanding
【Paper Link】 【Pages】:491-499
【Authors】: Minyong R. Lee ; Milan Shen
【Abstract】: Online controlled experiments, or A/B testing, has been a standard framework adopted by most online product companies to measure the effect of any new change. Companies use various statistical methods including hypothesis testing and statistical inference to quantify the business impact of the changes and make product decisions. Nowadays, experimentation platforms can run as many as hundreds or even more experiments concurrently. When a group of experiments is conducted, usually the ones with significant successful results are chosen to be launched into the product. We are interested in learning the aggregated impact of the launched features. In this paper, we investigate a statistical selection bias in this process and propose a correction method of getting an unbiased estimator. Moreover, we give an implementation example at Airbnb's ERF platform (Experiment Reporting Framework) and discuss the best practices to account for this bias.
【Keywords】: A/B testing; bias correction; multiple hypothesis testing; online experiments
【Paper Link】 【Pages】:500-508
【Authors】: Mu-Chu Lee ; Bin Gao ; Ruofei Zhang
【Abstract】: Generative Adversarial Networks (GAN) have achieved great success in generating realistic synthetic data like images, tags, and sentences. We explore using GAN to generate bid keywords directly from query in sponsored search ads selection, especially for rare queries. Specifically, in the query expansion (query-keyword matching) scenario in search advertising, we train a sequence to sequence model as the generator to generate keywords, conditioned on the user query, and use a recurrent neural network model as the discriminator to play an adversarial game with the generator. By applying the trained generator, we can generate keywords directly from a given query, so that we can highly improve the effectiveness and efficiency of query-keyword matching based ads selection in search advertising. We trained the proposed model in the clicked query-keyword pair dataset from a commercial search advertising system. Evaluation results show that the generated keywords are more relevant to the given query compared with the baseline model and they have big potential to bring extra revenue improvement.
【Keywords】: generative adversarial networks; online advertising; query keyword matching
【Paper Link】 【Pages】:509-518
【Authors】: Jia Li ; Yu Rong ; Helen Meng ; Zhihui Lu ; Timothy Kwok ; Hong Cheng
【Abstract】: With the increase of elderly population, Alzheimer's Disease (AD), as the most common cause of dementia among the elderly, is affecting more and more senior people. It is crucial for a patient to receive accurate and timely diagnosis of AD. Current diagnosis relies on doctors' experience and clinical test, which, unfortunately, may not be performed until noticeable AD symptoms are developed. In this work, we present our novel solution named time-aware TICC and CNN (TATC), for predicting AD from actigraphy data. TATC is a multivariate time series classification method using a neural attention-based deep learning approach. It not only performs accurate prediction of AD risk, but also generates meaningful interpretation of daily behavior pattern of subjects. TATC provides an automatic, low-cost solution for continuously monitoring the change of physical activity of subjects in daily living environment. We believe the future deployment of TATC can benefit both doctors and patients in early detection of potential AD risk.
【Keywords】: actigraphy data; alzheimer's disease; attention; circadian rhythm
【Paper Link】 【Pages】:519-527
【Authors】: Jianbo Li ; Jingrui He ; Yada Zhu
【Abstract】: Recent decades have witnessed the rapid growth of E-commerce. In particular, E-tail has provided customers with great convenience by allowing them to purchase retail products anywhere without visiting the actual stores. A recent trend in E-tail is to allow free shipping and hassle-free returns to further attract online customers. However, a downside of such a customer-friendly policy is the rapidly increasing return rate as well as the associated costs of handling returned online orders. Therefore, it has become imperative to take proactive measures for reducing the return rate and the associated cost. Despite the large amount of data available from historical purchase and return records, up until now, the problem of E-tail product return prediction has not attracted much attention from the data mining community. To address this problem, in this paper, we propose a generic framework for E-tail product return prediction named HyperGo . It aims to predict the customer's intention to return after s/he has put together the shopping basket. For the baskets with a high return intention, the E-tailers can then take appropriate measures to incentivize the customer not to issue a return and/or prepare for reverse logistics. The proposed HyperGo is based on a novel hypergraph representation of historical purchase and return records, effectively leveraging the rich information of basket composition. For a given basket, we propose a local graph cut algorithm using truncated random walk on the hypergraph to identify similar historical baskets. Based on these baskets, HyperGo is able to estimate the return intention on two levels: basket-level vs. product-level, which provides the E-tailers with detailed information regarding the reason for a potential return (e.g., duplicate products with different colors). One major benefit of the proposed local algorithm lies in its time complexity, which is linearly dependent on the size of the output cluster and polylogarithmically dependent on the volume of the hypergraph. This makes HyperGo particularly suitable for processing large-scale data sets. The experimental results on multiple real-world E-tail data sets demonstrate the effectiveness and efficiency of HyperGo .
【Keywords】: clustering; e-commerce; graph partitioning; hypergraph; product return
【Paper Link】 【Pages】:528-536
【Authors】: Xijun Li ; Mingxuan Yuan ; Di Chen ; Jianguo Yao ; Jia Zeng
【Abstract】: Split Delivery Vehicle Routing Problem with 3D Loading Constraints (3L-SDVRP) can be seen as the most important problem in large-scale manufacturing logistics. The goal is to devise a strategy consisting of three NP-hard planning components: vehicle routing, cargo splitting and container loading, which shall be jointly optimized for cost savings. The problem is an enhanced variant of the classical logistics problem 3L-CVRP, and its complexity leaps beyond current studies of solvability. Our solution employs a novel data-driven three-layer search algorithm (DTSA), which we designed to improve both the efficiency and effectiveness of traditional meta-heuristic approaches, through learning from data and from simulation. A detailed experimental evaluation on real data shows our algorithm is versatile in solving this practical complex constrained multi-objective optimization problem, and our framework may be of general interest. DTSA performs much better than the state-of-the-art algorithms both in efficiency and optimization performance. Our algorithm has been deployed in the UAT (User Acceptance Test) environment; conservative estimates suggest that the full usage of our algorithm would save millions of dollars in logistics costs per year, besides savings due to automation and more efficient routing.
【Keywords】: container loading; logistics; transportation planning
【Paper Link】 【Pages】:537-546
【Authors】: Binbing Liao ; Jingqing Zhang ; Chao Wu ; Douglas McIlwraith ; Tong Chen ; Shengwen Yang ; Yike Guo ; Fei Wu
【Abstract】: Predicting traffic conditions from online route queries is a challenging task as there are many complicated interactions over the roads and crowds involved. In this paper, we intend to improve traffic prediction by appropriate integration of three kinds of implicit but essential factors encoded in auxiliary information. We do this within an encoder-decoder sequence learning framework that integrates the following data: 1) offline geographical and social attributes. For example, the geographical structure of roads or public social events such as national celebrations; 2) road intersection information. In general, traffic congestion occurs at major junctions; 3) online crowd queries. For example, when many online queries issued for the same destination due to a public performance, the traffic around the destination will potentially become heavier at this location after a while. Qualitative and quantitative experiments on a real-world dataset from Baidu have demonstrated the effectiveness of our framework.
【Keywords】: LSTM; encoder-decoder; sequence learning; traffic prediction
【Paper Link】 【Pages】:547-555
【Authors】: Qingwei Lin ; Weichen Ke ; Jian-Guang Lou ; Hongyu Zhang ; Kaixin Sui ; Yong Xu ; Ziyi Zhou ; Bo Qiao ; Dongmei Zhang
【Abstract】: The ability to identify insights from multi-dimensional big data is important for business intelligence. To enable interactive identification of insights, a large number of dimension combinations need to be searched and a series of aggregation queries need to be quickly answered. The existing approaches answer interactive queries on big data through data cubes or approximate query processing. However, these approaches can hardly satisfy the performance or accuracy requirements for ad-hoc queries demanded by interactive exploration. In this paper, we present BigIN4, a system for instant, interactive identification of insights from multi-dimensional big data. BigIN4 gives insight suggestions by enumerating subspaces and answers queries by combining data cube and approximate query processing techniques. If a query cannot be answered by the cubes, BigIN4 decomposes it into several low dimensional queries that can be directly answered by the cubes through an online constructed Bayesian Network and gives an approximate answer within a statistical interval. Unlike the related works, BigIN4 does not require any prior knowledge of queries and does not assume a certain data distribution. Our experiments on ten real-world large-scale datasets show that BigIN4 can successfully identify insights from big data. Furthermore, BigIN4 can provide approximate answers to aggregation queries effectively (with less than 10% error on average) and efficiently (50x faster than sampling-based methods).
【Keywords】: approximate query processing; data cube; insight identification; interactive data analytics
【Paper Link】 【Pages】:556-565
【Authors】: Qiaoling Liu ; Josh Chao ; Thomas Mahoney ; Alan Chern ; Chris Min ; Faizan Javed ; Valentin Jijkoun
【Abstract】: Employer name normalization, or linking employer names in job postings or resumes to entities in an employer knowledge base (KB), is important for many downstream applications in the online recruitment domain. Key challenges for employer name normalization include handling employer names from both job postings and resumes, leveraging the corresponding location and URL context, and handling name variations and duplicates in the KB. In this paper, we describe the CompanyDepot system developed at CareerBuilder, which uses machine learning techniques to address these challenges. We discuss the main challenges and share our lessons learned in deployment, maintenance, and utilization of the system over the past two years. We also share several examples of how the system has been used in applications at CareerBuilder to deliver value to end customers.
【Keywords】: deployment; employer name normalization; entity linking
【Paper Link】 【Pages】:566-575
【Authors】: Zhaoyang Liu ; Yanyan Shen ; Yanmin Zhu
【Abstract】: Dockless shared bikes, which aim at providing a more flexible and convenient solution to the first-and-last mile connection, come into China and expand to other countries at a very impressing speed. The expansion of shared bike business in new cities brings many challenges among which, the most critical one is the parking chaos caused by too many bikes yet insufficient demands. To allow possible actions to be taken in advance, this paper studies the problem of detecting parking hotspots in a new city where no dockless shared bike has been deployed. We propose to measure road hotness by bike density with the help of the Kernal Density Estimation. We extract useful features from multi-source urban data and introduce a novel domain adaption network for transferring hotspots knowledge learned from one city with shared bikes to a new city. The extensive experimental results demonstrate the effectiveness of our proposed approach compared with various baselines.
【Keywords】: dockless shared bikes; hotspots detection; transfer learning; urban computing
【Paper Link】 【Pages】:576-585
【Authors】: Tova Milo ; Amit Somech
【Abstract】: Modern Interactive Data Analysis (IDA) platforms, such as Kibana, Splunk, and Tableau, are gradually replacing traditional OLAP/SQL tools, as they allow for easy-to-use data exploration, visualization, and mining, even for users lacking SQL and programming skills. Nevertheless, data analysis is still a di cult task, especially for non-expert users. To that end we present REACT, a recommender system designed for modern IDA platforms. In these platforms, analysis sessions interweave high-level actions of multiple types and operate over diverse datasets . REACT identifies and generalizes relevant (previous) sessions to generate personalized next-action suggestions to the user. We model the user's analysis context using a generic tree based model, where the edges represent the user's recent actions, and the nodes represent their result "screens". A dedicated context-similarity metric is employed for efficient indexing and retrieval of relevant candidate next-actions. These are then generalized to abstract actions that convey common fragments, then adapted to the specific user context. To prove the utility of REACT we performed an extensive online and offline experimental evaluation over real-world analysis logs from the cyber security domain, which we also publish to serve as a benchmark dataset for future work.
【Keywords】: analysis action recommendation; interactive data analysis
【Paper Link】 【Pages】:586-595
【Authors】: Piero Molino ; Huaixiu Zheng ; Yi-Chia Wang
【Abstract】: For a company looking to provide delightful user experiences, it is of paramount importance to take care of any customer issues. This paper proposes COTA, a system to improve speed and reliability of customer support for end users through automated ticket classification and answers selection for support representatives. Two machine learning and natural language processing techniques are demonstrated: one relying on feature engineering (COTA v1) and the other exploiting raw signals through deep learning architectures (COTA v2). COTA v1 employs a new approach that converts the multi-classification task into a ranking problem, demonstrating significantly better performance in the case of thousands of classes. For COTA v2, we propose an Encoder-Combiner-Decoder, a novel deep learning architecture that allows for heterogeneous input and output feature types and injection of prior knowledge through network architecture choices. This paper compares these models and their variants on the task of ticket classification and answer selection, showing model COTA v2 outperforms COTA v1, and analyzes their inner workings and shortcomings. Finally, an A/B test is conducted in a production setting validating the real-world impact of COTA in reducing issue resolution time by 10 percent without reducing customer satisfaction.
【Keywords】: customer satisfaction; customer support; deep learning; intent detection; machine learning; natural language processing
【Paper Link】 【Pages】:596-605
【Authors】: Yabo Ni ; Dan Ou ; Shichen Liu ; Xiang Li ; Wenwu Ou ; Anxiang Zeng ; Luo Si
【Abstract】: Tasks such as search and recommendation have become increasingly important for E-commerce to deal with the information overload problem. To meet the diverse needs of different users, personalization plays an important role. In many large portals such as Taobao and Amazon, there are a bunch of different types of search and recommendation tasks operating simultaneously for personalization. However, most of current techniques address each task separately. This is suboptimal as no information about users shared across different tasks. In this work, we propose to learn universal user representations across multiple tasks for more effective personalization. In particular, user behavior sequences (e.g., click, bookmark or purchase of products) are modeled by LSTM and attention mechanism by integrating all the corresponding content, behavior and temporal information. User representations are shared and learned in an end-to-end setting across multiple tasks. Benefiting from better information utilization of multiple tasks, the user representations are more effective to reflect their interests and are more general to be transferred to new tasks. We refer this work as Deep User Perception Network (DUPN) and conduct an extensive set of offline and online experiments. Across all tested five different tasks, our DUPN consistently achieves better results by giving more effective user representations. Moreover, we deploy DUPN in large scale operational tasks in Taobao. Detailed implementations, e.g., incremental model updating, are also provided to address the practical issues for the real world applications.
【Keywords】: attention; e-commerce search; multi-task learning; recurrent neural network; representation learning
【Paper Link】 【Pages】:606-615
【Authors】: Tim Op De Beéck ; Wannes Meert ; Kurt Schütte ; Benedicte Vanwanseele ; Jesse Davis
【Abstract】: Running is extremely popular and around 10.6 million people run regularly in the United States alone. Unfortunately, estimates indicated that between 29% to 79% of runners sustain an overuse injury every year. One contributing factor to such injuries is excessive fatigue, which can result in alterations in how someone runs that increase the risk for an overuse injury. Thus being able to detect during a running session when excessive fatigue sets in, and hence when these alterations are prone to arise, could be of great practical importance. In this paper, we explore whether we can use machine learning to predict the rating of perceived exertion (RPE), a validated subjective measure of fatigue, from inertial sensor data of individuals running outdoors. We describe how both the subjective target label and the realistic outdoor running environment introduce several interesting data science challenges. We collected a longitudinal dataset of runners, and demonstrate that machine learning can be used to learn accurate models for predicting RPE.
【Keywords】: machine learning; sensor fusion; sports analytics
【Paper Link】 【Pages】:616-625
【Authors】: Barak Oshri ; Annie Hu ; Peter Adelson ; Xiao Chen ; Pascaline Dupas ; Jeremy Weinstein ; Marshall Burke ; David B. Lobell ; Stefano Ermon
【Abstract】: The UN Sustainable Development Goals allude to the importance of infrastructure quality in three of its seventeen goals. However, monitoring infrastructure quality in developing regions remains prohibitively expensive and impedes efforts to measure progress toward these goals. To this end, we investigate the use of widely available remote sensing data for the prediction of infrastructure quality in Africa. We train a convolutional neural network to predict ground truth labels from the Afrobarometer Round 6 survey using Landsat 8 and Sentinel 1 satellite imagery. Our best models predict infrastructure quality with AUROC scores of 0.881 on Electricity, 0.862 on Sewerage, 0.739 on Piped Water, and 0.786 on Roads using Landsat 8. These performances are significantly better than models that leverage OpenStreetMap or nighttime light intensity on the same tasks. We also demonstrate that our trained model can accurately make predictions in an unseen country after fine-tuning on a small sample of images. Furthermore, the model can be deployed in regions with limited samples to predict infrastructure outcomes with higher performance than nearest neighbor spatial interpolation.
【Keywords】: applied computing; computer vision; computing methodologies; economics; neural networks
【Paper Link】 【Pages】:626-635
【Authors】: Fabio Petroni ; Natraj Raman ; Timothy Nugent ; Armineh Nourbakhsh ; Zarko Panic ; Sameena Shah ; Jochen L. Leidner
【Abstract】: The automatic extraction of breaking news events from natural language text is a valuable capability for decision support systems. Traditional systems tend to focus on extracting events from a single media source and often ignore cross-media references. Here, we describe a large-scale automated system for extracting natural disasters and critical events from both newswire text and social media. We outline a comprehensive architecture that can identify, categorize and summarize seven different event types - namely floods, storms, fires, armed conflict, terrorism, infrastructure breakdown, and labour unavailability. The system comprises fourteen modules and is equipped with a novel coreference mechanism, capable of linking events extracted from the two complementary data sources. Additionally, the system is easily extensible to accommodate new event types. Our experimental evaluation demonstrates the effectiveness of the system.
【Keywords】: event coreference; event extraction; first story detection; information extraction; news analytics
【Paper Link】 【Pages】:636-645
【Authors】: Jinfeng Rao ; Ferhan Türe ; Jimmy Lin
【Abstract】: We tackle the challenge of understanding voice queries posed against the Comcast Xfinity X1 entertainment platform, where consumers direct speech input at their "voice remotes". Such queries range from specific program navigation (i.e., watch a movie) to requests with vague intents and even queries that have nothing to do with watching TV. We present successively richer neural network architectures to tackle this challenge based on two key insights: The first is that session context can be exploited to disambiguate queries and recover from ASR errors, which we operationalize with hierarchical recurrent neural networks. The second insight is that query understanding requires evidence integration across multiple related tasks, which we identify as program prediction, intent classification, and query tagging. We present a novel multi-task neural architecture that jointly learns to accomplish all three tasks. Our initial model, already deployed in production, serves millions of queries daily with an improved customer experience. The novel multi-task learning model, first described here, is evaluated through carefully-controlled laboratory experiments, which demonstrates further gains in effectiveness and increased system capabilities.
【Keywords】: intelligent agent; intent classification; program prediction; query tagging; speech interface
【Paper Link】 【Pages】:646-654
【Authors】: Yuecheng Rong ; Zhimian Xu ; Ruibo Yan ; Xu Ma
【Abstract】: Realtime parking availability information is of great importance to help drivers to find a parking space faster and thus to reduce parking search traffic. While there are limited realtime parking availability systems in a city due to the expensive cost of sensor device and maintaining realtime parking information. In this paper, we estimate the realtime parking availability throughout a city using historical parking availability data reported by a limited number of existing sensors of parking lots and a variety of datasets we observed in the city, such as meteorology, events, map mobility trace data and navigation data from Baidu map, and POIs. We propose a deep-learning-based approach, called Du-Parking, which consists of three major components modeling temporal closeness, period and current general influence, respectively. More specifically, we employ long short-term memory (LSTM) to model the temporal closeness and period, and meanwhile using two fully-connected layers to model the current general factors. Our approach learns to dynamically aggregate the output of the three components, to estimate the final parking availability of given parking lot. Using the proposed approach, we have provided the realtime parking availability information in Baidu map app, in nine cities in China. We evaluated our approach in Beijing and Shenzhen. The results show the advantages of our method over two categories of baselines, including linear interpolations, and the well-known classification model like GBDT.
【Keywords】: dnn; lstm; parking availability; spatial-temporal big data
【Paper Link】 【Pages】:655-664
【Authors】: Rui Paulo Ruhrländer ; Martin Boissier ; Matthias Uflacker
【Abstract】: Recent progress in machine learning and related fields like recommender systems open up new possibilities for data-driven approaches. One example is the prediction of a movie's box office revenue, which is highly relevant for optimizing production and marketing. We use individual recommendations and user-based forecast models in a system that forecasts revenue and additionally provides actionable insights for industry professionals. In contrast to most existing models that completely neglect user preferences, our approach allows us to model the most important source for movie success: moviegoer taste and behavior. We divide the problem into three distinct stages: (i) we use matrix factorization recommenders to model each user's taste, (ii) we then predict the individual consumption behavior, and (iii) eventually aggregate users to predict the box office result. We compare our approach to the current industry standard and show that the inclusion of user rating data reduces the error by a factor of 2x and outperforms recently published research.
【Keywords】: box office predictions; gradient-boosted trees; logistic regression; motion picture industry; recommender systems; user ratings
【Paper Link】 【Pages】:665-674
【Authors】: Elaheh Sadredini ; Deyuan Guo ; Chunkun Bo ; Reza Rahimi ; Kevin Skadron ; Hongning Wang
【Abstract】: Part-of-speech (POS) tagging is the foundation of many natural language processing applications. Rule-based POS tagging is a wellknown solution, which assigns tags to the words using a set of predefined rules. Many researchers favor statistical-based approaches over rule-based methods for better empirical accuracy. However, until now, the computational cost of rule-based POS tagging has made it difficult to study whether more complex rules or larger rulesets could lead to accuracy competitive with statistical approaches. In this paper, we leverage two hardware accelerators, the Automata Processor (AP) and Field Programmable Gate Arrays (FPGA), to accelerate rule-based POS tagging by converting rules to regular expressions and exploiting the highly-parallel regular-expressionmatching ability of these accelerators. We study the relationship between rule set size and accuracy, and observe that adding more rules only poses minimal overhead on the AP and FPGA. This allows a substantial increase in the number and complexity of rules, leading to accuracy improvement. Our experiments on Treebank and Brown corpora achieve up to 2,600X and 1,914X speedups on the AP and on the FPGA respectively over rule-based methods on the CPU in the rule-matching stage, up to 58× speedup over the Perceptron POS tagger on the CPU in total testing time, and up to 253× speedup over the LSTM tagger on the GPU in total testing time, while showing a competitive accuracy compared to neural-network and statistical solutions.
【Keywords】: POS tagging; automata; hardware acceleration
【Paper Link】 【Pages】:675-684
【Authors】: Tara Safavi ; Maryam Davoodi ; Danai Koutra
【Abstract】: From artificial intelligence to network security to hardware design, it is well-known that computing research drives many important technological and societal advancements. However, less is known about the long-term career paths of the people behind these innovations. What do their careers reveal about the evolution of computing research? Which institutions were and are the most important in this field, and for what reasons? Can insights into computing career trajectories help predict employer retention? In this paper we analyze several decades of post-PhD computing careers using a large new dataset rich with professional information, and propose a versatile career network model, R 3 , that captures temporal career dynamics. With R 3 we track important organizations in computing research history, analyze career movement between industry, academia, and government, and build a powerful predictive model for individual career transitions. Our study, the first of its kind, is a starting point for understanding computing research careers, and may inform employer recruitment and retention mechanisms at a time when the demand for specialized computational expertise far exceeds supply.
【Keywords】: career mining; computing research; graph mining; hits; professional trajectories
【Paper Link】 【Pages】:685-694
【Authors】: Karan Samel ; Xu Miao
【Abstract】: The great success of supervised learning has initiated a paradigm shift from building a deterministic software system to a probabilistic artificial intelligent system throughout the industry. The historical records in enterprise domains can potentially bootstrap the traditional business into the modern data-driven approach almost everywhere. The introduction of the Deep Neural Networks (DNNs) significantly reduces the efforts of feature engineering so that supervised learning becomes even more automated. The last bottleneck is to ensure the data quality, particularly the label quality, because the performance of supervised learning is bounded by the errors present in labels. In this paper, we present a new Active Deep Denoising (ADD) approach that first builds a DNN noise model, and then adopts an active learning algorithm to identify the optimal denoising function. We prove that under the low noise condition, we only need to query the oracle with log n examples where n is the total number in the data. We apply ADD on one enterprise application and show that it can effectively reduce 1/3 of the prediction error with only 0.1% of examples verified by the oracle.
【Keywords】: active learning; classification; deep neural networks; denoising
【Paper Link】 【Pages】:695-704
【Authors】: Issei Sato ; Yukihiro Nomura ; Shouhei Hanaoka ; Soichiro Miki ; Naoto Hayashi ; Osamu Abe ; Yoshitaka Masutani
【Abstract】: The reading workload for radiologists is increasing because the numbers of examinations and images per examination are increasing due to the technical progress on imaging modalities such as computed tomography and magnetic resonance imaging. A computer-assisted detection (CAD) system based on machine learning is expected to assist radiologists. The preliminary results of a multi-institutional study indicate that the performance of the CAD system for each institution improved using training data of other institutions. This indicates that transfer learning may be useful for developing the CAD systems among multiple institutions. In this paper, we focus on transfer learning without sharing training data due to the need to protect personal information in each institution. Moreover, we raise a problem of negative transfer in CAD system and propose an algorithm for inhibiting negative transfer. Our algorithm provides a theoretical guarantee for managing CAD software in terms of transfer learning and exhibits experimentally better performance compared to that of the current algorithm in cerebral aneurysm detection.
【Keywords】: computer-assisted detection; machine learning; medical image analysis; negative transfer; transfer learning
【Paper Link】 【Pages】:705-714
【Authors】: Rainer Schlosser ; Martin Boissier
【Abstract】: Most online markets are characterized by competitive settings and limited demand information. Due to the complexity of such markets, efficient pricing strategies are hard to derive. We analyze stochastic dynamic pricing models in competitive markets with multiple offer dimensions, such as price, quality, and rating. In a first step, we use a simulated test market to study how sales probabilities are affected by specific customer behaviors and the strategic interaction of price reaction strategies. Further, we show how different state-of-the-art learning techniques can be used to estimate sales probabilities from partially observable market data. In a second step, we use a dynamic programming model to compute an effective pricing strategy which circumvents the curse of dimensionality. We demonstrate that the strategy is applicable even if the number of competitors is large and their strategies are unknown. We show that our heuristic can be tuned to smoothly balance profitability and speed of sales. Further, our approach is currently applied by a large seller on Amazon for the sale of used books. Sales results show that our data-driven strategy outperforms the rule-based strategy of an experienced seller by a profit increase of more than 20%.
【Keywords】: decision making; demand learning; dynamic pricing; e-commerce
【Paper Link】 【Pages】:715-723
【Authors】: Supreeth Prajwal Shashikumar ; Amit J. Shah ; Gari D. Clifford ; Shamim Nemati
【Abstract】: Detection of atrial fibrillation (AF), a type of cardiac arrhythmia, is difficult since many cases of AF are usually clinically silent and undiagnosed. In particular paroxysmal AF is a form of AF that occurs occasionally, and has a higher probability of being undetected. In this work, we present an attention based deep learning framework for detection of paroxysmal AF episodes from a sequence of windows. Time-frequency representation of 30 seconds recording windows, over a 10 minute data segment, are fed sequentially into a deep convolutional neural network for image-based feature extraction, which are then presented to a bidirectional recurrent neural network with an attention layer for AF detection. To demonstrate the effectiveness of the proposed framework for transient AF detection, we use a database of 24 hour Holter Electrocardiogram (ECG) recordings acquired from 2850 patients at the University of Virginia heart station. The algorithm achieves an AUC of 0.94 on the testing set, which exceeds the performance of baseline models. We also demonstrate the cross-domain generalizablity of the approach by adapting the learned model parameters from one recording modality (ECG) to another (photoplethysmogram) with improved AF detection performance. The proposed high accuracy, low false alarm algorithm for detecting paroxysmal AF has potential applications in long-term monitoring using wearable sensors.
【Keywords】: atrial fibrillation; convolutional neural network; deep learning; recurrent neural network; transfer learning
【Paper Link】 【Pages】:724-733
【Authors】: Bilong Shen ; Xiaodan Liang ; Yufeng Ouyang ; Miaofeng Liu ; Weimin Zheng ; Kathleen M. Carley
【Abstract】: A mobility event occurs when a passenger moves out or takes off from a particular location. Mobility event prediction is of utmost importance in the field of intelligent transportation systems. It has a huge potential in solving important problems such as minimizing passenger waiting time and maximizing the utilization of the transportation resources by planning vehicle routes and dispatching transportation resources. Recently, numerous mobility pattern mining methods have been proposed to predict the transportation supply and demand in different locations. Those methods first reveal the event patterns of each Place of Interests (POI) independently and then employ a separate region function as a post-processing step. This separate process, that disregards the intrinsic spatial and temporal pattern correlations between POI, is sub-optimal and complex, resulting in a poor generalization in different scenarios. In this work, we propose a Spatial-Temporal mobility Event Prediction framework based on Deep neural network (StepDeep) for simultaneously taking into account all correlated spatial and temporal mobility patterns. StepDeep not only simplifies the prediction process but also enhances the prediction accuracy. Our StepDeep proposes a novel problem formulation towards an end-to-end mobility prediction framework, that is, switching mobility events over time in an area into an event video and then posing the mobility prediction problem as a video prediction task. Such a novel formulation can naturally encode spatial and temporal dependencies for each POI. StepDeep thus predicts the spatial-temporal events by incorporating the new time sensitive convolution filters, spatial sensitive convolution filters, and spatial-temporal sensitive convolution filters into a single network. We conduct experimental evaluations on a real-world 547-day New York City taxi trajectory dataset, which show that StepDeep provides higher prediction accuracy than five existing baselines. Moreover, StepDeep is generalizable and can be applied to numerous spatial-temporal event prediction scenarios.
【Keywords】: convolutional neural network; deep learning; intelligent transportation systems; spatial temporal data mining; time series prediction
【Paper Link】 【Pages】:734-743
【Authors】: Ying Sheng ; Sandeep Tata ; James Bradley Wendt ; Jing Xie ; Qi Zhao ; Marc Najork
【Abstract】: Extracting structured data from emails can enable several assistive experiences, such as reminding the user when a bill payment is due, answering queries about the departure time of a booked flight, or proactively surfacing an emailed discount coupon while the user is at that store. This paper presents Juicer, a system for extracting information from email that is serving over a billion Gmail users daily. We describe how the design of the system was informed by three key principles: scaling to a planet-wide email service, isolating the complexity to provide a simple experience for the developer, and safeguarding the privacy of users (our team and the developers we support are not allowed to view any single email). We describe the design tradeoffs made in building this system, the challenges faced and the approaches used to tackle them. We present case studies of three extraction tasks implemented on this platform---bill reminders, commercial offers, and hotel reservations---to illustrate the effectiveness of the platform despite challenges unique to each task. Finally, we outline several areas of ongoing research in large-scale machine-learned information extraction from email.
【Keywords】: document classification; email; information extraction; wrapper induction
【Paper Link】 【Pages】:744-753
【Authors】: Yeming Shi ; Claudia Perlich ; Rod Hook ; Wickus Martin ; Melinda Han Williams ; Justin Moynihan ; Patrick McCarthy ; Peter Lenz ; Reka Daniel-Weiner ; Roger Cost
【Abstract】: A growing proportion of digital advertising slots is purchased through real time bidding auctions, which enables advertisers to impose highly specific criteria on which devices and opportunities to target. Employing sophisticated targeting criteria reliably increases the performance of an ad campaign, however too strict criteria will limit its scale. This raises the need to estimate the number of anticipated ad impressions at a given campaign performance level, thus enabling advertisers to tune the campaign's budget to an optimal performance-scale trade-off. In this paper, we provide a way to estimate campaign impressions given the campaign criteria. There are several challenges to this problem. First, the criteria contain logic to include and exclude combinations of audience segments, making the space of possible criteria exponentially large. Furthermore, it is difficult to validate predictions, because we wish to predict the number of impressions available without budget constraints, a situation we can rarely observe in practice. In our approach, we first treat the audience segment inclusion/exclusion criteria separately as a data compression problem, where we use MinHash "sketches" to estimate audience size. We then model the number of available impressions with a regularized linear regression in log space, using multiplier features motivated by the assumption that some components of the additional campaign criteria are conditionally independent. We construct a validation set by projecting observed RTB data (under real budget constraints) to get impression availability without budget constraints. Using this approach, our average prediction is a factor of 2.2 from the true impression availability, and the deployed product responds to user requests in well under a second, meeting both accuracy and latency requirements for decision making in the execution of advertising campaigns.
【Keywords】: bloom filter; cardinality estimation; online advertising; real-time bidding
【Paper Link】 【Pages】:754-763
【Authors】: Mark Silvis ; Anthony Sicilia ; Alexandros Labrinidis
【Abstract】: The amount of food waste generated by the U.S. is staggering, both expensive in economic cost and environmental side effects. Surplus food, which could be used to feed people facing food insecurity, is instead discarded and placed in landfills. Institutions, universities, and non-profits have noticed this issue and are beginning to take action to reduce surplus food waste, typically by redirecting it to food banks and other organizations or having students transport or eat the food. These approaches present challenges such as transportation, volunteer availability, and lack of prioritization of those in need. In this paper, we introduce PittGrub, a notification system to intelligently select users to invite to events that have leftover food. PittGrub was invented to help reduce food waste at the University of Pittsburgh. We use reinforcement learning to determine how many notifications to send out and a valuation model to determine whom to prioritize in the notifications. Our goal is to produce a system that prioritizes feeding students in need while simultaneously eliminating food waste and maintaining a fair distribution of notifications. As far as we are aware, PittGrub is unique in its approach to eliminating surplus food waste while striving for social good. We compare our proposed techniques to multiple baselines on simulated datasets to demonstrate effectiveness. Experimental results among various algorithms show promise in eliminating food waste while helping those facing food insecurity and treating users fairly. Our prototype is currently in beta and coming soon to the Apple App Store.
【Keywords】: data mining; food waste; q-learning; reinforcement learning
【Paper Link】 【Pages】:764-773
【Authors】: Bhavkaran Singh Walia ; Qianyi Hu ; Jeffrey Chen ; Fangyan Chen ; Jessica Lee ; Nathan Kuo ; Palak Narang ; Jason Batts ; Geoffrey Arnold ; Michael Madaio
【Abstract】: Recent high-profile fire incidents in cities around the world have highlighted gaps in fire risk reduction efforts, as cities grapple with fewer resources and more properties to safeguard. To address this resource gap, prior work has developed machine learning frameworks to predict fire risk and prioritize fire inspections. However, existing approaches were limited by not including time-varying data, never deploying in real-time, and only predicting risk for a small subset of commercial properties in their city. Here, we have developed a predictive risk framework for all 20,636 commercial properties in Pittsburgh, based on time-varying data from a variety of municipal agencies. We have deployed our fire risk model on Pittsburgh Bureau of Fire's (PBF), and we have developed preliminary risk models for residential property fire risk prediction. Our commercial risk model outperforms the prior state of the art with a kappa of 0.33 compared to their 0.17, and is able to be applied to nearly 4 times as many properties as the prior model. In the 5 weeks since our model was first deployed, 58% of our predicted high-risk properties had a fire incident of any kind, while 23% of the building fire incidents that occurred took place in our predicted high or medium risk properties. The risk scores from our commercial model are visualized on an interactive dashboard and map to assist the PBF with planning their fire risk reduction initiatives. This work is already helping to improve fire risk reduction in Pittsburgh and is beginning to be adopted by other cities.
【Keywords】: civic risk modeling; fire risk; predictive modeling; spatio-temporal risk prediction
【Paper Link】 【Pages】:774-782
【Authors】: Peter W. J. Staar ; Michele Dolfi ; Christoph Auer ; Costas Bekas
【Abstract】: Over the past few decades, the amount of scientific articles and technical literature has increased exponentially in size. Consequently, there is a great need for systems that can ingest these documents at scale and make the contained knowledge discoverable. Unfortunately, both the format of these documents (e.g. the PDF format or bitmap images) as well as the presentation of the data (e.g. complex tables) make the extraction of qualitative and quantitive data extremely challenging. In this paper, we present a modular, cloud-based platform to ingest documents at scale. This platform, called the Corpus Conversion Service (CCS), implements a pipeline which allows users to parse and annotate documents (i.e. collect ground-truth), train machine-learning classification algorithms and ultimately convert any type of PDF or bitmap-documents to a structured content representation format. We will show that each of the modules is scalable due to an asynchronous microservice architecture and can therefore handle massive amounts of documents. Furthermore, we will show that our capability to gather groundtruth is accelerated by machine-learning algorithms by at least one order of magnitude. This allows us to both gather large amounts of ground-truth in very little time and obtain very good precision/recall metrics in the range of 99% with regard to content conversion to structured output. The CCS platform is currently deployed on IBM internal infrastructure and serving more than 250 active users for knowledge-engineering project engagements.
【Keywords】: ai; artificial intelligence; asynchronous architecture; cloud architecture; cloud computing; deep learning; document conversion; ibm; ibm research; knowledge ingestion; machine learning; pdf; table processing
【Paper Link】 【Pages】:783-792
【Authors】: Hiroki Sugiura ; Taichi Kiwaki ; Siamak Yousefi ; Hiroshi Murata ; Ryo Asaoka ; Kenji Yamanishi
【Abstract】: Conventionally, glaucoma is diagnosed on the basis of visual field sensitivity (VF). However, the VF test is time-consuming, costly, and noisy. Using retinal thickness (RT) for glaucoma diagnosis is currently desirable. Thus, we propose a new methodology for estimating VF from RT in glaucomatous eyes. The key ideas are to use our new methods of pattern-based regularization (PBR) and pattern-based visualization (PBV) with convolutional neural networks (CNNs). PBR effectively conducts supervised learning of RT-VF relations in combination with unsupervised learning from non-paired VF data. We can thereby avoid overfitting of a CNN to small sized data. PBV visualizes functional correspondence between RT and VF with its nonlinearity preserved. We empirically demonstrate with real datasets that a CNN with PBR achieves the highest estimation accuracy to date and that a CNN with PBV is effective for knowledge discovery in an ophthalmological context.
【Keywords】: convolutional neural networks; glaucoma; non-negative matrix factorization; regularization; visualization
【Paper Link】 【Pages】:793-801
【Authors】: Mengying Sun ; Fengyi Tang ; Jinfeng Yi ; Fei Wang ; Jiayu Zhou
【Abstract】: The surging availability of electronic medical records (EHR) leads to increased research interests in medical predictive modeling. Recently many deep learning based predicted models are also developed for EHR data and demonstrated impressive performance. However, a series of recent studies showed that these deep models are not safe: they suffer from certain vulnerabilities. In short, a well-trained deep network can be extremely sensitive to inputs with negligible changes. These inputs are referred to as adversarial examples. In the context of medical informatics, such attacks could alter the result of a high performance deep predictive model by slightly perturbing a patient's medical records. Such instability not only reflects the weakness of deep architectures, more importantly, it offers a guide on detecting susceptible parts on the inputs. In this paper, we propose an efficient and effective framework that learns a time-preferential minimum attack targeting the LSTM model with EHR inputs, and we leverage this attack strategy to screen medical records of patients and identify susceptible events and measurements. The efficient screening procedure can assist decision makers to pay extra attentions to the locations that can cause severe consequence if not measured correctly. We conduct extensive empirical studies on a real-world urgent care cohort and demonstrate the effectiveness of the proposed screening.
【Keywords】: adversarial attack; medical records; predictive modeling
【Paper Link】 【Pages】:802-810
【Authors】: Harini Suresh ; Jen J. Gong ; John V. Guttag
【Abstract】: Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations.
【Keywords】: clinical risk models; multi-task learning; patient subpopulation discovery
【Paper Link】 【Pages】:811-820
【Authors】: Jianrong Tao ; Jiarong Xu ; Linxia Gong ; Yifu Li ; Changjie Fan ; Zhou Zhao
【Abstract】: Game bots are automated programs that assist cheating users and enable them to obtain huge superiority, leading to an imbalance in the game ecosystem and the collapse of user interest. Therefore, game bot detection becomes particularly important and urgent. Among many kinds of online games, massively multiplayer online role playing games (MMORPGs), such as World of Warcraft and AION, provide immersive gaming experience and attract many loyal fans. At the same time, however, game bots in MMORPGs have proliferated in volume and method, evolving with the real-world detection methods and showing strong diversity, leaving MMORPG bot detection efforts extremely difficult. To deal with the fast-changing nature of game bots, we here proposed a generalized game bot detection framework for MMORPGs termed NGUARD, denoting NetEase Games' Guard. NGUARD takes charge of automatically differentiating game bots from humans for MMORPGs. In detail, NGUARD exploits a combination of supervised and unsupervised methods. Supervised models are utilized to detect game bots in observed patterns according to the training data. Meanwhile, unsupervised solutions are employed to detect clustered game bots and help discovering new bots. The game bot detection framework NGUARD has been implemented and deployed in multiple MMORPG productions in the NetEase Game portfolio, achieving remarkable performance improvement and acceleration compared to traditional methods. Moreover, the framework reveals outstanding robustness for game bots in mutated patterns and even in completely new patterns on account of the design of the auto-iteration mechanism.
【Keywords】: auto-iteration mechanism; bidirectional lstm; game bot detection; sequence autoencoder; time-interval event2vec
【Paper Link】 【Pages】:821-829
【Authors】: Martin Valdez-Vivas ; Caner Gocmen ; Andrii Korotkov ; Ethan Fang ; Kapil Goenka ; Sherry Chen
【Abstract】: Multiple teams at Facebook are tasked with monitoring compute and memory utilization metrics that are important for managing the efficiency of the codebase. An efficiency regression is characterized by instances where the CPU utilization or query per second (QPS) patterns of a function or endpoint experience an unexpected increase over its prior baseline. If the code changes responsible for these regressions get propagated to Facebook's fleet of web servers, the impact of the inefficient code will get compounded over billions of executions per day, carrying potential ramifications to Facebook's scaling efforts and the quality of the user experience. With a codebase ingesting in excess of 1,000 diffs across multiple pushes per day, it is important to have a real-time solution for detecting regressions that is not only scalable and high in recall, but also highly precise in order to avoid overrunning the remediation queue with thousands of false positives. This paper describes the end-to-end regression detection system designed and used at Facebook. The main detection algorithm is based on sequential statistics supplemented by signal processing transformations, and the performance of the algorithm was assessed with a mixture of online and offline tests across different use cases. We compare the performance of our algorithm against a simple benchmark as well as a commercial anomaly detection software solution.
【Keywords】: anomaly detection; change point detection; code efficiency monitoring; codebase efficiency; continuous push; cusum methods; regression detection; resource utilization monitoring; systems data science
【Paper Link】 【Pages】:830-838
【Authors】: Jingyuan Wang ; Xiaojian Wang ; Junjie Wu
【Abstract】: Since the 21st century, the global outbreaks of infectious diseases such as SARS in 2003, H1N1 in 2009, and H7N9 in 2013, have become the critical threat to the public health and a hunting nightmare to the government. Understanding the propagation in large-scale metapopulations and predicting the future outbreaks thus become crucially important for epidemic control and prevention. In the literature, there have been a bulk of studies on modeling intra-city epidemic propagation but with the single population assumption (homogeneity). Some recent works on metapopulation propagation, however, focus on finding specific human mobility physical networks to approximate diseases transmission networks, whose generality to fit different diseases cannot be guaranteed. In this paper, we argue that the intra-city epidemic propagation should be modeled on a metapopulation base, and propose a two-step method for this purpose. The first step is to understand the propagation system by inferring the underlying disease infection network. To this end, we propose a novel network inference model called D 2 PRI, which reduces the individual network into a sub-population network without information loss, and incorporates the power-law distribution prior and data prior for better performance. The second step is to predict the disease propagation by extending the classic SIR model to a metapopulation SIR model that allows visitors transmission between any two sub-populations. The validity of our model is testified on a real-life clinical report data set about the airborne disease in the Shenzhen city, China. The D 2 PRI model with the extended SIR model exhibit superior performance in various tasks including network inference, infection prediction and outbreaks simulation.
【Keywords】: epidemic propagation; intracity epidemic control and prevention; metapopulation; network inference
【Paper Link】 【Pages】:839-848
【Authors】: Jizhe Wang ; Pipei Huang ; Huan Zhao ; Zhibo Zhang ; Binqiang Zhao ; Dik Lun Lee
【Abstract】: Recommender systems (RSs) have been the most important technology for increasing the business in Taobao, the largest online consumer-to-consumer (C2C) platform in China. There are three major challenges facing RS in Taobao: scalability, sparsity and cold start. In this paper, we present our technical solutions to address these three challenges. The methods are based on a well-known graph embedding framework. We first construct an item graph from users' behavior history, and learn the embeddings of all items in the graph. The item embeddings are employed to compute pairwise similarities between all items, which are then used in the recommendation process. To alleviate the sparsity and cold start problems, side information is incorporated into the graph embedding framework. We propose two aggregation methods to integrate the embeddings of items and the corresponding side information. Experimental results from offline experiments show that methods incorporating side information are superior to those that do not. Further, we describe the platform upon which the embedding methods are deployed and the workflow to process the billion-scale data in Taobao. Using A/B test, we show that the online Click-Through-Rates (CTRs) are improved comparing to the previous collaborative filtering based methods widely used in Taobao, further demonstrating the effectiveness and feasibility of our proposed methods in Taobao's live production environment.
【Keywords】: collaborative filtering; e-commerce recommendation; graph embedding; recommendation system
【Paper Link】 【Pages】:849-857
【Authors】: Yaqing Wang ; Fenglong Ma ; Zhiwei Jin ; Ye Yuan ; Guangxu Xun ; Kishlay Jha ; Lu Su ; Jing Gao
【Abstract】: As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.
【Keywords】: adversarial neural networks; deep learning; fake news detection
【Paper Link】 【Pages】:858-866
【Authors】: Zheng Wang ; Kun Fu ; Jieping Ye
【Abstract】: Vehicle travel time estimation or estimated time of arrival (ETA) is one of the most important location-based services (LBS). It is becoming increasingly important and has been widely used as a basic service in navigation systems and intelligent transportation systems. This paper presents a novel machine learning solution to predict the vehicle travel time based on floating-car data. First, we formulate ETA as a pure spatial-temporal regression problem based on a large set of effective features. Second, we adapt different existing machine learning models to solve the regression problem. Furthermore, we propose a Wide-Deep-Recurrent (WDR) learning model to accurately predict the travel time along a given route at a given departure time. We then jointly train wide linear models, deep neural networks and recurrent neural networks together to take full advantages of all three models. We evaluate our solution offline with millions of historical vehicle travel data. We also deploy the proposed solution on Didi Chuxing's platform, which services billions of ETA requests and benefits millions of customers per day. Our extensive evaluations show that our proposed deep learning algorithm significantly outperforms the state-of-the-art learning algorithms, as well as the solutions provided by leading industry LBS providers.
【Keywords】: estimated time of arrival; location-based services; wide-deep-recurrent learning
【Paper Link】 【Pages】:867-875
【Authors】: Serene W. H. Wong ; Chiara Pastrello ; Max Kotlyar ; Christos Faloutsos ; Igor Jurisica
【Abstract】: Given a large, dynamic graph, how can we trace the activities of groups of vertices over time? Given a dynamic biological graph modeling a given disease progression, which genes interact closely at the early stage of the disease, and their interactions are being disrupted in the latter stage of the disease? Which genes interact sparsely at the early stage of the disease, and their interactions increase as the disease progresses? Knowing the answers to these questions is important as they give insights to the underlying molecular mechanism to disease progression, and potential treatments that target these mechanisms can be developed. There are three main contributions to this paper. First, we designed a novel algorithm, SDREGION, that identifies subgraphs that decrease or increase in density monotonically over time, referred to as d-regions or i-regions, respectively. We introduced the objective function, -density, for identifying d-(i-)regions. Second, SDREGION is a generic algorithm, applicable across several real datasets. In this manuscript, we showed its effectiveness, and made observations in the modeling of the progression of lung cancer. In particular, we observed that SDREGION identified d-(i-)regions that capture mechanisms that align with literature. Importantly, findings that were identified but were not retrospectively validated by literature may provide novel mechanisms in tumor progression that will guide future biological experiments. Third, SDREGION is scalable with a time complexity of O(mlogn + nlogn) where m is the number of edges, and n is the number of vertices in a given dynamic graph.
【Keywords】: decreasing density subgraph detection; dynamic graphs; increasing density subgraph detection; temporal data
【Paper Link】 【Pages】:876-885
【Authors】: Yuxiang Xie ; Nanyu Chen ; Xiaolin Shi
【Abstract】: Online controlled experiments (a.k.a. A/B testing) have been used as the mantra for data-driven decision making on feature changing and product shipping in many Internet companies. However, it is still a great challenge to systematically measure how every code or feature change impacts millions of users with great heterogeneity (e.g. countries, ages, devices). The most commonly used A/B testing framework in many companies is based on Average Treatment Effect (ATE), which cannot detect the heterogeneity of treatment effect on users with different characteristics. In this paper, we propose statistical methods that can systematically and accurately identify Heterogeneous Treatment Effect (HTE) of any user cohort of interest (e.g. mobile device type, country), and determine which factors (e.g. age, gender) of users contribute to the heterogeneity of the treatment effect in an A/B test. By applying these methods on both simulation data and real-world experimentation data, we show how they work robustly with controlled low False Discover Rate (FDR), and at the same time, provides us with useful insights about the heterogeneity of identified user groups. We have deployed a toolkit based on these methods, and have used it to measure the Heterogeneous Treatment Effect of many A/B tests at Snap.
【Keywords】: a/b testing; false discovery rate; heterogeneous treatment effect; multiple testing
【Paper Link】 【Pages】:886-894
【Authors】: Xin Shen ; Hongxia Yang ; Weizhao Xian ; Martin Ester ; Jiajun Bu ; Zhongyao Wang ; Can Wang
【Abstract】: The e-commerce era is witnessing a rapid increase of mobile Internet users. Major e-commerce companies nowadays see billions of mobile accesses every day. Hidden in these records are valuable user behavioral characteristics such as their shopping preferences and browsing patterns. And, to extract these knowledge from the huge dataset, we need to first link records to the corresponding mobile devices. This Mobile Access Records Resolution (MARR) problem is confronted with two major challenges: (1) device identifiers and other attributes in access records might be missing or unreliable; (2) the dataset contains billions of access records from millions of devices. To the best of our knowledge, as a novel challenge industrial problem of mobile Internet, no existing method has been developed to resolve entities using mobile device identifiers in such a massive scale. To address these issues, we propose a SParse Identifier-linkage Graph (SPI-Graph) accompanied with the abundant mobile device profiling data to accurately match mobile access records to devices. Furthermore, two versions (unsupervised and semi-supervised) of Parallel Graph-based Record Resolution (PGRR) algorithm are developed to effectively exploit the advantages of the large-scale server clusters comprising of more than 1,000 computing nodes. We empirically show superior performances of PGRR algorithms in a very challenging and sparse real data set containing 5.28 million nodes and 31.06 million edges from 2.15 billion access records compared to other state-of-the-arts methodologies.
【Keywords】: big data; graph algorithms; mobile access record resolution; scalable algorithms
【Paper Link】 【Pages】:895-904
【Authors】: Ya Xu ; Weitao Duan ; Shaochen Huang
【Abstract】: Controlled experimentation, also called A/B testing, is widely adopted to accelerate product innovations in the online world. However, how fast we innovate can be limited by how we run experiments. Most experiments go through a "ramp up" process where we gradually increase the traffic to the new treatment to 100%. We have seen huge inefficiency and risk in how experiments are ramped, and it is getting in the way of innovation. This can go both ways: we ramp too slowly and much time and resource is wasted; or we ramp too fast and suboptimal decisions are made. In this paper, we build up a ramping framework that can effectively balance among Speed, Quality and Risk (SQR). We start out by identifying the top common mistakes experimenters make, and then introduce the four SQR principles corresponding to the four ramp phases of an experiment. To truly scale SQR to all experiments, we develop a statistical algorithm that is embedded into the process of running every experiment to automatically recommend ramp decisions. Finally, to complete the whole picture, we briefly cover the auto-ramp engineering infrastructure that can collect inputs and execute on the recommendations timely and reliably.
【Keywords】: a/b testing; causal inference; controlled experiment; experimentation; quality; ramp; risk; speed
【Paper Link】 【Pages】:905-913
【Authors】: Zhe Xu ; Zhixin Li ; Qingwen Guan ; Dingshui Zhang ; Qiang Li ; Junxiao Nan ; Chunyang Liu ; Wei Bian ; Jieping Ye
【Abstract】: We present a novel order dispatch algorithm in large-scale on-demand ride-hailing platforms. While traditional order dispatch approaches usually focus on immediate customer satisfaction, the proposed algorithm is designed to provide a more efficient way to optimize resource utilization and user experience in a global and more farsighted view. In particular, we model order dispatch as a large-scale sequential decision-making problem, where the decision of assigning an order to a driver is determined by a centralized algorithm in a coordinated way. The problem is solved in a learning and planning manner: 1) based on historical data, we first summarize demand and supply patterns into a spatiotemporal quantization, each of which indicates the expected value of a driver being in a particular state; 2) a planning step is conducted in real-time, where each driver-order-pair is valued in consideration of both immediate rewards and future gains, and then dispatch is solved using a combinatorial optimizing algorithm. Through extensive offline experiments and online AB tests, the proposed approach delivers remarkable improvement on the platform's efficiency and has been successfully deployed in the production system of Didi Chuxing.
【Keywords】: intelligent transportation system; multi-agent system; order dispatch; reinforcement learning; planning
【Paper Link】 【Pages】:914-922
【Authors】: Carl Yang ; Xiaolin Shi ; Luo Jie ; Jiawei Han
【Abstract】: As online platforms are striving to get more users, a critical challenge is user churn, which is especially concerning for new users. In this paper, by taking the anonymous large-scale real-world data from Snapchat as an example, we develop ClusChurn , a systematic two-step framework for interpretable new user clustering and churn prediction, based on the intuition that proper user clustering can help understand and predict user churn. Therefore, ClusChurn firstly groups new users into interpretable typical clusters, based on their activities on the platform and ego-network structures. Then we design a novel deep learning pipeline based on LSTM and attention to accurately predict user churn with very limited initial behavior data, by leveraging the correlations among users' multi- dimensional activities and the underlying user types. ClusChurn is also able to predict user types, which enables rapid reactions to different types of user churn. Extensive data analysis and experiments show that ClusChurn provides valuable insight into user behaviors, and achieves state-of-the-art churn prediction performance. The whole framework is deployed as a data analysis pipeline, delivering real-time data analysis and prediction results to multiple relevant teams for business intelligence uses. It is also general enough to be readily adopted by any online systems with user behavior data.
【Keywords】: churn prediction; interpretable model; user clustering
【Paper Link】 【Pages】:923-931
【Authors】: Xulei Yang ; Zeng Zeng ; Sin G. Teo ; Li Wang ; Vijay Chandrasekhar ; Steven C. H. Hoi
【Abstract】: In past years, deep convolutional neural networks (DCNN) have achieved big successes in image classification and object detection, as demonstrated on ImageNet in academic field. However, There are some unique practical challenges remain for real-world image recognition applications, e.g., small size of the objects, imbalanced data distributions, limited labeled data samples, etc. In this work, we are making efforts to deal with these challenges through a computational framework by incorporating latest developments in deep learning. In terms of two-stage detection scheme, pseudo labeling, data augmentation, cross-validation and ensemble learning, the proposed framework aims to achieve better performances for practical image recognition applications as compared to using standard deep learning methods. The proposed framework has recently been deployed as the key kernel for several image recognition competitions organized by Kaggle. The performance is promising as our final private scores were ranked 4 out of 2293 teams for fish recognition on the challenge "The Nature Conservancy Fisheries Monitoring" and 3 out of 834 teams for cervix recognition on the challenge "Intel &MobileODT Cervical Cancer Screening", and several others. We believe that by sharing the solutions, we can further promote the applications of deep learning techniques.
【Keywords】: deep learning; image classification; image recognition; objection detection
【Paper Link】 【Pages】:932-940
【Authors】: Peng Ye ; Julian Qian ; Jieying Chen ; Chen-Hung Wu ; Yitong Zhou ; Spencer De Mars ; Frank Yang ; Li Zhang
【Abstract】: This paper describes the pricing strategy model deployed at Airbnb, an online marketplace for sharing home and experience. The goal of price optimization is to help hosts who share their homes on Airbnb set the optimal price for their listings. In contrast to conventional pricing problems, where pricing strategies are applied to a large quantity of identical products, there are no "identical" products on Airbnb, because each listing on our platform offers unique values and experiences to our guests. The unique nature of Airbnb listings makes it very difficult to estimate an accurate demand curve that's required to apply conventional revenue maximization pricing strategies. Our pricing system consists of three components. First, a binary classification model predicts the booking probability of each listing-night. Second, a regression model predicts the optimal price for each listing-night, in which a customized loss function is used to guide the learning. Finally, we apply additional personalization logic on top of the output from the second model to generate the final price suggestions. In this paper, we focus on describing the regression model in the second stage of our pricing system. We also describe a novel set of metrics for offline evaluation. The proposed pricing strategy has been deployed in production to power the Price Tips and Smart Pricing tool on Airbnb. Online A/B testing results demonstrate the effectiveness of the proposed strategy model.
【Keywords】: customized regression model; dynamic pricing; price optimization
【Paper Link】 【Pages】:941-950
【Authors】: Ting Ye ; Hucheng Zhou ; Will Y. Zou ; Bin Gao ; Ruofei Zhang
【Abstract】: Relevance ranking models based on additive ensembles of regression trees have shown quite good effectiveness in web search engines. In the era of big data, tree ensemble models grow large in both tree depth and ensemble size to provide even better search relevance and user experience. However, the computational cost for their scoring process is high, such that it becomes a challenging issue to apply the big tree ensemble models in a search engine which needs to answer thousands of queries per second. Although several works have been proposed to improve the scoring process, the challenge is still great especially when the model size grows large. In this paper, we present RapidScorer , a novel framework for speeding up the scoring process of industry-scale tree ensemble models, without hurting the quality of scoring results. RapidScorer introduces a modified run length encoding called epitome to the bitvector representation of the tree nodes. Epitome can greatly reduce the computation cost to traverse the tree ensemble, and work with several other proposed strategies to maximize the compactness of data units in memory. The achieved compactness makes it possible to fully utilize data parallelization to improve model scalability. Experiments on two web search benchmarks show that, RapidScorer achieves significant speed-up over the state-of-the-art methods: V-QuickScorer , ranging from 1.3x to 3.5x; QuickScorer , ranging from 2.1x to 25.0x; VPred , ranging from 2.3x to 18.3x; and XGBoost , ranging from 2.6x to 42.5x.
【Keywords】: data parallelization; efficiency; industry-scale models; quickscorer; run length encoding; simd; tree ensemble traversal; vpred; xgboost
【Paper Link】 【Pages】:965-973
【Authors】: Xiuwen Yi ; Junbo Zhang ; Zhaoyuan Wang ; Tianrui Li ; Yu Zheng
【Abstract】: Accompanying the rapid urbanization, many developing countries are suffering from serious air pollution problem. The demand for predicting future air quality is becoming increasingly more important to government's policy-making and people's decision making. In this paper, we predict the air quality of next 48 hours for each monitoring station, considering air quality data, meteorology data, and weather forecast data. Based on the domain knowledge about air pollution, we propose a deep neural network (DNN)-based approach (entitled DeepAir), which consists of a spatial transformation component and a deep distributed fusion network. Considering air pollutants' spatial correlations, the former component converts the spatial sparse air quality data into a consistent input to simulate the pollutant sources. The latter network adopts a neural distributed architecture to fuse heterogeneous urban data for simultaneously capturing the factors affecting air quality, e.g. meteorological conditions. We deployed DeepAir in our AirPollutionPrediction system, providing fine-grained air quality forecasts for 300+ Chinese cities every hour. The experimental results on the data from three-year nine Chinese-city demonstrate the advantages of DeepAir beyond 10 baseline methods. Comparing with the previous online approach in AirPollutionPrediction system, we have 2.4%, 12.2%, 63.2% relative accuracy improvements on short-term, long-term and sudden changes prediction, respectively.
【Keywords】: air quality prediction; deep learning; urban computing
【Paper Link】 【Pages】:974-983
【Authors】: Rex Ying ; Ruining He ; Kaifeng Chen ; Pong Eksombatchai ; William L. Hamilton ; Jure Leskovec
【Abstract】: Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains an unsolved challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. Overall, we can train on and embed graphs that are four orders of magnitude larger than typical GCN implementations. We show how GCN embeddings can be used to make high-quality recommendations in various settings at Pinterest, which has a massive underlying graph with 3 billion nodes representing pins and boards, and 17 billion edges. According to offline metrics, user studies, as well as A/B tests, our approach generates higher-quality recommendations than comparable deep learning based systems. To our knowledge, this is by far the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.
【Keywords】: deep learning; graph convolutional networks; recommender systems; scalability
【Paper Link】 【Pages】:984-992
【Authors】: Zhuoning Yuan ; Xun Zhou ; Tianbao Yang
【Abstract】: Predicting traffic accidents is a crucial problem to improving transportation and public safety as well as safe routing. The problem is also challenging due to the rareness of accidents in space and time and spatial heterogeneity of the environment (e.g., urban vs. rural). Most previous research on traffic accident prediction conducted by domain researchers simply applied classical prediction models on limited data without addressing the above challenges properly, thus leading to unsatisfactory performance. A small number of recent works have attempted to use deep learning for traffic accident prediction. However, they either ignore time information or use only data from a small and homogeneous study area (a city), without handling spatial heterogeneity and temporal auto-correlation properly at the same time. In this paper we perform a comprehensive study on the traffic accident prediction problem using the Convolutional Long Short-Term Memory (ConvLSTM) neural network model. A number of detailed features such as weather, environment, road condition, and traffic volume are extracted from big datasets over the state of Iowa across 8 years. To address the spatial heterogeneity challenge in the data, we propose a Hetero-ConvLSTM framework, where a few novel ideas are implemented on top of the basic ConvLSTM model, such as incorporating spatial graph features and spatial model ensemble. Extensive experiments on the 8-year data over the entire state of Iowa show that the proposed framework makes reasonably accurate predictions and significantly improves the prediction accuracy over baseline approaches.
【Keywords】: convolutional lstm; deep learning; spatial heterogeneity; traffic accident prediction
【Paper Link】 【Pages】:993-1001
【Authors】: Yanhao Zhang ; Pan Pan ; Yun Zheng ; Kang Zhao ; Yingya Zhang ; Xiaofeng Ren ; Rong Jin
【Abstract】: This paper introduces the large scale visual search algorithm and system infrastructure at Alibaba. The following challenges are discussed under the E-commercial circumstance at Alibaba (a) how to handle heterogeneous image data and bridge the gap between real-shot images from user query and the online images. (b) how to deal with large scale indexing for massive updating data. (c) how to train deep models for effective feature representation without huge human annotations. (d) how to improve the user engagement by considering the quality of the content. We take advantage of large image collection of Alibaba and state-of-the-art deep learning techniques to perform visual search at scale. We present solutions and implementation details to overcome those problems and also share our learnings from building such a large scale commercial visual search engine. Specifically, model and search-based fusion approach is introduced to effectively predict categories. Also, we propose a deep CNN model for joint detection and feature learning by mining user click behavior. The binary index engine is designed to scale up indexing without compromising recall and precision. Finally, we apply all the stages into an end-to-end system architecture, which can simultaneously achieve highly efficient and scalable performance adapting to real-shot images. Extensive experiments demonstrate the advancement of each module in our system. We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.
【Keywords】: deep learning; detection and recognition; visual search
【Paper Link】 【Pages】:1002-1011
【Authors】: Yutao Zhang ; Fanjin Zhang ; Peiran Yao ; Jie Tang
【Abstract】: AMiner 1 is a free online academic search and mining system, having collected more than 130,000,000 researcher profiles and over 200,000,000 papers from multiple publication databases [25]. In this paper, we present the implementation and deployment of name disambiguation , a core component in AMiner. The problem has been studied for decades but remains largely unsolved. In AMiner, we did a systemic investigation into the problem and propose a comprehensive framework to address the problem. We propose a novel representation learning method by incorporating both global and local information and present an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method. To improve accuracy, we involve human annotators into the disambiguation process. We carefully evaluate the proposed framework on real-world large data and experimental results show that the proposed solution achieves clearly better performance (+7-35% in terms of F1-score) than several state-of-the-art methods including GHOST [5], Zhang et al. [33], and Louppe et al. [17]. Finally, the algorithm has been deployed in AMiner to deal with the disambiguation problem at the billion scale, which further demonstrates both effectiveness and efficiency of the proposed framework.
【Keywords】: clustering; entity resolution; metric learning; name disambiguation
【Paper Link】 【Pages】:1012-1020
【Authors】: Bo Zhao ; Koichiro Narita ; Burkay Orten ; John Egan
【Abstract】: Notifications (including emails, mobile / desktop push notifications, SMS, etc.) are very effective channels for online services to engage with users and drive user engagement metrics and other business metrics. One of the most important and challenging problems in a production notification system is to decide the right frequency for each user. In this paper, we propose a novel machine learning approach to decide notification volume for each user such that long term user engagement is optimized. We will also discuss a few practical issues and design choices we have made. The new system has been deployed to production at Pinterest in mid 2017 and significantly reduced notification volume and improved CTR of notifications and site engagement metrics compared with the previous machine learning approach.
【Keywords】: machine learning; notification volume optimization
【Paper Link】 【Pages】:1021-1030
【Authors】: Jun Zhao ; Guang Qiu ; Ziyu Guan ; Wei Zhao ; Xiaofei He
【Abstract】: Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing'' problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.
【Keywords】: bidding optimization; reinforcement learning; sponsored search auction
【Paper Link】 【Pages】:1031-1039
【Authors】: Kui Zhao ; Yuechuan Li ; Zhaoqian Shuai ; Cheng Yang
【Abstract】: Many machine intelligence techniques are developed in E-commerce and one of the most essential components is the representation of IDs, including user ID, item ID, product ID, store ID, brand ID, category ID etc. The classical encoding based methods (like one-hot encoding) are inefficient in that it suffers sparsity problems due to its high dimension, and it cannot reflect the relationships among IDs, either homogeneous or heterogeneous ones. In this paper, we propose an embedding based framework to learn and transfer the representation of IDs. As the implicit feedbacks of users, a tremendous amount of item ID sequences can be easily collected from the interactive sessions. By jointly using these informative sequences and the structural connections among IDs, all types of IDs can be embedded into one low-dimensional semantic space. Subsequently, the learned representations are utilized and transferred in four scenarios: (i) measuring the similarity between items, (ii) transferring from seen items to unseen items, (iii) transferring across different domains, (iv) transferring across different tasks. We deploy and evaluate the proposed approach in Hema App and the results validate its effectiveness.
【Keywords】: e-commerce; ids embedding; neural networks; representation learning
【Paper Link】 【Pages】:1040-1048
【Authors】: Xiangyu Zhao ; Liang Zhang ; Zhuoye Ding ; Long Xia ; Jiliang Tang ; Dawei Yin
【Abstract】: Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Users' feedback can be positive and negative and both types of feedback have great potentials to boost recommendations. However, the number of negative feedback is much larger than that of positive one; thus incorporating them simultaneously is challenging since positive feedback could be buried by negative one. In this paper, we develop a novel approach to incorporate them into the proposed deep recommender system (DEERS) framework. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of both positive and negative feedback in recommendations.
【Keywords】: deep reinforcement learning; pairwise deep Q-network; recommender system
【Paper Link】 【Pages】:1049-1058
【Authors】: Guineng Zheng ; Subhabrata Mukherjee ; Xin Luna Dong ; Feifei Li
【Abstract】: Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new attribute values that we have never seen before? Can we do this with limited human annotation or supervision? We study this problem in the context of product catalogs that often have missing values for many attributes of interest. In this work, we leverage product profile information such as titles and descriptions to discover missing values of product attributes. We develop a novel deep tagging model OpenTag for this extraction problem with the following contributions: (1) we formalize the problem as a sequence tagging task, and propose a joint model exploiting recurrent neural networks (specifically, bidirectional LSTM) to capture context and semantics, and Conditional Random Fields (CRF) to enforce tagging consistency; (2) we develop a novel attention mechanism to provide interpretable explanation for our model's decisions; (3) we propose a novel sampling strategy exploring active learning to reduce the burden of human annotation. OpenTag does not use any dictionary or hand-crafted features as in prior works. Extensive experiments in real-life datasets in different domains show that OpenTag with our active learning strategy discovers new attribute values from as few as 150 annotated samples (reduction in 3.3x amount of annotation effort) with a high F-score of 83%, outperforming state-of-the-art models.
【Keywords】: active learning; attention mechanism; deep sequence tagging; imputation; neural networks; open extraction
【Paper Link】 【Pages】:1059-1068
【Authors】: Guorui Zhou ; Xiaoqiang Zhu ; Chengru Song ; Ying Fan ; Han Zhu ; Xiao Ma ; Yanghui Yan ; Junqi Jin ; Han Li ; Kun Gai
【Abstract】: Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.
【Keywords】: click-through rate prediction; display advertising; e-commerce
【Paper Link】 【Pages】:1069-1078
【Authors】: Xiao Zhou ; Anastasios Noulas ; Cecilia Mascolo ; Zhongxiang Zhao
【Abstract】: Cultural activity is an inherent aspect of urban life and the success of a modern city is largely determined by its capacity to offer generous cultural entertainment to its citizens. To this end, the optimal allocation of cultural establishments and related resources across urban regions becomes of vital importance, as it can reduce financial costs in terms of planning and improve quality of life in the city, more generally. In this paper, we make use of a large longitudinal dataset of user location check-ins from the online social network WeChat to develop a data-driven framework for cultural planning in the city of Beijing. We exploit rich spatio-temporal representations on user activity at cultural venues and use a novel extended version of the traditional latent Dirichlet allocation model that incorporates temporal information to identify latent patterns of urban cultural interactions. Using the characteristic typologies of mobile user cultural activities emitted by the model, we determine the levels of demand for different types of cultural resources across urban areas. We then compare those with the corresponding levels of supply as driven by the presence and spatial reach of cultural venues in local areas to obtain high resolution maps that indicate urban regions with lack of cultural resources, and thus give suggestions for further urban cultural planning and investment optimisation.
【Keywords】: pattern mining; spatial accessibility; spatio-temporal analysis; topic modeling; urban computing
【Paper Link】 【Pages】:1079-1088
【Authors】: Han Zhu ; Xiang Li ; Pengye Zhang ; Guozheng Li ; Jie He ; Han Li ; Kun Gai
【Abstract】: Model-based methods for recommender systems have been studied extensively in recent years. In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult. To overcome the calculation barriers, models such as matrix factorization resort to inner product form (i.e., model user-item preference as the inner product of user, item latent factors) and indexes to facilitate efficient approximate k-nearest neighbor searches. However, it still remains challenging to incorporate more expressive interaction forms between user and item features, e.g., interactions through deep neural networks, because of the calculation cost. In this paper, we focus on the problem of introducing arbitrary advanced models to recommender systems with large corpus. We propose a novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks. Our main idea is to predict user interests from coarse to fine by traversing tree nodes in a top-down fashion and making decisions for each user-node pair. We also show that the tree structure can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction. Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods. Online A/B test results in Taobao display advertising platform also demonstrate the effectiveness of the proposed method in production environments.
【Keywords】: implicit feedback; recommender systems; tree-based learning
【Paper Link】 【Pages】:1089-1098
【Authors】: Rediet Abebe ; Jon M. Kleinberg ; David C. Parkes ; Charalampos E. Tsourakakis
【Abstract】: A long line of work in social psychology has studied variations in people's susceptibility to persuasion -- the extent to which they are willing to modify their opinions on a topic. This body of literature suggests an interesting perspective on theoretical models of opinion formation on social networks: in addition to considering interventions that directly modify people's intrinsic opinions, it is also natural to consider those that modify people's susceptibility to persuasion. Here, we adopt a popular model for social opinion dynamics, and formalize the opinion maximization and minimization problems where interventions happen at the level of susceptibility. We show that modeling interventions at the level of susceptibility leads to an interesting family of new questions in network opinion dynamics. We find that the questions are quite different depending on whether there is an overall budget constraining the number of agents we can target or not. We give a polynomial-time algorithm for finding the optimal target-set to optimize the sum of opinions when there are no budget constraints on the size of the target-set. We show that this problem is NP-hard when there is a budget, and that the objective function is neither submodular nor supermodular. Finally, we propose a heuristic for the budgeted opinion optimization problem and show its efficacy at finding target-sets that optimize the sum of opinions on real world networks, including a Twitter network with real opinion estimates.
【Keywords】: influence models; opinion dynamics; social networks
【Paper Link】 【Pages】:1099-1108
【Authors】: Ayan Acharya ; Joydeep Ghosh ; Mingyuan Zhou
【Abstract】: The abundance of digital text has led to extensive research on topic models that reason about documents using latent representations. Since for many online or streaming textual sources such as news outlets, the number, and nature of topics change over time, there have been several efforts that attempt to address such situations using dynamic versions of topic models. Unfortunately, existing approaches encounter more complex inferencing when their model parameters are varied over time, resulting in high computation complexity and performance degradation. This paper introduces the DM-DTM, a dual Markov chain dynamic topic model, for characterizing a corpus that evolves over time. This model uses a gamma Markov chain and a Dirichlet Markov chain to allow the topic popularities and word-topic assignments, respectively, to vary smoothly over time. Novel applications of the Negative-Binomial augmentation trick result in simple, efficient, closed-form updates of all the required conditional posteriors, resulting in far lower computational requirements as well as less sensitivity to initial conditions, as compared to existing approaches. Moreover, via a gamma process prior, the number of desired topics is inferred directly from the data rather than being pre-specified and can vary as the data changes. Empirical comparisons using multiple real-world corpora demonstrate a clear superiority of DM-DTM over strong baselines for both static and dynamic topic models.
【Keywords】: CRT augmentation; dynamic topic model; gibbs sampling
【Paper Link】 【Pages】:1109-1118
【Authors】: Aris Anagnostopoulos ; Carlos Castillo ; Adriano Fazzone ; Stefano Leonardi ; Evimaria Terzi
【Abstract】: Although freelancing work has grown substantially in recent years, in part facilitated by a number of online labor marketplaces, %(e.g., Guru, Freelancer, Amazon Mechanical Turk), traditional forms of "in-sourcing" work continue being the dominant form of employment. % in most companies. This means that, at least for the time being, freelancing and salaried employment will continue to co-exist. In this paper, we provide algorithms for outsourcing and hiring workers in a general setting, where workers form a team and contribute different skills to perform a task. We call this model team formation with outsourcing. In our model, tasks arrive in an online fashion: neither the number nor the composition of the tasks are known a-priori. At any point in time, there is a team of hired workers who receive a fixed salary independently of the work they perform. This team is dynamic: new members can be hired and existing members can be fired, at some cost. Additionally, some parts of the arriving tasks can be outsourced and thus completed by non-team members, at a premium. Our contribution is an efficient online cost-minimizing algorithm for hiring and firing team members and outsourcing tasks. We present theoretical bounds obtained using a primal--dual scheme proving that our algorithms have logarithmic competitive approximation ratio. We complement these results with experiments using semi-synthetic datasets based on actual task requirements and worker skills from three large online labor marketplaces.
【Keywords】: online primal-dual; team formation
【Paper Link】 【Pages】:1119-1127
【Authors】: Olivier Bachem ; Mario Lucic ; Andreas Krause
【Abstract】: \emphCoresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of lightweight coresets that allows for both multiplicative and additive errors. We provide a single algorithm to construct lightweight coresets for k -means clustering as well as soft and hard Bregman clustering. The algorithm is substantially faster than existing constructions, embarrassingly parallel, and the resulting coresets are smaller. We further show that the proposed approach naturally generalizes to statistical k -means clustering and that, compared to existing results, it can be used to compute smaller summaries for empirical risk minimization. In extensive experiments, we demonstrate that the proposed algorithm outperforms existing data summarization strategies in practice.
【Keywords】: big data; clustering; coresets; distributed algorithms
【Paper Link】 【Pages】:1128-1137
【Authors】: Zilong Bai ; Buyue Qian ; Ian Davidson
【Abstract】: Block models of graphs are used in a wide variety of domains as they find not only clusters (the blocks) but also interaction within and between the blocks. However, existing approaches primarily focus on either structural graphs (i.e. for MRI scans) or behavioral graphs (i.e. for fMRI scans). In both cases the block model's interaction or mixing matrix can be useful for understanding potential interaction (for structural graphs) and actual interaction (for behavioral graphs) between the blocks. In this paper we explore finding block models where there is both a structural network and multiple behavioral graphs. This provides significant modeling challenges, consider if there is strong behavioral connectivity but no structural connectivity between two nodes. We show why existing multi-graph settings such as multi-view learning are insufficient and instead propose a novel model to address the problem. Our method not only learns structurally and behaviorally cohesive blocks of nodes but also finds structurally and behaviorally feasible block interactions. We show in numerical evaluations on synthetic data that our method outperforms baseline approaches in recovering the ground-truth factor matrices in increasingly complex situations. We further apply our method to real-world datasets from two different domains (1) brain imaging data (a multi-cohort fMRI study) and to show its versatility (2) Twitter (following network and retweet behavior) and gain insights into the information flow and underlying generating mechanisms of these complex data.
【Keywords】: block modeling; brain imaging data; model discovery; structural-behavioral networks
【Paper Link】 【Pages】:1138-1147
【Authors】: MohammadHossein Bateni ; Hossein Esfandiari ; Vahab S. Mirrokni
【Abstract】: We present distributed algorithms for several classes of submodular optimization problems such as k-cover, set cover, facility location, and probabilistic coverage. The new algorithms enjoy almost optimal space complexity, optimal approximation guarantees, optimal communication complexity (and run in only four rounds of computation), addressing major shortcomings of prior work. We first present a distributed algorithm for k-cover using only Õ(n) space per machine, and then extend it to several submodular optimization problems, improving previous results for all the above problems-e.g., our algorithm for facility location problem improves the space of the best-known algorithm (Lindgren et al.). Our algorithms are implementable in various distributed frameworks such as MapReduce and RAM models. On the hardness side, we demonstrate the limitations of uniform sampling via an information theoretic argument. Furthermore, we perform an extensive empirical study of our algorithms (implemented in MapReduce) on a variety of datasets. We observe that using sketches 30-600 times smaller than the input, one can solve the coverage maximization problem with quality very close to that of the state-of-the-art single machine algorithm. Finally, we demonstrate an application of our algorithm in large-scale feature selection
【Keywords】: distributed algorithm; dominating set; set cover; submodular maximization
【Paper Link】 【Pages】:1148-1157
【Authors】: Austin R. Benson ; Ravi Kumar ; Andrew Tomkins
【Abstract】: Sequential behavior such as sending emails, gathering in groups, tagging posts, or authoring academic papers may be characterized by a set of recipients, attendees, tags, or coauthors respectively. Such "sequences of sets" show complex repetition behavior, sometimes repeating prior sets wholesale, and sometimes creating new sets from partial copies or partial merges of earlier sets. In this paper, we provide a stochastic model to capture these patterns. The model has two classes of parameters. First, a correlation parameter determines how much of an earlier set will contribute to a future set. Second, a vector of recency parameters captures the fact that a set in a sequence is more similar to recent sets than more distant ones. Comparing against a strong baseline, we find that modeling both correlation and recency structures are required for high accuracy. We also find that both parameter classes vary widely across domains, so must be optimized on a per-dataset basis. We present the model in detail, provide a theoretical examination of its asymptotic behavior, and perform a set of detailed experiments on its predictive performance.
【Keywords】: repeat consumption; sequences; sets
【Paper Link】 【Pages】:1158-1166
【Authors】: Lei Cai ; Zhengyang Wang ; Hongyang Gao ; Dinggang Shen ; Shuiwang Ji
【Abstract】: Multi-modality data are widely used in clinical applications, such as tumor detection and brain disease diagnosis. Different modalities can usually provide complementary information, which commonly leads to improved performance. However, some modalities are commonly missing for some subjects due to various technical and practical reasons. As a result, multi-modality data are usually incomplete, raising the multi-modality missing data completion problem. In this work, we formulate the problem as a conditional image generation task and propose an encoder-decoder deep neural network to tackle this problem. Specifically, the model takes the existing modality as input and generates the missing modality. By employing an auxiliary adversarial loss, our model is able to generate high-quality missing modality images. At the same time, we propose to incorporate the available category information of subjects in training to enable the model to generate more informative images. We evaluate our method on the Alzheimer's Disease Neuroimaging Initiative~(ADNI) database, where positron emission tomography~(PET) modalities are missing. Experimental results show that the trained network can generate high-quality PET modalities based on existing magnetic resonance imaging~(MRI) modalities, and provide complementary information to improve the detection and tracking of the Alzheimer's disease. Our results also show that the proposed methods generate higher quality images than baseline methods as measured by various image quality statistics.
【Keywords】: adversarial loss function; deep learning; disease diagnosis; missing data completion
【Paper Link】 【Pages】:1167-1176
【Authors】: Chen Chen ; Ruiyue Peng ; Lei Ying ; Hanghang Tong
【Abstract】: Network connectivity optimization, which aims to manipulate network connectivity by changing its underlying topology, is a fundamental task behind a wealth of high-impact data mining applications, ranging from immunization, critical infrastructure construction, social collaboration mining, bioinformatics analysis, to intelligent transportation system design. To tackle its exponential computation complexity, greedy algorithms have been extensively used for network connectivity optimization by exploiting its diminishing returns property. Despite the empirical success, two key challenges largely remain open. First, on the theoretic side, the hardness, as well as the approximability of the general network connectivity optimization problem are still nascent except for a few special instances. Second, on the algorithmic side, current algorithms are often hard to balance between the optimization quality and the computational efficiency. In this paper, we systematically address these two challenges for the network connectivity optimization problem. First, we reveal some fundamental limits by proving that, for a wide range of network connectivity optimization problems, (1) they are NP-hard and (2) (1-1/e) is the optimal approximation ratio for any polynomial algorithms. Second, we propose an effective, scalable and general algorithm (CONTAIN) to carefully balance the optimization quality and the computational efficiency.
【Keywords】: graph mining; network connectivity
【Paper Link】 【Pages】:1177-1186
【Authors】: Hongxu Chen ; Hongzhi Yin ; Weiqing Wang ; Hao Wang ; Quoc Viet Hung Nguyen ; Xue Li
【Abstract】: Heterogenous information network embedding aims to embed heterogenous information networks (HINs) into low dimensional spaces, in which each vertex is represented as a low-dimensional vector, and both global and local network structures in the original space are preserved. However, most of existing heterogenous information network embedding models adopt the dot product to measure the proximity in the low dimensional space, and thus they can only preserve the first-order proximity and are insufficient to capture the global structure. Compared with homogenous information networks, there are multiple types of links (i.e., multiple relations) in HINs, and the link distribution w.r.t relations is highly skewed. To address the above challenging issues, we propose a novel heterogenous information network embedding model PME based on the metric learning to capture both first-order and second-order proximities in a unified way. To alleviate the potential geometrical inflexibility of existing metric learning approaches, we propose to build object and relation embeddings in separate object space and relation spaces rather than in a common space. Afterwards, we learn embeddings by firstly projecting vertices from object space to corresponding relation space and then calculate the proximity between projected vertices. To overcome the heavy skewness of the link distribution w.r.t relations and avoid "over-sampling'' or "under-sampling'' for each relation, we propose a novel loss-aware adaptive sampling approach for the model optimization. Extensive experiments have been conducted on a large-scale HIN dataset, and the experimental results show superiority of our proposed PME model in terms of prediction accuracy and scalability.
【Keywords】: heterogenous network embedding; link prediction
【Paper Link】 【Pages】:1187-1196
【Authors】: Shi-Yong Chen ; Yang Yu ; Qing Da ; Jun Tan ; Hai-Kuan Huang ; Hai-Hong Tang
【Abstract】: Deep reinforcement learning has shown great potential in improving system performance autonomously, by learning from iterations with the environment. However, traditional reinforcement learning approaches are designed to work in static environments. In many real-world problems, the environments are commonly dynamic, in which the performance of reinforcement learning approaches can degrade drastically. A direct cause of the performance degradation is the high-variance and biased estimation of the reward, due to the distribution shifting in dynamic environments. In this paper, we propose two techniques to alleviate the unstable reward estimation problem in dynamic environments, the stratified sampling replay strategy and the approximate regretted reward, which address the problem from the sample aspect and the reward aspect, respectively. Integrating the two techniques with Double DQN, we propose the Robust DQN method. We apply Robust DQN in the tip recommendation system in Taobao online retail trading platform. We firstly disclose the highly dynamic property of the recommendation application. We then carried out online A/B test to examine Robust DQN. The results show that Robust DQN can effectively stabilize the value estimation and, therefore, improves the performance in this real-world dynamic environment.
【Keywords】: approximate regretted reward; dynamic environment; recommendation; reinforcement learning; stratified sampling replay
【Paper Link】 【Pages】:1197-1205
【Authors】: Xi Chen ; Jefrey Lijffijt ; Tijl De Bie
【Abstract】: Controversy, disagreement, conflict, polarization and opinion divergence in social networks have been the subject of much recent research. In particular, researchers have addressed the question of how such concepts can be quantified given people's prior opinions, and how they can be optimized by influencing the opinion of a small number of people or by editing the network's connectivity. Here, rather than optimizing such concepts given a specific set of prior opinions, we study whether they can be optimized in the average case and in the worst case over all sets of prior opinions. In particular, we derive the worst-case and average-case conflict risk of networks, and we propose algorithms for optimizing these. For some measures of conflict, these are non-convex optimization problems with many local minima. We provide a theoretical and empirical analysis of the nature of some of these local minima, and show how they are related to existing organizational structures. Empirical results show how a small number of edits quickly decreases its conflict risk, both average-case and worst-case. Furthermore, it shows that minimizing average-case conflict risk often does not reduce worst-case conflict risk. Minimizing worst-case conflict risk on the other hand, while computationally more challenging, is generally effective at minimizing both worst-case as well as average-case conflict risk.
【Keywords】: conflict; controversy; disagreement measures; social networks
【Paper Link】 【Pages】:1206-1215
【Authors】: Xiaojun Chen ; Weijun Hong ; Feiping Nie ; Dan He ; Min Yang ; Joshua Zhexue Huang
【Abstract】: During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.
【Keywords】: clustering; large-scale data; normalized cut
【Paper Link】 【Pages】:1216-1225
【Authors】: Yihong Chen ; Bei Chen ; Xuguang Duan ; Jian-Guang Lou ; Yue Wang ; Wenwu Zhu ; Yong Cao
【Abstract】: Almost all the knowledge empowered applications rely upon accurate knowledge, which has to be either collected manually with high cost, or extracted automatically with unignorable errors. In this paper, we study 20 Questions, an online interactive game where each question-response pair corresponds to a fact of the target entity, to acquire highly accurate knowledge effectively with nearly zero labor cost. Knowledge acquisition via 20 Questions predominantly presents two challenges to the intelligent agent playing games with human players. The first one is to seek enough information and identify the target entity with as few questions as possible, while the second one is to leverage the remaining questioning opportunities to acquire valuable knowledge effectively, both of which count on good questioning strategies. To address these challenges, we propose the Learning-to-Ask (LA) framework, within which the agent learns smart questioning strategies for information seeking and knowledge acquisition by means of deep reinforcement learning and generalized matrix factorization respectively. In addition, a Bayesian approach to represent knowledge is adopted to ensure robustness to noisy user responses. Simulating experiments on real data show that LA is able to equip the agent with effective questioning strategies, which result in high winning rates and rapid knowledge acquisition. Moreover, the questioning strategies for information seeking and knowledge acquisition boost the performance of each other, allowing the agent to start with a relatively small knowledge set and quickly improve its knowledge base in the absence of constant human supervision.
【Keywords】: 20 questions; generalized matrix factorization; information seeking; knowledge acquisition; reinforcement learning
【Paper Link】 【Pages】:1226-1234
【Authors】: Yongjun Chen ; Hongyang Gao ; Lei Cai ; Min Shi ; Dinggang Shen ; Shuiwang Ji
【Abstract】: Deep learning methods have shown great success in pixel-wise prediction tasks. One of the most popular methods employs an encoder-decoder network in which deconvolutional layers are used for up-sampling feature maps. However, a key limitation of the deconvolutional layer is that it suffers from the checkerboard artifact problem, which harms the prediction accuracy. This is caused by the independency among adjacent pixels on the output feature maps. Previous work only solved the checkerboard artifact issue of deconvolutional layers in the 2D space. Since the number of intermediate feature maps needed to generate a deconvolutional layer grows exponentially with dimensionality, it is more challenging to solve this issue in higher dimensions. In this work, we propose the voxel deconvolutional layer (VoxelDCL) to solve the checkerboard artifact problem of deconvolutional layers in 3D space. We also provide an efficient approach to implement VoxelDCL. To demonstrate the effectiveness of VoxelDCL, we build four variations of voxel deconvolutional networks (VoxelDCN) based on the U-Net architecture with VoxelDCL. We apply our networks to address volumetric brain images labeling tasks using the ADNI and LONI LPBA40 datasets. The experimental results show that the proposed iVoxelDCNa achieves improved performance in all experiments. It reaches 83.34% in terms of dice ratio on the ADNI dataset and 79.12% on the LONI LPBA40 dataset, which increases 1.39% and 2.21% respectively compared with the baseline. In addition, all the variations of VoxelDCN we proposed outperform the baseline methods on the above datasets, which demonstrates the effectiveness of our methods.
【Keywords】: deep learning; volumetric brain image labeling; voxel deconvolutional layer; voxel deconvolutional networks
【Paper Link】 【Pages】:1235-1243
【Authors】: Evangelia Christakopoulou ; George Karypis
【Abstract】: Users' behaviors are driven by their preferences across various aspects of items they are potentially interested in purchasing, viewing, etc. Latent space approaches model these aspects in the form of latent factors. Although such approaches have been shown to lead to good results, the aspects that are important to different users can vary. In many domains, there may be a set of aspects for which all users care about and a set of aspects that are specific to different subsets of users. To explicitly capture this, we consider models in which there are some latent factors that capture the shared aspects and some user subset specific latent factors that capture the set of aspects that the different subsets of users care about. In particular, we propose two latent space models: rGLSVD and sGLSVD, that combine such a global and user subset specific sets of latent factors. The rGLSVD model assigns the users into different subsets based on their rating patterns and then estimates a global and a set of user subset specific local models whose number of latent dimensions can vary. The sGLSVD model estimates both global and user subset specific local models by keeping the number of latent dimensions the same among these models but optimizes the grouping of the users in order to achieve the best approximation. Our experiments on various real-world datasets show that the proposed approaches significantly outperform state-of-the-art latent space top-N recommendation approaches.
【Keywords】: clustering; collaborative filtering; latent space models; local models
【Paper Link】 【Pages】:1244-1253
【Authors】: Lingyang Chu ; Xia Hu ; Juhua Hu ; Lanjun Wang ; Jian Pei
【Abstract】: Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretations. In this paper, we propose an elegant closed form solution named $OpenBox$ to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply $OpenBox$ to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation.
【Keywords】: closed form; deep neural network; exact and consistent interpretation
【Paper Link】 【Pages】:1254-1262
【Authors】: Adam D. Cobb ; Richard Everett ; Andrew Markham ; Stephen J. Roberts
【Abstract】: In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system's surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.
【Keywords】: animal tracking; gaussian processes; gps data; multi-agent systems; potential fields
【Paper Link】 【Pages】:1263-1271
【Authors】: David Cohen-Steiner ; Weihao Kong ; Christian Sohler ; Gregory Valiant
【Abstract】: The spectrum of a network or graph $G=(V,E)$ with adjacency matrix A , consists of the eigenvalues of the normalized Laplacian $L= I - D^-1/2 A D^-1/2 $. This set of eigenvalues encapsulates many aspects of the structure of the graph, including the extent to which the graph posses community structures at multiple scales. We study the problem of approximating the spectrum, $łambda = (łambda1,\dots,łambda|V| )$, of G in the regime where the graph is too large to explicitly calculate the spectrum. We present a sublinear time algorithm that, given the ability to query a random node in the graph and select a random neighbor of a given node, computes a succinct representation of an approximation $\widetilde łambda = (\widetilde łambda1,\dots,\widetilde łambda|V| )$, such that $|\widetilde łambda - łambda|_1 łe ε |V|$. Our algorithm has query complexity and running time $exp(O(1/\eps))$, which is independent of the size of the graph, $|V|$. We demonstrate the practical viability of our algorithm on synthetically generated graphs, and on 15 different real-world graphs from the Stanford Large Network Dataset Collection, including social networks, academic collaboration graphs, and road networks. For the smallest of these graphs, we are able to validate the accuracy of our algorithm by explicitly calculating the true spectrum; for the larger graphs, such a calculation is computationally prohibitive. The spectra of these real-world networks reveal insights into the structural similarities and differences between them, illustrating the potential value of our algorithm for efficiently approximating the spectrum of large large networks.
【Keywords】: method of moments; random walks; spectral graph theory; sublinear algorithms
【Paper Link】 【Pages】:1272-1281
【Authors】: Alessio Conte ; Tiziano De Matteis ; Daniele De Sensi ; Roberto Grossi ; Andrea Marino ; Luca Versari
【Abstract】: This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.
【Keywords】: community discovery; graph enumeration; k-plexes; parallel programming
【Paper Link】 【Pages】:1282-1291
【Authors】: Alessio Conte ; Gaspare Ferraro ; Roberto Grossi ; Andrea Marino ; Kunihiko Sadakane ; Takeaki Uno
【Abstract】: We study node similarity in labeled networks, using the label sequences found in paths of bounded length q leading to the nodes. (This recalls the q-grams employed in document resemblance, based on the Jaccard distance.) When applied to networks, the challenge is two-fold: the number of q-grams generated from labeled paths grows exponentially with q, and their frequency should be taken into account: this leads to a variation of the Jaccard index known as Bray-Curtis index for multisets. We describe nSimGram, a suite of fast algorithms for node similarity with q-grams, based on a novel blend of color coding, probabilistic counting, sketches, and string algorithms, where the universe of elements to sample is exponential. We provide experimental evidence that our measure is effective and our running times scale to deal with large real-world networks.
【Keywords】: approximation; color coding; graph algorithms; node similarity
【Paper Link】 【Pages】:1292-1300
【Authors】: Yogesh Dahiya ; Dimitris Konomis ; David P. Woodruff
【Abstract】: Over the last ten years, tremendous speedups for problems in randomized numerical linear algebra such as low rank approximation and regression have been made possible via the technique of randomized data dimensionality reduction, also known as sketching. In theory, such algorithms have led to optimal input sparsity time algorithms for a wide array of problems. While several scattered implementations of such methods exist, the goal of this work is to provide a comprehensive comparison of such methods to alternative approaches. We investigate least squares regression, iteratively reweighted least squares, logistic regression, robust regression with Huber and Bisquare loss functions, leverage score computation, Frobenius norm low rank approximation, and entrywise $\ell_1$-low rank approximation. We give various implementation techniques to speed up several of these algorithms, and the resulting implementations demonstrate the tradeoffs of such techniques in practice.
【Keywords】: logistic regression; low rank approximation; regression; robust methods; sketching
【Paper Link】 【Pages】:1301-1309
【Authors】: Shimin Di ; Jingshu Peng ; Yanyan Shen ; Lei Chen
【Abstract】: Transfer learning has gained increasing attention due to the inferior performance of machine learning algorithms with insufficient training data. Most of the previous homogeneous or heterogeneous transfer learning works aim to learn a mapping function between feature spaces based on the inherent correspondence across the source and target domains or labeled instances. However, in many real world applications, existing methods may not be robust when the correspondence across domains is noisy or labeled instances are not representative. In this paper, we develop a novel transfer learning framework called Transfer Learning via Feature Isomorphism Discovery (abbreviated to TLFid), which owns high tolerance for noisy correspondence between domains as well as scarce or non-existing labeled instances. More specifically, we propose a feature isomorphism approach to discovering common substructures across feature spaces and learning a feature mapping function from the target domain to the source domain. We evaluate the performance of TLFid on the cross-lingual sentiment classification tasks. The results show that our method achieves significant improvement in terms of accuracy compared with the state-of-the-art methods.
【Keywords】: cross-lingual; subgraph isomorphism; transfer learning
【Paper Link】 【Pages】:1310-1319
【Authors】: Yi Ding ; Weiqing Liu ; Jiang Bian ; Daoqiang Zhang ; Tie-Yan Liu
【Abstract】: Stock trading is a popular investment approach in real world. However, since lacking enough domain knowledge and experience, it is very difficult for common investors to analyze thousands of stocks manually. Algorithmic investment provides another rational way to formulate human knowledge as a trading agent. However, it still requires well-built knowledge and experience to design effective trading algorithms in such a volatile market. Fortunately, various kinds of historical trading records are easy to obtain in this big-data era, it is invaluable of us to extract the trading knowledge hidden in the data to help people make better decisions. In this paper, we propose a reinforcement learning driven Investor-Imitator framework to formalize the trading knowledge, by imitating an investor's behavior with a set of logic descriptors. In particular, to instantiate specific logic descriptors, we introduce the Rank-Invest model that can keep the diversity of logic descriptors by learning to optimize different evaluation metrics. In the experiment, we first simulate three types of investors, representing different degrees of information disclosure we may meet in real market. By learning towards these investors, we can tell the inherent trading logic of the target investor with the Investor-Imitator empirically, and the extracted interpretable knowledge can help us better understand and construct trading portfolios. Experimental results in this paper sufficiently demonstrate the designed purpose of Investor-Imitator, it makes the Investor-Imitator an applicable and meaningful intelligent trading framework in financial investment research.
【Keywords】: data mining; finance; financial investment; reinforcement learning
【Paper Link】 【Pages】:1320-1329
【Authors】: Claire Donnat ; Marinka Zitnik ; David Hallac ; Jure Leskovec
【Abstract】: Nodes residing in different parts of a graph can have similar structural roles within their local network topology. The identification of such roles provides key insight into the organization of networks and can be used for a variety of machine learning tasks. However, learning structural representations of nodes is a challenging problem, and it has typically involved manually specifying and tailoring topological features for each node. In this paper, we develop GraphWave, a method that represents each node's network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns. Instead of training on hand-selected features, GraphWave learns these embeddings in an unsupervised way. We mathematically prove that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and our method scales linearly with the number of edges. Experiments in a variety of different settings demonstrate GraphWave's real-world potential for capturing structural roles in networks, and our approach outperforms existing state-of-the-art baselines in every experiment, by as much as 137%.
【Keywords】: graph signal processing; graphs; node embeddings; representation learning; structural roles; structural similarity; unsupervised learning
【Paper Link】 【Pages】:1330-1338
【Authors】: Bowen Du ; Yongxin Tong ; Zimu Zhou ; Qian Tao ; Wenjun Zhou
【Abstract】: Cars of the future have been predicted as shared and electric. There has been a rapid growth in electric vehicle (EV) sharing services worldwide in recent years. For EV-sharing platforms to excel, it is essential for them to offer private charging infrastructure for exclusive use that meets the charging demand of their clients. Particularly, they need to plan not only the places to build charging stations, but also the amounts of chargers per station, to maximally satisfy the requirements on global charging coverage and local charging demand. Existing research efforts are either inapplicable for their different problem formulations or are at a coarse granularity. In this paper, we formulate the \underlineE lectric \underlineV ehicle \underlineC harger \underlineP lanning (EVCP) problem especially for EV-sharing. We prove that the \shortpro problem is NP-hard, and design an approximation algorithm to solve the problem with a theoretical bound of $1-\frac1 e $. We also devise some optimization techniques to speed up the solution. Extensive experiments on real-world datasets validate the effectiveness and the efficiency of our proposed solutions.
【Keywords】: electric vehicles; location selection; submodularity
【Paper Link】 【Pages】:1339-1347
【Authors】: Boxin Du ; Hanghang Tong
【Abstract】: The Sylvester equation offers a powerful and unifying primitive for a variety of important graph mining tasks, including network alignment, graph kernel, node similarity, subgraph matching, etc. A major bottleneck of Sylvester equation lies in its high computational complexity. Despite tremendous effort, state-of-the-art methods still require a complexity that is at least \em quadratic in the number of nodes of graphs, even with approximations. In this paper, we propose a family of Krylov subspace based algorithms (\fasten) to speed up and scale up the computation of Sylvester equation for graph mining. The key idea of the proposed methods is to project the original equivalent linear system onto a Kronecker Krylov subspace. We further exploit (1) the implicit representation of the solution matrix as well as the associated computation, and (2) the decomposition of the original Sylvester equation into a set of inter-correlated Sylvester equations of smaller size. The proposed algorithms bear two distinctive features. First, they provide the \em exact solutions without any approximation error. Second, they significantly reduce the time and space complexity for solving Sylvester equation, with two of the proposed algorithms having a \em linear complexity in both time and space. Experimental evaluations on a diverse set of real networks, demonstrate that our methods (1) are up to $10,000\times$ faster against Conjugate Gradient method, the best known competitor that outputs the exact solution, and (2) scale up to million-node graphs.
【Keywords】:
【Paper Link】 【Pages】:1348-1357
【Authors】: Changying Du ; Changde Du ; Xingyu Xie ; Chen Zhang ; Hao Wang
【Abstract】: Many important data mining problems can be modeled as learning a (bidirectional) multidimensional mapping between two data domains. Based on the generative adversarial networks (GANs), particularly conditional ones, cross-domain joint distribution matching is an increasingly popular kind of methods addressing such problems. Though significant advances have been achieved, there are still two main disadvantages of existing models, i.e., the requirement of large amount of paired training samples and the notorious instability of training. In this paper, we propose a multi-view adversarially learned inference (ALI) model, termed as MALI, to address these issues. Unlike the common practice of learning direct domain mappings, our model relies on shared latent representations of both domains and can generate arbitrary number of paired faking samples, benefiting from which usually very few paired samples (together with sufficient unpaired ones) is enough for learning good mappings. Extending the vanilla ALI model, we design novel discriminators to judge the quality of generated samples (both paired and unpaired), and provide theoretical analysis of our new formulation. Experiments on image-to-image translation, image-to-attribute generation (multi-label classification), attribute-to-image generation tasks demonstrate that our semi-supervised learning framework yields significant performance improvements over existing ones. Results on cross-modality retrieval show that our latent space based method can achieve competitive similarity search performance in relative fast speed, compared to those methods that compute similarities in the high-dimensional data space.
【Keywords】: adversarially learned inference; attribute-image generation; cross-modality retrieval; generative adversarial networks; image translation; joint distribution matching; multi-label classification; multi-view learning
【Paper Link】 【Pages】:1358-1367
【Authors】: Mengnan Du ; Ninghao Liu ; Qingquan Song ; Xia Hu
【Abstract】: While deep neural networks (DNN) have become an effective computational tool, the prediction results are often criticized by the lack of interpretability, which is essential in many real-world applications such as health informatics. Existing attempts based on local interpretations aim to identify relevant features contributing the most to the prediction of DNN by monitoring the neighborhood of a given input. They usually simply ignore the intermediate layers of the DNN that might contain rich information for interpretation. To bridge the gap, in this paper, we propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of DNN models. By further interacting with the neuron of the target category at the output layer of the DNN, we enforce the interpretation result to be class-discriminative. We apply the proposed interpretation model to different CNN architectures to provide explanations for image data and conduct extensive experiments on ImageNet and PASCAL VOC07 datasets. The interpretation results demonstrate the effectiveness of our proposed framework in providing class-discriminative interpretation for DNN-based prediction.
【Keywords】: deep learning; guided feature inversion; intermediate layers; machine learning interpretation
【Paper Link】 【Pages】:1368-1377
【Authors】: Otmar Ertl
【Abstract】: Minwise hashing has become a standard tool to calculate signatures which allow direct estimation of Jaccard similarities. While very efficient algorithms already exist for the unweighted case, the calculation of signatures for weighted sets is still a time consuming task. BagMinHash is a new algorithm that can be orders of magnitude faster than current state of the art without any particular restrictions or assumptions on weights or data dimensionality. Applied to the special case of unweighted sets, it represents the first efficient algorithm producing independent signature components. A series of tests finally verifies the new algorithm and also reveals limitations of other approaches published in the recent past.
【Keywords】: consistent weighted sampling; jaccard similarity; locality-sensitive hashing; sketching algorithms; weighted minwise hashing
【Paper Link】 【Pages】:1378-1386
【Authors】: Dhivya Eswaran ; Christos Faloutsos ; Sudipto Guha ; Nina Mishra
【Abstract】: How do we spot interesting events from e-mail or transportation logs? How can we detect port scan or denial of service attacks from IP-IP communication data? In general, given a sequence of weighted, directed or bipartite graphs, each summarizing a snapshot of activity in a time window, how can we spot anomalous graphs containing the sudden appearance or disappearance of large dense subgraphs (e.g., near bicliques) in near real-time using sublinear memory? To this end, we propose a randomized sketching-based approach called SpotLight, which guarantees that an anomalous graph is mapped 'far' away from 'normal' instances in the sketch space with high probability for appropriate choice of parameters. Extensive experiments on real-world datasets show that SpotLight (a) improves accuracy by at least 8.4% compared to prior approaches, (b) is fast and can process millions of edges within a few minutes, (c) scales linearly with the number of edges and sketching dimensions and (d) leads to interesting discoveries in practice.
【Keywords】: anomaly detection; graph sketching; streaming graphs
【Paper Link】 【Pages】:1387-1395
【Authors】: Ian Fox ; Lynn Ang ; Mamta Jaiswal ; Rodica Pop-Busui ; Jenna Wiens
【Abstract】: In many forecasting applications, it is valuable to predict not only the value of a signal at a certain time point in the future, but also the values leading up to that point. This is especially true in clinical applications, where the future state of the patient can be less important than the patient's overall trajectory. This requires multi-step forecasting, a forecasting variant where one aims to predict multiple values in the future simultaneously. Standard methods to accomplish this can propagate error from prediction to prediction, reducing quality over the long term. In light of these challenges, we propose multi-output deep architectures for multi-step forecasting in which we explicitly model the distribution of future values of the signal over a prediction horizon. We apply these techniques to the challenging and clinically relevant task of blood glucose forecasting. Through a series of experiments on a real-world dataset consisting of 550K blood glucose measurements, we demonstrate the effectiveness of our proposed approaches in capturing the underlying signal dynamics. Compared to existing shallow and deep methods, we find that our proposed approaches improve performance individually and capture complementary information, leading to a large improvement over the baseline when combined (4.87 vs. 5.31 absolute percentage error (APE)). Overall, the results suggest the efficacy of our proposed approach in predicting blood glucose level and multi-step forecasting more generally.
【Keywords】: blood glucose forecasting; deep learning for time series; function forecasting; machine learning for healthcare; multi-output forecasting; multi-step forecasting
【Paper Link】 【Pages】:1396-1405
【Authors】: Weijie Fu ; Meng Wang ; Shijie Hao ; Xindong Wu
【Abstract】: We study the problem of active learning for multi-class classification on large-scale datasets. In this setting, the existing active learning approaches built upon uncertainty measures are ineffective for discovering unknown regions, and those based on expected error reduction are inefficient owing to their huge time costs. To overcome the above issues, this paper proposes a novel query selection criterion called approximated error reduction (AER). In AER, the error reduction of each candidate is estimated based on an expected impact over all datapoints and an approximated ratio between the error reduction and the impact over its nearby datapoints. In particular, we utilize hierarchical anchor graphs to construct the candidate set as well as the nearby datapoint sets of these candidates. The benefit of this strategy is that it enables a hierarchical expansion of candidates with the increase of labels, and allows us to further accelerate the AER estimation. We finally introduce AER into an efficient semi-supervised classifier for scalable active learning. Experiments on publicly available datasets with the sizes varying from thousands to millions demonstrate the effectiveness of our approach.
【Keywords】: active learning; efficient algorithms; query selection
【Paper Link】 【Pages】:1406-1415
【Authors】: Hongchang Gao ; Heng Huang
【Abstract】: Network embedding has attracted increasing attention in recent data mining research with many real-world applications. Network embedding is to learn low-dimensional representations for nodes in a network. A popular kind of existing methods, such as DeepWalk, Node2Vec, and LINE, learn node representations by pushing positive context node to the anchor node while pushing negative context nodes away from it in the low-dimensional vector space. When sampling the negative context nodes, they usually employ a predefined sampling distribution based on the node popularity. However, this sampling distribution often fails to capture the real informativeness of each node and cannot reflect the training state. To address these important problems, in this paper, we propose a novel self-paced network embedding method. Specifically, our method can adaptively capture the informativeness of each node based on the current training state, and sample negative context nodes in terms of their informativeness. The proposed self-paced sampling strategy can gradually select difficult negative context nodes with training process going on to learn better node representations. Moreover, to better capture the node informativeness for learning node representations, we extend our method to the generative adversarial network framework, which has the larger capacity to discover node informativeness. The extensive experiments have been conducted on the benchmark network datasets to validate the effectiveness of our proposed methods.
【Keywords】: informativeness; network embedding; self-paced sampling strategy
【Paper Link】 【Pages】:1416-1424
【Authors】: Hongyang Gao ; Zhengyang Wang ; Shuiwang Ji
【Abstract】: Convolutional neural networks (CNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In CNNs, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighboring units is neither fixed nor are they ordered in generic graphs, thereby hindering the applications of convolutional operations. Here, we address these challenges by proposing the learnable graph convolutional layer (LGCL). LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs. To enable model training on large-scale graphs, we propose a sub-graph training method to reduce the excessive memory and computational resource requirements suffered by prior methods on graph convolutions. Our experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that our methods can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and protein-protein interaction network datasets. Our results also indicate that the proposed methods using sub-graph training strategy are more efficient as compared to prior approaches.
【Keywords】: deep learning; graph convolutional networks; graph mining; large-scale learning
【Paper Link】 【Pages】:1425-1434
【Authors】: Nandani Garg ; Sayan Ranu
【Abstract】: We study the problem of route recommendation to idle taxi drivers such that the distance between the taxi and an anticipated customer request is minimized. Minimizing the distance to the next anticipated customer leads to more productivity for the taxi driver and less waiting time for the customer. To anticipate when and where future customer requests are likely to come from and accordingly recom- mend routes, we develop a route recommendation engine called MDM: Minimizing Distance through Monte Carlo Tree Search. In contrast to existing techniques, MDM employs a continuous learning platform where the underlying model to predict future customer requests is dynamically updated. Extensive experiments on real taxi data from New York and San Francisco reveal that MDM is up to 70% better than the state of the art and robust to anomalous events such as concerts, sporting events, etc.
【Keywords】: monte carlo tree search; multi-armed bandit learning; taxi route recommendation
【Paper Link】 【Pages】:1435-1444
【Authors】: Kamran Ghasedi Dizaji ; Xiaoqian Wang ; Heng Huang
【Abstract】: Gene expression profiling provides comprehensive characterization of cellular states under different experimental conditions, thus contributes to the prosperity of many fields of biomedical research. Although the rapid development of gene expression profiling has been observed, genome-wide profiling of large libraries is still expensive and difficult. Due to the fact that there are significant correlations between gene expression patterns, previous studies introduced regression models for predicting the target gene expressions from the landmark gene profiles. These models formulate the gene expression inference in a completely supervised manner, which require a large labeled dataset (i.e. paired landmark and target gene expressions). However, collecting the whole gene expressions is much more expensive than the landmark genes. In order to address this issue and take advantage of cheap unlabeled data (i.e. landmark genes), we propose a novel semi-supervised deep generative model for target gene expression inference. Our model is based on the generative adversarial network (GAN) to approximate the joint distribution of landmark and target genes, and an inference network to learn the conditional distribution of target genes given the landmark genes. We employ the reliable generated data by our GAN model as the extra training pairs to improve the training of our inference model, and utilize the trustworthy predictions of the inference network to enhance the adversarial training of our GAN network. We evaluate our model on the prediction of two types of gene expression data and identify obvious advantage over the counterparts.
【Keywords】: deep generative model; gene expression inference; semi-supervised learning
【Paper Link】 【Pages】:1445-1454
【Authors】: Fabian Gieseke ; Christian Igel
【Abstract】: Without access to large compute clusters, building random forests on large datasets is still a challenging problem. This is, in particular, the case if fully-grown trees are desired. We propose a simple yet effective framework that allows to efficiently construct ensembles of huge trees for hundreds of millions or even billions of training instances using a cheap desktop computer with commodity hardware. The basic idea is to consider a multi-level construction scheme, which builds top trees for small random subsets of the available data and which subsequently distributes all training instances to the top trees' leaves for further processing. While being conceptually simple, the overall efficiency crucially depends on the particular implementation of the different phases. The practical merits of our approach are demonstrated using dense datasets with hundreds of millions of training instances.
【Keywords】: classification and regression trees; ensemble methods; large-scale data analytics; machine learning; random forests; supervised learning
【Paper Link】 【Pages】:1455-1464
【Authors】: Lin Gong ; Hongning Wang
【Abstract】: User modeling is critical for understanding user intents, while it is also challenging as user intents are so diverse and not directly observable. Most existing works exploit specific types of behavior signals for user modeling, e.g., opinionated data or network structure; but the dependency among different types of user-generated data is neglected. We focus on self-consistence across multiple modalities of user-generated data to model user intents. A probabilistic generative model is developed to integrate two companion learning tasks of opinionated content modeling and social network structure modeling for users. Individual users are modeled as a mixture over the instances of paired learning tasks to realize their behavior heterogeneity, and the tasks are clustered by sharing a global prior distribution to capture the homogeneity among users. Extensive experimental evaluations on large collections of Amazon and Yelp reviews with social network structures confirm the effectiveness of the proposed solution. The learned user models are interpretable and predictive: they enable more accurate sentiment classification and item/friend recommendations than the corresponding baselines that only model a singular type of user behaviors.
【Keywords】: sentiment analysis; social network; user behavior modeling
【Paper Link】 【Pages】:1465-1474
【Authors】: Alexander Gorovits ; Ekta Gujral ; Evangelos E. Papalexakis ; Petko Bogdanov
【Abstract】: Communities are essential building blocks of complex networks enjoying significant research attention in terms of modeling and detection algorithms. Common across models is the premise that node pairs that share communities are likely to interact more strongly. Moreover, in the most general setting a node may be a member of multiple communities, and thus, interact with more than one cohesive group of other nodes. If node interactions are observed over a long period and aggregated into a single static network, the communities may be hard to discern due to their in-network overlap. Alternatively, if interactions are observed over short time periods, the communities may be only partially observable. How can we detect communities at an appropriate temporal resolution that resonates with their natural periods of activity? We propose LARC, a general framework for joint learning of the overlapping community structure and the periods of activity of communities, directly from temporal interaction data. We formulate the problem as an optimization task coupling community fit and smooth temporal activation over time. To the best of our knowledge, the tensor version of LARC is the first tensor-based community detection method to introduce such smoothness constraints. We propose efficient algorithms for the problem, achieving a $2.6x$ quality improvement over all baselines for high temporal resolution datasets, and consistently detecting better-quality communities for different levels of data aggregation and varying community overlap. In addition, LARC elucidates interpretable temporal patterns of community activity corresponding to botnet attacks, transportation change points and public forum interaction trends, while being computationally practical---few minutes on large real datasets. Finally, LARC provides a comprehensive \em unsupervised parameter estimation methodology yielding high accuracy and rendering it easy-to-use for practitioners.
【Keywords】: community activation; dynamic graphs; overlapping community detection; tensor factorization
【Paper Link】 【Pages】:1475-1484
【Authors】: Bin Gu ; Xiao-Tong Yuan ; Songcan Chen ; Heng Huang
【Abstract】: Semi-supervised learning is especially important in data mining applications because it can make use of plentiful unlabeled data to train the high-quality learning models. Semi-Supervised Support Vector Machine (S3VM) is a powerful semi-supervised learning model. However, the high computational cost and non-convexity severely impede the S3VM method in large-scale applications. Although several learning algorithms were proposed for S3VM, scaling up S3VM is still an open problem. To address this challenging problem, in this paper, we propose a new incremental learning algorithm to scale up S3VM (IL-S3VM) based on the path following technique in the framework of Difference of Convex (DC) programming. The traditional DC programming based algorithms need multiple outer loops and are not suitable for incremental learning, and traditional path following algorithms are limited to convex problems. Our new IL-S3VM algorithm based on the path-following technique can directly update the solution of S3VM to converge to a local minimum within one outer loop so that the efficient incremental learning can be achieved. More importantly, we provide the finite convergence analysis for our new algorithm. To the best of our knowledge, our new IL-S3VM algorithm is the first efficient path following algorithm for a non-convex problem (i.e., S3VM) with local minimum convergence guarantee. Experimental results on a variety of benchmark datasets not only confirm the finite convergence of IL-S3VM, but also show a huge reduction of computational time compared with existing batch and incremental learning algorithms, while retaining the similar generalization performance.
【Keywords】: difference of convex programming; incremental learning; path following algorithm; semi-supervised support vector machine
【Paper Link】 【Pages】:1485-1493
【Abstract】: Learning-based hashing has recently received considerable attentions due to its capability of supporting efficient storage and retrieval of high-dimensional data such as images, videos, and documents. In this paper, we propose a learning-based hashing algorithm called "Robust Rotated Supervised Discrete Hashing" (R 2 SDH), by extending the previous work on "Supervised Discrete Hashing" (SDH). In R 2 SDH, correntropy is adopted to replace the least square regression (LSR) model in SDH for achieving better robustness. Furthermore, considering the commonly used distance metrics such as cosine and Euclidean distance are invariant to rotational transformation, rotation is integrated into the original zero-one label matrix used in SDH, as additional freedom to promote flexibility without sacrificing accuracy. The rotation matrix is learned through an optimization procedure. Experimental results on three image datasets (MNIST, CIFAR-10, and NUS-WIDE) confirm that R 2 SDH generally outperforms SDH.
【Keywords】: robust m-estimator; rotation; supervised discrete hashing
【Paper Link】 【Pages】:1494-1503
【Authors】: Yufei Han ; Guolei Sun ; Yun Shen ; Xiangliang Zhang
【Abstract】: Tremendous efforts have been dedicated to improving the effectiveness of multi-label learning with incomplete label assignments. Most of the current techniques assume that the input features of data instances are complete. Nevertheless, the co-occurrence of highly incomplete features and weak label assignments is a challenging and widely perceived issue in real-world multi-label learning applications due to a number of practical reasons including incomplete data collection, moderate labels from annotators, etc. Existing multi-label learning algorithms are not directly applicable when the observed features are highly incomplete. In this work, we attack this problem by proposing a weakly supervised multi-label learning approach, based on the idea of collaborative embedding. This approach provides a flexible framework to conduct efficient multi-label classification at both transductive and inductive mode by coupling the process of reconstructing missing features and weak label assignments in a joint optimisation framework. It is designed to collaboratively recover feature and label information, and extract the predictive association between the feature profile and the multi-label tag of the same data instance. Substantial experiments on public benchmark datasets and real security event data validate that our proposed method can provide distinctively more accurate transductive and inductive classification than other state-of-the-art algorithms.
【Keywords】: highly incomplete feature; multi-label learning; weak labels
【Paper Link】 【Pages】:1504-1511
【Authors】: Raphael A. Hauser ; Armin Eftekhari ; Heinrich F. Matzinger
【Abstract】: Principal Component Analysis (PCA) finds the best linear representation for data and is an indispensable tool in many learning tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. There are yet other ways of interpreting PCA that are rarely exploited in practice, largely because it is not known how to reliably solve the corresponding non-convex optimisation programs. In this paper, we consider one such interpretation of principal components as the directions that preserve most of the "volume" of the dataset. Our main contribution is a theorem that shows that the corresponding non-convex program has no spurious local optima, and is therefore amenable to many convex solvers. We also confirm our findings numerically.
【Keywords】: non-convex optimization; principal component analysis; saddle point property
【Paper Link】 【Pages】:1512-1520
【Authors】: William Herlands ; Edward McFowland III ; Andrew Gordon Wilson ; Daniel B. Neill
【Abstract】: Inferring causal relationships in observational data is crucial for understanding scientific and social processes. We develop the first statistical machine learning approach for automatically discovering regression discontinuity designs (RDDs), a quasi-experimental setup often used in econometrics. Our method identifies interpretable, localized RDDs in arbitrary dimensional data and can seamlessly compute treatment effects without expert supervision. By applying the technique to a variety of synthetic and real datasets, we demonstrate robust performance under adverse conditions including unobserved variables, substantial noise, and model
【Keywords】: anomalous pattern detection; econometrics; machine learning; natural experiments; regression discontinuity
【Paper Link】 【Pages】:1521-1530
【Authors】: Junyuan Hong ; Huanhuan Chen ; Feng Lin
【Abstract】: In this paper, we focus on subspace-based learning problems, where data elements are linear subspaces instead of vectors. To handle this kind of data, Grassmann kernels were proposed to measure the space structure and used with classifiers, e.g., Support Vector Machines (SVMs). However, the existing discriminative algorithms mostly ignore the instability of subspaces, which would cause the classifiers to be misled by disturbed instances. Thus we propose considering all potential disturbances of subspaces in learning processes to obtain more robust classifiers. Firstly, we derive the dual optimization of linear classifiers with disturbances subject to a known distribution, resulting in a new kernel, Disturbance Grassmann (DG) kernel. Secondly, we research into two kinds of disturbance, relevant to the subspace matrix and singular values of bases, with which we extend the Projection kernel on Grassmann manifolds to two new kernels. Experiments on action data indicate that the proposed kernels perform better compared to state-of-the-art subspace-based methods, even in a worse environment.
【Keywords】: classification; grassmann manifolds; kernel; noise; subspace; supervised learning
【Paper Link】 【Pages】:1531-1540
【Authors】: Binbin Hu ; Chuan Shi ; Wayne Xin Zhao ; Philip S. Yu
【Abstract】: Heterogeneous information network (HIN) has been widely adopted in recommender systems due to its excellence in modeling complex context information. Although existing HIN based recommendation methods have achieved performance improvement to some extent, they have two major shortcomings. First, these models seldom learn an explicit representation for path or meta-path in the recommendation task. Second, they do not consider the mutual effect between the meta-path and the involved user-item pair in an interaction. To address these issues, we develop a novel deep neural network with the co-attention mechanism for leveraging rich meta-path based context for top-N recommendation. We elaborately design a three-way neural interaction model by explicitly incorporating meta-path based context. To construct the meta-path based context, we propose to use a priority based sampling technique to select high-quality path instances. Our model is able to learn effective representations for users, items and meta-path based context for implementing a powerful interaction function. The co-attention mechanism improves the representations for meta-path based con- text, users and items in a mutual enhancement way. Extensive experiments on three real-world datasets have demonstrated the effectiveness of the proposed model. In particular, the proposed model performs well in the cold-start scenario and has potentially good interpretability for the recommendation results.
【Keywords】: attention mechanism; deep learning; heterogeneous information network; recommender system
【Paper Link】 【Pages】:1541-1550
【Authors】: Mengdi Huai ; Chenglin Miao ; Yaliang Li ; Qiuling Suo ; Lu Su ; Aidong Zhang
【Abstract】: Metric learning aims to learn a good distance metric that can capture the relationships among instances, and its importance has long been recognized in many fields. In the traditional settings of metric learning, an implicit assumption is that the associated labels of the instances are deterministic. However, in many real-world applications, the associated labels come naturally with probabilities instead of deterministic values. Thus, the existing metric learning methods cannot work well in these applications. To tackle this challenge, in this paper, we study how to effectively learn the distance metric from datasets that contain probabilistic information, and then propose two novel metric learning mechanisms for two types of probabilistic labels, i.e., the instance-wise probabilistic label and the group-wise probabilistic label. Compared with the existing metric learning methods, our proposed mechanisms are capable of learning distance metrics directly from the probabilistic labels with high accuracy. We also theoretically analyze the two proposed mechanisms and provide theoretical bounds on the sample complexity for both of them. Additionally, extensive experiments based on real-world datasets are conducted to verify the desirable properties of the proposed mechanisms.
【Keywords】: distance measure; metric learning; probabilistic labels
【Paper Link】 【Pages】:1551-1560
【Authors】: Biwei Huang ; Kun Zhang ; Yizhu Lin ; Bernhard Schölkopf ; Clark Glymour
【Abstract】: Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need strong assumptions on the functional forms of causal mechanisms, as well as on data distributions, which limit their applicability. In practice the precise information of the underlying model class is usually unknown. If the above assumptions are violated, both spurious and missing edges may result. In this paper, we introduce generalized score functions for causal discovery based on the characterization of general (conditional) independence relationships between random variables, without assuming particular model classes. In particular, we exploit regression in RKHS to capture the dependence in a nonparametric way. The resulting causal discovery approach produces asymptotically correct results in rather general cases, which may have nonlinear causal mechanisms, a wide class of data distributions, mixed continuous and discrete data, and multidimensional variables. Experimental results on both synthetic and real-world data demonstrate the efficacy of our proposed approach.
【Keywords】:
【Paper Link】 【Pages】:1561-1570
【Authors】: Qiang Huang ; Guihong Ma ; Jianlin Feng ; Qiong Fang ; Anthony K. H. Tung
【Abstract】: The problem of Approximate Maximum Inner Product (AMIP) search has received increasing attention due to its wide applications. Interestingly, based on asymmetric transformation, the problem can be reduced to the Approximate Nearest Neighbor (ANN) search, and hence leverage Locality-Sensitive Hashing (LSH) to find solution. However, existing asymmetric transformations such as L2-ALSH and XBOX, suffer from large distortion error in reducing AMIP search to ANN search, such that the results of AMIP search can be arbitrarily bad. In this paper, we propose a novel Asymmetric LSH scheme based on Homocentric Hypersphere partition (H2-ALSH) for high-dimensional AMIP search. On the one hand, we propose a novel Query Normalized First (QNF) transformation to significantly reduce the distortion error. On the other hand, by adopting the homocentric hypersphere partition strategy, we can not only improve the search efficiency with early stop pruning, but also get higher search accuracy by further reducing the distortion error with limited data range. Our theoretical studies show that H2-ALSH enjoys a guarantee on search accuracy. Experimental results over four real datasets demonstrate that H2-ALSH significantly outperforms the state-of-the-art schemes.
【Keywords】: locality-sensitive hashing; maximum inner product search; nearest neighbor search
【Paper Link】 【Pages】:1571-1579
【Authors】: Sheng-Jun Huang ; Miao Xu ; Ming-Kun Xie ; Masashi Sugiyama ; Gang Niu ; Songcan Chen
【Abstract】: Feature missing is a serious problem in many applications, which may lead to low quality of training data and further significantly degrade the learning performance. While feature acquisition usually involves special devices or complex process, it is expensive to acquire all feature values for the whole dataset. On the other hand, features may be correlated with each other, and some values may be recovered from the others. It is thus important to decide which features are most informative for recovering the other features as well as improving the learning performance. In this paper, we try to train an effective classification model with least acquisition cost by jointly performing active feature querying and supervised matrix completion. When completing the feature matrix, a novel objective function is proposed to simultaneously minimize the reconstruction error on observed entries and the supervised loss on training data. When querying the feature value, the most uncertain entry is actively selected based on the variance of previous iterations. In addition, a bi-objective optimization method is presented for cost-aware active selection when features bear different acquisition costs. The effectiveness of the proposed approach is well validated by both theoretical analysis and experimental study.
【Keywords】: active learning; feature acquisition; matrix completion
【Paper Link】 【Pages】:1580-1588
【Authors】: Sheng-Jun Huang ; Jia-Wei Zhao ; Zhao-Yang Liu
【Abstract】: Deep convolutional neural networks have achieved great success in various applications. However, training an effective DNN model for a specific task is rather challenging because it requires a prior knowledge or experience to design the network architecture, repeated trial-and-error process to tune the parameters, and a large set of labeled data to train the model. In this paper, we propose to overcome these challenges by actively adapting a pre-trained model to a new task with less labeled examples. Specifically, the pre-trained model is iteratively fine tuned based on the most useful examples. The examples are actively selected based on a novel criterion, which jointly estimates the potential contribution of an instance on optimizing the feature representation as well as improving the classification model for the target task. On one hand, the pre-trained model brings plentiful information from its original task, avoiding redesign of the network architecture or training from scratch; and on the other hand, the labeling cost can be significantly reduced by active label querying. Experiments on multiple datasets and different pre-trained models demonstrate that the proposed approach can achieve cost-effective training of DNNs.
【Keywords】: active learning; deep learning; model adaptation
【Paper Link】 【Pages】:1589-1598
【Authors】: Jun-Yong Jeong ; Chi-Hyuck Jun
【Abstract】: We consider multi-task learning, which simultaneously learns related prediction tasks, to improve generalization performance. We factorize a coefficient matrix as the product of two matrices based on a low-rank assumption. These matrices have sparsities to simultaneously perform variable selection and learn and overlapping group structure among the tasks. The resulting bi-convex objective function is minimized by alternating optimization, where sub-problems are solved using alternating direction method of multipliers and accelerated proximal gradient descent. Moreover, we provide the performance bound of the proposed method. The effectiveness of the proposed method is validated for both synthetic and real-world datasets.
【Keywords】: $k$-support norm; low-rank; multi-task learning; sparse representation
【Paper Link】 【Pages】:1599-1607
【Authors】: Kishlay Jha ; Guangxu Xun ; Yaqing Wang ; Vishrawas Gopalakrishnan ; Aidong Zhang
【Abstract】: Given two topics of interest (A, C) that are otherwise disconnected - for instance two concepts: a disease ("Migraine") and a therapeutic substance ("Magnesium") - this paper attempts to find the conceptual bridges (e.g., serotonin (B)) that connects them in a novel way. This problem of mining implicit linkage is known as hypotheses generation and its potential to accelerate scientific progress is widely recognized. Almost all of the prior studies to tackle this problem ignore the temporal dynamics of concepts. This is limiting because it is known that the semantic meaning of a concept evolves over time. To overcome this issue, in this study, we define this problem as mining time-aware Top-k conceptual bridges, and in doing so provide a systematic approach to formalize the problem. Specifically, the proposed model first extracts relevant entities from the corpus, represents them in time-specific latent spaces, and then further reasons upon it to generate novel and experimentally testable hypotheses. The key challenge in this approach is to learn a mapping function that encodes the temporal characteristics of concepts and aligns the across-time latent spaces. To solve this, we propose an effective algorithm that learns precise mapping sensitive to both global and local semantics of the input query. Both qualitative and quantitative evaluations performed on the largest available biomedical corpus substantiate the importance of leveraging the evolutionary semantics of medical concepts and suggests that the generated hypotheses are novel and worthy of clinical trials.
【Keywords】: hypotheses generation; temporal dynamics; word embeddings
【Paper Link】 【Pages】:1608-1616
【Authors】: Bo Jin ; Haoyu Yang ; Leilei Sun ; Chuanren Liu ; Yue Qu ; Jianing Tong
【Abstract】: Recent years have witnessed an opportunity for improving healthcare efficiency and quality by mining Electronic Medical Records (EMRs). This paper is aimed at developing a treatment engine, which learns from historical EMR data and provides a patient with next-period prescriptions based on disease conditions, laboratory results, and treatment records of the patient. Importantly, the engine takes consideration of both treatment records and physical examination sequences which are not only heterogeneous and temporal in nature but also often with different record frequencies and lengths. Moreover, the engine also combines static information (e.g., demographics) with the temporal sequences to provide personalized treatment prescriptions to patients. In this regard, a novel Long Short-Term Memory (LSTM) learning framework is proposed to model inter-correlations of different types of medical sequences by connections between hidden neurons. With this framework, we develop three multifaceted LSTM models: Fully Connected Heterogeneous LSTM, Partially Connected Heterogeneous LSTM, and Decomposed Heterogeneous LSTM. The experiments are conducted on two datasets: one is the public MIMIC-III ICU data, and the other comes from several Chinese hospitals. Experimental results reveal the effectiveness of the framework and the three models. The work is deemed important and meaningful for both academia and practitioners in the realm of medical treatment and prediction, as well as in other fields of applications where intelligent decision support becomes pervasive.
【Keywords】: emrs; prescription prediction; temporal sequences; treatment
【Paper Link】 【Pages】:1617-1626
【Authors】: Kun Kuang ; Peng Cui ; Susan Athey ; Ruoxuan Xiong ; Bo Li
【Abstract】: In many important machine learning applications, the training distribution used to learn a probabilistic classifier differs from the distribution on which the classifier will be used to make predictions. Traditional methods correct the distribution shift by reweighting training data with the ratio of the density between test and training data. However, in many applications training takes place without prior knowledge of the testing distribution. Recently, methods have been proposed to address the shift by learning the underlying causal structure, but those methods rely on diversity arising from multiple training data sets, and they further have complexity limitations in high dimensions. In this paper, we propose a novel Deep Global Balancing Regression (DGBR) algorithm to jointly optimize a deep auto-encoder model for feature selection and a global balancing model for stable prediction across unknown environments. The global balancing model constructs balancing weights that facilitate estimation of partial effects of features (holding fixed all other features), a problem that is challenging in high dimensions, and thus helps to identify stable, causal relationships between features and outcomes. The deep auto-encoder model is designed to reduce the dimensionality of the feature space, thus making global balancing easier. We show, both theoretically and with empirical experiments, that our algorithm can make stable predictions across unknown environments. Our experiments on both synthetic and real datasets demonstrate that our algorithm outperforms the state-of-the-art methods for stable prediction across unknown environments.
【Keywords】: confounder(variable) balancing; stability; stable prediction; unknown environments
【Paper Link】 【Pages】:1627-1636
【Authors】: Atsutoshi Kumagai ; Tomoharu Iwata
【Abstract】: We propose a method for learning the dynamics of the decision boundary to maintain classification performance without additional labeled data. In various applications, such as spam-mail classification, the decision boundary dynamically changes over time. Accordingly, the performance of classifiers deteriorates quickly unless the classifiers are retrained using additional labeled data. However, continuously preparing such data is quite expensive or impossible. The proposed method alleviates this deterioration in performance by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. With the proposed method, the dynamics of the decision boundary is modeled by Gaussian processes. To exploit information on the decision boundaries from unlabeled data, the low-density separation criterion, i.e., the decision boundary should not cross high-density regions, but instead lie in low-density regions, is assumed with the proposed method. We incorporate this criterion into our framework in a principled manner by introducing the entropy posterior regularization to the posterior of the classifier parameters on the basis of the generic regularized Bayesian framework. We developed an efficient inference algorithm for the model based on variational Bayesian inference. The effectiveness of the proposed method was demonstrated through experiments using two synthetic and four real-world data sets.
【Keywords】: concept drift; semi-supervised learning; transfer learning
【Paper Link】 【Pages】:1637-1645
【Authors】: Hung Le ; Truyen Tran ; Svetha Venkatesh
【Abstract】: One of the core tasks in multi-view learning is to capture relations among views. For sequential data, the relations not only span across views, but also extend throughout the view length to form long-term intra-view and inter-view interactions. In this paper, we present a new memory augmented neural network that aims to model these complex interactions between two asynchronous sequential views. Our model uses two encoders for reading from and writing to two external memories for encoding input views. The intra-view interactions and the long-term dependencies are captured by the use of memories during this encoding process. There are two modes of memory accessing in our system: late-fusion and early-fusion, corresponding to late and early inter-view interactions. In the late-fusion mode, the two memories are separated, containing only view-specific contents. In the early-fusion mode, the two memories share the same addressing space, allowing cross-memory accessing. In both cases, the knowledge from the memories will be combined by a decoder to make predictions over the output space. The resulting dual memory neural computer is demonstrated on a comprehensive set of experiments, including a synthetic task of summing two sequences and the tasks of drug prescription and disease progression in healthcare. The results demonstrate competitive performance over both traditional algorithms and deep learning methods designed for multi-view problems.
【Keywords】: healthcare; memory augmented neural networks; multi-view sequential learning
【Paper Link】 【Pages】:1646-1655
【Authors】: Ching-pei Lee ; Cong Han Lim ; Stephen J. Wright
【Abstract】: We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving ERM problems with a nonsmooth regularization term. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations, and we describe how to maintain an approximation of the Hessian and solve subproblems efficiently in a distributed manner. The proposed method enjoys global linear convergence for a broad range of non-strongly convex problems that includes the most commonly used ERMs, thus requiring lower communication complexity. It also converges on non-convex problems, so has the potential to be used on applications such as deep learning. Initial computational results on convex problems demonstrate that our method significantly improves on communication cost and running time over the current state-of-the-art methods.
【Keywords】: distributed optimization; empirical risk minimization; inexact method; nonsmooth optimization; proximal method; quasi-newton; regularized optimization; variable metrics
【Paper Link】 【Pages】:1656-1665
【Authors】: Jaewoo Lee ; Daniel Kifer
【Abstract】: Iterative algorithms, like gradient descent, are common tools for solving a variety of problems, such as model fitting. For this reason, there is interest in creating differentially private versions of them. However, their conversion to differentially private algorithms is often naive. For instance, a fixed number of iterations are chosen, the privacy budget is split evenly among them, and at each iteration, parameters are updated with a noisy gradient. In this paper, we show that gradient-based algorithms can be improved by a more careful allocation of privacy budget per iteration. Intuitively, at the beginning of the optimization, gradients are expected to be large, so that they do not need to be measured as accurately. However, as the parameters approach their optimal values, the gradients decrease and hence need to be measured more accurately. We add a basic line-search capability that helps the algorithm decide when more accurate gradient measurements are necessary. Our gradient descent algorithm works with the recently introduced zCDP version of differential privacy. It outperforms prior algorithms for model fitting and is competitive with the state-of-the-art for $(ε,δ)$-differential privacy, a strictly weaker definition than zCDP.
【Keywords】: differential privacy; erm; gradient descent
【Paper Link】 【Pages】:1666-1674
【Authors】: John Boaz Lee ; Ryan A. Rossi ; Xiangnan Kong
【Abstract】: Graph classification is a problem with practical applications in many different domains. To solve this problem, one usually calculates certain graph statistics (i.e., graph features) that help discriminate between graphs of different classes. When calculating such features, most existing approaches process the entire graph. In a graphlet-based approach, for instance, the entire graph is processed to get the total count of different graphlets or subgraphs. In many real-world applications, however, graphs can be noisy with discriminative patterns confined to certain regions in the graph only. In this work, we study the problem of attention-based graph classification. The use of attention allows us to focus on small but informative parts of the graph, avoiding noise in the rest of the graph. We present a novel RNN model, called the Graph Attention Model (GAM), that processes only a portion of the graph by adaptively selecting a sequence of "informative" nodes. Experimental results on multiple real-world datasets show that the proposed method is competitive against various well-known methods in graph classification even though our method is limited to only a portion of the graph.
【Keywords】: attentional processing; deep learning; graph mining; reinforcement learning
【Paper Link】 【Pages】:1675-1684
【Authors】: Qi Li ; Meng Jiang ; Xikun Zhang ; Meng Qu ; Timothy P. Hanratty ; Jing Gao ; Jiawei Han
【Abstract】: Pattern-based methods have been successful in information extraction and NLP research. Previous approaches learn the quality of a textual pattern as relatedness to a certain task based on statistics of its individual content (e.g., length, frequency) and hundreds of carefully-annotated labels. However, patterns of good content-quality may generate heavily conflicting information due to the big gap between relatedness and correctness. Evaluating the correctness of information is critical in (entity, attribute, value)-tuple extraction. In this work, we propose a novel method, called TruePIE, that finds reliable patterns which can extract not only related but also correct information. TruePIE adopts the self-training framework and repeats the training-predicting-extracting process to gradually discover more and more reliable patterns. To better represent the textual patterns, pattern embeddings are formulated so that patterns with similar semantic meanings are embedded closely to each other. The embeddings jointly consider the local pattern information and the distributional information of the extractions. To conquer the challenge of lacking supervision on patterns' reliability, TruePIE can automatically generate high quality training patterns based on a couple of seed patterns by applying the arity-constraints to distinguish highly reliable patterns (i.e., positive patterns) and highly unreliable patterns (i.e., negative patterns). Experiments on a huge news dataset (over 25GB) demonstrate that the proposed TruePIE significantly outperforms baseline methods on each of the three tasks: reliable tuple extraction, reliable pattern extraction, and negative pattern extraction.
【Keywords】: information extraction; pattern embedding; pattern reliability; textual patterns
【Paper Link】 【Pages】:1685-1694
【Authors】: Shuai Li ; Yasin Abbasi-Yadkori ; Branislav Kveton ; S. Muthukrishnan ; Vishwa Vinay ; Zheng Wen
【Abstract】: Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.
【Keywords】: click models; importance sampling; offline evaluation; ranking
【Paper Link】 【Pages】:1695-1704
【Authors】: Yaguang Li ; Kun Fu ; Zheng Wang ; Cyrus Shahabi ; Jieping Ye ; Yan Liu
【Abstract】: One crucial task in intelligent transportation systems is estimating the duration of a potential trip given the origin location, destination location as well as the departure time. Most existing approaches for travel time estimation assume that the route of the trip is given, which does not hold in real-world applications since the route can be dynamically changed due to traffic conditions, user preferences, etc. As inferring the path from the origin and the destination can be time-consuming and nevertheless error-prone, it is desirable to perform origin-destination travel time estimation, which aims to predict the travel time without online route information. This problem is challenging mainly due to its limited amount of information available and the complicated spatiotemporal dependency. In this paper, we propose a MUlti-task Representation learning model for Arrival Time estimation (MURAT). This model produces meaningful representation that preserves various trip properties in the real-world and at the same time leverages the underlying road network and the spatiotemporal prior knowledge. Further-more, we propose a multi-task learning framework to utilize the path information of historical trips during the training phase which boosts the performance. Experimental results on two large-scale real-world datasets show that the proposed approach achieves clear improvements over state-of-the-art methods
【Keywords】: multi-task learning; representation learning; travel time estimation
【Paper Link】 【Pages】:1705-1714
【Authors】: Yaliang Li ; Chenglin Miao ; Lu Su ; Jing Gao ; Qi Li ; Bolin Ding ; Zhan Qin ; Kui Ren
【Abstract】: Soliciting answers from online users is an efficient and effective solution to many challenging tasks. Due to the variety in the quality of users, it is important to infer their ability to provide correct answers during aggregation. Therefore, truth discovery methods can be used to automatically capture the user quality and aggregate user-contributed answers via a weighted combination. Despite the fact that truth discovery is an effective tool for answer aggregation, existing work falls short of the protection towards the privacy of participating users. To fill this gap, we propose perturbation-based mechanisms that provide users with privacy guarantees and maintain the accuracy of aggregated answers. We first present a one-layer mechanism, in which all the users adopt the same probability to perturb their answers. Aggregation is then conducted on perturbed answers but the aggregation accuracy could drop accordingly. To improve the utility, a two-layer mechanism is proposed where users are allowed to sample their own probabilities from a hyper distribution. We theoretically compare the one-layer and two-layer mechanisms, and prove that they provide the same privacy guarantee while the two-layer mechanism delivers better utility. This advantage is brought by the fact that the two-layer mechanism can utilize the estimated user quality information from truth discovery to reduce the accuracy loss caused by perturbation, which is confirmed by experimental results on real-world datasets. Experimental results also demonstrate the effectiveness of the proposed two-layer mechanism in privacy protection with tolerable accuracy loss in aggregation.
【Keywords】: differential privacy; truth discovery; two-layer mechanism
【Paper Link】 【Pages】:1715-1723
【Authors】: Yan Li ; Jieping Ye
【Abstract】: Semi-supervised learning is a branch of machine learning techniques that aims to make fully use of both labeled and unlabeled instances to improve the prediction performance. The size of modern real world datasets is ever-growing so that acquiring label information for them is extraordinarily difficult and costly. Therefore, deep semi-supervised learning is becoming more and more popular. Most of the existing deep semi-supervised learning methods are built under the generative model based scheme, where the data distribution is approximated via input data reconstruction. However, this scheme does not naturally work on discrete data, e.g., text; in addition, learning a good data representation is sometimes directly opposed to the goal of learning a high performance prediction model. To address the issues of this type of methods, we reformulate the semi-supervised learning as a model-based reinforcement learning problem and propose an adversarial networks based framework. The proposed framework contains two networks: a predictor network for target estimation and a judge network for evaluation. The judge network iteratively generates proper reward to guide the training of predictor network, and the predictor network is trained via policy gradient. Based on the aforementioned framework, we propose a recurrent neural network based model for semi-supervised text classification. We conduct comprehensive experimental analysis on several real world benchmark text datasets, and the results from our evaluations show that our method outperforms other competing state-of-the-art methods.
【Keywords】: adversarial networks; policy gradients; semi-supervised learning; text classification
【Paper Link】 【Pages】:1724-1733
【Authors】: Yexin Li ; Yu Zheng ; Qiang Yang
【Abstract】: Bike-sharing systems are widely deployed in many major cities, while the jammed and empty stations in them lead to severe customer loss. Currently, operators try to constantly reposition bikes among stations when the system is operating. However, how to efficiently reposition to minimize the customer loss in a long period remains unsolved. We propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem. Firstly, an inter-independent inner-balance clustering algorithm is proposed to cluster stations into groups. Clusters obtained have two properties, i.e. each cluster is inner-balanced and independent from the others. As there are many trikes repositioning in a very large system simultaneously, clustering is necessary to reduce the problem complexity. Secondly, we allocate multiple trikes to each cluster to conduct inner-cluster bike reposition. A spatio-temporal reinforcement learning model is designed for each cluster to learn a reposition policy in it, targeting at minimizing its customer loss in a long period. To learn each model, we design a deep neural network to estimate its optimal long-term value function, from which the optimal policy can be easily inferred. Besides formulating the model in a multi-agent way, we further reduce its training complexity by two spatio-temporal pruning rules. Thirdly, we design a system simulator based on two predictors to train and evaluate the reposition model. Experiments on real-world datasets from Citi Bike are conducted to confirm the effectiveness of our model.
【Keywords】: bike-sharing system; dynamic bike reposition; reinforcement learning
【Paper Link】 【Pages】:1734-1743
【Authors】: Zhi Li ; Hongke Zhao ; Qi Liu ; Zhenya Huang ; Tao Mei ; Enhong Chen
【Abstract】: In the modern e-commerce, the behaviors of customers contain rich information, e.g., consumption habits, the dynamics of preferences. Recently, session-based recommendationsare becoming popular to explore the temporal characteristics of customers' interactive behaviors. However, existing works mainly exploit the short-term behaviors without fully taking the customers' long-term stable preferences and evolutions into account. In this paper, we propose a novel Behavior-Intensive Neural Network (BINN) for next-item recommendation by incorporating both users' historical stable preferences and present consumption motivations. Specifically, BINN contains two main components, i.e., Neural Item Embedding, and Discriminative Behaviors Learning. Firstly, a novel item embedding method based on user interactions is developed for obtaining an unified representation for each item. Then, with the embedded items and the interactive behaviors over item sequences, BINN discriminatively learns the historical preferences and present motivations of the target users. Thus, BINN could better perform recommendations of the next items for the target users. Finally, for evaluating the performances of BINN, we conduct extensive experiments on two real-world datasets, i.e., Tianchi and JD. The experimental results clearly demonstrate the effectiveness of BINN compared with several state-of-the-art methods.
【Keywords】: item embedding; next-item recommendation; recurrent neural networks; sequential behaviors
【Paper Link】 【Pages】:1744-1753
【Authors】: Defu Lian ; Kai Zheng ; Vincent W. Zheng ; Yong Ge ; Longbing Cao ; Ivor W. Tsang ; Xing Xie
【Abstract】: Information network embedding is an effective way for efficient graph analytics. However, it still faces with computational challenges in problems such as link prediction and node recommendation, particularly with increasing scale of networks. Hashing is a promising approach for accelerating these problems by orders of magnitude. However, no prior studies have been focused on seeking binary codes for information networks to preserve high-order proximity. Since matrix factorization (MF) unifies and outperforms several well-known embedding methods with high-order proximity preserved, we propose a MF-based \underlineI nformation \underlineN etwork \underlineH ashing (INH-MF) algorithm, to learn binary codes which can preserve high-order proximity. We also suggest Hamming subspace learning, which only updates partial binary codes each time, to scale up INH-MF. We finally evaluate INH-MF on four real-world information network datasets with respect to the tasks of node classification and node recommendation. The results demonstrate that INH-MF can perform significantly better than competing learning to hash baselines in both tasks, and surprisingly outperforms network embedding methods, including DeepWalk, LINE and NetMF, in the task of node recommendation. The source code of INH-MF is available online\footnote\urlhttps://github.com/DefuLian/network .
【Keywords】:
【Paper Link】 【Pages】:1754-1763
【Authors】: Jianxun Lian ; Xiaohuan Zhou ; Fuzheng Zhang ; Zhongxia Chen ; Xing Xie ; Guangzhong Sun
【Abstract】: Combinatorial features are essential for the success of many commercial models. Manually crafting these features usually comes with high cost due to the variety, volume and velocity of raw data in web-scale systems. Factorization based models, which measure interactions in terms of vector product, can learn patterns of combinatorial features automatically and generalize to unseen features as well. With the great success of deep neural networks (DNNs) in various fields, recently researchers have proposed several DNN-based factorization model to learn both low- and high-order feature interactions. Despite the powerful ability of learning an arbitrary function from data, plain DNNs generate feature interactions implicitly and at the bit-wise level. In this paper, we propose a novel Compressed Interaction Network (CIN), which aims to generate feature interactions in an explicit fashion and at the vector-wise level. We show that the CIN share some functionalities with convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We further combine a CIN and a classical DNN into one unified model, and named this new model eXtreme Deep Factorization Machine (xDeepFM). On one hand, the xDeepFM is able to learn certain bounded-degree feature interactions explicitly; on the other hand, it can learn arbitrary low- and high-order feature interactions implicitly. We conduct comprehensive experiments on three real-world datasets. Our results demonstrate that xDeepFM outperforms state-of-the-art models. We have released the source code of xDeepFM at https://github.com/Leavingseason/xDeepFM.
【Keywords】: deep learning; factorization machines; feature interactions; neural network; recommender systems
【Paper Link】 【Pages】:1764-1773
【Authors】: Shangsong Liang ; Xiangliang Zhang ; Zhaochun Ren ; Evangelos Kanoulas
【Abstract】: In this paper, we study the problem of dynamic user profiling in Twitter. We address the problem by proposing a dynamic user and word embedding model (DUWE), a scalable black-box variational inference algorithm, and a streaming keyword diversification model (SKDM). DUWE dynamically tracks the semantic representations of users and words over time and models their embeddings in the same space so that their similarities can be effectively measured. Our inference algorithm works with a convex objective function that ensures the robustness of the learnt embeddings. SKDM aims at retrieving top-K relevant and diversified keywords to profile users' dynamic interests. Experiments on a Twitter dataset demonstrate that our proposed embedding algorithms outperform state-of-the-art non-dynamic and dynamic embedding and topic models.
【Keywords】: dynamic model; profiling; word embeddings
【Paper Link】 【Pages】:1774-1783
【Authors】: Kaixiang Lin ; Renyu Zhao ; Zhe Xu ; Jiayu Zhou
【Abstract】: Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including two concrete algorithms, namely contextual deep Q-learning and contextual multi-agent actor-critic, to achieve explicit coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.
【Keywords】: deep reinforcement learning; fleet management; multi-agent reinforcement learning
【Paper Link】 【Pages】:1784-1793
【Authors】: Boyang Liu ; Pang-Ning Tan ; Jiayu Zhou
【Abstract】: Multilevel modeling and multi-task learning are two widely used approaches for modeling nested (multi-level) data, which contain observations that can be clustered into groups, characterized by their group-level features. Despite the similarity of the problems they address, the explicit relationship between multilevel modeling and multi-task learning has not been carefully examined. In this paper, we present a comparative analysis between the two methods to illustrate their strengths and limitations when applied to two-level nested data. We provide a detailed analysis demonstrating the equivalence of their formulations under a mild condition from an optimization perspective. We also demonstrate their limitations in terms of their predictive performance and especially, their difficulty in identifying potential cross-scale interactions between the local and group-level features when applied to datasets with either a small number of groups or limited training examples per group. To overcome these limitations, we propose a novel method for disaggregating the coarse-scale values of the group-level features in the nested data. Experimental results on both synthetic and real-world data show that the disaggregated group-level features can help enhance the prediction accuracy of the models significantly and identify the cross-scale interactions more effectively.
【Keywords】: multi-task learning; multilevel modeling; nested data
【Paper Link】 【Pages】:1794-1802
【Authors】: Jie Liu ; Zhicheng He ; Lai Wei ; Yalou Huang
【Abstract】: This paper concerns the problem of network embedding (NE), whose aim is to learn low-dimensional representations for nodes in networks. Such dense vector representations offer great promises for many network analysis problems. However, existing NE approaches are still faced with challenges posed by the characteristics of complex networks in real-world applications. First, for many real-world networks associated with rich content information, previous NE methods tend to learn separated content and structure representations for each node, which requires a post-processing of combination. The empirical and simple combination strategies often make the final vector suboptimal. Second, the existing NE methods preserve the structure information by considering short and fixed neighborhood scope, such as the first- and/or the second-order proximities. However, it is hard to decide the scope of the neighborhood when facing a complex problem. To this end, we propose a novel sequence-to-sequence model based NE framework which is referred to as Self-Translation Network Embedding (STNE) model. With the sequences generated by random walks on a network, STNE learns the mapping that translates each sequence itself from the content sequence to the node sequence. On the one hand, the bi-directional LSTM encoder of STNE fuses the content and structure information seamlessly from the raw input. On the other hand, high-order proximity can be flexibly learned with the memories of LSTM to capture long-range structural information. By such self-translation from content to node, the learned hidden representations can be adopted as node embeddings. Extensive experimental results based on three real-world datasets demonstrate that the proposed STNE outperforms the state-of-the-art NE approaches. To facilitate reproduction and further study, we provide Internet access to the code and datasets\footnotehttp://dm.nankai.edu.cn/code/STNE.rar.
【Keywords】: feature learning; network embedding; sequence to sequence
【Paper Link】 【Pages】:1803-1811
【Authors】: Ninghao Liu ; Hongxia Yang ; Xia Hu
【Abstract】: Machine learning (ML) systems have been increasingly applied in web security applications such as spammer detection, malware detection and fraud detection. These applications have an intrinsic adversarial nature where intelligent attackers can adaptively change their behaviors to avoid being detected by the deployed detectors. Existing efforts against adversaries are usually limited by the type of applied ML models or the specific applications such as image classification. Additionally, the working mechanisms of ML models usually cannot be well understood by users, which in turn impede them from understanding the vulnerabilities of models nor improving their robustness. To bridge the gap, in this paper, we propose to investigate whether model interpretation could potentially help adversarial detection. Specifically, we develop a novel adversary-resistant detection framework by utilizing the interpretation of ML models. The interpretation process explains the mechanism of how the target ML model makes prediction for a given instance, thus providing more insights for crafting adversarial samples. The robustness of detectors is then improved through adversarial training with the adversarial samples. A data-driven method is also developed to empirically estimate costs of adversaries in feature manipulation. Our approach is model-agnostic and can be applied to various types of classification models. Our experimental results on two real-world datasets demonstrate the effectiveness of interpretation-based attacks and how estimated feature manipulation cost would affect the behavior of adversaries.
【Keywords】: adversarial detection; machine learning interpretation; spammer detection
【Paper Link】 【Pages】:1812-1820
【Authors】: Ninghao Liu ; Xiao Huang ; Jundong Li ; Xia Hu
【Abstract】: Network embedding has been increasingly used in many network analytics applications to generate low-dimensional vector representations, so that many off-the-shelf models can be applied to solve a wide variety of data mining tasks. However, similar to many other machine learning methods, network embedding results remain hard to be understood by users. Each dimension in the embedding space usually does not have any specific meaning, thus it is difficult to comprehend how the embedding instances are distributed in the reconstructed space. In addition, heterogeneous content information may be incorporated into network embedding, so it is challenging to specify which source of information is effective in generating the embedding results. In this paper, we investigate the interpretation of network embedding, aiming to understand how instances are distributed in embedding space, as well as explore the factors that lead to the embedding results. We resort to the post-hoc interpretation scheme, so that our approach can be applied to different types of embedding methods. Specifically, the interpretation of network embedding is presented in the form of a taxonomy. Effective objectives and corresponding algorithms are developed towards building the taxonomy. We also design several metrics to evaluate interpretation results. Experiments on real-world datasets from different domains demonstrate that, by comparing with the state-of-the-art alternatives, our approach produces effective and meaningful interpretation to embedding results.
【Keywords】: machine learning interpretation; network embedding; taxonomy
【Paper Link】 【Pages】:1821-1830
【Authors】: Qi Liu ; Zai Huang ; Zhenya Huang ; Chuanren Liu ; Enhong Chen ; Yu Su ; Guoping Hu
【Abstract】: In online education systems, finding similar exercises is a fundamental task of many applications, such as exercise retrieval and student modeling. Several approaches have been proposed for this task by simply using the specific textual content (e.g. the same knowledge concepts or the similar words) in exercises. However, the problem of how to systematically exploit the rich semantic information embedded in multiple heterogenous data (e.g. texts and images) to precisely retrieve similar exercises remains pretty much open. To this end, in this paper, we develop a novel Multimodal Attention-based Neural Network (MANN) framework for finding similar exercises in large-scale online education systems by learning a unified semantic representation from the heterogenous data. In MANN, given exercises with texts, images and knowledge concepts, we first apply a convolutional neural network to extract image representations and use an embedding layer for representing concepts. Then, we design an attention-based long short-term memory network to learn a unified semantic representation of each exercise in a multimodal way. Here, two attention strategies are proposed to capture the associations of texts and images, texts and knowledge concepts, respectively. Moreover, with a Similarity Attention, the similar parts in each exercise pair are also measured. Finally, we develop a pairwise training strategy for returning similar exercises. Extensive experimental results on real-world data clearly validate the effectiveness and the interpretation power of MANN.
【Keywords】: heterogenous data; online education systems; similar exercises
【Paper Link】 【Pages】:1831-1839
【Authors】: Qiao Liu ; Yifu Zeng ; Refuoe Mokhosi ; Haibin Zhang
【Abstract】: Predicting users' actions based on anonymous sessions is a challenging problem in web-based behavioral modeling research, mainly due to the uncertainty of user behavior and the limited information. Recent advances in recurrent neural networks have led to promising approaches to solving this problem, with long short-term memory model proving effective in capturing users' general interests from previous clicks. However, none of the existing approaches explicitly take the effects of users' current actions on their next moves into account. In this study, we argue that a long-term memory model may be insufficient for modeling long sessions that usually contain user interests drift caused by unintended clicks. A novel short-term attention/memory priority model is proposed as a remedy, which is capable of capturing users' general interests from the long-term memory of a session context, whilst taking into account users' current interests from the short-term memory of the last-clicks. The validity and efficacy of the proposed attention mechanism is extensively evaluated on three benchmark data sets from the RecSys Challenge 2015 and CIKM Cup 2016. The numerical results show that our model achieves state-of-the-art performance in all the tests.
【Keywords】: attention model; behavior modeling; neural networks; representation learning; session-based recommendation
【Paper Link】 【Pages】:1840-1849
【Authors】: Xinyue Liu ; Xiangnan Kong ; Philip S. Yu
【Abstract】: Influence maximization (IM) targets at maximizing the number of users being aware of a product by finding a set of seed users to expose in a social network. Previous IM models mainly focus on optimizing the spread of product consumption, which assumes that all users are potential customers and more exposures lead to better profit. However, in the real-world scenario, some people may not like the product and may express negative opinions after consuming, which damage the product reputation and harm the long-term profit. Only a portion of users in the social network, called the target user, is the potential customer that likes the product and will spread positive opinion. In this paper, we consider a problem called AcTive Opinion Maximization (ATOM), where the goal is to find a set of seed users to maximize the overall opinion spread toward a target product in a multi-round campaign. Different from previous works, we do not assume the user opinion is known before consumption, but should be derived from user preference data. The ATOM problem has essential applications in viral marketing, such as reputation building and precision advertising. Given its significance, ATOM problem is profoundly challenging due to the hardness of estimating user opinion in a multi-round campaign. Moreover, the process of opinion estimation and influence propagation intertwine with each other, which requires the model to consider the two components collectively. We propose an active learning framework called CONE (aCtive OpinioN Estimator) to address above challenges. Experimental results on two real-world datasets demonstrate that CONE improves the total opinion spread in a social network.
【Keywords】: active learning; influence maximization; matrix factorization; opinion maximization; social networks; viral marketing
【Paper Link】 【Pages】:1850-1859
【Authors】: Yiding Liu ; Kaiqi Zhao ; Gao Cong
【Abstract】: With the proliferation of mobile devices and location-based services, rich geo-tagged data is becoming prevalent and this offer great opportunities to understand different geographical regions (e.g., shopping areas). However, the huge number of regions with complicated spatial information are expensive for people to explore and understand. To solve this issue, we study the problem of searching similar regions given a user specified query region. The problem is challenging in both similarity definition and search efficiency. To tackle the two challenges, we propose a novel solution equipped by (1) a deep learning approach to learning the similarity that considers both object attributes and the relative locations between objects; and (2) an efficient branch and bound search algorithm for finding top-N similar regions. Moreover, we propose an approximation method to further improve the efficiency by slightly sacrificing the accuracy. Our experiments on three real world datasets demonstrate that our solution improves both the accuracy and search efficiency by a significant margin compared with the state-of-the-art methods.
【Keywords】: metric learning; similarity search; spatial data
【Paper Link】 【Pages】:1860-1869
【Authors】: Zemin Liu ; Vincent W. Zheng ; Zhou Zhao ; Zhao Li ; Hongxia Yang ; Minghui Wu ; Jing Ying
【Abstract】: Semantic proximity search on heterogeneous graph is an important task, and is useful for many applications. It aims to measure the proximity between two nodes on a heterogeneous graph w.r.t. some given semantic relation. Prior work often tries to measure the semantic proximity by paths connecting a query object and a target object. Despite the success of such path-based approaches, they often modeled the paths in a weakly coupled manner, which overlooked the rich interactions among paths. In this paper, we introduce a novel concept of interactive paths to model the inter-dependency among multiple paths between a query object and a target object. We then propose an Interactive Paths Embedding (IPE) model, which learns low-dimensional representations for the resulting interactive-paths structures for proximity estimation. We conduct experiments on seven relations with four different types of heterogeneous graphs, and show that our model outperforms the state-of-the-art baselines.
【Keywords】: heterogeneous graph; interactive paths embedding; semantic proximity search
【Paper Link】 【Pages】:1870-1879
【Authors】: Zheng Liu ; Xing Xie ; Lei Chen
【Abstract】: Collaborator Recommendation is a useful application in exploiting big academic data. However, existing works leave out the contextual restriction (i.e., research topics) of people's academic collaboration, thus cannot recommend suitable collaborators for the required research topics. In this work, we propose Context-aware Collaborator Recommendation (CACR), which aims to recommend high-potential new collaborators for people's context-restricted requests. To this end, we design a novel recommendation framework, which consists of two fundamental components: the Collaborative Entity Embedding network (CEE) and the Hierarchical Factorization Model (HFM). In particular, CEE jointly represents researchers and research topics as compact vectors based on their co-occurrence relationships, whereby capturing researchers' context-aware collaboration tendencies and topics' underlying semantics. Meanwhile, HFM extracts researchers' activenesses and conservativenesses, which reflect their intensities of making academic collaborations and tendencies of working with non-collaborated fellows. The extracted activenesses and conservativenesses work collaboratively with the context-aware collaboration tendencies, such that high-quality recommendation can be produced. Extensive experimental studies are conducted with large-scale academic data, whose results verify the effectiveness of our proposed approaches.
【Keywords】: academic data mining; collaborator recommendation; contextaware recommendation
【Paper Link】 【Pages】:1880-1889
【Authors】: Pan Lu ; Lei Ji ; Wei Zhang ; Nan Duan ; Ming Zhou ; Jianyong Wang
【Abstract】: Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. However, these candidate entities or attributes might be unrelated to the VQA task and have limited semantic capacities. To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. Specifically, we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset via a semantic similarity module, in which each data consists of an image, a corresponding question, a correct answer and a supporting relation fact. A well-defined relation detector is then adopted to predict visual question-related relation facts. We further propose a multi-step attention model composed of visual attention and semantic attention sequentially to extract related visual knowledge and semantic knowledge. We conduct comprehensive experiments on the two benchmark datasets, demonstrating that our model achieves state-of-the-art performance and verifying the benefit of considering visual relation facts.
【Keywords】: attention network; question answering; relation fact mining; semantic knowledge; visual question answering
【Paper Link】 【Pages】:1890-1899
【Authors】: Chen Luo ; Zhengzhang Chen ; Lu An Tang ; Anshumali Shrivastava ; Zhichun Li ; Haifeng Chen ; Jieping Ye
【Abstract】: The latent behavior of an information system that can exhibit extreme events, such as system faults or cyber-attacks, is complex. Recently, the invariant network has shown to be a powerful way of characterizing complex system behaviors. Structures and evolutions of the invariance network, in particular, the vanishing correlations, can shed light on identifying causal anomalies and performing system diagnosis. However, due to the dynamic and complex nature of real-world information systems, learning a reliable invariant network in a new environment often requires continuous collecting and analyzing the system surveillance data for several weeks or even months. Although the invariant networks learned from old environments have some common entities and entity relationships, these networks cannot be directly borrowed for the new environment due to the domain variety problem. To avoid the prohibitive time and resource consuming network building process, we propose TINET, a knowledge transfer based model for accelerating invariant network construction. In particular, we first propose an entity estimation model to estimate the probability of each source domain entity that can be included in the final invariant network of the target domain. Then, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of TINET. We also apply TINET to a real enterprise security system for intrusion detection. TINET achieves superior detection performance at least 20 days lead-lag time in advance with more than 75% accuracy.
【Keywords】: dependency construction; enterprise security system; entity embedding; heterogeneous categorical event; invariant network; knowledge transfer
【Paper Link】 【Pages】:1900-1909
【Authors】: Luo Luo ; Wenpeng Zhang ; Zhihua Zhang ; Wenwu Zhu ; Tong Zhang ; Jian Pei
【Abstract】: Factorization Machine (FM) is a supervised machine learning model for feature engineering, which is widely used in many real-world applications. In this paper, we consider the case that the data samples arrive sequentially. The existing convex formulation for online FM has the strong theoretical guarantee and stable performance in practice, but the computational cost is typically expensive when the data is high-dimensional. To address this weakness, we devise a novel online learning algorithm called Sketched Follow-The-Regularizer-Leader (SFTRL). SFTRL presents the parameters of FM implicitly by maintaining low-rank matrices and updates the parameters via sketching. More specifically, we propose Generalized Frequent Directions to approximate indefinite symmetric matrices in a streaming way, making that the sum of historical gradients for FM could be estimated with tighter error bound efficiently. With mild assumptions, we prove that the regret bound of SFTRL is close to that of the standard FTRL. Experimental results show that SFTRL has better prediction quality than the state-of-the-art online FM algorithms in much lower time and space complexities.
【Keywords】: convex online learning; factorization machine; follow-the-regularized-leader; sketching
【Paper Link】 【Pages】:1910-1919
【Authors】: Fenglong Ma ; Jing Gao ; Qiuling Suo ; Quanzeng You ; Jing Zhou ; Aidong Zhang
【Abstract】: Predicting the risk of potential diseases from Electronic Health Records (EHR) has attracted considerable attention in recent years, especially with the development of deep learning techniques. Compared with traditional machine learning models, deep learning based approaches achieve superior performance on risk prediction task. However, none of existing work explicitly takes prior medical knowledge (such as the relationships between diseases and corresponding risk factors) into account. In medical domain, knowledge is usually represented by discrete and arbitrary rules. Thus, how to integrate such medical rules into existing risk prediction models to improve the performance is a challenge. To tackle this challenge, we propose a novel and general framework called PRIME for risk prediction task, which can successfully incorporate discrete prior medical knowledge into all of the state-of-the-art predictive models using posterior regularization technique. Different from traditional posterior regularization, we do not need to manually set a bound for each piece of prior medical knowledge when modeling desired distribution of the target disease on patients. Moreover, the proposed PRIME can automatically learn the importance of different prior knowledge with a log-linear model.Experimental results on three real medical datasets demonstrate the effectiveness of the proposed framework for the task of risk prediction
【Keywords】: healthcare informatics; posterior regularization; prior medical knowledge
【Paper Link】 【Pages】:1920-1929
【Authors】: Jianxin Ma ; Peng Cui ; Xiao Wang ; Wenwu Zhu
【Abstract】: Network embedding learns the low-dimensional representations for vertices, while preserving the inter-vertex similarity reflected by the network structure. The neighborhood structure of a vertex is usually closely related with an underlying hierarchical taxonomy---the vertices are associated with successively broader categories that can be organized hierarchically. The categories of different levels reflects similarity of different granularity. The hierarchy of the taxonomy therefore requires that the learned representations support multiple levels of granularity. Moreover, the hierarchical taxonomy enables the information to flow between vertices via their common categories, and thus provides an effective mechanism for alleviating data scarcity. However, incorporating the hierarchical taxonomy into network embedding poses a great challenge (since the taxonomy is generally unknown), and it is neglected by the existing approaches. In this paper, we propose NetHiex, a NETwork embedding model that captures the latent HIErarchical taXonomy. In our model, a vertex representation consists of multiple components that are associated with categories of different granularity. The representations of both the vertices and the categories are co-regularized. We employ the nested Chinese restaurant process to guide the search of the most plausible hierarchical taxonomy. The network structure is then recovered from the latent representations via a Bernoulli distribution. The whole model is unified within a nonparametric probabilistic framework. A scalable expectation-maximization algorithm is derived for optimization. Empirical results demonstrate that NetHiex achieves significant performance gain over the state-of-arts.
【Keywords】: hierarchical taxonomy; nested chinese restaurant process; network embedding; network representation learning
【Paper Link】 【Pages】:1930-1939
【Authors】: Jiaqi Ma ; Zhe Zhao ; Xinyang Yi ; Jilin Chen ; Lichan Hong ; Ed H. Chi
【Abstract】: Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the system might also optimize for users liking the movies afterwards. With multi-task learning, we aim to build a single model that learns these multiple goals and tasks simultaneously. However, the prediction quality of commonly used multi-task models is often sensitive to the relationships between tasks. It is therefore important to study the modeling tradeoffs between task-specific objectives and inter-task relationships. In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data. We adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing the expert submodels across all tasks, while also having a gating network trained to optimize each task. To validate our approach on data with different levels of task relatedness, we first apply it to a synthetic dataset where we control the task relatedness. We show that the proposed approach performs better than baseline methods when the tasks are less related. We also show that the MMoE structure results in an additional trainability benefit, depending on different levels of randomness in the training data and model initialization. Furthermore, we demonstrate the performance improvements by MMoE on real tasks including a binary classification benchmark, and a large-scale content recommendation system at Google.
【Keywords】: mixture of experts; multi-task learning; neural network; recommendation system
【Paper Link】 【Pages】:1953-1962
【Authors】: Chaitanya Manapragada ; Geoffrey I. Webb ; Mahsa Salehi
【Abstract】: We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree---"Extremely Fast Decision Tree'', a minor modification to the MOA implementation of Hoeffding Tree---obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.
【Keywords】: classification; decision trees; incremental learning
【Paper Link】 【Pages】:1963-1972
【Authors】: Emaad A. Manzoor ; Hemank Lamba ; Leman Akoglu
【Abstract】: This work addresses the outlier detection problem for feature-evolving streams, which has not been studied before. In this setting both (1) data points may evolve, with feature values changing, as well as (2) feature space may evolve, with newly-emerging features over time. This is notably different from row-streams, where points with fixed features arrive one at a time. We propose a density-based ensemble outlier detector, called xStream, for this more extreme streaming setting which has the following key properties: (1) it is a constant-space and constant-time (per incoming update) algorithm, (2) it measures outlierness at multiple scales or granularities, it can handle (3 i ) high-dimensionality through distance-preserving projections, and (3$ii$) non-stationarity via $O(1)$-time model updates as the stream progresses. In addition, xStream can address the outlier detection problem for the (less general) disk-resident static as well as row-streaming settings. We evaluate xStream rigorously on numerous real-life datasets in all three settings: static, row-stream, and feature-evolving stream. Experiments under static and row-streaming scenarios show that xStream is as competitive as state-of-the-art detectors and particularly effective in high-dimensions with noise. We also demonstrate that our solution is fast and accurate with modest space overhead for evolving streams, on which there exists no competition.
【Keywords】: anomaly detection; data streams; evolving feature spaces; outlier detection
【Paper Link】 【Pages】:1973-1982
【Authors】: Dominik Mautz ; Wei Ye ; Claudia Plant ; Christian Böhm
【Abstract】: A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the data set. The new research field of non-redundant clustering addresses this class of problems. In this paper, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments.
【Keywords】: clustering; k-means; non-redundant; subspace
【Paper Link】 【Pages】:1983-1992
【Authors】: Denis Moreira dos Reis ; André Gustavo Maletzke ; Diego Furtado Silva ; Gustavo E. A. P. A. Batista
【Abstract】: Many real-world applications in the batch and data stream settings with data shift pose restrictions to the access to class labels after the deployment of a classification or quantification model. However, a significant portion of the data stream literature assumes that actual labels are instantaneously available after issuing their corresponding classifications. In this paper, we explore a different set of assumptions without relying on the availability of class labels. We assume that, although the distribution of the data may change over time, it will switch between one of a handful of well-known distributions. Still, we allow the proportions of the classes to vary. In these conditions, we propose the first method that can accurately identify the correct context of data samples and simultaneously estimate the proportion of the positive class. This estimate can be further used to adjust a classification decision threshold and improve classification accuracy. Finally, the method is very efficient regarding time and memory requirements, fitting data stream applications.
【Keywords】: classification; concept drift; counting; dataset shift; quantification; recurrent contexts
【Paper Link】 【Pages】:1993-2002
【Authors】: Gyoung S. Na ; Dong Hyun Kim ; Hwanjo Yu
【Abstract】: With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.
【Keywords】: data streams; density-based sampling; outlier detection
【Paper Link】 【Pages】:2003-2011
【Authors】: Khanh Nguyen ; Trung Le ; Tu Dinh Nguyen ; Dinh Q. Phung ; Geoffrey I. Webb
【Abstract】: Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) - a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efficient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efficient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and significantly outperforms other baselines, while obtaining significantly speedup in terms of the total training time compared with its rivals
【Keywords】: bayesian inference; big data; kernel methods; multiclass supervised learning; random feature; stein divergence; variational method
【Paper Link】 【Pages】:2012-2021
【Authors】: Feiping Nie ; Zhanxuan Hu ; Xuelong Li
【Abstract】: This paper proposes a novel algorithm, named Non-Convex Calibrated Multi-Task Learning (NC-CMTL), for learning multiple related regression tasks jointly. Instead of utilizing the nuclear norm, NC-CMTL adopts a non-convex low rank regularizer to explore the shared information among different tasks. In addition, considering that the regularization parameter for each regression task desponds on its noise level, we replace the least squares loss function by square-root loss function. Computationally, as proposed model has a nonsmooth loss function and a non-convex regularization term, we construct an efcient re-weighted method to optimize it. Theoretically, we frst present the convergence analysis of constructed method, and then prove that the derived solution is a stationary point of original problem. Particularly, the regularizer and optimization method used in this paper are also suitable for other rank minimization problems. Numerical experiments on both synthetic and real data illustrate the advantages of NC-CMTL over several state-of-the-art methods.
【Keywords】: clustering; dimensionality reduction; multi-task learning
【Paper Link】 【Pages】:2022-2030
【Authors】: Feiping Nie ; Lai Tian ; Xuelong Li
【Abstract】: In this paper, we make a multiview extension of the spectral rotation technique raised in single view spectral clustering research. Since spectral rotation is closely related to the Procrustes Analysis for points matching, we point out that classical Procrustes Average approach can be used for multiview clustering. Besides, we show that direct applying Procrustes Average (PA) in multiview tasks may not be optimal theoretically and empirically, since it does not take the clustering capacity differences of different views into consideration. Other than that, we propose an Adaptively Weighted Procrustes (AWP) approach to overcome the aforementioned deficiency. Our new AWP weights views with their clustering capacities and forms a weighted Procrustes Average problem accordingly. The optimization algorithm to solve the new model is computational complexity analyzed and convergence guaranteed. Experiments on five real-world datasets demonstrate the effectiveness and efficiency of the new models.
【Keywords】: clustering; multiview data; procrustes analysis
【Paper Link】 【Pages】:2031-2040
【Authors】: Chaoyue Niu ; Zhenzhe Zheng ; Fan Wu ; Shaojie Tang ; Xiaofeng Gao ; Guihai Chen
【Abstract】: With the commoditization of personal privacy, pricing private data has become an intriguing problem. In this paper, we study noisy aggregate statistics trading from the perspective of a data broker in data markets. We thus propose ERATO, which enables aggrEgate statistics pRicing over privATe cOrrelated data. On one hand, ERATO guarantees arbitrage freeness against cunning data consumers. On the other hand, ERATO compensates data owners for their privacy losses using both bottom-up and top-down designs. We further apply ERATO to three practical aggregate statistics, namely weighted sum, probability distribution fitting, and degree distribution, and extensively evaluate their performances on MovieLens dataset, 2009 RECS dataset, and two SNAP large social network datasets, respectively. Our analysis and evaluation results reveal that ERATO well balances utility and privacy, achieves arbitrage freeness, and compensates data owners more fairly than differential privacy based approaches.
【Keywords】: data correlation; data privacy; data trading
【Paper Link】 【Pages】:2041-2050
【Authors】: Guansong Pang ; Longbing Cao ; Ling Chen ; Huan Liu
【Abstract】: Learning expressive low-dimensional representations of ultrahigh-dimensional data, e.g., data with thousands/millions of features, has been a major way to enable learning methods to address the curse of dimensionality. However, existing unsupervised representation learning methods mainly focus on preserving the data regularity information and learning the representations independently of subsequent outlier detection methods, which can result in suboptimal and unstable performance of detecting irregularities (i.e., outliers). This paper introduces a ranking model-based framework, called RAMODO, to address this issue. RAMODO unifies representation learning and outlier detection to learn low-dimensional representations that are tailored for a state-of-the-art outlier detection approach - the random distance-based approach. This customized learning yields more optimal and stable representations for the targeted outlier detectors. Additionally, RAMODO can leverage little labeled data as prior knowledge to learn more expressive and application-relevant representations. We instantiate RAMODO to an efficient method called REPEN to demonstrate the performance of RAMODO. Extensive empirical results on eight real-world ultrahigh dimensional data sets show that REPEN (i) enables a random distance-based detector to obtain significantly better AUC performance and two orders of magnitude speedup; (ii) performs substantially better and more stably than four state-of-the-art representation learning methods; and (iii) leverages less than 1% labeled data to achieve up to 32% AUC improvement.
【Keywords】: anomaly detection; dimension reduction; high-dimensional data; outlier detection; prior knowledge; representation learning; ultrahigh-dimensional data
【Paper Link】 【Pages】:2051-2059
【Authors】: Himchan Park ; Min-Soo Kim
【Abstract】: Nowadays, many researchers and industry groups often suffer from the lack of a variety of large-scale real graphs. Although a lot of synthetic graph generation methods,(or models) such as RMAT and BA have been developed, their output graphs tend to be quite different from real-world graphs in terms of graph properties. There are a few graph upscaling methods such as Gscaler, they still fail to preserve important properties of the original graph and fail to upscale due to out of memory or too long runtime. In this paper, we propose a novel graph upscaling method called EvoGraph that can upscale the original graph with preserving its properties regardless of a scale factor. It determines and attaches new edges to the real graph using the preferential attachment mechanism in an effective and efficient way. Through extensive experiments, we have demonstrated that EvoGraph significantly outperforms the state-of-the-art graph upscaling method Gscaler in terms of preserving graph properties and performance measures such as runtime, memory usage, and scalability.
【Keywords】: barabasi-albert; graph generation; graph upscaling; parallel computation; preferential attachment
【Paper Link】 【Pages】:2060-2069
【Authors】: Georgina Peake ; Jun Wang
【Abstract】: The widescale use of machine learning algorithms to drive decision-making has highlighted the critical importance of ensuring the interpretability of such models in order to engender trust in their output. The state-of-the-art recommendation systems use black-box latent factor models that provide no explanation of why a recommendation has been made, as they abstract their decision processes to a high-dimensional latent space which is beyond the direct comprehension of humans. We propose a novel approach for extracting explanations from latent factor recommendation systems by training association rules on the output of a matrix factorisation black-box model. By taking advantage of the interpretable structure of association rules, we demonstrate that predictive accuracy of the recommendation model can be maintained whilst yielding explanations with high fidelity to the black-box model on a unique industry dataset. Our approach mitigates the accuracy-interpretability trade-off whilst avoiding the need to sacrifice flexibility or use external data sources. We also contribute to the ill-defined problem of evaluating interpretability.
【Keywords】: association rules; black-box; explanations; interpretability; latent factor models; recommendation systems; white-box
【Paper Link】 【Pages】:2070-2079
【Authors】: Leonardo Pellegrina ; Fabio Vandin
【Abstract】: The extraction of patterns displaying significant association with a class label is a key data mining task with wide application in many domains. We study a variant of the problem that requires to mine the top-k statistically significant patterns, thus providing tight control on the number of patterns reported in output. We develop TopKWY, the first algorithm to mine the top-k significant patterns while rigorously controlling the family-wise error rate of the output and provide theoretical evidence of its effectiveness. TopKWY crucially relies on a novel strategy to explore statistically significant patterns and on several key implementation choices, which may be of independent interest. Our extensive experimental evaluation shows that TopKWY enables the extraction of the most significant patterns from large datasets which could not be analyzed by the state-of-the-art. In addition, TopKWY improves over the state-of-the-art even for the extraction of all significant patterns.
【Keywords】: hypothesis testing; statistical pattern mining; top-k patterns
【Paper Link】 【Pages】:2080-2089
【Authors】: Ioakeim Perros ; Evangelos E. Papalexakis ; Haesun Park ; Richard W. Vuduc ; Xiaowei Yan ; Christopher deFilippi ; Walter F. Stewart ; Jimeng Sun
【Abstract】: This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. Instead, our approach extracts factor values from integer datasets as scores that are constrained to take values from a small integer set. These scores are easy to interpret: a score of zero indicates no feature contribution and higher scores indicate distinct levels of feature importance. At its core, SUSTain relies on: a) a problem partitioning into integer-constrained subproblems, so that they can be optimally solved in an efficient manner; and b) organizing the order of the subproblems' solution, to promote reuse of shared intermediate results. We propose two variants, SUSTain_M and SUSTain_T, to handle both matrix and tensor inputs, respectively. We evaluate SUSTain against several state-of-the-art baselines on both synthetic and real Electronic Health Record (EHR) datasets. Comparing to those baselines, SUSTain shows either significantly better fit or orders of magnitude speedups that achieve a comparable fit (up to 425× faster). We apply SUSTain to EHR datasets to extract patient phenotypes (i.e., clinically meaningful patient clusters). Furthermore, 87% of them were validated as clinically meaningful phenotypes related to heart failure by a cardiologist.
【Keywords】: matrix factorization; phenotyping; tensor factorization; unsupervised learning
【Paper Link】 【Pages】:2090-2099
【Authors】: Jean Pouget-Abadie ; Vahab S. Mirrokni ; David C. Parkes ; Edoardo M. Airoldi
【Abstract】: Cluster-based randomized experiments are popular designs for mitigating the bias of standard estimators when interference is present and classical causal inference and experimental design assumptions (such as SUTVA or ITR) do not hold. Without an exact knowledge of the interference structure, it can be challenging to understand which partitioning of the experimental units is optimal to minimize the estimation bias. In the paper, we introduce a monotonicity condition under which a novel two-stage experimental design allows us to determine which of two cluster-based designs yields the least biased estimator. We then consider the setting of online advertising auctions and show that reserve price experiments satisfy the monotonicity condition and the proposed framework and methodology apply. We validate our findings on an advertising auction dataset.
【Keywords】: causal inference; potential outcomes; violations of sutva
【Paper Link】 【Pages】:2100-2109
【Authors】: Abdulhakim Ali Qahtan ; Ahmed K. Elmagarmid ; Raul Castro Fernandez ; Mourad Ouzzani ; Nan Tang
【Abstract】: Missing values are common in real-world data and may seriously affect data analytics such as simple statistics and hypothesis testing. Generally speaking, there are two types of missing values: explicitly missing values (i.e. NULL values), and implicitly missing values (a.k.a. disguised missing values (DMVs)) such as "11111111" for a phone number and "Some college" for education. While detecting explicitly missing values is trivial, detecting DMVs is not; the essential challenge is the lack of standardization about how DMVs are generated. In this paper, we present FAHES, a robust system for detecting DMVs from two angles: DMVs as detectable outliers and as detectable inliers. For DMVs as outliers, we propose a syntactic outlier detection module for categorical data, and a density-based outlier detection module for numerical values. For DMVs as inliers, we propose a method that detects DMVs which follow either missing-completely-at-random or missing-at-random models. The robustness of FAHES is achieved through an ensemble technique that is inspired by outlier ensembles. Our extensive experiments using real-world data sets show that FAHES delivers better results than existing solutions.
【Keywords】: disguised missing value; numerical outliers; syntactic outliers; syntactic patterns
【Paper Link】 【Pages】:2110-2119
【Authors】: Jiezhong Qiu ; Jian Tang ; Hao Ma ; Yuxiao Dong ; Kuansan Wang ; Jie Tang
【Abstract】: Social and information networking activities such as on Facebook, Twitter, WeChat, and Weibo have become an indispensable part of our everyday life, where we can easily access friends' behaviors and are in turn influenced by them. Consequently, an effective social influence prediction for each user is critical for a variety of applications such as online recommendation and advertising. Conventional social influence prediction approaches typically design various hand-crafted rules to extract user- and network-specific features. However, their effectiveness heavily relies on the knowledge of domain experts. As a result, it is usually difficult to generalize them into different domains. Inspired by the recent success of deep neural networks in a wide range of computing applications, we design an end-to-end framework, DeepInf, to learn users' latent feature representation for predicting social influence. In general, DeepInf takes a user's local network as the input to a graph neural network for learning her latent social representation. We design strategies to incorporate both network structures and user-specific features into convolutional neural and attention networks. Extensive experiments on Open Academic Graph, Twitter, Weibo, and Digg, representing different types of social and information networks, demonstrate that the proposed end-to-end model, DeepInf, significantly outperforms traditional feature engineering-based approaches, suggesting the effectiveness of representation learning for social applications.
【Keywords】: graph attention; graph convolution; network embedding; representation learning; social influence; social networks
【Paper Link】 【Pages】:2120-2129
【Authors】: Reihaneh Rabbany ; David Bayani ; Artur Dubrawski
【Abstract】: How can we help an investigator to efficiently connect the dots and uncover the network of individuals involved in a criminal activity based on the evidence of their connections, such as visiting the same address, or transacting with the same bank account? We formulate this problem as Active Search of Connections, which finds target entities that share evidence of different types with a given lead, where their relevance to the case is queried interactively from the investigator. We present RedThread, an efficient solution for inferring related and relevant nodes while incorporating the user's feedback to guide the inference. Our experiments focus on case building for combating human trafficking, where the investigator follows leads to expose organized activities, i.e. different escort advertisements that are connected and possibly orchestrated. RedThread is a local algorithm and enables online case building when mining millions of ads posted in one of the largest classified advertising websites. The results of RedThread are interpretable, as they explain how the results are connected to the initial lead. We experimentally show that RedThread learns the importance of the different types and different pieces of evidence, while the former could be transferred between cases.
【Keywords】: active learning; graph construction; link inference
【Paper Link】 【Pages】:2130-2139
【Authors】: Matteo Riondato ; Fabio Vandin
【Abstract】: We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which results in small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset.
【Keywords】: pattern mining; statistical learning theory
【Paper Link】 【Pages】:2140-2149
【Authors】: Mrinmaya Sachan ; Eric P. Xing
【Abstract】: This paper introduces Parsing to Programs, a framework that combines ideas from parsing and probabilistic programming for situated question answering. As a case study, we build a system that solves pre-university level Newtonian physics questions. Our approach represents domain knowledge of Newtonian physics as programs. When presented with a novel question, the system learns a formal representation of the question by combining interpretations from the question text and any associated diagram. Finally, the system uses this formal representation to solve the questions using the domain knowledge. We collect a new dataset of Newtonian physics questions from a number of textbooks and use it to train our system. The system achieves near human performance on held-out textbook questions and section 1 of AP Physics C mechanics - both on practice questions as well as on freely available actual exams held in 1998 and 2012.
【Keywords】: diagram parsing; expert systems; question answering; semantic parsing
【Paper Link】 【Pages】:2150-2159
【Authors】: Seyed-Vahid Sanei-Mehri ; Ahmet Erdem Sariyüce ; Srikanta Tirthapura
【Abstract】: We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.
【Keywords】: bipartite networks; butterfly; graph algorithms; motif counting; randomized algorithms
【Paper Link】 【Pages】:2160-2169
【Authors】: Seiya Satoh ; Yoshinobu Takahashi ; Hiroshi Yamakawa
【Abstract】: Equivalence structure (ES) extraction can allow for finding correspondence relations between different sequential datasets. A K -dimensional ES is a set of K -tuples to specify K -dimensional sequences that are considered equivalent. Whether or not two K -dimensional sequences are equivalent is decided based on comparisons of all of their subsequences. ES extraction can be used for preprocessing for transfer learning or imitation learning, as well as an analysis of multidimensional sequences. A recently proposed method called incremental search (IS) was much faster than brute-force search. However, IS can still take a long time to obtain ESs, because ESs obtained by IS can be subsets of other ESs and such subsets must be removed in the process. In this paper, we propose a new fast method called pairwise incremental search (PIS). In the process of PIS, the aforementioned problem about subsets of ESs does not exist, because the elements of ESs are searched pairwise. As shown by results of two experiments we conducted, PIS was 48 times faster than IS in an experiment using synthetic datasets and 171 times faster in an experiment using motion capture datasets.
【Keywords】: equivalence structure; imitation learning; transfer learning
【Paper Link】 【Pages】:2170-2179
【Authors】: Ying Shan ; Jian Jiao ; Jie Zhu ; J. C. Mao
【Abstract】: Rapid advances in GPU hardware and multiple areas of Deep Learning open up a new opportunity for billion-scale information retrieval with exhaustive search. Building on top of the powerful concept of semantic learning, this paper proposes a Recurrent Binary Embedding (RBE) model that learns compact representations for real-time retrieval. The model has the unique ability to refine a base binary vector by progressively adding binary residual vectors to meet the desired accuracy. The refined vector enables efficient implementation of exhaustive similarity computation with bit-wise operations, followed by a near-lossless k-NN selection algorithm, also proposed in this paper. The proposed algorithms are integrated into an end-to-end multi-GPU system that retrieves thousands of top items from over a billion candidates in real-time. The RBE model and the retrieval system were evaluated with data from a major paid search engine. When measured against the state-of-the-art model for binary representation and the full precision model for semantic embedding, RBE significantly outperformed the former, and filled in over 80% of the AUC gap in-between. Experiments comparing with our production retrieval system also demonstrated superior performance. While the primary focus of this paper is to build RBE based on a particular class of semantic models, generalizing to other types is straightforward, as exemplified by two different models at the end of the paper.
【Keywords】: binary network; cdssm; cntk; deep learning; deep neural network (dnn); gpu; information retrieval; k-nn; semantic embedding; sponsored search
【Paper Link】 【Pages】:2180-2189
【Authors】: Jiaming Shen ; Zeqiu Wu ; Dongming Lei ; Chao Zhang ; Xiang Ren ; Michelle T. Vanni ; Brian M. Sadler ; Jiawei Han
【Abstract】: Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the is-a relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus, and allow users to input a seed taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newly-expanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies.
【Keywords】: hierarchical tree expansion; set expansion; taxonomy construction; weakly-supervised relation extraction
【Paper Link】 【Pages】:2190-2199
【Authors】: Yu Shi ; Qi Zhu ; Fang Guo ; Chao Zhang ; Jiawei Han
【Abstract】: Heterogeneous information networks (HINs) are ubiquitous in real-world applications. In the meantime, network embedding has emerged as a convenient tool to mine and learn from networked data. As a result, it is of interest to develop HIN embedding methods. However, the heterogeneity in HINs introduces not only rich information but also potentially incompatible semantics, which poses special challenges to embedding learning in HINs. With the intention to preserve the rich yet potentially incompatible information in HIN embedding, we propose to study the problem of comprehensive transcription of heterogeneous information networks. The comprehensive transcription of HINs also provides an easy-to-use approach to unleash the power of HINs, since it requires no additional supervision, expertise, or feature engineering. To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics. To corroborate the efficacy of HEER, we conducted experiments on two large-scale real-words datasets with an edge reconstruction task and multiple case studies. Experiment results demonstrate the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneous metrics. The code and data are available at https://github.com/GentleZhu/HEER.
【Keywords】: graph mining; heterogeneous information networks; network embedding; representation learning
【Paper Link】 【Pages】:2200-2209
【Authors】: Md Amran Siddiqui ; Alan Fern ; Thomas G. Dietterich ; Ryan Wright ; Alec Theriault ; David W. Archer
【Abstract】: Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. This can be exceedingly difficult and time consuming when most high-ranking anomalies are false positives and not interesting from an application perspective. In this paper, we study how to reduce the analyst's effort by incorporating their feedback about whether the anomalies they investigate are of interest or not. In particular, the feedback will be used to adjust the anomaly ranking after every analyst interaction, ideally moving anomalies of interest closer to the top. Our main contribution is to formulate this problem within the framework of online convex optimization, which yields an efficient and extremely simple approach to incorporating feedback compared to the prior state-of-the-art. We instantiate this approach for the powerful class of tree-based anomaly detectors and conduct experiments on a range of benchmark datasets. The results demonstrate the utility of incorporating feedback and advantages of our approach over the state-of-the-art. In addition, we present results on a significant cybersecurity application where the goal is to detect red-team attacks in real system audit data. We show that our approach for incorporating feedback is able to significantly reduce the time required to identify malicious system entities across multiple attacks on multiple operating systems.
【Keywords】: anomaly detection; anomaly detection feedback; anomaly detection on security; feedback in linear model; online convex optimization
【Paper Link】 【Pages】:2210-2218
【Authors】: Alban Siffer ; Pierre-Alain Fouque ; Alexandre Termier ; Christine Largouët
【Abstract】: Understanding data distributions is one of the most fundamental research topic in data analysis. The literature provides a great deal of powerful statistical learning algorithms to gain knowledge on the underlying distribution given multivariate observations. We are likely to find out a dependence between features, the appearance of clusters or the presence of outliers. Before such deep investigations, we propose the folding test of unimodality. As a simple statistical description, it allows to detect whether data are gathered or not (unimodal or multimodal). To the best of our knowledge, this is the first multivariate and purely statistical unimodality test. It makes no distribution assumption and relies only on a straightforward p-value. Through real world data experiments, we show its relevance and how it could be useful for clustering.
【Keywords】: multivariate statistics; unimodality test
【Paper Link】 【Pages】:2219-2228
【Authors】: Ashudeep Singh ; Thorsten Joachims
【Abstract】: Rankings are ubiquitous in the online world today. As we have transitioned from finding books in libraries to ranking products, jobs, job applicants, opinions and potential romantic partners, there is a substantial precedent that ranking systems have a responsibility not only to their users but also to the items being ranked. To address these often conflicting responsibilities, we propose a conceptual and computational framework that allows the formulation of fairness constraints on rankings in terms of exposure allocation. As part of this framework, we develop efficient algorithms for finding rankings that maximize the utility for the user while provably satisfying a specifiable notion of fairness. Since fairness goals can be application specific, we show how a broad range of fairness constraints can be implemented using our framework, including forms of demographic parity, disparate treatment, and disparate impact constraints. We illustrate the effect of these constraints by providing empirical results on two ranking problems.
【Keywords】: algorithmic bias; equal opportunity; fairness; fairness in rankings; position bias
【Paper Link】 【Pages】:2229-2238
【Authors】: Dongjin Song ; Ning Xia ; Wei Cheng ; Haifeng Chen ; Dacheng Tao
【Abstract】: Multivariate time series data are becoming increasingly common in numerous real world applications, e.g., power plant monitoring, health care, wearable devices, automobile, etc. As a result, multivariate time series retrieval, i.e., given the current multivariate time series segment, how to obtain its relevant time series segments in the historical data (or in the database), attracts significant amount of interest in many fields. Building such a system, however, is challenging since it requires a compact representation of the raw time series which can explicitly encode the temporal dynamics as well as the correlations (interactions) between different pairs of time series (sensors). Furthermore, it requires query efficiency and expects a returned ranking list with high precision on the top. Despite the fact that various approaches have been developed, few of them can jointly resolve these two challenges. To cope with this issue, in this paper we propose a Deep r-th root of Rank Supervised Joint Binary Embedding (Deep r-RSJBE) to perform multivariate time series retrieval. Given a raw multivariate time series segment, we employ Long Short-Term Memory (LSTM) units to encode the temporal dynamics and utilize Convolutional Neural Networks (CNNs) to encode the correlations (interactions) between different pairs of time series (sensors). Subsequently, a joint binary embedding is pursued to incorporate both the temporal dynamics and the correlations. Finally, we develop a novel r-th root ranking loss to optimize the precision at the top of a Hamming distance ranking list. Thoroughly empirical studies based upon three publicly available time series datasets demonstrate the effectiveness and the efficiency of Deep r-RSJBE.
【Keywords】: deep learning; multivariate time series retrieval; r-th root ranking loss; supervised binary embedding
【Paper Link】 【Pages】:2239-2248
【Authors】: Till Speicher ; Hoda Heidari ; Nina Grgic-Hlaca ; Krishna P. Gummadi ; Adish Singla ; Adrian Weller ; Muhammad Bilal Zafar
【Abstract】: Discrimination via algorithmic decision making has received considerable attention. Prior work largely focuses on defining conditions for fairness, but does not define satisfactory measures of algorithmic unfairness. In this paper, we focus on the following question: Given two unfair algorithms, how should we determine which of the two is more unfair? Our core idea is to use existing inequality indices from economics to measure how unequally the outcomes of an algorithm benefit different individuals or groups in a population. Our work offers a justified and general framework to compare and contrast the (un)fairness of algorithmic predictors. This unifying approach enables us to quantify unfairness both at the individual and the group level. Further, our work reveals overlooked tradeoffs between different fairness notions: using our proposed measures, the overall individual-level unfairness of an algorithm can be decomposed into a between-group and a within-group component. Earlier methods are typically designed to tackle only between-group un- fairness, which may be justified for legal or other reasons. However, we demonstrate that minimizing exclusively the between-group component may, in fact, increase the within-group, and hence the overall unfairness. We characterize and illustrate the tradeoffs between our measures of (un)fairness and the prediction accuracy.
【Keywords】: algorithmic decision making; fairness in machine learning; fairness measures; generalized entropy; group fairness; individual fairness; inequality indices; subgroup decomposability
【Paper Link】 【Pages】:2249-2258
【Authors】: Lichao Sun ; Weiran Huang ; Philip S. Yu ; Wei Chen
【Abstract】: In this paper, we study the Multi-Round Influence Maximization (MRIM) problem, where influence propagates in multiple rounds independently from possibly different seed sets, and the goal is to select seeds for each round to maximize the expected number of nodes that are activated in at least one round. MRIM problem models the viral marketing scenarios in which advertisers conduct multiple rounds of viral marketing to promote one product. We consider two different settings: 1) the non-adaptive MRIM, where the advertiser needs to determine the seed sets for all rounds at the very beginning, and 2) the adaptive MRIM, where the advertiser can select seed sets adaptively based on the propagation results in the previous rounds. For the non-adaptive setting, we design two algorithms that exhibit an interesting tradeoff between efficiency and effectiveness: a cross-round greedy algorithm that selects seeds at a global level and achieves $1/2 - \varepsilon$ approximation ratio, and a within-round greedy algorithm that selects seeds round by round and achieves $1-e^-(1-1/e) -\varepsilon \approx 0.46 - \varepsilon$ approximation ratio but saves running time by a factor related to the number of rounds. For the adaptive setting, we design an adaptive algorithm that guarantees $1-e^-(1-1/e) -\varepsilon$ approximation to the adaptive optimal solution. In all cases, we further design scalable algorithms based on the reverse influence sampling approach and achieve near-linear running time. We conduct experiments on several real-world networks and demonstrate that our algorithms are effective for the MRIM task.
【Keywords】: greedy algorithm; influence maximization; triggering model
【Paper Link】 【Pages】:2259-2268
【Authors】: Mengying Sun ; Inci M. Baytas ; Liang Zhan ; Zhangyang Wang ; Jiayu Zhou
【Abstract】: Over the past decade a wide spectrum of machine learning models have been developed to model the neurodegenerative diseases, associating biomarkers, especially non-intrusive neuroimaging markers, with key clinical scores measuring the cognitive status of patients. Multi-task learning (MTL) has been commonly utilized by these studies to address high dimensionality and small cohort size challenges. However, most existing MTL approaches are based on linear models and suffer from two major limitations: 1) they cannot explicitly consider upper/lower bounds in these clinical scores; 2) they lack the capability to capture complicated non-linear interactions among the variables. In this paper, we propose Subspace Network, an efficient deep modeling approach for non-linear multi-task censored regression. Each layer of the subspace network performs a multi-task censored regression to improve upon the predictions from the last layer via sketching a low-dimensional subspace to perform knowledge transfer among learning tasks. Under mild assumptions, for each layer the parametric subspace can be recovered using only one pass of training data. Empirical results demonstrate that the proposed subspace network quickly picks up the correct parameter subspaces, and outperforms state-of-the-arts in predicting neurodegenerative clinical scores using information in brain imaging.
【Keywords】: censoring; subspace; deep network; multi-task learning
【Paper Link】 【Pages】:2269-2278
【Authors】: Ying Sun ; Hengshu Zhu ; Fuzhen Zhuang ; Jingjing Gu ; Qing He
【Abstract】: Urban Region-of-Interest (ROI) refers to the integrated urban areas with specific functionalities that attract people's attentions and activities, such as the recreational business districts, transportation hubs, and city landmarks. Indeed, at the macro level, ROI is one of the representatives for agglomeration economies, and plays an important role in urban business planning. At the micro level, ROI provides a useful venue for understanding the urban lives, demands and mobilities of people. However, due to the vague and diversified nature of ROI, it still lacks of quantitative ways to investigate ROIs in a holistic manner. To this end, in this paper we propose a systematic study on ROI analysis through mining the large-scale online map query logs, which provides a new data-driven research paradigm for ROI detection and profiling. Specifically, we first divide the urban area into small region grids, and calculate their PageRank value as visiting popularity based on the transition information extracted from map queries. Then, we propose a density-based clustering method for merging neighboring region grids with high popularity into integrated ROIs. After that, to further explore the profiles of different ROIs, we develop a spatial-temporal latent factor model URPTM (Urban Roi Profiling Topic Model) to identify the latent travel patterns and Point-of-Interest (POI) demands of ROI visitors. Finally, we implement extensive experiments to empirically evaluate our approaches based on the large-scale real-world data collected from Beijing. Indeed, by visualizing the results obtained from URPTM, we can successfully obtain many meaningful travel patterns and interesting discoveries on urban lives.
【Keywords】:
【Paper Link】 【Pages】:2279-2288
【Authors】: Charles A. Sutton ; Timothy Hobson ; James Geddes ; Rich Caruana
【Abstract】: Many analyses in data science are not one-off projects, but are repeated over multiple data samples, such as once per month, once per quarter, and so on. For example, if a data scientist performs an analysis in 2017 that saves a significant amount of money, then she will likely to be asked to perform the same analysis on data from 2018. But more data analyses means more effort spent in data wrangling. We introduce the data diff problem, which attempts to turn this problem into an opportunity. Comparing the repeated data samples against each other, inconsistencies may be indicative of underlying issues in data quality. By analogy to text \textttdiff, the data diff problem is to find a "patch", that is, transformation in a specified domain-specific language, that transforms the data samples so that they are identically distributed. We present a prototype tool for data diff that formalizes the problem as a bipartite matching problem, calibrating its parameters using a bootstrap procedure. The tool is evaluated quantitatively and through a case study on an open government data set.
【Keywords】: automl; data mining; data wrangling; sequential analysis
【Paper Link】 【Pages】:2289-2298
【Authors】: Jiaxi Tang ; Ke Wang
【Abstract】: We propose a novel way to train ranking models, such as recommender systems, that are both effective and efficient. Knowledge distillation (KD) was shown to be successful in image recognition to achieve both effectiveness and efficiency. We propose a KD technique for learning to rank problems, called ranking distillation (RD). Specifically, we train a smaller student model to learn to rank documents/items from both the training data and the supervision of a larger teacher model. The student model achieves a similar ranking performance to that of the large teacher model, but its smaller model size makes the online inference more efficient. RD is flexible because it is orthogonal to the choices of ranking models for the teacher and student. We address the challenges of RD for ranking problems. The experiments on public data sets and state-of-the-art recommendation models showed that RD achieves its design purposes: the student model learnt with RD has less than an half size of the teacher model while achieving a ranking performance similar tothe teacher model and much better than the student model learnt without RD.
【Keywords】: knowledge transfer; learning to rank; model compression; recommender system
【Paper Link】 【Pages】:2299-2308
【Authors】: Yi Tay ; Luu Anh Tuan ; Siu Cheung Hui
【Abstract】: Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by 9%. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.
【Keywords】: attention mechanism; co-attention; conversation modeling; deep learning; information retrieval; intra-attention; learning to rank; neural networks; neural ranking models; qa; question answering
【Paper Link】 【Pages】:2309-2318
【Authors】: Yi Tay ; Anh Tuan Luu ; Siu Cheung Hui
【Abstract】: Many recent state-of-the-art recommender systems such as D-ATT, TransNet and DeepCoNN exploit reviews for representation learning. This paper proposes a new neural architecture for recommendation with reviews. Our model operates on a multi-hierarchical paradigm and is based on the intuition that not all reviews are created equal, i.e., only a selected few are important. The importance, however, should be dynamically inferred depending on the current target. To this end, we propose a review-by-review pointer-based learning scheme that extracts important reviews from user and item reviews and subsequently matches them in a word-by-word fashion. This enables not only the most informative reviews to be utilized for prediction but also a deeper word-level interaction. Our pointer-based method operates with a gumbel-softmax based pointer mechanism that enables the incorporation of discrete vectors within differentiable neural architectures. Our pointer mechanism is co-attentive in nature, learning pointers which are co-dependent on user-item relationships. Finally, we propose a multi-pointer learning scheme that learns to combine multiple views of user-item interactions. We demonstrate the effectiveness of our proposed model via extensive experiments on 24 benchmark datasets from Amazon and Yelp. Empirical results show that our approach significantly outperforms existing state-of-the-art models, with up to 19% and 71% relative improvement when compared to TransNet and DeepCoNN respectively. We study the behavior of our multi-pointer learning mechanism, shedding light on 'evidence aggregation' patterns in review-based recommender systems.
【Keywords】: attention mechanism; collaborative filtering; deep learning; information retrieval; natural language processing; pointer networks; recommendation; review rating prediction; review-based recommender systems
【Paper Link】 【Pages】:2319-2328
【Authors】: Daniel Ting
【Abstract】: The Count-Min sketch is an important and well-studied data summarization method. It can estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the Count-Min sketch depends on characteristics of the underlying data. This has led to a number of count estimation procedures which work well in one scenario but perform poorly in others. A practitioner is faced with two basic, unanswered questions. Given an estimate, what is its error? Which estimation procedure should be chosen when the data is unknown? We provide answers to these questions. We derive new count estimators, including a provably optimal estimator, which best or match previous estimators in all scenarios. We also provide practical, tight error bounds at query time for all estimators and methods to tune sketch parameters using these bounds. The key observation is that the full distribution of errors in each counter can be empirically estimated from the sketch itself. By first estimating this distribution, count estimation becomes a statistical estimation and inference problem with a known error distribution. This provides both a principled way to derive new and optimal estimators as well as a way to study the error and properties of existing estimators.
【Keywords】: countmin; data sketching; nonparametric estimation
【Paper Link】 【Pages】:2329-2337
【Authors】: Kai Ming Ting ; Yue Zhu ; Zhi-Hua Zhou
【Abstract】: This paper investigates data dependent kernels that are derived directly from data. This has been an outstanding issue for about two decades which hampered the development of kernel-based methods. We introduce Isolation Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning to be a classifier. In contrast, existing data dependent kernels rely heavily on class information and explicit learning to produce a classifier. We show that Isolation Kernel approximates well to a data independent kernel function called Laplacian kernel under uniform density distribution. With this revelation, Isolation Kernel can be viewed as a data dependent kernel that adapts a data independent kernel to the structure of a dataset. We also provide a reason why the proposed new data dependent kernel enables SVM (which employs a kernel through other means) to improve its predictive accuracy. The key differences between Random Forest kernel and Isolation Kernel are discussed to examine the reasons why the latter is a more successful tree-based kernel.
【Keywords】: data dependent kernel; isolation forest; random forest; svm classifiers
【Paper Link】 【Pages】:2338-2346
【Authors】: Federico Tomasi ; Veronica Tozzo ; Saverio Salzo ; Alessandro Verri
【Abstract】: In many applications of finance, biology and sociology, complex systems involve entities interacting with each other. These processes have the peculiarity of evolving over time and of comprising latent factors, which influence the system without being explicitly measured. In this work we present latent variable time-varying graphical lasso (LTGL), a method for multivariate time-series graphical modelling that considers the influence of hidden or unmeasurable factors. The estimation of the contribution of the latent factors is embedded in the model which produces both sparse and low-rank components for each time point. In particular, the first component represents the connectivity structure of observable variables of the system, while the second represents the influence of hidden factors, assumed to be few with respect to the observed variables. Our model includes temporal consistency on both components, providing an accurate evolutionary pattern of the system. We derive a tractable optimisation algorithm based on alternating direction method of multipliers, and develop a scalable and efficient implementation which exploits proximity operators in closed form. LTGL is extensively validated on synthetic data, achieving optimal performance in terms of accuracy, structure learning and scalability with respect to ground truth and state-of-the-art methods for graphical inference. We conclude with the application of LTGL to real case studies, from biology and finance, to illustrate how our method can be successfully employed to gain insights on multivariate time-series data.
【Keywords】: convex optimization; graphical models; latent variables; network inference; time-series
【Paper Link】 【Pages】:2347-2356
【Authors】: Anton Tsitsulin ; Davide Mottin ; Panagiotis Karras ; Alexander M. Bronstein ; Emmanuel Müller
【Abstract】: Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representation-based methods, which are all inefficient and impractical for large graph collections. In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD): the first, to our knowledge, permutation- and size-invariant, scale-adaptive, and efficiently computable graph representation method that allows for straightforward comparisons of large graphs. NetLSD extracts a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel; thus, it \em hears the shape of a graph. Our evaluation on a variety of real-world graphs demonstrates that it outperforms previous works in both expressiveness and efficiency.
【Keywords】: data mining; graph comparison; graph geometry; graph representation; graph signature; graph similarity; graph theory; graph topology; heat kernel; heat kernel analysis
【Paper Link】 【Pages】:2357-2366
【Authors】: Ke Tu ; Peng Cui ; Xiao Wang ; Philip S. Yu ; Wenwu Zhu
【Abstract】: Network embedding aims to preserve vertex similarity in an embedding space. Existing approaches usually define the similarity by direct links or common neighborhoods between nodes, i.e. structural equivalence. However, vertexes which reside in different parts of the network may have similar roles or positions, i.e. regular equivalence, which is largely ignored by the literature of network embedding. Regular equivalence is defined in a recursive way that two regularly equivalent vertexes have network neighbors which are also regularly equivalent. Accordingly, we propose a new approach named Deep Recursive Network Embedding (DRNE) to learn network embeddings with regular equivalence. More specifically, we propose a layer normalized LSTM to represent each node by aggregating the representations of their neighborhoods in a recursive way. We theoretically prove that some popular and typical centrality measures which are consistent with regular equivalence are optimal solutions of our model. This is also demonstrated by empirical results that the learned node representations can well predict the indexes of regular equivalence and related centrality scores. Furthermore, the learned node representations can be directly used for end applications like structural role classification in networks, and the experimental results show that our method can consistently outperform centrality-based methods and other state-of-the-art network embedding methods.
【Keywords】: network embedding; recurrent neural network; regular equivalence
【Paper Link】 【Pages】:2367-2376
【Authors】: Jan N. van Rijn ; Frank Hutter
【Abstract】: With the advent of automated machine learning, automated hyperparameter optimization methods are by now routinely used in data mining. However, this progress is not yet matched by equal progress on automatic analyses that yield information beyond performance-optimizing hyperparameter settings. In this work, we aim to answer the following two questions: Given an algorithm, what are generally its most important hyperparameters, and what are typically good values for these? We present methodology and a framework to answer these questions based on meta-learning across many datasets. We apply this methodology using the experimental meta-data available on OpenML to determine the most important hyperparameters of support vector machines, random forests and Adaboost, and to infer priors for all their hyperparameters. The results, obtained fully automatically, provide a quantitative basis to focus efforts in both manual algorithm design and in automated hyperparameter optimization. The conducted experiments confirm that the hyperparameters selected by the proposed method are indeed the most important ones and that the obtained priors also lead to statistically significant improvements in hyperparameter optimization.
【Keywords】: hyperparameter importance; hyperparameter optimization; meta-learning
【Paper Link】 【Pages】:2377-2386
【Authors】: Thomas Vandal ; Evan Kodra ; Jennifer G. Dy ; Sangram Ganguly ; Ramakrishna R. Nemani ; Auroop R. Ganguly
【Abstract】: Deep Learning (DL) methods have been transforming computer vision with innovative adaptations to other domains including climate change. For DL to pervade Science and Engineering (S&EE) applications where risk management is a core component, well-characterized uncertainty estimates must accompany predictions. However, S&E observations and model-simulations often follow heavily skewed distributions and are not well modeled with DL approaches, since they usually optimize a Gaussian, or Euclidean, likelihood loss. Recent developments in Bayesian Deep Learning (BDL), which attempts to capture uncertainties from noisy observations, aleatoric, and from unknown model parameters, epistemic, provide us a foundation. Here we present a discrete-continuous BDL model with Gaussian and lognormal likelihoods for uncertainty quantification (UQ). We demonstrate the approach by developing UQ estimates on "DeepSD'', a super-resolution based DL model for Statistical Downscaling (SD) in climate applied to precipitation, which follows an extremely skewed distribution. We find that the discrete-continuous models outperform a basic Gaussian distribution in terms of predictive accuracy and uncertainty calibration. Furthermore, we find that the lognormal distribution, which can handle skewed distributions, produces quality uncertainty estimates at the extremes. Such results may be important across S&E, as well as other domains such as finance and economics, where extremes are often of significant interest. Furthermore, to our knowledge, this is the first UQ model in SD where both aleatoric and epistemic uncertainties are characterized.
【Keywords】: bayesian deep learning; climate downscaling; precipitation estimation; super-resolution; uncertainty quantification
【Paper Link】 【Pages】:2387-2396
【Authors】: Chi Wang ; Kaushik Chakrabarti
【Abstract】: We study how to efficiently solve a primitive data exploration problem: Given two ad-hoc predicates which define two subsets of a relational table, find the top-K attributes whose distributions in the two subsets deviate most from each other. The deviation is measured by $\ell1$ or $\ell2$ distance. The exact approach is to query the full table to calculate the deviation for each attribute and then sort them. It is too expensive for large tables. Researchers have proposed heuristic sampling solutions to avoid accessing the entire table for all attributes. However, these solutions have no theoretical guarantee of correctness and their speedup over the exact approach is limited. In this paper, we develop an adaptive querying solution with probabilistic guarantee of correctness and near-optimal sample complexity. We perform experiments in both synthetic and real-world datasets. Compared to the exact approach implemented with a commercial DBMS, previous sampling solutions achieve up to 2× speedup with erroneous answers. Our solution can produce 25× speedup with near-zero error in the answer.
【Keywords】: exploratory analysis; multi-dimensional data; sampling
【Paper Link】 【Pages】:2397-2406
【Authors】: Daheng Wang ; Meng Jiang ; Qingkai Zeng ; Zachary Eberhart ; Nitesh V. Chawla
【Abstract】: Contextual behavior modeling uses data from multiple contexts to discover patterns for predictive analysis. However, existing behavior prediction models often face difficulties when scaling for massive datasets. In this work, we formulate a behavior as a set of context items of different types (such as decision makers, operators, goals and resources), consider an observable itemset as a behavior success, and propose a novel scalable method, "multi-type itemset embedding", to learn the context items' representations preserving the success structures. Unlike most of existing embedding methods that learn pair-wise proximity from connection between a behavior and one of its items, our method learns item embeddings collectively from interaction among all multi-type items of a behavior, based on which we develop a novel framework, LearnSuc, for (1) predicting the success rate of any set of items and (2) finding complementary items which maximize the probability of success when incorporated into an itemset. Extensive experiments demonstrate both effectiveness and efficency of the proposed framework.
【Keywords】: behavior data embedding; behavior modeling; itemset embedding; recommender systems; representation learning
【Paper Link】 【Pages】:2407-2416
【Authors】: Ji Wang ; Jianguo Zhang ; Weidong Bao ; Xiaomin Zhu ; Bokai Cao ; Philip S. Yu
【Abstract】: The increasing demand for on-device deep learning services calls for a highly efficient manner to deploy deep neural networks (DNNs) on mobile devices with limited capacity. The cloud-based solution is a promising approach to enabling deep learning applications on mobile devices where the large portions of a DNN are offloaded to the cloud. However, revealing data to the cloud leads to potential privacy risk. To benefit from the cloud data center without the privacy risk, we design, evaluate, and implement a cloud-based framework ARDEN which partitions the DNN across mobile devices and cloud data centers. A simple data transformation is performed on the mobile device, while the resource-hungry training and the complex inference rely on the cloud data center. To protect the sensitive information, a lightweight privacy-preserving mechanism consisting of arbitrary data nullification and random noise addition is introduced, which provides strong privacy guarantee. A rigorous privacy budget analysis is given. Nonetheless, the private perturbation to the original data inevitably has a negative impact on the performance of further inference on the cloud side. To mitigate this influence, we propose a noisy training method to enhance the cloud-side network robustness to perturbed data. Through the sophisticated design, ARDEN can not only preserve privacy but also improve the inference performance. To validate the proposed ARDEN, a series of experiments based on three image datasets and a real mobile application are conducted. The experimental results demonstrate the effectiveness of ARDEN. Finally, we implement ARDEN on a demo system to verify its practicality.
【Keywords】: deep learning; differential privacy; mobile cloud
【Paper Link】 【Pages】:2417-2426
【Authors】: Jiaxuan Wang ; Jeeheh Oh ; Haozhu Wang ; Jenna Wiens
【Abstract】: In many settings, it is important that a model be capable of providing reasons for its predictions (ıe, the model must be interpretable). However, the model's reasoning may not conform with well-established knowledge. In such cases, while interpretable, the model lacks credibility. In this work, we formally define credibility in the linear setting and focus on techniques for learning models that are both accurate and credible. In particular, we propose a regularization penalty, expert yielded estimates (EYE), that incorporates expert knowledge about well-known relationships among covariates and the outcome of interest. We give both theoretical and empirical results comparing our proposed method to several other regularization techniques. Across a range of settings, experiments on both synthetic and real data show that models learned using the EYE penalty are significantly more credible than those learned using other penalties. Applied to two large-scale patient risk stratification task, our proposed technique results in a model whose top features overlap significantly with known clinical risk factors, while still achieving good predictive performance.
【Keywords】: model interpretability; regularization
【Paper Link】 【Pages】:2427-2436
【Authors】: Jing Wang ; Min-Ling Zhang
【Abstract】: Partial label (PL) learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one is valid. It is well-known that the problem of class-imbalance stands as a major factor affecting the generalization performance of multi-class classifier, and this problem becomes more pronounced as the ground-truth label of each PL training example is not directly accessible to the learning approach. To mitigate the negative influence of class-imbalance to partial label learning, a novel class-imbalance aware approach named CIMAP is proposed by adapting over-sampling techniques for handling PL training examples. Firstly, for each PL training example, CIMAP disambiguates its candidate label set by estimating the confidence of each class label being ground-truth one via weighted k-nearest neighbor aggregation. After that, the original PL training set is replenished for model induction by over-sampling existing PL training examples via manipulation of the disambiguation results. Extensive experiments on artificial as well as real-world PL data sets show that CIMAP serves as an effective data-level approach to mitigate the class-imbalance problem for partial label learning.
【Keywords】: class-imbalance; over-sampling; partial label learning
【Paper Link】 【Pages】:2437-2446
【Authors】: Jingyuan Wang ; Ze Wang ; Jianfeng Li ; Junjie Wu
【Abstract】: Recent years have witnessed the unprecedented rising of time series from almost all kindes of academic and industrial fields. Various types of deep neural network models have been introduced to time series analysis, but the important frequency information is yet lack of effective modeling. In light of this, in this paper we propose a wavelet-based neural network structure called multilevel Wavelet Decomposition Network (mWDN) for building frequency-aware deep learning models for time series analysis. mWDN preserves the advantage of multilevel discrete wavelet decomposition in frequency learning while enables the fine-tuning of all parameters under a deep neural network framework. Based on mWDN, we further propose two deep learning models called Residual Classification Flow (RCF) and multi-frequecy Long Short-Term Memory (mLSTM) for time series classification and forecasting, respectively. The two models take all or partial mWDN decomposed sub-series in different frequencies as input, and resort to the back propagation algorithm to learn all the parameters globally, which enables seamless embedding of wavelet-based frequency analysis into deep learning frameworks. Extensive experiments on 40 UCR datasets and a real-world user volume dataset demonstrate the excellent performance of our time series models based on mWDN. In particular, we propose an importance analysis method to mWDN based models, which successfully identifies those time-series elements and mWDN layers that are crucially important to time series analysis. This indeed indicates the interpretability advantage of mWDN, and can be viewed as an indepth exploration to interpretable deep learning.
【Keywords】: epidemic propagation; intracity epidemic control and prevention; metapopulation; network inference
【Paper Link】 【Pages】:2447-2456
【Authors】: Lu Wang ; Wei Zhang ; Xiaofeng He ; Hongyuan Zha
【Abstract】: Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement learning (e.g. maximizing evaluation signal which indicates cumulative reward from survival rates). However, none of these studies have considered to combine the benefits of supervised learning and reinforcement learning. In this paper, we propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework. Specifically, SRL-RNN applies an off-policy actor-critic framework to handle complex relations among multiple medications, diseases and individual characteristics. The "actor'' in the framework is adjusted by both the indicator signal and evaluation signal to ensure effective prescription and low mortality. RNN is further utilized to solve the Partially-Observed Markov Decision Process (POMDP) problem due to lack of fully observed states in real world applications. Experiments on the publicly real-world dataset, i.e., MIMIC-3, illustrate that our model can reduce the estimated mortality, while providing promising accuracy in matching doctors' prescriptions.
【Keywords】: deep sequential recommendation; dynamic treatment regime; supervised reinforcement learning
【Paper Link】 【Pages】:2457-2466
【Authors】: Pengyang Wang ; Yanjie Fu ; Jiawei Zhang ; Pengfei Wang ; Yu Zheng ; Charu C. Aggarwal
【Abstract】: Driving is a complex activity that requires multi-level skilled operations (e.g., acceleration, braking, turning). Analyzing driving behavior can help us assess driver performances, improve traffic safety, and, ultimately, promote the development of intelligent and resilient transportation systems. While some efforts have been made for analyzing driving behavior, existing methods can be improved via representation learning by jointly exploring the peer and temporal dependencies of driving behavior. To that end, in this paper, we develop a Peer and Temporal-Aware Representation Learning based framework (PTARL) for driving behavior analysis with GPS trajectory data. Specifically, we first detect the driving operations and states of each driver from GPS traces. Then, we derive a sequence of multi-view driving state transition graphs from the driving state sequences, in order to characterize a driver's driving behavior that varies over time. In addition, we develop a peer and temporal-aware representation learning method to learn a sequence of time-varying yet relational vectorized representations from the driving state transition graphs. The proposed method can simultaneously model both the graph-graph peer dependency and the current-past temporal dependency in a unified optimization framework. Also, we provide effective solutions for the optimization problem. Moreover, we exploit the learned representations of driving behavior to score driving performances and detect dangerous regions. Finally, extensive experimental results with big trajectory data demonstrate the enhanced performance of the proposed method for driving behavior analysis.
【Keywords】: driving behavior analysis; representation learning; spatio-temporal graphs
【Paper Link】 【Pages】:2467-2475
【Authors】: Qinyong Wang ; Hongzhi Yin ; Zhiting Hu ; Defu Lian ; Hao Wang ; Zi Huang
【Abstract】: With the increasing popularity of various social media and E-commerce platforms, large volumes of user behaviour data (e.g., user transaction data, rating and review data) are being continually generated at unprecedented and ever-increasing scales. It is more realistic and practical to study recommender systems with inputs of streaming data. User-generated streaming data presents unique properties such as temporally ordered, continuous and high-velocity, which poses tremendous new challenges for the once very successful recommendation techniques. Although a few temporal or sequential recommender models have recently been developed based on recurrent neural models, most of them can only be applied to the session-based recommendation scenario, due to their short-term memories and the limited capability of capturing users' long-term stable interests. In this paper, we propose a streaming recommender model based on neural memory networks with external memories to capture and store both long-term stable interests and short-term dynamic interests in a unified way. An adaptive negative sampling framework based on Generative Adversarial Nets (GAN) is developed to optimize our proposed streaming recommender model, which effectively overcomes the limitations of classical negative sampling approaches and improves both effectiveness and efficiency of the model parameter inference. Extensive experiments have been conducted on two large-scale recommendation datasets, and the experimental results show the superiority of our proposed streaming recommender model in the streaming recommendation scenario.
【Keywords】: collaborative filtering; memory networks; streaming recommender systems
【Paper Link】 【Pages】:2476-2485
【Authors】: Yunhe Wang ; Chang Xu ; Jiayan Qiu ; Chao Xu ; Dacheng Tao
【Abstract】: Compressing convolutional neural networks (CNNs) is essential for transferring the success of CNNs to a wide variety of applications to mobile devices. In contrast to directly recognizing subtle weights or filters as redundant in a given CNN, this paper presents an evolutionary method to automatically eliminate redundant convolution filters. We represent each compressed network as a binary individual of specific fitness. Then, the population is upgraded at each evolutionary iteration using genetic operations. As a result, an extremely compact CNN is generated using the fittest individual, which has the original network structure and can be directly deployed in any off-the-shelf deep learning libraries. In this approach, either large or small convolution filters can be redundant, and filters in the compressed network are more distinct. In addition, since the number of filters in each convolutional layer is reduced, the number of filter channels and the size of feature maps are also decreased, naturally improving both the compression and speed-up ratios. Experiments on benchmark deep CNN models suggest the superiority of the proposed algorithm over the state-of-the-art compression methods, e.g. combined with the parameter refining approach, we can reduce the storage requirement and the floating-point multiplications of ResNet-50 by a factor of 14.64x and 5.19x, respectively, without affecting its accuracy.
【Keywords】: CNN acceleration; deep learning; evolutionary algorithm; network compression
【Paper Link】 【Pages】:2486-2495
【Authors】: Zhengyang Wang ; Shuiwang Ji
【Abstract】: Dilated convolutions, also known as atrous convolutions, have been widely explored in deep convolutional neural networks (DCNNs) for various tasks like semantic image segmentation, object detection, audio generation, video modeling, and machine translation. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance of DCNNs with dilated convolutions. In this work, we propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions. Unlike existing models, which explore solutions by focusing on a block of cascaded dilated convolutional layers, our methods address the gridding artifacts by smoothing the dilated convolution itself. By analyzing them in both the original operation and the decomposition views, we further point out that the two degridding approaches are intrinsically related and define separable and shared (SS) operations, which generalize the proposed methods. We evaluate our methods thoroughly on two datasets and visualize the smoothing effect through effective receptive field analysis. Experimental results show that our methods yield significant and consistent improvements on the performance of DCNNs with dilated convolutions, while adding negligible amounts of extra training parameters.
【Keywords】: atrous convolutions; deep learning; dilated convolutions; gridding artifacts
【Paper Link】 【Pages】:2496-2505
【Authors】: Hua Wei ; Guanjie Zheng ; Huaxiu Yao ; Zhenhui Li
【Abstract】: The intelligent traffic light control is critical for an efficient transportation system. While existing traffic lights are mostly operated by hand-crafted rules, an intelligent traffic light control system should be dynamically adjusted to real-time traffic. There is an emerging trend of using deep reinforcement learning technique for traffic light control and recent studies have shown promising results. However, existing studies have not yet tested the methods on the real-world traffic data and they only focus on studying the rewards without interpreting the policies. In this paper, we propose a more effective deep reinforcement learning model for traffic light control. We test our method on a large-scale real traffic dataset obtained from surveillance cameras. We also show some interesting case studies of policies learned from the real data.
【Keywords】: reinforcement learning; traffic light control
【Paper Link】 【Pages】:2506-2515
【Authors】: Lingfei Wu ; Pin-Yu Chen ; Ian En-Hsu Yen ; Fangli Xu ; Yinglong Xia ; Charu Aggarwal
【Abstract】: Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity graphs and computing subsequent eigendecomposition. Although a number of methods have been proposed to accelerate spectral clustering, most of them compromise considerable information loss in the original data for reducing computational bottlenecks. In this paper, we present a novel scalable spectral clustering method using Random Binning features (RB) to simultaneously accelerate both similarity graph construction and the eigendecomposition. Specifically, we implicitly approximate the graph similarity (kernel) matrix by the inner product of a large sparse feature matrix generated by RB. Then we introduce a state-of-the-art SVD solver to effectively compute eigenvectors of this large matrix for spectral clustering. Using these two building blocks, we reduce the computational cost from quadratic to linear in the number of data points while achieving similar accuracy. Our theoretical analysis shows that spectral clustering via RB converges faster to the exact spectral clustering than the standard Random Feature approximation. Extensive experiments on 8 benchmarks show that the proposed method either outperforms or matches the state-of-the-art methods in both accuracy and runtime. Moreover, our method exhibits linear scalability in both the number of data samples and the number of RB features.
【Keywords】: graph construction; large-scale graph; random binning features; spectral clustering
【Paper Link】 【Pages】:2516-2525
【Authors】: Weichang Wu ; Junchi Yan ; Xiaokang Yang ; Hongyuan Zha
【Abstract】: This paper presents a factorial marked temporal point process model and presents efficient learning methods. In conventional (multi-dimensional) marked temporal point process models, an event is often encoded by a single discrete variable (marker). We describe the factorial marked point processes whereby time-stamped event is factored into multiple markers. Accordingly the size of the infectivity matrix modeling the effect between pairwise markers is in exponential order regarding the number of discrete markers. We propose a decoupled learning method with two learning procedures: i) directly solving the model based on two techniques: Alternating Direction Method of Multipliers and Fast Iterative Shrinkage-Thresholding Algorithm; ii) involving a reformulation that transforms the original problem into a Logistic Regression model for more efficient learning. Moreover, a sparse group regularizer is added to identify the key profile features and event labels. Empirical results on real world datasets demonstrate the efficiency of our decoupled and reformulated method.
【Keywords】: alternating direction method of multipliers; decoupled learning; factorial temporal point process; fast iterative shrinkage-thresholding algorithm
【Paper Link】 【Pages】:2526-2535
【Authors】: Wush Chi-Hsuan Wu ; Mi-Yen Yeh ; Ming-Syan Chen
【Abstract】: We generalize the winning price model to incorporate the deep learning models with different distributions and propose an algorithm to learn from the historical bidding information, where the winning price are either observed or partially observed. We study if the successful deep learning models of the click-through rate can enhance the prediction of the winning price or not. We also study how different distributions of winning price can affect the learning results. Experiment results show that the deep learning models indeed boost the prediction quality when they are learned on the historical observed data. In addition, the deep learning models on the unobserved data are improved after learning from the censored data. The main advantage of the proposed generalized deep learning model is to provide more flexibility to model the winning price and improve the performance in consideration of the possibly various winning price distributions and various model structures in practice.
【Keywords】: deep learning; demand-side platform; display advertising; learning with partial labels; real-time bidding
【Paper Link】 【Pages】:2536-2544
【Authors】: Yongkai Wu ; Lu Zhang ; Xintao Wu
【Abstract】: Predictive models learned from historical data are widely used to help companies and organizations make decisions. However, they may digitally unfairly treat unwanted groups, raising concerns about fairness and discrimination. In this paper, we study the fairness-aware ranking problem which aims to discover discrimination in ranked datasets and reconstruct the fair ranking. Existing methods in fairness-aware ranking are mainly based on statistical parity that cannot measure the true discriminatory effect since discrimination is causal. On the other hand, existing methods in causal-based anti-discrimination learning focus on classification problems and cannot be directly applied to handle the ranked data. To address these limitations, we propose to map the rank position to a continuous score variable that represents the qualification of the candidates. Then, we build a causal graph that consists of both the discrete profile attributes and the continuous score. The path-specific effect technique is extended to the mixed-variable causal graph to identify both direct and indirect discrimination. The relationship between the path-specific effects for the ranked data and those for the binary decision is theoretically analyzed. Finally, algorithms for discovering and removing discrimination from a ranked dataset are developed. Experiments using the real-world dataset show the effectiveness of our approaches.
【Keywords】: causal graph; direct and indirect discrimination; discrimination-aware machine learning; fair ranking
【Paper Link】 【Pages】:2545-2554
【Authors】: Miao Xie ; Zhe Jiang ; Arpan Man Sainju
【Abstract】: Flood extent mapping plays a crucial role in disaster management and national water forecasting. Unfortunately, traditional classification methods are often hampered by the existence of noise, obstacles and heterogeneity in spectral features as well as implicit anisotropic spatial dependency across class labels. In this paper, we propose geographical hidden Markov tree, a probabilistic graphical model that generalizes the common hidden Markov model from a one dimensional sequence to a two dimensional map. Partial order class dependency is incorporated in the hidden class layer with a reverse tree structure. We also investigate computational algorithms for reverse tree construction, model parameter learning and class inference. Extensive evaluations on both synthetic and real world datasets show that proposed model outperforms multiple baselines in flood mapping, and our algorithms are scalable on large data sizes.
【Keywords】: geographical hidden markov tree; spatial classification
【Paper Link】 【Pages】:2555-2564
【Authors】: Jie Xu ; Lei Luo ; Cheng Deng ; Heng Huang
【Abstract】: topic with many real-world applications. Most existing metric learning methods aim to learn an optimal Mahalanobis distance matrix M, under which data samples from the same class are forced to be close to each other and those from different classes are pushed far away. The Mahalanobis distance matrix M can be factorized as M = L'L, and the Mahalanobis distance induced by L is equivalent to the Euclidean distance after linear projection of the feature vectors on the rows of L. However, the Euclidean distance is only suitable for characterizing Gaussian noise, thus the traditional metric learning algorithms are not robust to achieve good performance when they are applied to the occlusion data, which often appear in image and video data mining applications. To overcome this limitation, we propose a new robust metric learning approach by introducing the maximum correntropy criterion to deal with real-world malicious occlusions or corruptions. In our new model, we enforce the intra-class reconstruction residual of each sample to be smaller than the inter-class reconstruction residual by a large margin. Meanwhile, we employ correntropy induced metric to fit the reconstruction residual, which has been proved to be useful in non-Gaussian data processing. Leveraging the half-quadratic optimization technique, we derive an efficient algorithm to solve the proposed new model and provide its convergence guarantee as well. Extensive experiments on various occluded data sets indicate that our proposed model can achieve more promising performance than other related methods.
【Keywords】: maximum correntropy criterion; robust metric learning
【Paper Link】 【Pages】:2565-2573
【Authors】: Yanbo Xu ; Siddharth Biswal ; Shriprasad R. Deshpande ; Kevin O. Maher ; Jimeng Sun
【Abstract】: With the improvement of medical data capturing, vast amount of continuous patient monitoring data, e.g., electrocardiogram (ECG), real-time vital signs and medications, become available for clinical decision support at intensive care units (ICUs). However, it becomes increasingly challenging to model such data, due to high density of the monitoring data, heterogeneous data types and the requirement for interpretable models. Integration of these high-density monitoring data with the discrete clinical events (including diagnosis, medications, labs) is challenging but potentially rewarding since richness and granularity in such multimodal data increase the possibilities for accurate detection of complex problems and predicting outcomes (e.g., length of stay and mortality). We propose Recurrent Attentive and Intensive Model (RAIM) for jointly analyzing continuous monitoring data and discrete clinical events. RAIM introduces an efficient attention mechanism for continuous monitoring data (e.g., ECG), which is guided by discrete clinical events (e.g, medication usage). We apply RAIM in predicting physiological decompensation and length of stay in those critically ill patients at ICU. With evaluations on MIMIC-III Waveform Database Matched Subset, we obtain an AUC-ROC score of $90.18%$ for predicting decompensation and an accuracy of $86.82%$ for forecasting length of stay with our final model, which outperforms our six baseline models.
【Keywords】: attention model; deep neural network; ecg waveforms; electronic health records; intensive care units; multimodal; time series
【Paper Link】 【Pages】:2574-2583
【Authors】: Rui Yan ; Dongyan Zhao
【Abstract】: To have automatic conversations between human and computer is regarded as one of the most hardcore problems in computer science. Conversational systems are of growing importance due to their promising potentials and commercial values as virtual assistants and chatbots. To build such systems with adequate intelligence is challenging, and requires abundant resources including an acquisition of big conversational data and interdisciplinary techniques, such as content analysis, text mining, and retrieval. The arrival of big data era reveals the feasibility to create a conversational system empowered by data-driven approaches. Now we are able to collect an extremely large number of human-human conversations on Web, and organize them to launch human-computer conversational systems. Given a human issued utterance, i.e., a query, a conversational system will search for appropriate responses, conduct relevance ranking using contexts information, and then output the highly relevant result. In this paper, we propose a novel context modeling framework with end-to-end neural networks for human-computer conversational systems. The proposed model is general and unified. In the experiments, we demonstrate the effectiveness of the proposed model for human-computer conversations using p@1, MAP, nDCG, and MRR metrics.
【Keywords】: context modeling; conversational system; retrieval model
【Paper Link】 【Pages】:2584-2593
【Authors】: Tong Yang ; Junzhi Gong ; Haowei Zhang ; Lei Zou ; Lei Shi ; Xiaoming Li
【Abstract】: Data stream processing is a fundamental issue in many fields, such as data mining, databases, network traffic measurement. There are five typical tasks in data stream processing: frequency estimation, heavy hitter detection, heavy change detection, frequency distribution estimation, and entropy estimation. Different algorithms are proposed for different tasks, but they seldom achieve high accuracy and high speed at the same time. To address this issue, we propose a novel data structure named HeavyGuardian. The key idea is to intelligently separate and guard the information of hot items while approximately record the frequencies of cold items. We deploy HeavyGuardian on the above five typical tasks. Extensive experimental results show that HeavyGuardian achieves both much higher accuracy and higher speed than the state-of-the-art solutions for each of the five typical tasks. The source codes of HeavyGuardian and other related algorithms are available at GitHub.
【Keywords】: data stream processing; data sturcture; probabilistic and approximate data
【Paper Link】 【Pages】:2594-2603
【Authors】: Yang Yang ; Yi-Feng Wu ; De-Chuan Zhan ; Zhi-Bin Liu ; Yuan Jiang
【Abstract】: In real world applications, complex objects are usually with multiple labels, and can be represented as multiple modal representations, e.g., the complex articles contain text and image information as well as are with multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, i.e., the article is constituted with variable number of inconsistent text and image instances. To solve this problem, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. Besides, how to effectively utilize label correlation is also a challenging issue. In this paper, we propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which learns the label prediction and exploits label correlation simultaneously based on the Optimal Transport, by considering the consistency principle between different modal bag-level prediction and the learned latent ground label metric. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed method.
【Keywords】: multi-instance; multi-label; multi-modal; optimal transport
【Paper Link】 【Pages】:2604-2613
【Authors】: Ali Batuhan Yardim ; Victor Kristof ; Lucas Maystre ; Matthias Grossglauser
【Abstract】: As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.
【Keywords】: collaborative filtering; peer-production systems; ranking; user-generated content
【Paper Link】 【Pages】:2614-2623
【Authors】: Abdurrahman Yasar ; Ümit V. Çatalyürek
【Abstract】: Integrating data from heterogeneous sources is often modeled as merging graphs. Given two or more "compatible'', but not-isomorphic graphs, the first step is to identify a graph alignment, where a potentially partial mapping of vertices between two graphs is computed. A significant portion of the literature on this problem only takes the global structure of the input graphs into account. Only more recent ones additionally use vertex and edge attributes to achieve a more accurate alignment. However, these methods are not designed to scale to map large graphs arising in many modern applications. We propose a new iterative graph aligner, gsaNA, that uses the global structure of the graphs to significantly reduce the problem size and align large graphs with a minimal loss of information. Concretely, we show that our proposed technique is highly flexible, can be used to achieve higher recall, and it is orders of magnitudes faster than the current state of the art techniques.
【Keywords】: graph alignment; graph matching; labeled graph; network alignment
【Paper Link】 【Pages】:2624-2633
【Authors】: Zeyang Ye ; Lihao Zhang ; Keli Xiao ; Wenjun Zhou ; Yong Ge ; Yuefan Deng
【Abstract】: The classic mobile sequential recommendation (MSR) problem aims to provide the optimal route to taxi drivers for minimizing the potential travel distance before they meet next passengers. However, the problem is designed from the view of a single user and may lead to overlapped recommendations and cause traffic problems. Existing approaches usually contain an offline pruning process with extremely high computational cost, given a large number of pick-up points. To this end, we formalize a new multi-user MSR (MMSR) problem that locates optimal routes for a group of drivers with different starting positions. We develop two efficient methods, PSAD and PSAD-M, for solving the MMSR problem by ganging parallel computing and simulated annealing. Our methods outperform several existing approaches, especially for high-dimensional MMSR problems, with a record-breaking performance of 180x speedup using 384 cores.
【Keywords】: mobile sequential recommendation; parallel computing; potential travel distance; simulated annealing
【Paper Link】 【Pages】:2634-2642
【Authors】: Jianhua Yin ; Daren Chao ; Zhongkun Liu ; Wei Zhang ; Xiaohui Yu ; Jianyong Wang
【Abstract】: Short text stream clustering has become an increasingly important problem due to the explosive growth of short text in diverse social medias. In this paper, we propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally. The MStream algorithm can achieve state-of-the-art performance with only one pass of the stream, and can have even better performance when we allow multiple iterations of each batch. We further propose an improved algorithm of MStream with forgetting rules called MStreamF, which can efficiently delete outdated documents by deleting clusters of outdated batches. Our extensive experimental study shows that MStream and MStreamF can achieve better performance than three baselines on several real datasets.
【Keywords】: dirichlet process; mixture model; text stream clustering
【Paper Link】 【Pages】:2643-2652
【Authors】: Yu Yin ; Zhenya Huang ; Enhong Chen ; Qi Liu ; Fuzheng Zhang ; Xing Xie ; Guoping Hu
【Abstract】: Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured code), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what'' solution. Specifically, we first decide "where-to-look'' through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write'' by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine our STN framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.
【Keywords】: reinforcement learning; spotlight transcribing network; structural image
【Paper Link】 【Pages】:2653-2662
【Authors】: Tomoki Yoshida ; Ichiro Takeuchi ; Masayuki Karasuyama
【Abstract】: We study safe screening for metric learning. Distance metric learning can optimize a metric over a set of triplets, each one of which is defined by a pair of same class instances and an instance in a different class. However, the number of possible triplets is quite huge even for a small dataset. Our safe triplet screening identifies triplets which can be safely removed from the optimization problem without losing the optimality. Compared with existing safe screening studies, triplet screening is particularly significant because of (1) the huge number of possible triplets, and (2) the semi-definite constraint in the optimization. We derive several variants of screening rules, and analyze their relationships. Numerical experiments on benchmark datasets demonstrate the effectiveness of safe triplet screening.
【Keywords】: convex optimization; metric learning; safe screening
【Paper Link】 【Pages】:2663-2671
【Authors】: Wenchao Yu ; Cheng Zheng ; Wei Cheng ; Charu C. Aggarwal ; Dongjin Song ; Bo Zong ; Haifeng Chen ; Wei Wang
【Abstract】: The problem of network representation learning, also known as network embedding, arises in many machine learning tasks assuming that there exist a small number of variabilities in the vertex representations which can capture the "semantics" of the original network structure. Most existing network embedding models, with shallow or deep architectures, learn vertex representations from the sampled vertex sequences such that the low-dimensional embeddings preserve the locality property and/or global reconstruction capability. The resultant representations, however, are difficult for model generalization due to the intrinsic sparsity of sampled sequences from the input network. As such, an ideal approach to address the problem is to generate vertex representations by learning a probability density function over the sampled sequences. However, in many cases, such a distribution in a low-dimensional manifold may not always have an analytic form. In this study, we propose to learn the network representations with adversarially regularized autoencoders (NetRA). NetRA learns smoothly regularized vertex representations that well capture the network structure through jointly considering both locality-preserving and global reconstruction constraints. The joint inference is encapsulated in a generative adversarial training process to circumvent the requirement of an explicit prior distribution, and thus obtains better generalization performance. We demonstrate empirically how well key properties of the network structure are captured and the effectiveness of NetRA on a variety of tasks, including network reconstruction, link prediction, and multi-label classification.
【Keywords】: autoencoder; gans; generative adversarial networks; network embedding
【Paper Link】 【Pages】:2672-2681
【Authors】: Wenchao Yu ; Wei Cheng ; Charu C. Aggarwal ; Kai Zhang ; Haifeng Chen ; Wei Wang
【Abstract】: Massive and dynamic networks arise in many practical applications such as social media, security and public health. Given an evolutionary network, it is crucial to detect structural anomalies, such as vertices and edges whose "behaviors'' deviate from underlying majority of the network, in a real-time fashion. Recently, network embedding has proven a powerful tool in learning the low-dimensional representations of vertices in networks that can capture and preserve the network structure. However, most existing network embedding approaches are designed for static networks, and thus may not be perfectly suited for a dynamic environment in which the network representation has to be constantly updated. In this paper, we propose a novel approach, NetWalk, for anomaly detection in dynamic networks by learning network representations which can be updated dynamically as the network evolves. We first encode the vertices of the dynamic network to vector representations by clique embedding, which jointly minimizes the pairwise distance of vertex representations of each walk derived from the dynamic networks, and the deep autoencoder reconstruction error serving as a global regularization. The vector representations can be computed with constant space requirements using reservoir sampling. On the basis of the learned low-dimensional vertex representations, a clustering-based technique is employed to incrementally and dynamically detect network anomalies. Compared with existing approaches, NetWalk has several advantages: 1) the network embedding can be updated dynamically, 2) streaming network nodes and edges can be encoded efficiently with constant memory space usage, 3). flexible to be applied on different types of networks, and 4) network anomalies can be detected in real-time. Extensive experiments on four real datasets demonstrate the effectiveness of NetWalk.
【Keywords】: anomaly detection; clique embedding; deep autoencoder; dynamic network embedding
【Paper Link】 【Pages】:2682-2691
【Authors】: Chengxi Zang ; Peng Cui ; Wenwu Zhu
【Abstract】: To fit empirical data distributions and then interpret them in a generative way is a common research paradigm to understand the structure and dynamics underlying the data in various disciplines. However, previous works mainly attempt to fit or interpret empirical data distributions in a case-by-case way. Faced with complex data distributions in the real world, can we fit and interpret them by a unified but parsimonious parametric model? In this paper, we view the complex empirical data as being generated by a dynamic system which takes uniform randomness as input. By modeling the generative dynamics of data, we showcase a four-parameter dynamic model together with inference and simulation algorithms, which is able to fit and generate a family of distributions, ranging from Gaussian, Exponential, Power Law, Stretched Exponential (Weibull), to their complex variants with multi-scale complexities. Rather than a black box, our model can be interpreted by a unified differential equation, which captures the underlying generative dynamics. More powerful models can be constructed by our framework in a principled way. We validate our model by various synthetic datasets. We then apply our model to $16$ real-world datasets from different disciplines. We show the systematic biases of fitting these datasets by the most widely used methods and show the superiority of our model. In short, our model potentially provides a framework to fit complex distributions in empirical data, and more importantly, to understand their generative mechanisms.
【Keywords】: complex distribution; dynamic model; heavy-tailed distribution; interpretability; survival analysis
【Paper Link】 【Pages】:2692-2700
【Authors】: Bang Zhang ; Lelin Zhang ; Ting Guo ; Yang Wang ; Fang Chen
【Abstract】: Urbanization is a global trend that we have all witnessed in the past decades. It brings us both opportunities and challenges. On the one hand, urban system is one of the most sophisticated social-economic systems that is responsible for efficiently providing supplies meeting the demand of residents in various of domains, e.g., dwelling, education, entertainment, healthcare, etc. On the other hand, significant diversity and inequality exist in the development patterns of urban systems, which makes urban data analysis difficult. Different urban regions often exhibit diverse urbanization patterns and provide distinct urban functions, e.g., commercial and residential areas offer significantly different urban functions. It is desired to develop the data analytic capabilities for discovering the underlying cross-domain urbanization patterns, clustering urban regions based on their function similarity and predicting region popularity in specified domains. Previous studies in the urban data analysis area often just focus on individual domains and rarely consider cross-domain urban development patterns hidden in different urban regions. In this paper, we propose the infinite urbanization process (IUP) model for simultaneous urban region function discovery and region popularity prediction. The IUP model is a generative Bayesian nonparametric process that is capable of describing a potentially infinite number of urbanization patterns. It is developed within the supervised topic modelling framework and is supported by a novel hierarchical spatial distance dependent Bayesian nonparametric prior over the spatial region partition space. The empirical study conducted on the real-world datasets shows promising outcome compared with the state-of-the-art techniques.
【Keywords】: bayesian nonparametric; topic modelling; urban computing; urban function discovery
【Paper Link】 【Pages】:2701-2709
【Authors】: Chao Zhang ; Fangbo Tao ; Xiusi Chen ; Jiaming Shen ; Meng Jiang ; Brian M. Sadler ; Michelle Vanni ; Jiawei Han
【Abstract】: Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.
【Keywords】: taxonomy construction; text mining; word embedding
【Paper Link】 【Pages】:2710-2719
【Authors】: Chen Zhang ; Yijun Wang ; Can Chen ; Changying Du ; Hongzhi Yin ; Hao Wang
【Abstract】: Stock comments from analysts contain important consulting information for investors to foresee stock volatility and market trends. Existing studies on stock comments usually focused on capturing coarse-grained opinion polarities or understanding market fundamentals. However, investors are often overwhelmed and confused by massive comments with huge noises and ambiguous opinions. Therefore, it is an emerging need to have a fine-grained stock comment analysis tool to identify more reliable stock comments. To this end, this paper provides a solution called StockAssIstant for modeling the reliability of stock comments by considering multiple factors, such as stock price trends, comment content, and the performances of analysts, in a holistic manner. Specifically, we first analyze the pattern of analysts' opinion dynamics from historical comments. Then, we extract key features from the time-series constructed by using the semantic information in comment text, stock prices and the historical behaviors of analysts. Based on these features, we propose an ensemble learning based approach for measuring the reliability of comments. Finally, we conduct extensive experiments and provide a trading simulation on real-world stock data. The experimental results and the profit achieved by the simulated trading in 12-month period clearly validate the effectiveness of our approach for modeling the reliability of stock comments.
【Keywords】: reliability modeling; stock comment; time-series
【Paper Link】 【Pages】:2720-2728
【Authors】: Chenwei Zhang ; Yaliang Li ; Nan Du ; Wei Fan ; Philip S. Yu
【Abstract】: Online healthcare services can provide the general public with ubiquitous access to medical knowledge and reduce medical information access cost for both individuals and societies. However, expanding the scale of high-quality yet structured medical knowledge usually comes with tedious efforts in data preparation and human annotation. To promote the benefits while minimizing the data requirement in expanding medical knowledge, we introduce a generative perspective to study the relational medical entity pair discovery problem. A generative model named Conditional Relationship Variational Autoencoder is proposed to discover meaningful and novel medical entity pairs by purely learning from the expression diversity in the existing relational medical entity pairs. Unlike discriminative approaches where high-quality contexts and candidate medical entity pairs are carefully prepared to be examined by the model, the proposed model generates novel entity pairs directly by sampling from a learned latent space without further data requirement. The proposed model explores the generative modeling capacity for medical entity pairs while incorporating deep learning for hands-free feature engineering. It is not only able to generate meaningful medical entity pairs that are not yet observed, but also can generate entity pairs for a specific medical relationship. The proposed model adjusts the initial representations of medical entities by addressing their relational commonalities. Quantitative and qualitative evaluations on real-world relational medical entity pairs demonstrate the effectiveness of the proposed method in generating relational medical entity pairs that are meaningful and novel.
【Keywords】: generative modeling; knowledge discovery; medical entity pair
【Paper Link】 【Pages】:2729-2737
【Authors】: Hengtong Zhang ; Yaliang Li ; Fenglong Ma ; Jing Gao ; Lu Su
【Abstract】: Truth discovery has attracted increasingly more attention due to its ability to distill trustworthy information from noisy multi-sourced data without any supervision. However, most existing truth discovery methods are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data as text data has its unique characteristics. The major challenges of inferring true information on text data stem from the multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). To tackle these challenges, in this paper, we propose a novel truth discovery method, named "TextTruth", which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers. After that, the answers to each question can be ranked based on the estimated trustworthiness of factors. The proposed method works in an unsupervised manner, and thus can be applied to various application scenarios that involve text data. Experiments on three real-world datasets show that the proposed TextTruth model can accurately select trustworthy answers, even when these answers are formed by multiple factors.
【Keywords】: text mining; truth discovery; unsupervised learning
【Paper Link】 【Pages】:2738-2747
【Authors】: Jing Zhang ; Xindong Wu
【Abstract】: When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the quality of labels, one task is independently completed by a group of heterogeneous crowdsourced workers. Then, the true values of the multiple labels of each task are inferred from these repeated noisy labels. In this paper, we propose a novel probabilistic method, which includes a multi-class multi-label dependency (MCMLD) model, to address this problem. The proposed method assumes that the label-correlation exists in both unknown true labels and noisy crowdsourced labels. Thus, it introduces a mixture of multiple independently multinoulli distributions to capture the correlation among the labels. Finally, the unknown true values of the multiple labels of each task, together with a set of confusion matrices modeling the reliability of the workers, can be jointly inferred through an EM algorithm. Experiments with three simulated typical crowdsourcing scenarios and a real-world dataset consistently show that our proposed MCMLD method significantly outperforms several competitive alternatives. Furthermore, if the labels are strongly correlated, the advantage of MCMLD will be more remarkable.
【Keywords】: crowdsourcing; label aggregation; maximum likelihood estimation; mixture models; probabilistic graphical models
【Paper Link】 【Pages】:2748-2757
【Authors】: Ping Zhang ; Zhifeng Bao ; Yuchen Li ; Guoliang Li ; Yipeng Zhang ; Zhiyong Peng
【Abstract】: In this paper we propose and study the problem of trajectory-driven influential billboard placement: given a set of billboards $\ur$ (each with a location and a cost), a database of trajectories $\td$ and a budget $\budget$, find a set of billboards within the budget to influence the largest number of trajectories. One core challenge is to identify and reduce the overlap of the influence from different billboards to the same trajectories, while keeping the budget constraint into consideration. We show that this problem is NP-hard and present an enumeration based algorithm with $(1-1/e)$ approximation ratio. However, the enumeration should be very costly when $|\ur|$ is large. By exploiting the locality property of billboards' influence, we propose a partition-based framework \psel. \psel partitions $\ur$ into a set of small clusters, computes the locally influential billboards for each cluster, and merges them to generate the global solution. Since the local solutions can be obtained much more efficient than the global one, \psel should reduce the computation cost greatly; meanwhile it achieves a non-trivial approximation ratio guarantee. Then we propose a \bbsel method to further prune billboards with low marginal influence, while achieving the same approximation ratio as \psel. Experiments on real datasets verify the efficiency and effectiveness of our methods.
【Keywords】: influence maximization; outdoor advertising; trajectory
【Paper Link】 【Pages】:2758-2767
【Authors】: Yan Zhang ; Haoyu Wang ; Defu Lian ; Ivor W. Tsang ; Hongzhi Yin ; Guowu Yang
【Abstract】: The efficiency of top-k recommendation is vital to large-scale recommender systems. Hashing is not only an efficient alternative but also complementary to distributed computing, and also a practical and effective option in a computing environment with limited resources. Hashing techniques improve the efficiency of online recommendation by representing users and items by binary codes. However, objective functions of existing methods are not consistent with ultimate goals of recommender systems, and are often optimized via discrete coordinate descent, easily getting stuck in a local optimum. To this end, we propose a Discrete Ranking-based Matrix Factorization (DRMF) algorithm based on each user's pairwise preferences, and formulate it into binary quadratic programming problems to learn binary codes. Due to non-convexity and binary constraints, we further propose self-paced learning for improving the optimization, to include pairwise preferences gradually from easy to complex. We finally evaluate the proposed algorithm on three public real-world datasets, and show that the proposed algorithm outperforms the state-of-the-art hashing-based recommendation algorithms, and even achieves comparable performance to matrix factorization methods.
【Keywords】: binary quadratic programming; hashing; ranking-based matrix factorization; self-paced learning
【Paper Link】 【Pages】:2768-2777
【Authors】: Yifan Zhang ; Peilin Zhao ; Jiezhang Cao ; Wenye Ma ; Junzhou Huang ; Qingyao Wu ; Mingkui Tan
【Abstract】: This paper investigates Online Active Learning (OAL) for imbalanced unlabeled datastream, where only a budget of labels can be queried to optimize some cost-sensitive performance measure. OAL can solve many real-world problems, such as anomaly detection in healthcare, finance and network security. In these problems, there are two key challenges: the query budget is often limited; the ratio between two classes is highly imbalanced. To address these challenges, existing work of OAL adopts either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and uses first-order methods to optimize the cost-sensitive measure. However, they may incur two deficiencies: (1) the poor ability in handling imbalanced data due to the isolated asymmetric strategy; (2) relative slow convergence rate due to the first-order optimization. In this paper, we propose a novel Online Adaptive Asymmetric Active (OA3) learning algorithm, which is based on a new asymmetric strategy (merging both the asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its bounds, and also empirically evaluate it on four real-world online anomaly detection tasks. Promising results confirm the effectiveness and robustness of the proposed algorithm in various application domains.
【Keywords】: active learning; anomaly detection; cost-sensitive learning; imbalance data; online learning; query budget
【Paper Link】 【Pages】:2778-2786
【Authors】: Ziwei Zhang ; Peng Cui ; Xiao Wang ; Jian Pei ; Xuanrong Yao ; Wenwu Zhu
【Abstract】: Network embedding has received increasing research attention in recent years. The existing methods show that the high-order proximity plays a key role in capturing the underlying structure of the network. However, two fundamental problems in preserving the high-order proximity remain unsolved. First, all the existing methods can only preserve fixed-order proximities, despite that proximities of different orders are often desired for distinct networks and target applications. Second, given a certain order proximity, the existing methods cannot guarantee accuracy and efficiency simultaneously. To address these challenges, we propose AROPE (arbitrary-order proximity preserved embedding), a novel network embedding method based on SVD framework. We theoretically prove the eigen-decomposition reweighting theorem, revealing the intrinsic relationship between proximities of different orders. With this theorem, we propose a scalable eigen-decomposition solution to derive the embedding vectors and shift them between proximities of arbitrary orders. Theoretical analysis is provided to guarantee that i) our method has a low marginal cost in shifting the embedding vectors across different orders, ii) given a certain order, our method can get the global optimal solutions, and iii) the overall time complexity of our method is linear with respect to network size. Extensive experimental results on several large-scale networks demonstrate that our proposed method greatly and consistently outperforms the baselines in various tasks including network reconstruction, link prediction and node classification.
【Keywords】: arbitrary-order proximity; network embedding; network representation learning
【Paper Link】 【Pages】:2787-2796
【Authors】: Liang Zhao ; Amir Alipour-Fanid ; Martin Slawski ; Kai Zeng
【Abstract】: As machine learning methods are utilized in more and more real-world applications involving constraints on computational budgets, the systematic integration of such constraints into the process of model selection and model optimization is required to an increasing extent. A specific computational resource in this regard is the time needed for evaluating predictions on test instances. There is meanwhile a substantial body of work concerned with the joint optimization of accuracy and test-time efficiency by considering the time costs of feature generation and model prediction. During the feature generation process, significant redundant computations across different features occur in many applications. Although the elimination of such redundancies would reduce the time cost substantially, there has been little research in this area due to substantial technical challenges involved, especially: 1) the lack of an effective formulation for feature computation dependency; and 2) the nonconvex and discrete nature of the optimization over feature computation dependency. In order to address these problems, this paper first proposes a heterogeneous hypergraph to represent the feature computation dependency, after which a framework is proposed that jointly optimizes the accuracy and the exact test-time cost based on a given feature computational dependency. A continuous tight approximation to this original problem is proposed based on a non-monotone nonconvex regularization term. Finally, an effective nonconvex optimization algorithm is proposed to solve the problem, along with a theoretical analysis of the convergence conditions. Extensive experiments on eight synthetic datasets and six real-world datasets demonstrate the proposed models' outstanding performance in terms of both accuracy and prediction-time cost.
【Keywords】: cost-efficient classification; feature computational dependency; nonconvex optimization
【Paper Link】 【Pages】:2797-2806
【Authors】: Yan Zhao ; Shuo Shang ; Yu Wang ; Bolong Zheng ; Quoc Viet Hung Nguyen ; Kai Zheng
【Abstract】: The pervasiveness of GPS-enabled devices and wireless communication technologies results in massive trajectory data, incurring expensive cost for storage, transmission, and query processing. To relieve this problem, in this paper we propose a novel framework for compressing trajectory data, REST (Reference-based Spatio-temporal trajectory compression), by which a raw trajectory is represented by concatenation of a series of historical (sub-)trajectories (called reference trajectories) that form the compressed trajectory within a given spatio-temporal deviation threshold. In order to construct a reference trajectory set that can most benefit the subsequent compression, we propose three kinds of techniques to select reference trajectories wisely from a large dataset such that the resulting reference set is more compact yet covering most footprints of trajectories in the area of interest. To address the computational issue caused by the large number of combinations of reference trajectories that may exist for resembling a given trajectory, we propose efficient greedy algorithms that run in the blink of an eye and dynamic programming algorithms that can achieve the optimal compression ratio. Compared to existing work on trajectory compression, our framework has few assumptions about data such as moving within a road network or moving with constant direction and speed, and better compression performance with fairly small spatio-temporal loss. Extensive experiments on a real taxi trajectory dataset demonstrate the superiority of our framework over existing representative approaches in terms of both compression ratio and efficiency.
【Keywords】: compression algorithm; spatio-temporal data; trajectory
【Paper Link】 【Pages】:2807-2816
【Authors】: Dawei Zhou ; Jingrui He ; Hongxia Yang ; Wei Fan
【Abstract】: In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.
【Keywords】: network embedding; rare category analysis; self-paced learning
【Paper Link】 【Pages】:2817-2826
【Authors】: Yao Zhou ; Arun Reddy Nelakurthi ; Jingrui He
【Abstract】: With the increasing demand for large amount of labeled data, crowdsourcing has been used in many large-scale data mining applications. However, most existing works in crowdsourcing mainly focus on label inference and incentive design. In this paper, we address a different problem of adaptive crowd teaching, which is a sub-area of machine teaching in the context of crowdsourcing. Compared with machines, human beings are extremely good at learning a specific target concept (e.g., classifying the images into given categories) and they can also easily transfer the learned concepts into similar learning tasks. Therefore, a more effective way of utilizing crowdsourcing is by supervising the crowd to label in the form of teaching. In order to perform the teaching and expertise estimation simultaneously, we propose an adaptive teaching framework named JEDI to construct the personalized optimal teaching set for the crowdsourcing workers. In JEDI teaching, the teacher assumes that each learner has an exponentially decayed memory. Furthermore, it ensures comprehensiveness in the learning process by carefully balancing teaching diversity and learner's accurate learning in terms of teaching usefulness. Finally, we validate the effectiveness and efficacy of JEDI teaching in comparison with the state-of-the-art techniques on multiple data sets with both synthetic learners and real crowdsourcing workers.
【Keywords】: crowd teaching; exponentially decayed memory; human learner
【Paper Link】 【Pages】:2827-2836
【Authors】: Dingyuan Zhu ; Peng Cui ; Daixin Wang ; Wenwu Zhu
【Abstract】: Network embedding, aiming to embed a network into a low dimensional vector space while preserving the inherent structural properties of the network, has attracted considerable attentions recently. Most of the existing embedding methods embed nodes as point vectors in a low-dimensional continuous space. In this way, the formation of the edge is deterministic and only determined by the positions of the nodes. However, the formation and evolution of real-world networks are full of uncertainties, which makes these methods not optimal. To address the problem, we propose a novel Deep Variational Network Embedding in Wasserstein Space (DVNE) in this paper. The proposed method learns a Gaussian distribution in the Wasserstein space as the latent representation of each node, which can simultaneously preserve the network structure and model the uncertainty of nodes. Specifically, we use 2-Wasserstein distance as the similarity measure between the distributions, which can well preserve the transitivity in the network with a linear computational cost. Moreover, our method implies the mathematical relevance of mean and variance by the deep variational model, which can well capture the position of the node by the mean vectors and the uncertainties of nodes by the variance. Additionally, our method captures both the local and global network structure by preserving the first-order and second-order proximity in the network. Our experimental results demonstrate that our method can effectively model the uncertainty of nodes in networks, and show a substantial gain on real-world applications such as link prediction and multi-label classification compared with the state-of-the-art methods.
【Keywords】: deep learning; network embedding; wasserstein space
【Paper Link】 【Pages】:2837-2846
【Authors】: Hongyuan Zhu ; Qi Liu ; Nicholas Jing Yuan ; Chuan Qin ; Jiawei Li ; Kun Zhang ; Guang Zhou ; Furu Wei ; Yuanchun Xu ; Enhong Chen
【Abstract】: With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when applying to song generation, which requires both the melody and arrangement. Besides, many critical factors related to the quality of a song such as chord progression and rhythm patterns are not well addressed. In particular, the problem of how to ensure the harmony of multi-track music is still underexplored. To this end, we present a focused study on pop music generation, in which we take both chord and rhythm influence of melody generation and the harmony of music arrangement into consideration. We propose an end-to-end melody and arrangement generation framework, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. Specifically, we devise a Chord based Rhythm and Melody Cross-Generation Model (CRMCG) to generate melody with chord progressions. Then, we propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement. Finally, we conduct extensive experiments on a real-world dataset, where the results demonstrate the effectiveness of XiaoIce Band.
【Keywords】: harmony evaluation; melody and arrangement generation; multi-task joint learning; music generation
【Paper Link】 【Pages】:2847-2856
【Authors】: Daniel Zügner ; Amir Akbarnejad ; Stephan Günnemann
【Abstract】: Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model.We generate adversarial perturbations targeting the node's features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given.
【Keywords】: adversarial machine learning; graph convolutional networks; graph mining; network mining; semi-supervised learning
【Paper Link】 【Pages】:2857-2866
【Authors】: Yuan Zuo ; Guannan Liu ; Hao Lin ; Jia Guo ; Xiaoqian Hu ; Junjie Wu
【Abstract】: Given the rich real-life applications of network mining as well as the surge of representation learning in recent years, network embedding has become the focal point of increasing research interests in both academic and industrial domains. Nevertheless, the complete temporal formation process of networks characterized by sequential interactive events between nodes has yet seldom been modeled in the existing studies, which calls for further research on the so-called temporal network embedding problem. In light of this, in this paper, we introduce the concept of neighborhood formation sequence to describe the evolution of a node, where temporal excitation effects exist between neighbors in the sequence, and thus we propose a Hawkes process based Temporal Network Embedding (HTNE) method. HTNE well integrates the Hawkes process into network embedding so as to capture the influence of historical neighbors on the current neighbors. In particular, the interactions of low-dimensional vectors are fed into the Hawkes process as base rate and temporal influence, respectively. In addition, attention mechanism is also integrated into HTNE to better determine the influence of historical neighbors on current neighbors of a node. Experiments on three large-scale real-life networks demonstrate that the embeddings learned from the proposed HTNE model achieve better performance than state-of-the-art methods in various tasks including node classification, link prediction, and embedding visualization. In particular, temporal recommendation based on arrival rate inferred from node embeddings shows excellent predictive power of the proposed model.
【Keywords】: hawkes process; learning representation; network embedding; temporal network
【Paper Link】 【Pages】:2867
【Authors】: John M. Abowd
【Abstract】: The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and redraw every legislative district in the country. Systems that perform successfully in the E2E test are then used in the production of the 2020 Census. Motivation: The Census Bureau conducted internal research that confirmed that the statistical disclosure limitation systems used for the 2000 and 2010 Censuses had serious vulnerabilities that were exposed by the Dinur and Nissim (2003) database reconstruction theorem. We designed a differentially private publication system that directly addressed these vulnerabilities while preserving the fitness for use of the core statistical products. Problem statement: Designing and engineering production differential privacy systems requires two primary components: (1) inventing and constructing algorithms that deliver maximum accuracy for a given privacy-loss budget and (2) insuring that the privacy-loss budget can be directly controlled by the policy-makers who must choose an appropriate point on the accuracy-privacy-loss tradeoff. The first problem lies in the domain of computer science. The second lies in the domain of economics. Approach: The algorithms under development for the 2020 Census focus on the data used to draw legislative districts and to enforce the 1965 Voting Rights Act (VRA). These algorithms efficiently distribute the noise injected by differential privacy. The Data Stewardship Executive Policy Committee selects the privacy-loss parameter after reviewing accuracy-privacy-loss graphs.
【Keywords】: differential privacy; economics of privacy
【Paper Link】 【Pages】:2868
【Authors】: Mayur Datar
【Abstract】: In this talk we will give a very brief overview of Flipkart, highlighting the major milestones in its journey so far and some latest market size numbers. We will enumerate some of the important data science challenges in an e-commerce company like ours. Finally we will cover one of the problems, demand forecasting, in some details to highlight some of our recent work: Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The challenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. We propose a Neural Network architecture called AR-MDN, that simultaneously models associative factors, time-series trends and the variance in the demand
【Keywords】: data science; demand forecasting; e-commerce; neural network; time series forecasting
【Paper Link】 【Pages】:2869
【Authors】: Xin Luna Dong
【Abstract】: Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph. In this talk we describe four scientific directions we are investigating in building and using such a knowledge graph. First, we have been developing advanced extraction technologies to harvest product knowledge from semi-structured sources on the web and from text product profiles. Our annotation-based extraction tool selects a few webpages (typically below 10 pages) from a website for annotations, and can derive XPaths to extract from the whole website with average precision and recall of 97% [1]. Our distantly supervised extraction tool, CERES, uses an existing knowledge graph to automatically generate (noisy) training labels, and can obtain a precision over 90% when extracting from long-tail websites in various languages [1]. Our OpenTag technique extends state-of-the-art techniques such as Recursive Neural Network (RNN) and Conditional Random Field with attention and active learning, to achieve over 90% precision and recall in extracting attribute values (including values unseen in training data) from product titles, descriptions, and bullets [3].
【Keywords】: data cleaning; entity linkage; graph mining; human-in-the-loop; knowledge extraction; knowledge fusion
【Paper Link】 【Pages】:2870
【Authors】: Li Fan
【Abstract】: Pinterest's mission is to help you discover and do what you love -- whether that's finding the perfect recipe for your family dinner or pulling together an outfit. To achieve this level of personalization, and with 200M+ active users and billions of recommendations every day, we live on machine learning. From object detection and classification to ads auction model tuning, Machine learning is used in numerous components of our system. With limited resources as a medium-sized company, but fast growing demand from passionate users, we have to balance cutting edge technology advancement with practical system implementation that can be put in place within a short amount of time. In this talk, I will review Pinterest's approach of a careful balance between simplicity and functionality, and how we reached our current stage of system design.
【Keywords】: machine learning; pinterest; simplicity
【Paper Link】 【Pages】:2871
【Authors】: James Hodson
【Abstract】: Finance is the efficient allocation of capital to achieve individual and societal objectives. Finance runs on information, from the number of ships under construction in the ports of Dalian, to the beliefs of investors in a marketplace--we want to put capital to work in the places it will have the biggest impact. Increasingly, large new data sources are facilitating our understanding of how individuals, teams, companies, governments, and other entities operate, allowing for new types of modelling that can unlock value and accelerate growth. In this talk, we will explore several open questions at the intersection of individuals, jobs, companies, and financial markets, where new data brings the promise of new understanding. How do people and firms interact to create economic value? How do we accelerate this value creation? How can the KDD community help?
【Keywords】: dynamic network structure; economics; finance; labour market; machine learning
【Paper Link】 【Pages】:2872-2873
【Authors】: Foster Provost ; James Hodson ; Jeannette M. Wing ; Qiang Yang ; Jennifer Neville
【Abstract】: The explosion of interest in KDD and other Data Science/Machine Learning/AI conferences is just one of the many signs that these technologies are no longer confined to the realms of academia and a hand-full of tech companies. As our daily lives seamlessly integrate more and more data-driven applications, people's excitement is tempered by worry about the technologies' potential to disrupt their existence. Having worked for almost 30 years to design and develop these technologies, the KDD community now should examine and debate the impact of Machine Learning & AI on the broader world. Beyond the hype, where do we stand with respect to the dangers? What role can our community play to alleviate concerns around AI taking jobs, or taking over? How can the value derived from data be distributed fairly? Are concerns about inequity well-founded or rather largely problems of perception? What can be done to bring data hunger and data sharing concerns to a level of equilibrium? How do we prepare people to interact with intelligent systems at scale? Can we unleash the incredible responsiveness of the KDD community toward longer-term more impactful projects across sectors that are essential for social good, such as Health, Environmental Sustainability, and Public Welfare.
【Keywords】: ai dangers; artificial intelligence; data-driven applications; society
【Paper Link】 【Pages】:2874
【Authors】: Hema Raghavan
【Abstract】: At LinkedIn our mission is to build active communities for all of our members such that members are able to disseminate or seek professional content at the right time on the right channel. We mine a variety of data sources including LinkedIn's Economic Graph and member activities on the site and use large scale machine learning algorithms to recommend members to connect to people they might know to build active communities. We build real-time recommendations to disseminate information so that members never miss a relevant conversation that is going on in any of the communities they are part of. Through this talk we will showcase how we are trying to solve some of the most challenging problems on internet-scale social network analysis, streaming algorithms, and multi-objective optimization.
【Keywords】:
【Paper Link】 【Pages】:2875
【Authors】: Suju Rajan
【Abstract】: Machine learning literature on Computational Advertising typically tends to focus on the simplistic CTR prediction problem which while being relevant is the tip of the iceberg in terms of the challenges in the field. There is also very little appreciation for the scale at which the real-time-bidding systems operate (200B bid requests/day) or the increasingly adversarial ecosystem all of which add a ton of constraints in terms of feasible solutions. In this talk, I'll highlight some recent efforts in developing models that try to better encapsulate the journey of an ad from the first display to a user to the effect on an actual purchase.
【Keywords】: computational advertising; real-time-bidding
【Paper Link】 【Pages】:2876
【Authors】: Christopher Ré
【Abstract】: In the last few years, deep learning models have simultaneously achieved high quality on conventionally challenging tasks and become easy-to-use commodity tools. These factors, combined with the ease of deployment compared to traditional software, have led to deep learning models replacing production software stacks in not only traditional machine learning-driven products including translation and search, but also in many previously heuristic-based applications. This new mode of software construction and deployment has been called Software 2.0 [2]. A key bottleneck in the construction of Software 2.0 applications is the need for large, high-quality training sets for each task. This talk describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks. In Snorkel applications, instead of tediously hand-labeling individual data items, a user implicitly defines large training sets by writing programs, called labeling functions, that assign labels to subsets of data points, albeit noisily. This idea of using multiple, imperfect sources of labels builds on work in distant supervision. However, if ignored, the uneven (and unknown) accuracies and coverages of the user-provided labeling functions can easily lead to suboptimal results: Example. Suppose we have two training sets, T1 and T2, which are produced by two processes (or labeling functions). T1 has high accuracy say 90% but low yield, labeling 10k points while T2 has lower accuracy, 60%, but higher yield, 1M points. If we put the training sets together, we have a set T of 1.01M points with overall accuracy 60.3%. This could be distressing for a user: a model trained on T1 seems to lose quality when trained on all of T. Naively combining the training sets fails to account for the different origins of T1 and T2. Snorkel addresses this challenge of uneven training source quality by automatically learning a statistical model of the labeling functions' accuracies and correlation structure. The lack of hand-labeled data when learning this model raises several statistical challenges including estimating accuracies, learning correlations, and selecting features that refine labeling function quality [1,3,4]. Snorkel then uses this model to combine and reweight the labeling functions' labels, producing a set of probabilistic training labels, thus effectively passing along key provenance information about the training. Our experimental results and theory show that estimating and accounting for the quality of the labeling functions in this way can lead to improved training set labels and boost downstream application quality-potentially by large margins, e.g., more than ten points of F1 score in NLP applications. Exploiting the varied quality of supervision is a key building block to help manage the software 2.0 stack-but it's far from the only technique. Indeed, recent extensions of these core themes have led to projects automatically generating data augmentations, synthesizing labeling functions, and programmatically defining multi-task supervision. This does not even touch the many new opportunities for deployment and systems in Software 2.0. Hence, we contend there is a broad research motivated by Software 2.0. Although only two years old, the Snorkel project powers applications in major tech companies and scientific efforts. It is used in applications in traditional machine learning applications like natural language processing, medical imaging, and prediction. Perhaps more excitingly for the Software 2.0 vision, it's also used in traditional enterprise applications like data cleaning, data integration, and semi-structured extraction- areas that have traditionally been difficult to deploy machine learning for. For more information about the formal underpinnings and applications of Snorkel, we refer to Snorkel.stanford.edu for open source code, tutorial, and links to technical papers.
【Keywords】: data labeling; deep learning; hand-labeled data; label synthesis; software 2.0
【Paper Link】 【Pages】:2877
【Authors】: Joseph Sirosh
【Abstract】: AI for Earth puts Microsoft's cloud and AI tools in the hands of those working to solve global environmental challenges. Land cover mapping is part of Microsoft's AI for Earth program, which was created in order to fundamentally change the way that society monitors, models, and ultimately manages Earth's natural resources. To power the land cover mapping work, DNNs are used to perform land use classification using tens of terabytes of high-resolution satellite images from National Agriculture Imagery Program (NAIP). However, Deep Neural Networks (DNNs) are challenging to infer cost-effectively, and deploy in large-scale online services with low latencies and price/performance. Microsoft Project Brainwave is a hardware architecture designed to enable high performance real-time AI computations, and the architecture is deployed on field programmable arrays (FPGAs). This wave of hardware innovation will fundamentally transform latencies and price-performance for large scale use of DNNs. In this session, we will walkthrough how FPGAs are used within Microsoft, and how we can tap the power of FPGAs for real-time AI. We will share the secrets of how we are able to perform land cover classification on 20 terabytes of high-resolutions satellite images from NAIP in ten minutes, at the rate of over 415,000 inferences/second.
【Keywords】: FPGA; artificial intelligence; deep neural networks; land cover classification; national agriculture imagery program; real-time AI
【Paper Link】 【Pages】:2878
【Authors】: Alex Smola
【Abstract】: Over the past decade Deep Learning has revolutionized much of Data Mining and Artificial Intelligence. Several factors have contributed to this virtuous cycle, primarily the ready availability of data in the cloud and a shift in the hardware resources that can be used for computation, mostly away from memory intensive models to compute intensive ones. For instance, large amounts of image and video data are available thanks to cheap and ubiquitous sensors. Processing them is only possible with equally copious amounts of low-precision computation. At the same time, expressive machine learning frameworks have allowed statistical modelers to design complex models with ease and to deploy them at scale, thus increasing the demand for computation even further. In this talk I will illustrate how these interaction cycles are likely to shape machine learning in the future.
【Keywords】: AI algorithms; AI hardware; algorithms; cloud; data; hardware
【Paper Link】 【Pages】:2879
【Authors】: Jen Walraven
【Abstract】: Netflix entered the world of content production with its first Original title in 2012 and has since grown to produce over 700 Original titles around the world. Spanning pre-production (planning, budgeting, etc.), production (principal photography), post-production (editing, sound mixing, etc.), and localization and quality control (subtitle creation, resolving technical glitches, etc.), content production is a complex operation that consumes and generates significant amounts of data. Throughout this process, the application of analytics, machine learning, and optimization can unlock deeper insight. Translating this insight into actionable recommendations alongside creative teams can introduce tremendous efficiency and scalability into the production process. In this talk, we'll discuss how data science can help tackle critical challenges in the production space, as well as opportunities on the horizon in a transforming entertainment industry.
【Keywords】: content production; data science; entertainment; machine learning; optimization
【Paper Link】 【Pages】:2880
【Authors】: Eric Xing
【Abstract】: The rise of Big Data and AI computing has led to new demands for Machine Learning systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful and real-time predictive analytics thereupon. In this talk, I discuss a recent trend toward building new distributed frameworks for AI at massive scale known as "system and ML algorithm co-design", or SysML -- system designs are tailored to the unique properties of ML algorithms, and algorithms are re-designed to better fit into the system architecture. I show how one can explore the underlying statistical and algorithmic characteristics unique to ML programs but not typical in traditional computer programs in designing the system architecture to achieve significant, universal, and theoretically sound power-up of ML program across the board. I also present a briefly introduction of the Petuum system based on such interdisciplinary innovations, which intends to dramatically improve adoption of AI solutions by lowering the barrier of entry to AI technologies via Automatic Machine Learning through Petuum. I show how, through automatable, product-grade, hardware-agnostic, standardized building blocks that can be assembled and customized, AI users can liberate themselves from the demanding experience of algorithm programming and system tuning, and easily experiment with different AI methods, parameters, and speed/resource trade-offs by themselves or automatically. To put this in a broader context, recent discussions about AI in both research community, and the general public have been championing a novelistic view of AI, that AI can mimic, surpass, threaten, or even destroy mankind. And such discussions are fueled by mainly recent advances in deep learning experimentations and applications, which are however often plagued by its craftiness, un-interpretability, and poor generalizability. I will discuss a different view of AI as a rigorous engineering discipline and as a commodity, where standardization, modularity, repeatability, reusability, and transparency are commonly expected, just as in civil engineering where builders apply principles and techniques from all sciences to build reliable constructions. I will discuss how such a view sets different focus, approach, metric, and expectation for AI research and engineering, which we practiced in our SysML work.
【Keywords】: SysML; distributed machine learning; machine learning; systems architecture