Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM 【DBLP Link】
【Paper Link】 【Pages】:1
【Authors】: Stephen Robertson
【Abstract】: To me, an awareness of history is a fundamental requirement for progress; and I believe that we in the field of information retrieval are currently ill-served in this domain, or at least not as aware as we should be. While it is true that a researcher in IR is expected to acquire some knowledge of what has gone before, this knowledge is typically fairly narrow in scope. there are some IR researchers who regard IR as a branch of computer science, for example, despite the fact that the field has a long and venerable history entirely outside the domain of computers, as well as a considerable current presence in academic departments well away from CSDs. Outside of our immediate community, there is a widespread belief that web search engines arose out of nothing, a totally new invention for the totally new world of the web. This is unfortunate. It is true that there were huge changes in the information retrieval world in the second half of the twentiethcentury. The ideas, methods and systems that were around in 2000 (particularly the early web search engines, up to and including Google) seem at first glance to be completely different from the ideas, methods and systems that were around in 1950. But these developments involved in large measure the usual process of evolution rather than revolution, and the usual mix of steps large or small, forwards or backwards or sideways. Some of these steps were taken in the world of IR research, and some in the domain of practical commercial systems.
【Keywords】: history; text search; web search engines
【Paper Link】 【Pages】:3
【Authors】: Yoelle Maarek
【Abstract】: Web Mail has significantly changed in the last decade. It keeps growing with 90% of its traffic being generated by automated scripts or "machines", [1]. At the same time, major mail services offer more and more free storage, ranging from 15GB for Gmail and Outlook.com to 1TB for Yahoo mail. As a result, we keep accumulating messages in our inbox, rarely deleting (and sometimes not even opening) many, [2]. Our inbox has become a big store mixing important information, such as e-tickets or bills, together with newsletters or promotions, from which we forgot to unsubscribe. Search is therefore a critical mechanism in order to retrieve the specific messages we need. Unfortunately, search in mail is far from being as trusted (and used) as in the Web today. Everything is personal and often private, from the content of the mailbox, to the search strategies, users' needs and queries, thus making traditional Web search techniques inapplicable "as is". Failure is evident when we can't find a message that we remember having read, and this increases our frustration. Most mail search services return sorted-by-time results in order for us to scan results chronologically and increase our perception of perfect recall. At the same time, the ranking mechanism drops less relevant results, in order to prevent them from being ranked first if recent. So in order to increase a (false) perception of recall, these systems actually hurt recall! Ranking results by mail-specific relevance would actually increase search success, [3] yet it is not widely adopted, with the exceptions of Inbox by Gmail and Yahoo Mail that show a few relevant results on top of traditional ranked-by-time results [4]. In addition, many of us still struggle with expressing our needs, typically issuing very short queries, like in the early days of the Web [5]. In this talk, I will first highlight the key characteristics of mail search and how they differ from Web search, in terms of searchers' needs and behavior [2,5]. I will then present recent progress in mail ranking [3,4] as well as in query assistance tools [5,6]. Finally, I will discuss directions for future research, and the need to revisit mail search and invent search mechanisms specifically tailored to the personal data store that our inbox has become.
【Keywords】: mail search
【Paper Link】 【Pages】:5-14
【Authors】: Gordon V. Cormack ; Maura R. Grossman
【Abstract】: Technology-assisted review ("TAR") systems seek to achieve "total recall"; that is, to approach, as nearly as possible, the ideal of 100% recall and 100% precision, while minimizing human review effort. The literature reports that TAR methods using relevance feedback can achieve considerably greater than the 65% recall and 65% precision reported by Voorhees as the "practical upper bound on retrieval performance... since that is the level at which humans agree with one another" (Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, 2000). This work argues that in order to build - as well as to, evaluate - TAR systems that approach 100% recall and 100% precision, it is necessary to model human assessment, not as absolute ground truth, but as an indirect indicator of the amorphous property known as "relevance." The choice of model impacts both the evaluation of system effectiveness, as well as the simulation of relevance feedback. Models are presented that better fit available data than the infallible ground-truth model. These models suggest ways to improve TAR-system effectiveness so that hybrid human-computer systems can improve on both the accuracy and efficiency of human review alone. This hypothesis is tested by simulating TAR using two datasets: the TREC 4 AdHoc collection, and a dataset consisting of 401,960 email messages that were manually reviewed and classified by a single individual, Roger, in his official capacity as Senior State Records Archivist. The results using the TREC 4 data show that TAR achieves higher recall and higher precision than the assessments by either of two independent NIST assessors, and blind adjudication of the email dataset, conducted by Roger, more than two years after his original review, shows that he could have achieved the same recall and better precision, while reviewing substantially fewer than 401,960 emails, had he employed TAR in place of exhaustive manual review.
【Keywords】: continuous active learning; high recall; relevance assessment; total recall
【Paper Link】 【Pages】:15-24
【Authors】: Ye Chen ; Ke Zhou ; Yiqun Liu ; Min Zhang ; Shaoping Ma
【Abstract】: As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from assessors while online metrics exploit the user behavior data, such as clicks, collected from search engines to compare search algorithms. Although both types of IR evaluation metrics have achieved success, to what extent can they predict user satisfaction still remains under-investigated. To shed light on this research question, we meta-evaluate a series of existing online and offline metrics to study how well they infer actual search user satisfaction in different search scenarios. We find that both types of evaluation metrics significantly correlate with user satisfaction while they reflect satisfaction from different perspectives for different search tasks. Offline metrics better align with user satisfaction in homogeneous search (i.e. ten blue links) whereas online metrics outperform when vertical results are federated. Finally, we also propose to incorporate mouse hover information into existing online evaluation metrics, and empirically show that they better align with search user satisfaction than click-based online metrics.
【Keywords】: evaluation metrics; online evaluation; search satisfaction
【Paper Link】 【Pages】:25-34
【Authors】: Tetsuya Sakai
【Abstract】: Using classical statistical significance tests, researchers can only discuss P(D+|H), the probability of observing the data D at hand or something more extreme, under the assumption that the hypothesis H is true (i.e., the p-value). But what we usually want is P(H|D), the probability that a hypothesis is true, given the data. If we use Bayesian statistics with state-of-the-art Markov Chain Monte Carlo (MCMC) methods for obtaining posterior distributions, this is no longer a problem. That is, instead of the classical p-values and 95% confidence intervals, which are often misinterpreted respectively as "probability that the hypothesis is (in)correct" and "probability that the true parameter value drops within the interval is 95%," we can easily obtain P(H|D) and credible intervals which represent exactly the above. Moreover, with Bayesian tests, we can easily handle virtually any hypothesis, not just "equality of means," and obtain an Expected A Posteriori (EAP) value of any statistic that we are interested in. We provide simple tools to encourage the IR community to take up paired and unpaired Bayesian tests for comparing two systems. Using a variety of TREC and NTCIR data, we compare P(H|D) with p-values, credible intervals with confidence intervals, and Bayesian EAP effect sizes with classical ones. Our results show that (a) p-values and confidence intervals can respectively be regarded as approximations of what we really want, namely, P(H|D) and credible intervals; and (b) sample effect sizes from classical significance tests can differ considerably from the Bayesian EAP effect sizes, which suggests that the former can be poor estimates of population effect sizes. For both paired and unpaired tests, we propose that the IR community report the EAP, the credible interval, and the probability of hypothesis being true, not only for the raw difference in means but also for the effect size in terms of Glass's Δ.
【Keywords】: bayesian hypothesis tests; confidence intervals; credible intervals; effect sizes; hamiltonian monte carlo; markov chain monte carlo; p-values; statistical significance
【Paper Link】 【Pages】:35-44
【Authors】: Xiaolu Lu ; Alistair Moffat ; J. Shane Culpepper
【Abstract】: Increasing test collection sizes and limited judgment budgets create measurement challenges for IR batch evaluations, challenges that are greater when using deep effectiveness metrics than when using shallow metrics, because of the increased likelihood that unjudged documents will be encountered. Here we study the problem of metric score adjustment, with the goal of accurately estimating system performance when using deep metrics and limited judgment sets, assuming that dynamic score adjustment is required per topic due to the variability in the number of relevant documents. We seek to induce system orderings that are as close as is possible to the orderings that would arise if full judgments were available. Starting with depth-based pooling, and no prior knowledge of sampling probabilities, the first phase of our two-stage process computes a background gain for each document based on rank-level statistics. The second stage then accounts for the distributional variance of relevant documents. We also exploit the frequency statistics of pooled relevant documents in order to determine a threshold for dynamically determining the set of topics to be adjusted. Taken together, our results show that: (i) better score estimates can be achieved when compared to previous work; (ii) by setting a global threshold, we are able to adapt our methods to different collections; and (iii) the proposed estimation methods reliably approximate the system orderings achieved when many more relevance judgments are available. We also consider pools generated by a two-strata sampling approach.
【Keywords】: pooling; relevance assessment; shallow judgments; test collection
【Paper Link】 【Pages】:45-54
【Authors】: Yuxin Su ; Irwin King ; Michael R. Lyu
【Abstract】: Many learning-to-rank~(LtR) algorithms focus on query-independent model, in which query and document do not lie in the same feature space, and the rankers rely on the feature ensemble about query-document pair instead of the similarity between query instance and documents. However, existing algorithms do not consider local structures in query-document feature space, and are fragile to irrelevant noise features. In this paper, we propose a novel Riemannian metric learning algorithm to capture the local structures and develop a robust LtR algorithm. First, we design a concept called ideal candidate document to introduce metric learning algorithm to query-independent model. Previous metric learning algorithms aiming to find an optimal metric space are only suitable for query-dependent model, in which query instance and documents belong to the same feature space and the similarity is directly computed from the metric space. Then we extend the new and extremely fast global Geometric Mean Metric Learning (GMML) algorithm to develop a localized GMML, namely L-GMML. Based on the combination of local learned metrics, we employ the popular Normalized Discounted Cumulative Gain~(NDCG) scorer and Weighted Approximate Rank Pairwise~(WARP) loss to optimize the ideal candidate document for each query candidate set. Finally, we can quickly evaluate all candidates via the similarity between the ideal candidate document and other candidates. By leveraging the ability of metric learning algorithms to describe the complex structural information, our approach gives us a principled and efficient way to perform LtR tasks. The experiments on real-world datasets demonstrate that our proposed L-GMML algorithm outperforms the state-of-the-art metric learning to rank methods and the stylish query-independent LtR algorithms regarding accuracy and computational efficiency.
【Keywords】: distance metric learning; learning to rank; local metric learning
【Paper Link】 【Pages】:55-64
【Authors】: Chenyan Xiong ; Zhuyun Dai ; Jamie Callan ; Zhiyuan Liu ; Russell Power
【Abstract】: This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.
【Keywords】: embedding; kernel pooling; neural ir; ranking; relevance model
【Paper Link】 【Pages】:65-74
【Authors】: Mostafa Dehghani ; Hamed Zamani ; Aliaksei Severyn ; Jaap Kamps ; W. Bruce Croft
【Abstract】: Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection (Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models.
【Keywords】: ad-hoc retrieval; deep learning; deep neural network; ranking model; weak supervision
【Paper Link】 【Pages】:75-84
【Authors】: Suthee Chaidaroon ; Yi Fang
【Abstract】: As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original data samples by compact binary codes through hashing. A spectrum of machine learning methods have been utilized, but they often lack expressiveness and flexibility in modeling to learn effective representations. The recent advances of deep learning in a wide range of applications has demonstrated its capability to learn robust and powerful feature representations for complex data. Especially, deep generative models naturally combine the expressiveness of probabilistic generative models with the high capacity of deep neural networks, which is very suitable for text modeling. However, little work has leveraged the recent progress in deep learning for text hashing. In this paper, we propose a series of novel deep document generative models for text hashing. The first proposed model is unsupervised while the second one is supervised by utilizing document labels/tags for hashing. The third model further considers document-specific factors that affect the generation of words. The probabilistic generative formulation of the proposed models provides a principled framework for model extension, uncertainty estimation, simulation, and interpretability. Based on variational inference and reparameterization, the proposed models can be interpreted as encoder-decoder deep neural networks and thus they are capable of learning complex nonlinear distributed representations of the original documents. We conduct a comprehensive set of experiments on four public testbeds. The experimental results have demonstrated the effectiveness of the proposed supervised learning models for text hashing.
【Keywords】: deep learning; semantic hashing; variational autoencoder
【Paper Link】 【Pages】:85-94
【Authors】: Chuan-Ju Wang ; Ting-Hsiang Wang ; Hsiu-Wei Yang ; Bo-Sin Chang ; Ming-Feng Tsai
【Abstract】: This paper proposes an item concept embedding (ICE) framework to model item concepts via textual information. Specifically, in the proposed framework there are two stages: graph construction and embedding learning. In the first stage, we propose a generalized network construction method to build a network involving heterogeneous nodes and a mixture of both homogeneous and heterogeneous relations. The second stage leverages the concept of neighborhood proximity to learn the embeddings of both items and words. With the proposed carefully designed ICE networks, the resulting embedding facilitates both homogeneous and heterogeneous retrieval, including item-to-item and word-to-item retrieval. Moreover, as a distributed embedding approach, the proposed ICE approach not only generates related retrieval results but also delivers more diverse results than traditional keyword-matching-based approaches. As our experiments on two real-world datasets show, ICE encodes useful textual information and thus outperforms traditional methods in various item classification and retrieval tasks.
【Keywords】: concept embedding; conceptual retrieval; information network; textual information
【Paper Link】 【Pages】:95-104
【Authors】: Pengjie Ren ; Zhumin Chen ; Zhaochun Ren ; Furu Wei ; Jun Ma ; Maarten de Rijke
【Abstract】: As a framework for extractive summarization, sentence regression has achieved state-of-the-art performance in several widely-used practical systems. The most challenging task within the sentence regression framework is to identify discriminative features to encode a sentence into a feature vector. So far, sentence regression approaches have neglected to use features that capture contextual relations among sentences. We propose a neural network model, Contextual Relation-based Summarization (CRSum), to take advantage of contextual relations among sentences so as to improve the performance of sentence regression. Specifically, we first use sentence relations with a word-level attentive pooling convolutional neural network to construct sentence representations. Then, we use contextual relations with a sentence-level attentive pooling recurrent neural network to construct context representations. Finally, CRSum automatically learns useful contextual features by jointly learning representations of sentences and similarity scores between a sentence and sentences in its context. Using a two-level attention mechanism, CRSum is able to pay attention to important content, i.e., words and sentences, in the surrounding context of a given sentence. We carry out extensive experiments on six benchmark datasets. CRSum alone can achieve comparable performance with state-of-the-art approaches; when combined with a few basic surface features, it significantly outperforms the state-of-the-art in terms of multiple ROUGE metrics.
【Keywords】: contextual sentence relation; extractive summarization; neural network
【Paper Link】 【Pages】:105-114
【Authors】: Raphael R. Campos ; Sérgio D. Canuto ; Thiago Salles ; Clebson C. A. de Sá ; Marcos André Gonçalves
【Abstract】: Random Forest (RF) is one of the most successful strategies for automated classification tasks. Motivated by the RF success, recently proposed RF-based classification approaches leverage the central RF idea of aggregating a large number of low-correlated trees, which are inherently parallelizable and provide exceptional generalization capabilities. In this context, this work brings several new contributions to this line of research. First, we propose a new RF-based strategy (BERT) that applies the boosting technique in bags of extremely randomized trees. Second, we empirically demonstrate that this new strategy, as well as the recently proposed BROOF and LazyNN_RF classifiers do complement each other, motivating us to stack them to produce an even more effective classifier. Up to our knowledge, this is the first strategy to effectively combine the three main ensemble strategies: stacking, bagging (the cornerstone of RFs) and boosting. Finally, we exploit the efficient and unbiased stacking strategy based on out-of-bag (OOB) samples to considerably speedup the very costly training process of the stacking procedure. Our experiments in several datasets covering two high-dimensional and noisy domains of topic and sentiment classification provide strong evidence in favor of the benefits of our RF-based solutions. We show that BERT is among the top performers in the vast majority of analyzed cases, while retaining the unique benefits of RF classifiers (explainability, parallelization, easiness of parameterization). We also show that stacking only the recently proposed RF-based classifiers and BERT using our OOB-based strategy is not only significantly faster than recently proposed stacking strategies (up to six times) but also much more effective, with gains up to 21% and 17% on MacroF1 and MicroF1, respectively, over the best base method, and of 5% and 6% over a stacking of traditional methods, performing no worse than a complete stacking of methods at a much lower computational effort.
【Keywords】: bagging; boosting; classification; ensemble; random forests; stacking
【Paper Link】 【Pages】:115-124
【Authors】: Jingzhou Liu ; Wei-Cheng Chang ; Yuexin Wu ; Yiming Yang
【Abstract】: Extreme multi-label text classification (XMTC) refers to the problem of assigning to each document its most relevant subset of class labels from an extremely large label collection, where the number of labels could reach hundreds of thousands or millions. The huge label space raises research challenges such as data sparsity and scalability. Significant progress has been made in recent years by the development of new machine learning methods, such as tree induction with large-margin partitions of the instance spaces and label-vector embedding in the target space. However, deep learning has not been explored for XMTC, despite its big successes in other related areas. This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network (CNN) models which are tailored for multi-label classification in particular. With a comparative evaluation of 7 state-of-the-art methods on 6 benchmark datasets where the number of labels is up to 670,000, we show that the proposed CNN approach successfully scaled to the largest datasets, and consistently produced the best or the second best results on all the datasets. On the Wikipedia dataset with over 2 million documents and 500,000 labels in particular, it outperformed the second best method by 11.7%~15.3% in [email protected] and by 11.5%~11.7% in [email protected] for K = 1,3,5.
【Keywords】: convolutional neural network; deep learning; extreme text classification; multi-label
【Paper Link】 【Pages】:125-134
【Authors】: Ashlee Edwards ; Diane Kelly
【Abstract】: One of the primary ways researchers have characterized engagement during information search is by increases in search behaviors, such as queries and clicks. However, studies have shown that frustration is also characterized by increases in these same behaviors. This research examines the differences in the search behaviors and physiologies of people who are engaged or frustrated during search. A 2x2 within-subject laboratory experiment was conducted with 40 participants. Engagement was induced by manipulating task interest and frustration was induced by manipulating the quality of the search results. Participants' interactions and physiological responses were recorded, and after they searched, they evaluated their levels of engagement, frustration and stress. Participants reported significantly greater levels of engagement when completing tasks that interested them and significantly less engagement during searches with poor results quality. For all search behaviors measured, only two significant differences were found according to task interest: participants had more scrolls and longer query intervals when searching for interesting tasks, suggesting greater interaction with content. Significant differences were found for nine behaviors according to results quality, including queries issued, number of SERPs displayed and number of SERP clicks, suggesting these are potentially better indicators of frustration rather than engagement. When presented with poor quality results, participants had significantly higher heart rates than when presented with normal quality results. Finally, participants had lower heart rates and greater skin conductance responses when conducting interesting tasks than when conducting uninteresting tasks. This research provides insight into the differences in search behaviors and physiologies of participants when they are engaged versus frustrated and presents techniques that can be used by those wishing to induce engagement and frustration during laboratory IIR studies.
【Keywords】: emotion; engagement; frustration; interactive information retrieval; physiological signals; search behavior
【Paper Link】 【Pages】:135-144
【Authors】: David Maxwell ; Leif Azzopardi ; Yashar Moshfeghi
【Abstract】: The design and presentation of a Search Engine Results Page (SERP) has been subject to much research. With many contemporary aspects of the SERP now under scrutiny, work still remains in investigating more traditional SERP components, such as the result summary. Prior studies have examined a variety of different aspects of result summaries, but in this paper we investigate the influence of result summary length on search behaviour, performance and user experience. To this end, we designed and conducted a within-subjects experiment using the TREC AQUAINT news collection with 53 participants. Using Kullback-Leibler distance as a measure of information gain, we examined result summaries of different lengths and selected four conditions where the change in information gain was the greatest: (i) title only; (ii) title plus one snippet; (iii) title plus two snippets; and (iv) title plus four snippets. Findings show that participants broadly preferred longer result summaries, as they were perceived to be more informative. However, their performance in terms of correctly identifying relevant documents was similar across all four conditions. Furthermore, while the participants felt that longer summaries were more informative, empirical observations suggest otherwise; while participants were more likely to click on relevant items given longer summaries, they also were more likely to click on non-relevant items. This shows that longer is not necessarily better, though participants perceived that to be the case - and second, they reveal a positive relationship between the length and informativeness of summaries and their attractiveness (i.e. clickthrough rates). These findings show that there are tensions between perception and performance when designing result summaries that need to be taken into account.
【Keywords】: information retrieval; interactive information retrieval; result summary; search; search behavior; search performance; serp; snippets; user study
【Paper Link】 【Pages】:145-154
【Authors】: Bahareh Sarrafzadeh ; Edward Lank
【Abstract】: In information retrieval and information visualization, hierarchies are a common tool to structure information into topics or facets, and network visualizations such as knowledge graphs link related concepts within a domain. In this paper, we explore a multi-layer extension to knowledge graphs, hierarchical knowledge graphs (HKGs), that combines hierarchical and network visualizations into a unified data representation. Through interaction logs, we show that HKGs preserve the benefits of single-layer knowledge graphs at conveying domain knowledge while incorporating the sense-making advantages of hierarchies for knowledge seeking tasks. Specially, this paper describes our algorithm to construct these visualizations, analyzes interaction logs to quantitatively demonstrate performance parity with networks and performance advantages over hierarchies, and synthesizes data from interaction logs, inter- views, and thinkalouds on a testbed data set to demonstrate the utility of the unified hierarchy+network structure in our HKGs.
【Keywords】: exploratory search; hierarchies; information seeking; knowledge graphs; representations of search results
【Paper Link】 【Pages】:155-164
【Authors】: Morgan Harvey ; Matthew Pointon
【Abstract】: Smart phones and tablets are rapidly becoming our main method of accessing information and are frequently used to perform on-the-go search tasks. Mobile devices are commonly used in situations where attention must be divided, such as when walking down a street. Research suggests that this increases cognitive load and, therefore, may have an impact on performance. In this work we conducted a laboratory experiment with both device types in which we simulated everyday, common mobile situations that may cause fragmented attention, impact search performance and affect user perception. Our results showed that the fragmented attention induced by the simulated conditions significantly affected both participants' objective and perceived search performance, as well as how hurried they felt and how engaged they were in the tasks. Furthermore, the type of device used also impacted how users felt about the search tasks, how well they performed and the amount of time they spent engaged in the tasks. These novel insights provide useful information to inform the design of future interfaces for mobile search and give us a greater understanding of how context and device size affect search behaviour and user experience.
【Keywords】: cognition; experimentation; fragmented attention; mobile search; search experience; user study
【Paper Link】 【Pages】:165-174
【Authors】: Rishabh Mehrotra ; Imed Zitouni ; Ahmed Hassan Awadallah ; Ahmed El Kholy ; Madian Khabsa
【Abstract】: Detecting and understanding implicit measures of user satisfaction are essential for meaningful experimentation aimed at enhancing web search quality. While most existing studies on satisfaction prediction rely on users' click activity and query reformulation behavior, often such signals are not available for all search sessions and as a result, not useful in predicting satisfaction. On the other hand, user interaction data (such as mouse cursor movement) is far richer than just click data and can provide useful signals for predicting user satisfaction. In this work, we focus on considering holistic view of user interaction with the search engine result page (SERP) and construct detailed universal interaction sequences of their activity. We propose novel ways of leveraging the universal interaction sequences to automatically extract informative, interpretable subsequences. In addition to extracting frequent, discriminatory and interleaved subsequences, we propose a Hawkes process model to incorporate temporal aspects of user interaction. Through extensive experimentation we show that encoding the extracted subsequences as features enables us to achieve significant improvements in predicting user satisfaction. We additionally present an analysis of the correlation between various subsequences and user satisfaction. Finally, we demonstrate the usefulness of the proposed approach in covering abandonment cases. Our findings provide a valuable tool for fine-grained analysis of user interaction behavior for metric development.
【Keywords】: hawkes process; interaction sequences; satisfaction; subsequences
【Paper Link】 【Pages】:175-184
【Authors】: Chunfeng Yang ; Huan Yan ; Donghan Yu ; Yong Li ; Dah Ming Chiu
【Abstract】: As online video service continues to grow in popularity, video content providers compete hard for more eyeball engagement. Some users visit multiple video sites to enjoy videos of their interest while some visit exclusively one site. However, due to the isolation of data, mining and exploiting user behaviors in multiple video websites remain unexplored so far. In this work, we try to model user preferences in six popular video websites with user viewing records obtained from a large ISP in China. The empirical study shows that users exhibit both consistent cross-site interests as well as site-specific interests. To represent this dichotomous pattern of user preferences, we propose a generative model of Multi-site Probabilistic Factorization (MPF) to capture both the cross-site as well as site-specific preferences. Besides, we discuss the design principle of our model by analyzing the sources of the observed site-specific user preferences, namely, site peculiarity and data sparsity. Through conducting extensive recommendation validation, we show that our MPF model achieves the best results compared to several other state-of-the-art factorization models with significant improvements of F-measure by 12.96%, 8.24% and 6.88%, respectively. Our findings provide insights on the value of integrating user data from multiple sites, which stimulates collaboration between video service providers.
【Keywords】: multi-site transfer learning; recommender systems; user modeling
【Paper Link】 【Pages】:185-194
【Authors】: Xiang Wang ; Xiangnan He ; Liqiang Nie ; Tat-Seng Chua
【Abstract】: Online platforms can be divided into information-oriented and social-oriented domains. The former refers to forums or E-commerce sites that emphasize user-item interactions, like Trip.com and Amazon; whereas the latter refers to social networking services (SNSs) that have rich user-user connections, such as Facebook and Twitter. Despite their heterogeneity, these two domains can be bridged by a few overlapping users, dubbed as bridge users. In this work, we address the problem of cross-domain social recommendation, i.e., recommending relevant items of information domains to potential users of social networks. To our knowledge, this is a new problem that has rarely been studied before. Existing cross-domain recommender systems are unsuitable for this task since they have either focused on homogeneous information domains or assumed that users are fully overlapped. Towards this end, we present a novel Neural Social Collaborative Ranking (NSCR) approach, which seamlessly sews up the user-item interactions in information domains and user-user connections in SNSs. In the information domain part, the attributes of users and items are leveraged to strengthen the embedding learning of users and items. In the SNS part, the embeddings of bridge users are propagated to learn the embeddings of other non-bridge users. Extensive experiments on two real-world datasets demonstrate the effectiveness and rationality of our NSCR method.
【Keywords】: cross-domain recommendation; deep collaborative filtering; deep learning; neural network
【Paper Link】 【Pages】:195-204
【Authors】: Aleksandr Farseev ; Ivan Samborskii ; Andrey Filchenkov ; Tat-Seng Chua
【Abstract】: Venue category recommendation is an essential application for the tourism and advertisement industries, wherein it may suggest attractive localities within close proximity to users' current location. Considering that many adults use more than three social networks simultaneously, it is reasonable to leverage on this rapidly growing multi-source social media data to boost venue recommendation performance. Another approach to achieve higher recommendation results is to utilize group knowledge, which is able to diversify recommendation output. Taking into account these two aspects, we introduce a novel cross-network collaborative recommendation framework C3R, which utilizes both individual and group knowledge, while being trained on data from multiple social media sources. Group knowledge is derived based on new cross-source user community detection approach, which utilizes both inter-source relationship and the ability of sources to complement each other. To fully utilize multi-source multi-view data, we process user-generated content by employing state-of-the-art text, image, and location processing techniques. Our experimental results demonstrate the superiority of our multi-source framework over state-of-the-art baselines and different data source combinations. In addition, we suggest a new approach for automatic construction of inter-network relationship graph based on the data, which eliminates the necessity of having pre-defined domain knowledge.
【Keywords】: cross-domain recommendation; cross-source recommendation;; data fusion; grassmann manifolds; multi-layer clustering; multi-source clustering; multi-source learning; multi-view learning; recommender systems; spectral clustering; user community detection
【Paper Link】 【Pages】:205-214
【Authors】: Xiang Chen ; Bowei Chen ; Mohan S. Kankanhalli
【Abstract】: Displaying banner advertisements (in short, ads) on webpages has usually been discussed as an Internet economics topic where a publisher uses auction models to sell an online user's page view to advertisers and the one with the highest bid can have her ad displayed to the user. This is also called real-time bidding (RTB) and the ad displaying process ensures that the publisher's benefit is maximized or there is an equilibrium in ad auctions. However, the benefits of the other two stakeholders - the advertiser and the user - have been rarely discussed. In this paper, we propose a two-stage computational framework that selects a banner ad based on the optimized trade-offs among all stakeholders. The first stage is still auction based and the second stage re-ranks ads by considering the benefits of all stakeholders. Our metric variables are: the publisher's revenue, the advertiser's utility, the ad memorability, the ad click-through rate (CTR), the contextual relevance, and the visual saliency. To the best of our knowledge, this is the first work that optimizes trade-offs among all stakeholders in RTB by incorporating multimedia metrics. An algorithm is also proposed to determine the optimal weights of the metric variables. We use both ad auction datasets and multimedia datasets to validate the proposed framework. Our experimental results show that the publisher can significantly improve the other stakeholders' benefits by slightly reducing her revenue in the short-term. In the long run, advertisers and users will be more engaged, the increased demand of advertising and the increased supply of page views can then boost the publisher's revenue.
【Keywords】: ad recommendation; displaying advertising; real-time bidding; trade-offs optimization
【Paper Link】 【Pages】:215-224
【Authors】: Rocío Cañamares ; Pablo Castells
【Abstract】: We develop a probabilistic formulation giving rise to a formal version of heuristic k nearest-neighbor (kNN) collaborative filtering. Different independence assumptions in our scheme lead to user-based, item-based, normalized and non-normalized variants that match in structure the traditional formulations, while showing equivalent empirical effectiveness. The probabilistic formulation provides a principled explanation why kNN is an effective recommendation strategy, and identifies a key condition for this to be the case. Moreover, a natural explanation arises for the bias of kNN towards recommending popular items. Thereupon the kNN variants are shown to fall into two groups with similar trends in behavior, corresponding to two different notions of item popularity. We show experiments where the comparative performance of the two groups of algorithms changes substantially, which suggests that the performance measurements and comparison may heavily depend on statistical properties of the input data sample.
【Keywords】: algorithmic bias; collaborative filtering; evaluation; popularity; probabilistic models; recommender systems
【Paper Link】 【Pages】:225-234
【Authors】: Zhaofan Qiu ; Yingwei Pan ; Ting Yao ; Tao Mei
【Abstract】: Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover, the results can suffer from the robustness problem as the data at training and test stage may come from different distributions. This paper studies the exploration of generating synthetic data through semi-supervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained end-to-end by jointly optimizing three losses, i.e., adversarial loss to correct label of synthetic or real for each sample, triplet ranking loss to preserve the relative similarity ordering in the input real-synthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR-10 and NUS-WIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to state-of-the-art deep hash models.
【Keywords】: cnn; gans; hashing; similarity learning
【Paper Link】 【Pages】:235-244
【Authors】: Liu Yang ; Susan T. Dumais ; Paul N. Bennett ; Ahmed Hassan Awadallah
【Abstract】: Email is still among the most popular online activities. People spend a significant amount of time sending, reading and responding to email in order to communicate with others, manage tasks and archive personal information. Most previous research on email is based on either relatively small data samples from user surveys and interviews, or on consumer email accounts such as those from Yahoo! Mail or Gmail. Much less has been published on how people interact with enterprise email even though it contains less automatically generated commercial email and involves more organizational behavior than is evident in personal accounts. In this paper, we extend previous work on predicting email reply behavior by looking at enterprise settings and considering more than dyadic communications. We characterize the influence of various factors such as email content and metadata, historical interaction features and temporal features on email reply behavior. We also develop models to predict whether a recipient will reply to an email and how long it will take to do so. Experiments with the publicly-available Avocado email collection show that our methods outperform all baselines with large gains. We also analyze the importance of different features on reply behavior predictions. Our findings provide new insights about how people interact with enterprise email and have implications for the design of the next generation of email clients.
【Keywords】: email reply behavior; information overload; user behavior modeling
【Paper Link】 【Pages】:245-254
【Authors】: Chao Zhang ; Keyang Zhang ; Quan Yuan ; Fangbo Tao ; Luming Zhang ; Tim Hanratty ; Jiawei Han
【Abstract】: Spatiotemporalactivity modeling is an important task for applications like tour recommendation and place search. The recently developed geographical topic models have demonstrated compelling results in using geo-tagged social media (GTSM) for spatiotemporal activity modeling. Nevertheless, they all operate in batch and cannot dynamically accommodate the latest information in the GTSM stream to reveal up-to-date spatiotemporal activities. We propose ReAct, a method that processes continuous GTSM streams and obtains recency-aware spatiotemporal activity models on the fly. Distinguished from existing topic-based methods, ReAct embeds all the regions, hours, and keywords into the same latent space to capture their correlations. To generate high-quality embeddings, it adopts a novel semi-supervised multimodal embedding paradigm that leverages the activity category information to guide the embedding process. Furthermore, as new records arrive continuously, it employs strategies to effectively incorporate the new information while preserving the knowledge encoded in previous embeddings. Our experiments on the geo-tagged tweet streams in two major cities have shown that ReAct significantly outperforms existing methods for location and activity retrieval tasks.
【Keywords】: information retrieval; location-based service; multimodal embedding; online learning; representation learning; social media; spatiotemporal data mining
【Paper Link】 【Pages】:255-264
【Authors】: Shuo Zhang ; Krisztian Balog
【Abstract】: Tables are among the most powerful and practical tools for organizing and working with data. Our motivation is to equip spreadsheet programs with smart assistance capabilities. We concentrate on one particular family of tables, namely, tables with an entity focus. We introduce and focus on two specifc tasks: populating rows with additional instances (entities) and populating columns with new headings. We develop generative probabilistic models for both tasks. For estimating the components of these models, we consider a knowledge base as well as a large table corpus. Our experimental evaluation simulates the various stages of the user entering content into an actual table. A detailed analysis of the results shows that the models' components are complimentary and that our methods outperform existing approaches from the literature.
【Keywords】: intelligent table assistance; semantic search; table completion
【Paper Link】 【Pages】:265-274
【Authors】: Jin Young Kim ; Nick Craswell ; Susan T. Dumais ; Filip Radlinski ; Fang Liu
【Abstract】: Email has been a dominant form of communication for many years, and email search is an important problem. In contrast to other search setting, such as web search, there have been few studies of user behavior and models of email search success. Research in email search is challenging for many reasons including the personal and private nature of the collection. Third party judges can not look at email search queries or email message content requiring new modeling techniques. In this study, we built an opt-in client application which monitors a user's email search activity and then pops up an in-situ survey when a search session is finished. We then merged the survey data with server-side behavioral logs. This approach allows us to study the relationship between session-level outcome and user behavior, and then build a model to predict success for email search based on behavioral interaction patterns. Our results show that generative models (MarkovChain) of success can predict the session-level success of email search better than baseline heuristics and discriminative models (RandomForest). The success model makes use of email-specific log activities such as reply, forward and move, as well as generic signals such as click with long dwell time. The learned model is highly interpretable, and reusable in that it can be applied to unlabeled interaction logs in the future.
【Keywords】: email search; log analysis; user study
【Paper Link】 【Pages】:275-284
【Authors】: Xiaohui Xie ; Yiqun Liu ; Xiaochuan Wang ; Meng Wang ; Zhijing Wu ; Yingying Wu ; Min Zhang ; Shaoping Ma
【Abstract】: Image search engines show results differently from general Web search engines in three key ways: (1) most Web-based image search engines adopt the two-dimensional result placement instead of the linear result list; (2) image searches show snapshots instead of snippets (query-dependent abstracts of landing pages) on search engine result pages (SERPs); and (3) pagination is usually not (explicitly) supported on image search SERPs, and users can view results without having to click on the "next page'' button. Compared with the extensive study of user behavior in general Web search scenarios, there exists no thorough investigation how the different interaction mechanism of image search engines affects users' examination behavior. To shed light on this research question, we conducted an eye-tracking study to investigate users' examination behavior in image searches. We focus on the impacts of factors in examination including position, visual saliency, edge density, the existence of textual information, and human faces in result images. Three interesting findings indicate users' behavior biases: (1) instead of the traditional "Golden Triangle'' phenomena in the user examination patterns of general Web search, we observe a middle-position bias, (2) besides the position factor, the content of image results (e.g., visual saliency) affects examination behavior, and (3) some popular behavior assumptions in general Web search (e.g., examination hypothesis) do not hold in image search scenarios. We predict users' examination behavior with different impact factors. Results show that combining position and visual content features can improve prediction in image searches.
【Keywords】: examination behavior; eye-tracking; image search
【Paper Link】 【Pages】:285-294
【Authors】: Rishabh Mehrotra ; Emine Yilmaz
【Abstract】: A significant amount of search queries originate from some real world information need or tasks [13]. In order to improve the search experience of the end users, it is important to have accurate representations of tasks. As a result, significant amount of research has been devoted to extracting proper representations of tasks in order to enable search systems to help users complete their tasks, as well as providing the end user with better query suggestions [9], for better recommendations [41], for satisfaction prediction [36] and for improved personalization in terms of tasks [24, 38]. Most existing task extraction methodologies focus on representing tasks as flat structures. However, tasks often tend to have multiple subtasks associated with them and a more naturalistic representation of tasks would be in terms of a hierarchy, where each task can be composed of multiple (sub)tasks. To this end, we propose an efficient Bayesian nonparametric model for extracting hierarchies of such tasks & subtasks. We evaluate our method based on real world query log data both through quantitative and crowdsourced experiments and highlight the importance of considering task/subtask hierarchies.
【Keywords】: bayesian non-parametrics; hierarchical model; search tasks
【Paper Link】 【Pages】:295-304
【Authors】: Kevin Ong ; Kalervo Järvelin ; Mark Sanderson ; Falk Scholer
【Abstract】: This paper investigates if Information Foraging Theory can be used to understand differences in user behavior when searching on mobile and desktop web search systems. Two groups of thirty-six participants were recruited to carry out six identical web search tasks on desktop or on mobile. The search tasks were prepared with a different number and distribution of relevant documents on the first result page. Search behaviors on mobile and desktop were measurably different. Desktop participants viewed and clicked on more results but saved fewer as relevant, compared to mobile participants, when information scent level increased. Mobile participants achieved higher search accuracy than desktop participants for tasks with increasing numbers of relevant search results. Conversely, desktop participants were more accurate than mobile participants for tasks with an equal number of relevant results that were more distributed across the results page. Overall, both an increased number and better positioning of relevant search results improved the ability of participants to locate relevant results on both desktop and mobile. Participants spent more time and issued more queries on desktop, but abandoned less and saved more results for initial queries on mobile.
【Keywords】: information foraging theory; search process; search stopping
【Paper Link】 【Pages】:305-314
【Authors】: Bora Edizel ; Amin Mantrach ; Xiao Bai
【Abstract】: Predicting the click-through rate of an advertisement is a critical component of online advertising platforms. In sponsored search, the click-through rate estimates the probability that a displayed advertisement is clicked by a user after she submits a query to the search engine. Commercial search engines typically rely on machine learning models trained with a large number of features to make such predictions. This inevitably requires a lot of engineering efforts to define, compute, and select the appropriate features. In this paper, we propose two novel approaches (one working at character level and the other working at word level) that use deep convolutional neural networks to predict the click-through rate of a query-advertisement pair. Specifically, the proposed architectures only consider the textual content appearing in a query-advertisement pair as input, and produce as output a click-through rate prediction. By comparing the character-level model with the word-level model, we show that language representation can be learnt from scratch at character level when trained on enough data. Through extensive experiments using billions of query-advertisement pairs of a popular commercial search engine, we demonstrate that both approaches significantly outperform a baseline model built on well-selected text features and a state-of-the-art word2vec-based approach. Finally, by combining the predictions of the deep models introduced in this study with the prediction of the model in production of the same commercial search engine, we significantly improve the accuracy and the calibration of the click-through rate prediction of the production system.
【Keywords】: ctr prediction; deep learning; nlp; online advertising; sponsored search
【Paper Link】 【Pages】:315-324
【Authors】: Xu Chen ; Yongfeng Zhang ; Qingyao Ai ; Hongteng Xu ; Junchi Yan ; Zheng Qin
【Abstract】: Key frames are playing a very important role for many video applications, such as on-line movie preview and video information retrieval. Although a number of key frame selection methods have been proposed in the past, existing technologies mainly focus on how to precisely summarize the video content, but seldom take the user preferences into consideration. However, in real scenarios, people may cast diverse interests on the contents even for the same video, and thus they may be attracted by quite different key frames, which makes the selection of key frames an inherently personalized process. In this paper, we propose and investigate the problem of personalized key frame recommendation to bridge the above gap. To do so, we make use of video images and user time-synchronized comments to design a novel key frame recommender that can simultaneously model visual and textual features in a unified framework. By user personalization based on her/his previously reviewed frames and posted comments, we are able to encode different user interests in a unified multi-modal space, and can thus select key frames in a personalized manner, which, to the best of our knowledge, is the first time in the research field of video content analysis. Experimental results show that our method performs better than its competitors on various measures.
【Keywords】: collaborative filtering; key frame; personalization; recommender systems; video content analysis
【Paper Link】 【Pages】:325-334
【Authors】: Kwan Hui Lim ; Jeffrey Chan ; Shanika Karunasekera ; Christopher Leckie
【Abstract】: Personalized itinerary recommendation is a complex and time-consuming problem, due to the need to recommend popular attractions that are aligned to the interest preferences of a tourist, and to plan these attraction visits as an itinerary that has to be completed within a specific time limit. Furthermore, many existing itinerary recommendation systems do not automatically determine and consider queuing times at attractions in the recommended itinerary, which varies based on the time of visit to the attraction, e.g., longer queuing times at peak hours. To solve these challenges, we propose the PersQ algorithm for recommending personalized itineraries that take into consideration attraction popularity, user interests and queuing times. We also implement a framework that utilizes geo-tagged photos to derive attraction popularity, user interests and queuing times, which PersQ uses to recommend personalized and queue-aware itineraries. We demonstrate the effectiveness of PersQ in the context of five major theme parks, based on a Flickr dataset spanning nine years. Experimental results show that PersQ outperforms various state-of-the-art baselines, in terms of various queuing-time related metrics, itinerary popularity, user interest alignment, recall, precision and F1-score.
【Keywords】: personalization; tour recommendations; trip planning; user interests
【Paper Link】 【Pages】:335-344
【Authors】: Jingyuan Chen ; Hanwang Zhang ; Xiangnan He ; Liqiang Nie ; Wei Liu ; Tat-Seng Chua
【Abstract】: Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (e.g., product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists item- and component-level implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (e.g. photos, videos, songs, etc.) are unknown, while the component-level implicitness means that inside each item users' preferences on different components (e.g. regions in an image, frames of a video, etc.) are unknown. For example, a 'view'' on a video does not provide any specific information about how the user likes the video (i.e.item-level) and which parts of the video the user is interested in (i.e.component-level). In this paper, we introduce a novel attention mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (e.g. CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.
【Keywords】: attention; collaborative filtering; implicit feedback; multimedia recommendation
【Paper Link】 【Pages】:345-354
【Authors】: Piji Li ; Zihao Wang ; Zhaochun Ren ; Lidong Bing ; Wai Lam
【Abstract】: Recently, some E-commerce sites launch a new interaction box called Tips on their mobile apps. Users can express their experience and feelings or provide suggestions using short texts typically several words or one sentence. In essence, writing some tips and giving a numerical rating are two facets of a user's product assessment action, expressing the user experience and feelings. Jointly modeling these two facets is helpful for designing a better recommendation system. While some existing models integrate text information such as item specifications or user reviews into user and item latent factors for improving the rating prediction, no existing works consider tips for improving recommendation quality. We propose a deep learning based framework named NRT which can simultaneously predict precise ratings and generate abstractive tips with good linguistic quality simulating user experience and feelings. For abstractive tips generation, gated recurrent neural networks are employed to "translate'' user and item latent representations into a concise sentence. Extensive experiments on benchmark datasets from different domains show that NRT achieves significant improvements over the state-of-the-art methods. Moreover, the generated tips can vividly predict the user experience and feelings.
【Keywords】: deep learning; rating prediction; tips generation
【Paper Link】 【Pages】:355-364
【Authors】: Xiangnan He ; Tat-Seng Chua
【Abstract】: Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features. Factorization Machines (FMs) are a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the Wide&Deep by Google and DeepCross by Microsoft, the deep structure meanwhile makes them difficult to train. In this paper, we propose a novel model Neural Factorization Machine (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.
【Keywords】: deep learning; factorization machines; neural networks; recommendation; regression; sparse data
【Paper Link】 【Pages】:365-374
【Authors】: Renqin Cai ; Chi Wang ; Hongning Wang
【Abstract】: One important way for people to make their voice heard is to comment on the articles they have read online, such as news reports and each other's posts. The user-generated comments together with the commented documents form a unique correspondence structure. Properly modeling the dependency in such data is thus vital for one to obtain accurate insight of people's opinions and attention. In this work, we develop a Commented Correspondence Topic Model to model correspondence in commented text data. We focus on two levels of correspondence. First, to capture topic-level correspondence, we treat the topic assignments in commented documents as the prior to their comments' topic proportions. This captures the thematic dependency between commented documents and their comments. Second, to capture word-level correspondence, we utilize the Dirichlet compound multinomial distribution to model topics. This captures the word repetition patterns within the commented data. By integrating these two aspects, our model demonstrated encouraging performance in capturing the correspondence sturcture, which provides improved results in modeling user-generated content, spam comment detection, and sentence-based comment retrieval compared with state-of-the-art topic model solutions for correspondence modeling.
【Keywords】: social media; text correspondence modeling; topic models; user comments
【Paper Link】 【Pages】:375-384
【Authors】: Bei Shi ; Wai Lam ; Shoaib Jameel ; Steven Schockaert ; Kwun Ping Lai
【Abstract】: Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step'' methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way.
【Keywords】: document modeling; topic model; word embedding
【Paper Link】 【Pages】:385-394
【Authors】: Flavio Chierichetti ; Ravi Kumar ; Bo Pang
【Abstract】: About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. The word frequency distributions produced by this model are shown to match the observations both analytically and empirically.
【Keywords】: double pareto; generative model; word frequency distribution
【Paper Link】 【Pages】:395-404
【Authors】: Peter Bailey ; Alistair Moffat ; Falk Scholer ; Paul Thomas
【Abstract】: A search engine that can return the ideal results for a person's information need, independent of the specific query that is used to express that need, would be preferable to one that is overly swayed by the individual terms used; search engines should be consistent in the presence of syntactic query variations responding to the same information need. In this paper we examine the retrieval consistency of a set of five systems responding to syntactic query variations over one hundred topics, working with the UQV100 test collection, and using Rank-Biased Overlap (RBO) relative to a centroid ranking over the query variations per topic as a measure of consistency. We also introduce a new data fusion algorithm, Rank-Biased Centroid (RBC), for constructing a centroid ranking over a set of rankings from query variations for a topic. RBC is compared with alternative data fusion algorithms. Our results indicate that consistency is positively correlated to a moderate degree with "deep'' relevance measures. However, it is only weakly correlated with "shallow'' relevance measures, as well as measures of topic complexity and variety in query expression. These findings support the notion that consistency is an independent property of a search engine's retrieval effectiveness.
【Keywords】: retrieval consistency; semantic effectiveness; test collections
【Paper Link】 【Pages】:405-414
【Authors】: Jiepu Jiang ; Daqing He ; James Allan
【Abstract】: To address concerns of TREC-style relevance judgments, we explore two improvements. The first one seeks to make relevance judgments contextual, collecting in situ feedback of users in an interactive search session and embracing usefulness as the primary judgment criterion. The second one collects multidimensional assessments to complement relevance or usefulness judgments, with four distinct alternative aspects examined in this paper - novelty, understandability, reliability, and effort. We evaluate different types of judgments by correlating them with six user experience measures collected from a lab user study. Results show that switching from TREC-style relevance criteria to usefulness is fruitful, but in situ judgments do not exhibit clear benefits over the judgments collected without context. In contrast, combining relevance or usefulness with the four alternative judgments consistently improves the correlation with user experience measures, suggesting future IR systems should adopt multi-aspect search result judgments in development and evaluation. We further examine implicit feedback techniques for predicting these judgments. We find that click dwell time, a popular indicator of search result quality, is able to predict some but not all dimensions of the judgments. We enrich the current implicit feedback methods using post-click user interaction in a search session and achieve better prediction for all six dimensions of judgments.
【Keywords】: implicit feedback; relevance judgment; search experience
【Paper Link】 【Pages】:415-424
【Authors】: Adam Roegiest ; Luchen Tan ; Jimmy Lin
【Abstract】: Real-time push notification systems monitor continuous document streams such as social media posts and alert users to relevant content directly on their mobile devices. We describe a user study of such systems in the context of the TREC 2016 Real-Time Summarization Track, where system updates are immediately delivered as push notifications to the mobile devices of a cohort of users. Our study represents, to our knowledge, the first deployment of an interleaved evaluation framework for prospective information needs, and also provides an opportunity to examine user behavior in a realistic setting. Results of our online in-situ evaluation are correlated against the results a more traditional post-hoc batch evaluation. We observe substantial correlations between many online and batch evaluation metrics, especially for those that share the same basic design (e.g., are utility-based). For some metrics, we observe little correlation, but are able to identify the volume of messages that a system pushes as one major source of differences.
【Keywords】: microblogs; trec; user study
【Paper Link】 【Pages】:425-434
【Authors】: Fan Zhang ; Yiqun Liu ; Xin Li ; Min Zhang ; Yinghui Xu ; Shaoping Ma
【Abstract】: The design of a Web search evaluation metric is closely related with how the user's interaction process is modeled. Each behavioral model results in a different metric used to evaluate search performance. In these models and the user behavior assumptions behind them, when a user ends a search session is one of the prime concerns because it is highly related to both benefit and cost estimation. Existing metric design usually adopts some simplified criteria to decide the stopping time point: (1) upper limit for benefit (e.g. RR, AP); (2) upper limit for cost (e.g. [email protected], [email protected]). However, in many practical search sessions (e.g. exploratory search), the stopping criterion is more complex than the simplified case. Analyzing benefit and cost of actual users' search sessions, we find that the stopping criteria vary with search tasks and are usually combination effects of both benefit and cost factors. Inspired by a popular computer game named Bejeweled, we propose a Bejeweled Player Model (BPM) to simulate users' search interaction processes and evaluate their search performances. In the BPM, a user stops when he/she either has found sufficient useful information or has no more patience to continue. Given this assumption, a new evaluation framework based on upper limits (either fixed or changeable as search proceeds) for both benefit and cost is proposed. We show how to derive a new metric from the framework and demonstrate that it can be adopted to revise traditional metrics like Discounted Cumulative Gain (DCG), Expected Reciprocal Rank (ERR) and Average Precision (AP). To show effectiveness of the proposed framework, we compare it with a number of existing metrics in terms of correlation between user satisfaction and the metrics based on a dataset that collects users' explicit satisfaction feedbacks and assessors' relevance judgements. Experiment results show that the framework is better correlated with user satisfaction feedbacks.
【Keywords】: benefit and cost; evaluation metrics; user model
【Paper Link】 【Pages】:435-444
【Authors】: Cheng Luo ; Yiqun Liu ; Tetsuya Sakai ; Fan Zhang ; Min Zhang ; Shaoping Ma
【Abstract】: Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.
【Keywords】: evaluation metric; mobile search; user behavior
【Paper Link】 【Pages】:445-454
【Authors】: Ruey-Cheng Chen ; Luke Gallagher ; Roi Blanco ; J. Shane Culpepper
【Abstract】: Complex machine learning models are now an integral part of modern, large-scale retrieval systems. However, collection size growth continues to outpace advances in efficiency improvements in the learning models which achieve the highest effectiveness. In this paper, we re-examine the importance of tightly integrating feature costs into multi-stage learning-to-rank (LTR) IR systems. We present a novel approach to optimizing cascaded ranking models which can directly leverage a variety of different state-of-the-art LTR rankers such as LambdaMART and Gradient Boosted Decision Trees. Using our cascade model, we conclusively show that feature costs and the number of documents being re-ranked in each stage of the cascade can be balanced to maximize both efficiency and effectiveness. Finally, we also demonstrate that our cascade model can easily be deployed on commonly used collections to achieve state-of-the-art effectiveness results while only using a subset of the features required by the full model.
【Keywords】: cascade ranking model; efficiency-effectiveness tradeoffs; learning to rank
【Paper Link】 【Pages】:455-464
【Authors】: Fuli Feng ; Liqiang Nie ; Xiang Wang ; Richang Hong ; Tat-Seng Chua
【Abstract】: Many professional organizations produce regular reports of social indicators to monitor social progress. Despite their reasonable results and societal value, early efforts on social indicator computing suffer from three problems: 1) labor-intensive data gathering, 2) insufficient data, and 3) expert-relied data fusion. Towards this end, we present a novel graph-based multi-channel ranking scheme for social indicator computation by exploring the rich multi-channel Web data. For each channel, this scheme presents the semi-structured and unstructured data with simple graphs and hypergraphs, respectively. It then groups the channels into different clusters according to their correlations. After that, it uses a unified model to learn the cluster-wise common spaces, perform ranking separately upon each space, and fuse these rankings to produce the final one. We take Chinese university ranking as a case study and validate our scheme over a real-world dataset. It is worth emphasizing that our scheme is applicable to computation of other social indicators, such as Educational attainment.
【Keywords】: computational social indicators; university ranking
【Paper Link】 【Pages】:465-474
【Authors】: Nimrod Raifer ; Fiana Raiber ; Moshe Tennenholtz ; Oren Kurland
【Abstract】: In competitive search settings as the Web, there is an ongoing ranking competition between document authors (publishers) for certain queries. The goal is to have documents highly ranked, and the means is document manipulation applied in response to rankings. Existing retrieval models, and their theoretical underpinnings (e.g., the probability ranking principle), do not account for post-ranking corpus dynamics driven by this strategic behavior of publishers. However, the dynamics has major effect on retrieval effectiveness since it affects content availability in the corpus. Furthermore, while manipulation strategies observed over the Web were reported in past literature, they were not analyzed as ongoing, and changing, post-ranking response strategies, nor were they connected to the foundations of classical ad hoc retrieval models (e.g., content-based document-query surface level similarities and document relevance priors). We present a novel theoretical and empirical analysis of the strategic behavior of publishers using these foundations. Empirical analysis of controlled ranking competitions that we organized reveals a key strategy of publishers: making their documents (gradually) become similar to documents ranked the highest in previous rankings. Our theoretical analysis of the ranking competition as a repeated game, and its minmax regret equilibrium, yields a result that supports the merits of this publishing strategy. We further show that it can be predicted with high accuracy, and without explicit knowledge of the ranking function, whether documents will be promoted to the highest rank in our competitions. The prediction utilizes very few features which quantify changes of documents, specifically with respect to those previously ranked the highest.
【Keywords】: ad hoc retrieval; game theory; ranking competition
【Paper Link】 【Pages】:475-484
【Authors】: Shubhra Kanti Karmaker Santu ; Parikshit Sondhi ; ChengXiang Zhai
【Abstract】: E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of LETOR for web search has been well studied, its use for E-Com search has not yet been well explored. In this paper, we discuss the practical challenges in applying learning to rank methods to E-Com search, including the challenges in feature representation, obtaining reliable relevance judgments, and optimally exploiting multiple user feedback signals such as click rates, add-to-cart ratios, order rates, and revenue. We study these new challenges using experiments on industry data sets and report several interesting findings that can provide guidance on how to optimally apply LETOR to E-Com search: First, popularity-based features defined solely on product items are very useful and LETOR methods were able to effectively optimize their combination with relevance-based features. Second, query attribute sparsity raises challenges for LETOR, and selecting features to reduce/avoid sparsity is beneficial. Third, while crowdsourcing is often useful for obtaining relevance judgments for Web search, it does not work as well for E-Com search due to difficulty in eliciting sufficiently fine grained relevance judgments. Finally, among the multiple feedback signals, the order rate is found to be the most robust training objective, followed by click rate, while add-to-cart ratio seems least robust, suggesting that an effective practical strategy may be to initially use click rates for training and gradually shift to using order rates as they become available.
【Keywords】: e-commerce search; information retrieval; learning to rank
【Paper Link】 【Pages】:485-494
【Authors】: Rafael Glater ; Rodrygo L. T. Santos ; Nivio Ziviani
【Abstract】: Query understanding is a challenging task primarily due to the inherent ambiguity of natural language. A common strategy for improving the understanding of natural language queries is to annotate them with semantic information mined from a knowledge base. Nevertheless, queries with different intents may arguably benefit from specialized annotation strategies. For instance, some queries could be effectively annotated with a single entity or an entity attribute, others could be better represented by a list of entities of a single type or by entities of multiple distinct types, and others may be simply ambiguous. In this paper, we propose a framework for learning semantic query annotations suitable to the target intent of each individual query. Thorough experiments on a publicly available benchmark show that our proposed approach can significantly improve state-of-the-art intent-agnostic approaches based on Markov random fields and learning to rank. Our results further demonstrate the consistent effectiveness of our approach for queries of various target intents, lengths, and difficulty levels, as well as its robustness to noise in intent detection.
【Keywords】: intent-aware; learning to rank; semantic query annotation
【Paper Link】 【Pages】:495-504
【Authors】: Craig MacDonald ; Nicola Tonellotto ; Iadh Ounis
【Abstract】: To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine.
【Keywords】: complex operators; proximity retrieval; query efficiency prediction; query rewriting
【Paper Link】 【Pages】:505-514
【Authors】: Hamed Zamani ; W. Bruce Croft
【Abstract】: Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.
【Keywords】: embedding vector; neural network; query classification; query expansion; word representation
【Paper Link】 【Pages】:515-524
【Authors】: Jun Wang ; Lantao Yu ; Weinan Zhang ; Yu Gong ; Yinghui Xu ; Benyou Wang ; Peng Zhang ; Dell Zhang
【Abstract】: This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on [email protected] and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.
【Keywords】: adversarial training; information retrieval; information retrieval models; question answering; recommender systems; web search
【Paper Link】 【Pages】:525-534
【Authors】: Jung Hyun Kim ; Mao-Lin Li ; K. Selçuk Candan ; Maria Luisa Sapino
【Abstract】: Measures of node ranking, such as personalized PageRank, are utilized in many web and social-network based prediction and recommendation applications. Despite their effectiveness when the underlying graph is certain, however, these measures become difficult to apply in the presence of uncertainties, as they are not designed for graphs that include uncertain information, such as edges that mutually exclude each other. While there are several ways to naively extend existing techniques (such as trying to encode uncertainties as edge weights or computing all possible scenarios), as we discuss in this paper, these either lead to large degrees of errors or are very expensive to compute, as the number of possible worlds can grow exponentially with the amount of uncertainty. To tackle with this challenge, in this paper, we propose an efficient Uncertain Personalized PageRank (UPPR) algorithm to approximately compute personalized PageRank values on an uncertain graph with edge uncertainties. UPPR avoids enumeration of all possible worlds, yet it is able to achieve comparable accuracy by carefully encoding edge uncertainties in a data structure that leads to fast approximations. Experimental results show that UPPR is very efficient in terms of execution time and its accuracy is comparable or better than more costly alternatives.
【Keywords】: personalized pagerank; uncertain edges; uncertainty
【Paper Link】 【Pages】:535-544
【Authors】: Long Xia ; Jun Xu ; Yanyan Lan ; Jiafeng Guo ; Wei Zeng ; Xueqi Cheng
【Abstract】: In this paper we address the issue of learning diverse ranking models for search result diversification. Typical methods treat the problem of constructing a diverse ranking as a process of sequential document selection. At each ranking position, the document that can provide the largest amount of additional information to the users is selected, because the search users usually browse the documents in a top-down manner. Thus, to select an optimal document for a position, it is critical for a diverse ranking model to capture the utility of information the user have perceived from the preceding documents. Existing methods usually calculate the ranking scores (e.g., the marginal relevance) directly based on the query and the selected documents, with heuristic rules or handcrafted features. The utility the user perceived at each of the ranks, however, is not explicitly modeled. In this paper, we present a novel diverse ranking model on the basis of continuous state Markov decision process (MDP) in which the user perceived utility is modeled as a part of the MDP state. Our model, referred to as MDP-DIV, sequentially takes the actions of selecting one document according to current state, and then updates the state for the chosen of the next action. The transition of the states are modeled in a recurrent manner and the model parameters are learned with policy gradient. Experimental results based on the TREC benchmarks showed that MDP-DIV can significantly outperform the state-of-the-art baselines.
【Keywords】: learning to rank; markov decision process; search result diversification
【Paper Link】 【Pages】:545-554
【Authors】: Zhengbao Jiang ; Ji-Rong Wen ; Zhicheng Dou ; Wayne Xin Zhao ; Jian-Yun Nie ; Ming Yue
【Abstract】: Search result diversification aims to retrieve diverse results to satisfy as many different information needs as possible. Supervised methods have been proposed recently to learn ranking functions and they have been shown to produce superior results to unsupervised methods. However, these methods use implicit approaches based on the principle of Maximal Marginal Relevance (MMR). In this paper, we propose a learning framework for explicit result diversification where subtopics are explicitly modeled. Based on the information contained in the sequence of selected documents, we use attention mechanism to capture the subtopics to be focused on while selecting the next document, which naturally fits our task of document selection for diversification. The framework is implemented using recurrent neural networks and max-pooling which combine distributed representations and traditional relevance features. Our experiments show that the proposed method significantly outperforms all the existing methods.
【Keywords】: attention; search result diversification; subtopics
【Paper Link】 【Pages】:555-564
【Authors】: Rohail Syed ; Kevyn Collins-Thompson
【Abstract】: While search technology is widely used for learning-oriented information needs, the results provided by popular services such as Web search engines are optimized primarily for generic relevance, not effective learning outcomes. As a result, the typical information trail that a user must follow while searching to achieve a learning goal may be an inefficient one involving unnecessarily easy or difficult content, or material that is irrelevant to actual learning progress relative to a user's existing knowledge. We address this problem by introducing a novel theoretical framework, algorithms, and empirical analysis of an information retrieval model that is optimized for learning outcomes instead of generic relevance. We do this by formulating an optimization problem that incorporates a cognitive learning model into a retrieval objective, and then give an algorithm for an efficient approximate solution to find the search results that represent the best 'training set' for a human learner. Our model can personalize results for an individual user's learning goals, as well as account for the effort required to achieve those goals for a given set of retrieval results. We investigate the effectiveness and efficiency of our retrieval framework relative to a commercial search engine baseline ('Google') through a crowdsourced user study involving a vocabulary learning task, and demonstrate the effectiveness of personalized results from our model on word learning outcomes.
【Keywords】: assessment of learning in search; intrinsic diversity; retrieval models and ranking
【Paper Link】 【Pages】:565-574
【Authors】: Sergei Ivanov ; Konstantinos Theocharidis ; Manolis Terrovitis ; Panagiotis Karras
【Abstract】: How do we create content that will become viral in a whole network after we share it with friends or followers' Significant research activity has been dedicated to the problem of strategically selecting a seed set of initial adopters so as to maximize a meme's spread in a network. This line of work assumes that the success of such a campaign depends solely on the choice of a tunable seed set of adopters, while the way users perceive the propagated meme is fixed. Yet, in many real-world settings, the opposite holds: a meme's propagation depends on users' perceptions of its tunable characteristics, while the set of initiators is fixed. In this paper, we address the natural problem that arises in such circumstances: Suggest content, expressed as a limited set of attributes, for a creative promotion campaign that starts out from a given seed set of initiators, so as to maximize its expected spread over a social network. To our knowledge, no previous work addresses this problem. We find that the problem is NP-hard and inapproximable. As a tight approximation guarantee is not admissible, we design an efficient heuristic, Explore-Update, as well as a conventional Greedy solution. Our experimental evaluation demonstrates that Explore-Update selects near-optimal attribute sets with real data, achieves 30% higher spread than baselines, and runs an order of magnitude faster than the Greedy solution.
【Keywords】: content recommendation; influence maximization; meme propagation; viral spread
【Paper Link】 【Pages】:575-584
【Authors】: David Elsweiler ; Christoph Trattner ; Morgan Harvey
【Abstract】: By incorporating healthiness into the food recommendation / ranking process we have the potential to improve the eating habits of a growing number of people who use the Internet as a source of food inspiration. In this paper, using insights gained from various data sources, we explore the feasibility of substituting meals that would typically be recommended to users with similar, healthier dishes. First, by analysing a recipe collection sourced from Allrecipes.com, we quantify the potential for finding replacement recipes, which are comparable but have different nutritional characteristics and are nevertheless highly rated by users. Building on this, we present two controlled user studies (n=107, n=111) investigating how people perceive and select recipes. We show participants are unable to reliably identify which recipe contains most fat due to their answers being biased by lack of information, misleading cues and limited nutritional knowledge on their part. By applying machine learning techniques to predict the preferred recipes, good performance can be achieved using low-level image features and recipe meta-data as predictors. Despite not being able to consciously determine which of two recipes contains most fat, on average, participants select the recipe with the most fat as their preference. The importance of image features reveals that recipe choices are often visually driven. A final user study (n=138) investigates to what extent the predictive models can be used to select recipe replacements such that users can be ``nudged'' towards choosing healthier recipes. Our findings have important implications for online food systems.
【Keywords】: behavioural change; food recsys; human decision making; information behaviour
【Paper Link】 【Pages】:585-594
【Authors】: Da Cao ; Liqiang Nie ; Xiangnan He ; Xiaochi Wei ; Shunzhi Zhu ; Tat-Seng Chua
【Abstract】: Existing recommender algorithms mainly focused on recommending individual items by utilizing user-item interactions. However, little attention has been paid to recommend user generated lists (e.g., playlists and booklists). On one hand, user generated lists contain rich signal about item co-occurrence, as items within a list are usually gathered based on a specific theme. On the other hand, a user's preference over a list also indicate her preference over items within the list. We believe that 1) if the rich relevance signal within user generated lists can be properly leveraged, an enhanced recommendation for individual items can be provided, and 2) if user-item and user-list interactions are properly utilized, and the relationship between a list and its contained items is discovered, the performance of user-item and user-list recommendations can be mutually reinforced. Towards this end, we devise embedding factorization models, which extend traditional factorization method by incorporating item-item (item-item-list) co-occurrence with embedding-based algorithms. Specifically, we employ factorization model to capture users' preferences over items and lists, and utilize embedding-based models to discover the co-occurrence information among items and lists. The gap between the two types of models is bridged by sharing items' latent factors. Remarkably, our proposed framework is capable of solving the new-item cold-start problem, where items have never been consumed by users but exist in user generated lists. Overall performance comparisons and micro-level analyses demonstrate the promising performance of our proposed approaches.
【Keywords】: co-occurrence information; cold-start problem; embedding-based model; factorization model; recommender systems
【Paper Link】 【Pages】:595-604
【Authors】: Fumin Shen ; Yadong Mu ; Yang Yang ; Wei Liu ; Li Liu ; Jingkuan Song ; Heng Tao Shen
【Abstract】: This paper proposes a generic formulation that significantly expedites the training and deployment of image classification models, particularly under the scenarios of many image categories and high feature dimensions. As the core idea, our method represents both the images and learned classifiers using binary hash codes, which are simultaneously learned from the training data. Classifying an image thereby reduces to retrieving its nearest class codes in the Hamming space. Specifically, we formulate multiclass image classification as an optimization problem over binary variables. The optimization alternatingly proceeds over the binary classifiers and image hash codes. Profiting from the special property of binary codes, we show that the sub-problems can be efficiently solved through either a binary quadratic program (BQP) or a linear program. In particular, for attacking the BQP problem, we propose a novel bit-flipping procedure which enjoys high efficacy and a local optimality guarantee. Our formulation supports a large family of empirical loss functions and is, in specific, instantiated by exponential and linear losses. Comprehensive evaluations are conducted on several representative image benchmarks. The experiments consistently exhibit reduced computational and memory complexities of model training and deployment, without sacrificing classification accuracy.
【Keywords】: binary codes; classification; hashing
【Paper Link】 【Pages】:605-614
【Authors】: Bob Goodwin ; Michael Hopcroft ; Dan Luu ; Alex Clemmer ; Mihaela Curmei ; Sameh Elnikety ; Yuxiong He
【Abstract】: Since the mid-90s there has been a widely-held belief that signature files are inferior to inverted files for text indexing. In recent years the Bing search engine has developed and deployed an index based on bit-sliced signatures. This index, known as BitFunnel, replaced an existing production system based on an inverted index. The driving factor behind the shift away from the inverted index was operational cost savings. This paper describes algorithmic innovations and changes in the cloud computing landscape that led us to reconsider and eventually field a technology that was once considered unusable. The BitFunnel algorithm directly addresses four fundamental limitations in bit-sliced block signatures. At the same time, our mapping of the algorithm onto a cluster offers opportunities to avoid other costs associated with signatures. We show these innovations yield a significant efficiency gain versus classic bit-sliced signatures and then compare BitFunnel with Partitioned Elias-Fano Indexes, MG4J, and Lucene.
【Keywords】: bit-sliced signatures; bitvector; bloom filters; intersection; inverted indexes; query processing; search engines; signature files
【Paper Link】 【Pages】:615-624
【Authors】: Giulio Ermanno Pibiri ; Rossano Venturini
【Abstract】: The efficient indexing of large and sparse N-gram datasets is crucial in several applications in Information Retrieval, Natural Language Processing and Machine Learning. Because of the stringent efficiency requirements, dealing with billions of N-grams poses the challenge of introducing a compressed representation that preserves the query processing speed. In this paper we study the problem of reducing the space required by the representation of such datasets, maintaining the capability of looking up for a given N-gram within micro seconds. For this purpose we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to state-of-the-art software packages. In particular, we present a trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we are able to lower the space of representation to compression levels that were never achieved before. Despite the significant savings in space, we show that our technique introduces a negligible penalty at query time.
【Keywords】: data compression; elias-fano; language models; performance
【Paper Link】 【Pages】:625-634
【Authors】: Antonio Mallia ; Giuseppe Ottaviano ; Elia Porciani ; Nicola Tonellotto ; Rossano Venturini
【Abstract】: Query processing is one of the main bottlenecks in large-scale search engines. Retrieving the top k most relevant documents for a given query can be extremely expensive, as it involves scoring large amounts of documents. Several dynamic pruning techniques have been introduced in the literature to tackle this problem, such as BlockMaxWAND, which splits the inverted index into constant- sized blocks and stores the maximum document-term scores per block; this information can be used during query execution to safely skip low-score documents, producing many-fold speedups over exhaustive methods. We introduce a refinement for BlockMaxWAND that uses variable- sized blocks, rather than constant-sized. We set up the problem of deciding the block partitioning as an optimization problem which maximizes how accurately the block upper bounds represent the underlying scores, and describe an efficient algorithm to find an approximate solution, with provable approximation guarantees. rough an extensive experimental analysis we show that our method significantly outperforms the state of the art roughly by a factor 2×. We also introduce a compressed data structure to represent the additional block information, providing a compression ratio of roughly 50%, while incurring only a small speed degradation, no more than 10% with respect to its uncompressed counterpart.
【Keywords】: blockmaxwand; dynamic pruning; search efficiency; top-k query processing
【Paper Link】 【Pages】:635-644
【Authors】: Jinfeng Li ; James Cheng ; Fan Yang ; Yuzhen Huang ; Yunjian Zhao ; Xiao Yan ; Ruihao Zhao
【Abstract】: Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in high dimensional space for approximate nearest neighbor search. As the volume of real-world datasets keeps growing, it has become necessary to develop distributed LSH solutions. Implementing a distributed LSH algorithm from scratch requires high development costs, thus most existing solutions are developed on general-purpose platforms such as Hadoop and Spark. However, we argue that these platforms are both hard to use for programming LSH algorithms and inefficient for LSH computation. We propose LoSHa, a distributed computing framework that reduces the development cost by designing a tailor-made, general programming interface and achieves high efficiency by exploring LSH-specific system implementation and optimizations. We show that many LSH algorithms can be easily expressed in LoSHa's API. We evaluate LoSHa and also compare with general-purpose platforms on the same LSH algorithms. Our results show that LoSHa's performance can be an order of magnitude faster, while the implementations on LoSHa are even more intuitive and require few lines of code.
【Keywords】: distributed similarity search; locality sensitive hashing; retrieval of high-dimensional data
【Paper Link】 【Pages】:645-654
【Authors】: Qingyao Ai ; Yongfeng Zhang ; Keping Bi ; Xu Chen ; W. Bruce Croft
【Abstract】: Product search is an important part of online shopping. In contrast to many search tasks, the objectives of product search are not confined to retrieving relevant products. Instead, it focuses on finding items that satisfy the needs of individuals and lead to a user purchase. The unique characteristics of product search make search personalization essential for both customers and e-shopping companies. Purchase behavior is highly personal in online shopping and users often provide rich feedback about their decisions (e.g. product reviews). However, the severe mismatch found in the language of queries, products and users make traditional retrieval models based on bag-of-words assumptions less suitable for personalization in product search. In this paper, we propose a hierarchical embedding model to learn semantic representations for entities (i.e. words, products, users and queries) from different levels with their associated language data. Our contributions are three-fold: (1) our work is one of the initial studies on personalized product search; (2) our hierarchical embedding model is the first latent space model that jointly learns distributed representations for queries, products and users with a deep neural network; (3) each component of our network is designed as a generative model so that the whole structure is explainable and extendable. Following the methodology of previous studies, we constructed personalized product search benchmarks with Amazon product data. Experiments show that our hierarchical embedding model significantly outperforms existing product search baselines on multiple benchmark datasets.
【Keywords】: latent space model; personalization; product search; representation learning
【Paper Link】 【Pages】:655-664
【Authors】: Zhiyong Cheng ; Jialie Shen ; Liqiang Nie ; Tat-Seng Chua ; Mohan S. Kankanhalli
【Abstract】: With the advancement of mobile computing technology and cloud-based streaming music service, user-centered music retrieval has become increasingly important. User-specific information has a fundamental impact on personal music preferences and interests. However, existing research pays little attention to the modeling and integration of user-specific information in music retrieval algorithms/models to facilitate music search. In this paper, we propose a novel model, named User-Information-Aware Music Interest Topic (UIA-MIT) model. The model is able to effectively capture the influence of user-specific information on music preferences, and further associate users' music preferences and search terms under the same latent space. Based on this model, a user information aware retrieval system is developed, which can search and re-rank the results based on age- and/or gender-specific music preferences. A comprehensive experimental study demonstrates that our methods can significantly improve the search accuracy over existing text-based music retrieval methods.
【Keywords】: re-ranking; semantic music retrieval; topic model; user demographic information
【Paper Link】 【Pages】:665-674
【Authors】: Rachid Guerraoui ; Anne-Marie Kermarrec ; Mahsa Taziki
【Abstract】: Recommenders are becoming one of the main ways to navigate the Internet. They recommend appropriate items to users based on their clicks, i.e., likes, ratings, purchases, etc. These clicks are key to providing relevant recommendations and, in this sense, have a significant utility. Since clicks reflect the preferences of users, they also raise privacy concerns. At first glance, there seems to be an inherent trade-off between the utility and privacy effects of a click. Nevertheless, a closer look reveals that the situation is more subtle: some clicks do improve utility without compromising privacy, whereas others decrease utility while hampering privacy. In this paper, for the first time, we propose a way to quantify the exact utility and privacy effects of each user click. More specically, we show how to compute the privacy effect (disclosure risk) of a click using an information-theoretic approach, as well as its utility, using a commonality-based approach. We determine precisely when utility and privacy are antagonist and when they are not. To illustrate our metrics, we apply them to recommendation traces from Movielens and Jester datasets. We show, for instance, that, considering the Movielens dataset, 5.94% of the clicks improve the recommender utility without loss of privacy, whereas 16.43% of the clicks induce a high privacy risk without any utility gain. An appealing application of our metrics is what we call a click-advisor, a visual user-aware clicking platform that helps users decide whether it is actually worth clicking on an item or not (after evaluating its potential utility and privacy effects using our techniques). Using a game-theoretic approach, we evaluate several user clicking strategies. We highlight in particular what we define as a smart strategy, leading to a Nash equilibrium, where every user reaches the maximum possible privacy while preserving the average overall recommender utility for all users (with respect to the case where user clicks are based solely on their genuine preferences, i.e., without consulting the click-advisor).
【Keywords】: click; disclosure risk; nash equillibrium; privacy; recommenders; user strategies; utility
【Paper Link】 【Pages】:675-684
【Authors】: Joanna Asia Biega ; Rishiraj Saha Roy ; Gerhard Weikum
【Abstract】: Online service providers gather vast amounts of data to build user profiles. Such profiles improve service quality through personalization, but may also intrude on user privacy and incur discrimination risks. In this work, we propose a framework which leverages solidarity in a large community to scramble user interaction histories. While this is beneficial for anti-profiling, the potential downside is that individual user utility, in terms of the quality of search results or recommendations, may severely degrade. To reconcile privacy and user utility and control their trade-off, we develop quantitative models for these dimensions and effective strategies for assigning user interactions to Mediator Accounts. We demonstrate the viability of our framework by experiments in two different application areas (search and recommender systems), using two large datasets.
【Keywords】: anti-profiling; mediator accounts; personalization; privacy; profile scrambling; recommender systems; search engines; user utility
【Paper Link】 【Pages】:685-694
【Authors】: Rui Yan ; Dongyan Zhao ; Weinan E.
【Abstract】: Conversation systems are of growing importance since they enable an easy interaction interface between humans and computers: using natural languages. To build a conversation system with adequate intelligence is challenging, and requires abundant resources including an acquisition of big data and interdisciplinary techniques, such as information retrieval and natural language processing. Along with the prosperity of Web 2.0, the massive data available greatly facilitate data-driven methods such as deep learning for human-computer conversation systems. Owing to the diversity of Web resources, a retrieval-based conversation system will come up with at least some results from the immense repository for any user inputs. Given a human issued message, i.e., query, a traditional conversation system would provide a response after adequate training and learning of how to respond. In this paper, we propose a new task for conversation systems: joint learning of response ranking featured with next utterance suggestion. We assume that the new conversation mode is more proactive and keeps user engaging. We examine the assumption in experiments. Besides, to address the joint learning task, we propose a novel Dual-LSTM Chain Model to couple response ranking and next utterance suggestion simultaneously. From the experimental results, we demonstrate the usefulness of the proposed task and the effectiveness of the proposed model.
【Keywords】: conversation system; joint learning; neural networks; next utterance suggestion; response ranking
【Paper Link】 【Pages】:695-704
【Authors】: Yi Tay ; Minh C. Phan ; Anh Tuan Luu ; Siu Cheung Hui
【Abstract】: We describe a new deep learning architecture for learning to rank question answer pairs. Our approach extends the long short-term memory (LSTM) network with holographic composition to model the relationship between question and answer representations. As opposed to the neural tensor layer that has been adopted recently, the holographic composition provides the benefits of scalable and rich representational learning approach without incurring huge parameter costs. Overall, we present Holographic Dual LSTM (HD-LSTM), a unified architecture for both deep sentence modeling and semantic matching. Essentially, our model is trained end-to-end whereby the parameters of the LSTM are optimized in a way that best explains the correlation between question and answer representations. In addition, our proposed deep learning architecture requires no extensive feature engineering. Via extensive experiments, we show that HD-LSTM outperforms many other neural architectures on two popular benchmark QA datasets. Empirical studies confirm the effectiveness of holographic composition over the neural tensor layer.
【Keywords】: associative memory; deep learning; holographic composition; learning to rank; neural networks; question answering
【Paper Link】 【Pages】:705-714
【Authors】: Vineet Kumar ; Sachindra Joshi
【Abstract】: Intelligent personal assistants (IPAs) and interactive question answering (IQA) systems frequently encounter incomplete follow-up questions. The incomplete follow-up questions only make sense when seen in conjunction with the conversation context: the previous question and answer. Thus, IQA and IPA systems need to utilize the conversation context in order to handle the incomplete follow-up questions and generate an appropriate response. In this work, we present a retrieval based sequence to sequence learning system that can generate the complete (or intended) question for an incomplete follow-up question (given the conversation context). We can train our system using only a small labeled dataset (with only a few thousand conversations), by decomposing the original problem into two simpler and independent problems. The first problem focuses solely on selecting the candidate complete questions from a library of question templates (built offline using the small labeled conversations dataset). In the second problem, we re-rank the selected candidate questions using a neural language model (trained on millions of unlabelled questions independently). Our system can achieve a BLEU score of 42.91, as compared to 29.11 using an existing generation based approach. We further demonstrate the utility of our system as a plug-in module to an existing QA pipeline. Our system when added as a plug-in module, enables Siri to achieve an improvement of 131.57% in answering incomplete follow-up questions.
【Keywords】: follow-up question resolution; intelligent personal assistants; interactive question answering; query reformulation; retrieval based sequence to sequence learning
【Paper Link】 【Pages】:715-724
【Authors】: Sosuke Shiga ; Hideo Joho ; Roi Blanco ; Johanne R. Trippas ; Mark Sanderson
【Abstract】: The increase of voice-based interaction has changed the way people seek information, making search more conversational. Development of effective conversational approaches to search requires better understanding of how people express information needs in dialogue. This paper describes the creation and examination of over 32K spoken utterances collected during 34 hours of collaborative search tasks. The contribution of this work is three-fold. First, we propose a model of conversational information needs (CINs) based on a synthesis of relevant theories in Information Seeking and Retrieval. Second, we show several behavioural patterns of CINs based on the proposed model. Third, we identify effective feature groups that may be useful for detecting CINs categories from conversations. This paper concludes with a discussion of how these findings can facilitate advance of conversational search applications.
【Keywords】: collaborative search; conversation; information needs
【Paper Link】 【Pages】:733-742
【Authors】: Haoran Huang ; Qi Zhang ; Jindou Wu ; Xuanjing Huang
【Abstract】: Every day, social media users send millions of microblogs on every imaginable topics. If we could predict which topics a user will join in the future, it would be easy to determine what topics will become popular and what kinds of users a topic may attract. It also can be of great interest for many applications. In this study, we investigate the problem of predicting whether a user will join a topic based on his posting history. We introduce a novel deep convolutional neural network with external neural memory and attention mechanism to perform this problem. User's posting history and topics were modeled with an external neural memory architecture. The convolutional neural network based matching methods were used to construct the relations between users and topics. Final decisions were made based on these matching results. To train and evaluate the proposed method, we collected a large-scale dataset from Twitter. The experimental results demonstrated that the proposed method could perform significantly better than other methods. Comparing to the state-of-the-art deep neural networks, our approach achieves a relative improvement of 18.2\% in F1-score and 28.9\% in [email protected]
【Keywords】: convolutional neural network; social medias; topic prediction
【Paper Link】 【Pages】:743-752
【Authors】: Cheng Cao ; Hancheng Ge ; Haokai Lu ; Xia Hu ; James Caverlee
【Abstract】: User interests and expertise are valuable but often hidden resources on social media. For example, Twitter Lists and LinkedIn's Skill Tags provide a partial perspective on what users are known for (by aggregating crowd tagging knowledge), but the vast majority of users are untagged; their interests and expertise are essentially hidden from important applications such as personalized recommendation, community detection, and expert mining. A natural approach to overcome these limitations is to intelligently learn user topical profiles by exploiting information from multiple, heterogeneous footprints: for instance, Twitter users who post similar hashtags may have similar interests, and YouTube users who upvote the same videos may have similar preferences. And yet identifying "similar" users by exploiting similarity in such a footprint space often provides conflicting evidence, leading to poor-quality user profiles. In this paper, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints. We show how these footprints can be embedded in a generalized optimization framework that takes into account pairwise relations among all footprints for robustly learning user profiles. Through extensive experiments, we find the proposed model is capable of learning high-quality user topical profiles, and leads to a 10-15% improvement in precision and mean average error versus a cross-triadic factorization state-of-the-art baseline.
【Keywords】: social media; user behavior; user profile
【Paper Link】 【Pages】:753-762
【Authors】: Yuan Zhang ; Tianshu Lyu ; Yan Zhang
【Abstract】: Recently, online social networks are becoming increasingly popular platforms for social interactions. Understanding how information propagates in such networks is important for personalization and recommendation in social search. In this paper, we propose a Hierarchical Community-level Information Diffusion (HCID) model to capture the information diffusion process in social networks. We introduce the notion of users' topic popularity as to enable our model to depict the information diffusion process which is both topic-aware (which topic the information is concerned with) and source-aware (where the information comes from). Instead of assuming homogeneity of social communities, we propose the notion of community hierarchy, where information diffusion across inter-level communities is uni-directional from the higher levels to the lower ones. We design a Gibbs sampling algorithm to infer model parameters and propose prediction methods for two information diffusion prediction tasks, the retweet prediction and the cascade prediction. Comparison experiments are conducted on two real datasets. Results show that our model achieves substantial improvement compared with the existing work.
【Keywords】: communities; information diffusion; social networks
【Paper Link】 【Pages】:763-772
【Authors】: Chenyan Xiong ; Jamie Callan ; Tie-Yan Liu
【Abstract】: This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval. In this work, the query and documents are modeled by word-based representations and entity-based representations. Ranking features are generated by the interactions between the two representations, incorporating information from the word space, the entity space, and the cross-space connections through the knowledge graph. To handle the uncertainties from the automatically constructed entity representations, an attention-based ranking model AttR-Duet is developed. With back-propagation from ranking labels, the model learns simultaneously how to demote noisy entities and how to rank documents with the word-entity duet. Evaluation results on TREC Web Track ad-hoc task demonstrate that all of the four-way interactions in the duet are useful, the attention mechanism successfully steers the model away from noisy entities, and together they significantly outperform both word-based and entity-based learning to rank systems.
【Keywords】: document ranking; entity-based search; text representation
【Paper Link】 【Pages】:773-782
【Authors】: Faegheh Hasibi ; Krisztian Balog ; Svein Erik Bratsberg
【Abstract】: Entity cards are being used frequently in modern web search engines to offer a concise overview of an entity directly on the results page. These cards are composed of various elements, one of them being the entity summary: a selection of facts describing the entity from an underlying knowledge base. These summaries, while presenting a synopsis of the entity, can also directly address users' information needs. In this paper, we make the first effort towards generating and evaluating such factual summaries. We introduce and address the novel problem of dynamic entity summarization for entity cards, and break it down to two specific subtasks: fact ranking and summary generation. We perform an extensive evaluation of our method using crowdsourcing. Our results show the effectiveness of our fact ranking approach and validate that users prefer dynamic summaries over static ones.
【Keywords】: entity cards; entity summarization; user interfaces
【Paper Link】 【Pages】:783-792
【Authors】: Shoaib Jameel ; Zied Bouraoui ; Steven Schockaert
【Abstract】: We propose a new class of methods for learning vector space embeddings of entities. While most existing methods focus on modelling similarity, our primary aim is to learn embeddings that are interpretable, in the sense that query terms have a direct geometric representation in the vector space. Intuitively, we want all entities that have some property (i.e. for which a given term is relevant) to be located in some well-defined region of the space. This is achieved by imposing max-margin constraints that are derived from a bag-of-words representation of the entities. The resulting vector spaces provide us with a natural vehicle for identifying entities that have a given property (or ranking them according to how much they have the property), and conversely, to describe what a given set of entities have in common. As we show in our experiments, our models lead to a substantially better performance in a range of entity-oriented search tasks, such as list completion and entity ranking.
【Keywords】: entity embedding; entity ranking; list completion; maximum margin
【Paper Link】 【Pages】:793-796
【Authors】: Luchen Tan ; Gaurav Baruah ; Jimmy Lin
【Abstract】: Information retrieval test collections are typically built using data from large-scale evaluations in international forums such as TREC, CLEF, and NTCIR. Previous validation studies on pool-based test collections for ad hoc retrieval have examined their reusability to accurately assess the effectiveness of systems that did not participate in the original evaluation. To our knowledge, the reusability of test collections derived from "living labs" evaluations, based on logs of user activity, has not been explored. In this paper, we performed a "leave-one-out" analysis of human judgment data derived from the TREC 2016 Real-Time Summarization Track and show that those judgments do not appear to be reusable. While this finding is limited to one specific evaluation, it does call into question the reusability of test collections built from living labs in general, and at the very least suggests the need for additional work in validating such experimental instruments.
【Keywords】: interleaved evaluations; push notifications; reusability; user studies
【Paper Link】 【Pages】:797-800
【Authors】: Haotian Zhang ; Jinfeng Rao ; Jimmy J. Lin ; Mark D. Smucker
【Abstract】: We propose a heuristic called "one answer per document" for automatically extracting high-quality negative examples for answer selection in question answering. Starting with a collection of question-answer pairs from the popular TrecQA dataset, we identify the original documents from which the answers were drawn. Sentences from these source documents that contain query terms (aside from the answers) are selected as negative examples. Training on the original data plus these negative examples yields improvements in effectiveness by a margin that is comparable to successive recent publications on this dataset. Our technique is completely unsupervised, which means that the gains come essentially for free. We confirm that the improvements can be directly attributed to our heuristic, as other approaches to extracting comparable amounts of training data are not effective. Beyond the empirical validation of this heuristic, we also share our improved TrecQA dataset with the community to support further work in answer selection.
【Keywords】: deep learning; distant supervision; question answering; trec
【Paper Link】 【Pages】:801-804
【Authors】: Xiao Shen ; Fu-Lai Chung ; Sitong Mao
【Abstract】: When tackling large-scale influence maximization (IM) problem, one effective strategy is to employ graph sparsification as a pre-processing step, by removing a fraction of edges to make original networks become more concise and tractable for the task. In this work, a Cross-Network Graph Sparsification (CNGS) model is proposed to leverage the influence backbone knowledge pre-detected in a source network to predict and remove the edges least likely to contribute to the influence propagation in the target networks. Experimental results demonstrate that conducting graph sparsification by the proposed CNGS model can obtain a good trade-off between efficiency and effectiveness of IM, i.e., existing IM greedy algorithms can run more efficiently, while the loss of influence spread can be made as small as possible in the sparse target networks.
【Keywords】: cross-network; domain adaptation; feature incompatibility; graph sparsification; influence maximization; self-training
【Paper Link】 【Pages】:805-808
【Authors】: Daniel Valcarce ; Javier Parapar ; Álvaro Barreiro
【Abstract】: Given the diversity of recommendation algorithms, choosing one technique is becoming increasingly difficult. In this paper, we explore methods for combining multiple recommendation approaches. We studied rank aggregation methods that have been proposed for the metasearch task (i.e., fusing the outputs of different search engines) but have never been applied to merge top-N recommender systems. These methods require no training data nor parameter tuning. We analysed two families of methods: voting-based and score-based approaches. These rank aggregation techniques yield significant improvements over state-of-the-art top-N recommenders. In particular, score-based methods yielded good results; however, some voting techniques were also competitive without using score information, which may be unavailable in some recommendation scenarios. The studied methods not only improve the state of the art of recommendation algorithms but they are also simple and efficient.
【Keywords】: borda count; condorcet; metasearch; recommender systems
【Paper Link】 【Pages】:809-812
【Authors】: Jianliang Gao ; Bo Song ; Zheng Chen ; Weimao Ke ; Wanying Ding ; Xiaohua Hu
【Abstract】: In this paper, we propose a novel k-anonymization scheme to counter deanonymization queries on social networks. With this scheme, all entities are protected by k-anonymization, which means the attackers cannot re-identify a target with confidence higher than 1/k. The proposed scheme minimizes the modification on original networks, and accordingly maximizes the utility preservation of published data while achieving k-anonymization privacy protection. Extensive experiments on real data sets demonstrate the effectiveness of the proposed scheme, where the efficacy of the k-anonymized networks is verified with the distributions of pagerank, betweenness, and their Kolmogorov-Smirnov (K-S) test.
【Keywords】: deanonymization query; h-index; privacy protection
【Paper Link】 【Pages】:813-816
【Authors】: Peipei Li ; Junjie Yao ; Liping Wang ; Xuemin Lin
【Abstract】: With the pervasive availability of smart devices, billions of users' trajectories are recorded and collected. The aggregated human behaviors reveal users' interests and characteristics, becoming invaluable to reflect their demographic preference, i.e., gender, age, marital status and even personality, occupation. Occupation profiling from trajectory data is an attractive option for advertisement targeting and other applications, without severe privacy concerns. However, it carries great difficulties in sparsity and vagueness. This paper proposes a novel approach, i.e., SPOT (Selecting occuPation frOm Trajectories). We first carefully analyze and report the trajectory pattern variance of different occupational categories in a large real dataset. And then we design novel ways to extract users content, location and transition preference, and finally illustrate a comprehensive occupation prediction method, Continuous Conditional Random Fields (C-CRF) based prediction model. Empirical studies confirm that the new approach works surprisingly well, and it shows the discriminative power of trajectory data to reveal occupational preference.
【Keywords】: check-in analysis; trajectory mining; user profile
【Paper Link】 【Pages】:817-820
【Authors】: Wanyu Chen ; Fei Cai ; Honghui Chen ; Maarten de Rijke
【Abstract】: Query suggestions help users refine their queries after they input an initial query. We consider the task of generating query suggestions that are personalized and diversified. We propose a personalized query suggestion diversification model (PQSD), where a user's long-term search behavior is injected into a basic greedy query suggestion diversification model (G-QSD) that considers a user's search context in their current session. Query aspects are identified through clicked documents based on the Open Directory Project (ODP). We quantify the improvement of PQSD over a state-of-the-art baseline using the AOL query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification. The experimental results show that PQSD achieves the best performance when only queries with clicked documents are taken as search context rather than all queries.
【Keywords】: diversification; persoanlization; query suggestion
【Paper Link】 【Pages】:821-824
【Authors】: Ying Zhang ; Li Yu ; Xue Zhao ; Xiaojie Yuan ; Lei Xu
【Abstract】: This paper studies an emotion classification problem, which aims to classify online news comments to one of fine-grained emotion categories, e.g. happy, sad, and angry, etc. Neural networks have been widely used and achieved great success in sentiment classification. However, there must be sufficient labeled comments available for training neural networks, which usually requires labor-intensive and time-consuming manual labeling. One of the most effective solutions is to apply transfer learning, which uses abundant labeled comments from a source news domain to help the classification for another target domain with limited amount of labeled data. Still, the comments from different domains can have very different word distributions, which makes it difficult to transfer knowledge from one domain to another. In this paper, we accomplish cross-domain emotion tagging based on an advanced neural network BLSTM (bidirectional long short-term memory) with "domain translation'', which can overcome the difference between domains. A weighted linear transformation is utilized to "translate'' knowledge from source to target domain. An extensive set of experimental results on four datasets from popular online news services demonstrates the effectiveness of our proposed models.
【Keywords】: emotion tagging; neural networks; transfer learning
【Paper Link】 【Pages】:825-828
【Authors】: Cheng Wang ; Jieren Zhou ; Bo Yang
【Abstract】: In this paper we aim at addressing the correlation between two critical factors in mobile social networks (MSNs): the social-relationship networking among users and the spatial mobility pattern of users. Specifically, we investigate the impact of users' spatial distribution on their social relationship formation in MSNs. Based on the geolocation data (check-in records) and social relation data of MSN users, we propose a model, called neighborhood-cardinality-based model (NCBM), to describe this impact by taking into account both the multiple home-points/hotspots property of spatial mobility and the long-tailed social relationship degree distribution of MSN users. We define a fundamental quantity for each user, i.e., the so-called neighborhood cardinality, to measure how many and how often other MSN users visit his nearby area with a given range. The core of NCBM is a principle: The probability that a user, say u, is followed by another user, say v, obeys a power law distribution of the neighborhood cardinality of user u. The proposed formation model is evaluated on two large check-in datasets: Brightkite and Gowalla. Our experimental results indicate that the proposed formation model provides a useful paradigm for capturing the correlation between MSN users' mobility patterns and social relationships.
【Keywords】: digital footprint; mobile social networks; mobility model; social network analysis; social relationship formation
【Paper Link】 【Pages】:829-832
【Authors】: Yunan Ye ; Zhou Zhao ; Yimeng Li ; Long Chen ; Jun Xiao ; Yueting Zhuang
【Abstract】: Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle the problem of static image question, which may be ineffectively for video question answering due to the insufficiency of modeling the temporal dynamics of video contents. In this paper, we study the problem of video question answering by modeling its temporal dynamics with frame-level attention mechanism. We propose the attribute-augmented attention network learning framework that enables the joint frame-level attribute detection and unified video representation learning for video question answering. We then incorporate the multi-step reasoning process for our proposed attention network to further improve the performance. We construct a large-scale video question answering dataset. We conduct the experiments on both multiple-choice and open-ended video question answering tasks to show the effectiveness of the proposed method.
【Keywords】: attribute; video question answering; visual information retrieval
【Paper Link】 【Pages】:833-836
【Authors】: Zhou Zhao ; Qifan Yang ; Hanqing Lu ; Min Yang ; Jun Xiao ; Fei Wu ; Yueting Zhuang
【Abstract】: With the rapid development of mobile devices, point-of-interest (POI) suggestion has become a popular online web service, which provides attractive and interesting locations to users. In order to provide interesting POIs, many existing POI recommendation works learn the latent representations of users and POIs from users' past visiting POIs, which suffers from the sparsity problem of POI data. In this paper, we consider the problem of POI suggestion from the viewpoint of learning geosocial multimedia network representations. We propose a novel max-margin metric geosocial multimedia network representation learning framework by exploiting users' check-in behavior and their social relations. We then develop a random-walk based learning method with max-margin metric network embedding. We evaluate the performance of our method on a large-scale geosocial multimedia network dataset and show that our method achieves the best performance than other state-of-the-art solutions.
【Keywords】: network representation; poi suggestion
【Paper Link】 【Pages】:837-840
【Authors】: Zhuyun Dai ; Yubin Kim ; Jamie Callan
【Abstract】: We present a learning-to-rank approach for resource selection. We develop features for resource ranking and present a training approach that does not require human judgments. Our method is well-suited to environments with a large number of resources such as selective search, is an improvement over the state-of-the-art in resource selection for selective search, and is statistically equivalent to exhaustive search even for recall-oriented metrics such as [email protected], an area in which selective search was lacking.
【Keywords】: federated search; resource selection; selective search
【Paper Link】 【Pages】:841-844
【Authors】: Qiang Liu ; Shu Wu ; Liang Wang
【Abstract】: Visual information is an important factor in recommender systems. Some studies have been done to model user preferences for visual recommendation. Usually, an item consists of two fundamental components: style and category. Conventional methods model items in a common visual feature space. In these methods, visual representations always can only capture the categorical information but fail in capturing the styles of items. Style information indicates the preferences of users and has significant effect in visual recommendation. Accordingly, we propose a DeepStyle method for learning style features of items and sensing preferences of users. Experiments conducted on two real-world datasets illustrate the effectiveness of DeepStyle for visual recommendation.
【Keywords】: style features; user preferences; visual recommendation
【Paper Link】 【Pages】:845-848
【Authors】: Darío Garigliotti ; Faegheh Hasibi ; Krisztian Balog
【Abstract】: Identifying the target types of entity-bearing queries can help improve retrieval performance as well as the overall search experience. In this work, we address the problem of automatically detecting the target types of a query with respect to a type taxonomy. We propose a supervised learning approach with a rich variety of features. Using a purpose-built test collection, we show that our approach outperforms existing methods by a remarkable margin.
【Keywords】: entity search; query types; query understanding; semantic search
【Paper Link】 【Pages】:849-852
【Authors】: Saar Kuzi ; David Carmel ; Alex Libov ; Ariel Raviv
【Abstract】: This work studies the effectiveness of query expansion for email search. Three state-of-the-art expansion methods are examined: 1) a global translation-based expansion model; 2) a personalized-based word embedding model; 3) the classical pseudo-relevance-feedback model. Experiments were conducted with two mail datasets extracted from a large query log of a Web mail service. Our results demonstrate the significant contribution of query expansion for measuring the similarity between the query and email messages. On the other hand, the contribution of expansion methods for a well trained learning-to-rank scoring function that exploits many relevance signals, was found to be modest.
【Keywords】: email search; query expansion
【Paper Link】 【Pages】:853-856
【Authors】: Bevan Koopman ; Liam Cripwell ; Guido Zuccon
【Abstract】: This paper investigates how automated query generation methods can be used to derive effective ad-hoc queries from verbose patient narratives. In a clinical setting, automatic query generation provides a means of retrieving information relevant to a clinician, based on a patient record, but without the need for the clinician to manually author a query. Given verbose patient narratives, we evaluated a number of query reduction methods, both generic and domain specific. Comparison was made against human generated queries, both in terms of retrieval effectiveness and characteristics of human queries. Query reduction was an effective means of generating ad-hoc queries from narratives. However, human generated queries were still significantly more effective than automatically generated queries. Further improvements were possible if parameters of the query reduction methods were set on a per-query basis and a means of predicting this was developed. Under ideal conditions, automated methods can exceed humans. Effective human queries were found to contain many novel keywords not found in the narrative. Automated reduction methods may be handicapped in that they only use terms from narrative. Future work, therefore, may be directed toward better understanding effective human queries and automated query rewriting methods that attempt to model the inference of novel terms by exploiting semantic inference processes.
【Keywords】: clinical search; information retrieval; query generation
【Paper Link】 【Pages】:857-860
【Authors】: Ikumi Suzuki ; Kazuo Hara
【Abstract】: Graph construction is an important process in graph-based semi-supervised learning. Presently, the mutual kNN graph is the most preferred as it reduces hub nodes which can be a cause of failure during the process of label propagation. However, the mutual kNN graph, which is usually very sparse, suffers from over sparsification problem. That is, although the number of edges connecting nodes that have different labels decreases in the mutual kNN graph, the number of edges connecting nodes that have the same labels also reduces. In addition, over sparsification can produce a disconnected graph, which is not desirable for label propagation. So we present a new graph construction method, the centered kNN graph, which not only reduces hub nodes but also avoids the over sparsification problem.
【Keywords】: graph-based ssl; mutual knn graph; over sparsification problem
【Paper Link】 【Pages】:861-864
【Authors】: Shenghao Liu ; Bang Wang ; Minghua Xu
【Abstract】: Event recommendation has become an important issue in event-based social networks (EBSN). In this paper, we study how to exploit diverse relations in an EBSN as well as individual history preferences to recommend preferred events. We first construct a hybrid graph consisting of different types of nodes to represent available entities in an EBSN. The graph uses explicit relations as edges to connect nodes of different types; while transferring implicit relations of event attributes to interconnect the event nodes. After executing the graph random walking, we obtain the candidate events with high convergency probabilities. We next extract a user preference from his attended events to further compute his interest similarities to his candidate events. The recommended event list is then obtained by combining the two similarity scores. Data sets from a real EBSN are used to examine the proposed scheme, and experiment results validate its superiority over peer schemes.
【Keywords】: cold-start problem; event recommendation; event-based social networks; graph-based random walking
【Paper Link】 【Pages】:865-868
【Authors】: Nir Levine ; Haggai Roitman ; Doron Cohen
【Abstract】: The session search task aims at best serving the user's information need given her previous search behavior during the session. We propose an extended relevance model that captures the user's dynamic information need in the session. Our relevance modelling approach is directly driven by the user's query reformulation (change) decisions and the estimate of how much the user's search behavior affects such decisions. Overall, we demonstrate that, the proposed approach significantly boosts session search performance.
【Keywords】: relevance model; session search
【Paper Link】 【Pages】:869-872
【Authors】: Haggai Roitman
【Abstract】: We address the problem of query performance prediction (QPP) using reference lists. To date, no previous QPP method has been fully successful in generating and utilizing several pseudo-effective and pseudo-ineffective reference lists. In this work, we try to fill the gaps. We first propose a novel unsupervised approach for generating and selecting both types of reference lists using query perturbation and statistical inference. We then propose an enhanced QPP approach that utilizes both types of selected reference lists.
【Keywords】: query performance prediction; refrence lists
【Paper Link】 【Pages】:873-876
【Authors】: Azad Abad ; Moin Nabi ; Alessandro Moschitti
【Abstract】: In this paper, we introduce a general iterative human-machine collaborative method for training crowdsource workers: the classifier (i.e., the machine) selects the highest quality examples for training the crowdsource workers (i.e., the humans). Then, the latter annotate the lower quality examples such that the classifier can be re-trained with more accurate examples. This process can be iterated several times. We tested our approach on two different tasks, Relation Extraction and Community Question Answering, which are also in two different languages, English and Arabic, respectively. Our experimental results show a significant improvement for creating Gold Standard data over distant supervision or just crowdsourcing without worker training. At the same time, our method approach the performance than state-of-the-art methods using expensive Gold Standard for training workers
【Keywords】: community question answering; crowdsourcing; human in the loop; relation extraction; self-training
【Paper Link】 【Pages】:877-880
【Authors】: Atsushi Ushiku ; Shinsuke Mori ; Hirotaka Kameko ; Yoshimasa Tsuruoka
【Abstract】: There are many databases of game records available online. In order to retrieve a game state from such a database, users usually need to specify the target state in a domain-specific language, which may be difficult to learn for novice users. In this work, we propose a search system that allows users to retrieve game states from a game record database by using keywords. In our approach, we first train a neural network model for symbol grounding using a small number of pairs of a game state and a commentary on it. We then apply it to all the states in the database to associate each of them with characteristic terms and their scores. The enhanced database thus enables users to search for a state using keywords. To evaluate the performance of the proposed method, we conducted experiments of game state retrieval using game records of Shogi (Japanese chess) with commentaries. The results demonstrate that our approach gives significantly better results than full-text search and an LSTM language model.
【Keywords】: search of nonlinguistic data; shogi; symbol grounding
【Paper Link】 【Pages】:881-884
【Authors】: Ryen W. White ; Ryan Ma
【Abstract】: Result ranking in commercial web search engines is based on a wide array of signals, from keywords appearing on web pages to behavioral (clickthrough) data aggregated across many users or from the current user only. The recent emergence of wearable devices has enabled the collection of physiological data such as heart rate, skin temperature, and galvanic skin response at a population scale. These data are useful for many public health tasks, but they may also provide novel clues about people's interests and intentions as they engage in online activities. In this paper, we focus on heart rate and show that there are strong relationships between heart rate and various measures of user interest in a search result. We integrate features of heart rate, including heart rate dynamics, as additional attributes in a competitive machine-learned web search ranking algorithm. We show that we can obtain significant relevance improvements from this physiological sensing that vary depending on the search topic.
【Keywords】: physiological sensing; search engines; wearables
【Paper Link】 【Pages】:885-888
【Authors】: Jiajin Huang ; Jian Wang ; Ning Zhong
【Abstract】: Top-N recommendation tasks aim to solve the information overload problem for users in the information age. As a user's decision may be affected by correlations among items, we incorporate such correlations with the user and item latent factors to propose a Poisson-regression-based method for top-N recommendation tasks. By placing priori knowledge and using a sparse structure assumption, this method learns the latent factors and the structure of the item-item correlation matrix through the alternating direction method of multipliers (ADMM). The preliminary experimental results on two real-world datasets show the improved performance of our approach.
【Keywords】: item-item correlations; poisson regression; recommender systems
【Paper Link】 【Pages】:889-892
【Authors】: Yong Cheng ; Fei Huang ; Lian Zhou ; Cheng Jin ; Yuejie Zhang ; Tao Zhang
【Abstract】: A novel hierarchical multimodal attention-based model is developed in this paper to generate more accurate and descriptive captions for images. Our model is an "end-to-end" neural network which contains three related sub-networks: a deep convolutional neural network to encode image contents, a recurrent neural network to identify the objects in images sequentially, and a multimodal attention-based recurrent neural network to generate image captions. The main contribution of our work is that the hierarchical structure and multimodal attention mechanism is both applied, thus each caption word can be generated with the multimodal attention on the intermediate semantic objects and the global visual content. Our experiments on two benchmark datasets have obtained very positive results.
【Keywords】: hierarchical recurrent neural network; image captioning; long-short term memory model; multimodal attention
【Paper Link】 【Pages】:893-896
【Authors】: Gang Hu ; Jie Shao ; Fumin Shen ; Zi Huang ; Heng Tao Shen
【Abstract】: Travel route planning aims to mine user's attributes and recommend personalized routes. How to build interest model for users and understand their real intention brings great challenges. This paper presents an approach which mines the user interest model by multi-source social media (e.g., travelogues and check-in records), and understands the user's real intention by active behavior such as point of interest (POI) inputs. In order to unify heterogeneous data from different sources, a topical package is built as the measurement space. Based on the topical package, user topical package is modeled to find user interest and route topical package is constructed to describe the attributes of each route. User's active behavior can also be considered during route planning, where top ranked routes are finally recommended. The proposed multi-source topical package (MSTP) approach is evaluated on a real dataset and compared with two state-of-the-art methods. The result shows that MSTP performs better for providing personalized travel routes.
【Keywords】: route planning; social media; topical package; user interest
【Paper Link】 【Pages】:897-900
【Authors】: Michael R. Evans ; Dragomir Yankov ; Pavel Berkhin ; Pavel Yudin ; Florin Teodorescu ; Wei Wu
【Abstract】: Image search is a popular application on web search engines. Issuing a location-related query in image search engines often returns multiple images of maps among the top ranked results. Traditionally, clicking on such images either opens the image in a new browser tab or takes users to a web page containing the image. However, finding the area of intent on an interactive web map is a manual process. In this paper, we describe a novel system, LiveMaps, for analyzing and retrieving an appropriate map viewport for a given image of a map. This allows annotation of images of maps returned by image search engines, allowing users to directly open a link to an interactive map centered on the location of interest. LiveMaps works in several stages. It first checks whether the input image represents a map. If yes, then the system attempts to identify what geographical area this map image represents. In the process, we use textual as well as visual information extracted from the image. Finally, we construct an interactive map object capturing the geographical area inferred for the image. Evaluation results on a dataset of high ranked location images indicate our system constructs very precise map representations also achieving good levels of coverage.
【Keywords】: geographic information retrieval; image search; map search
【Paper Link】 【Pages】:901-904
【Authors】: Nicola Ferro ; Mark Sanderson
【Abstract】: Understanding the factors comprising IR system effectiveness is of primary importance to compare different IR systems. Effectiveness is traditionally broken down, using ANOVA, into a topic and a system effect but this leaves out a key component of our evaluation paradigm: the collections of documents. We break down effectiveness into topic, system and sub-corpus effects and compare it to the traditional break down, considering what happens when different evaluation measures come into play. We found that sub-corpora are a significant effect. The consideration of which allows us to be more accurate in estimating what systems are significantly different. We also found that the sub-corpora affect different evaluation measures in different ways and this may impact on what systems are considered significantly different.
【Keywords】: anova; effectiveness model; experimental evaluation; glmm; retrieval effectiveness; sub-corpus effect
【Paper Link】 【Pages】:905-908
【Authors】: Maura R. Grossman ; Gordon V. Cormack ; Adam Roegiest
【Abstract】: Abstract In the TREC Total Recall Track (2015-2016), participating teams could employ either fully automatic or human-assisted ("semi-automatic") methods to select documents for relevance assessment by a simulated human reviewer. According to the TREC 2016 evaluation, the fully automatic baseline method achieved a recall-precision breakeven ("R-precision") score of 0.71, while the two semi-automatic efforts achieved scores of 0.67 and 0.51. In this work, we investigate the extent to which the observed effectiveness of the different methods may be confounded by chance, by inconsistent adherence to the Track guidelines, by selection bias in the evaluation method, or by discordant relevance assessments. We find no evidence that any of these factors could yield relative effectiveness scores inconsistent with the official TREC 2016 ranking.
【Keywords】: document categorization; high recall information retrieval; relevance assessment; total recall
【Paper Link】 【Pages】:909-912
【Authors】: Huasha Zhao ; Luo Si ; Xiaogang Li ; Qiong Zhang
【Abstract】: Push notification is a key component for E-commerce mobile applications, which has been extensively used for user growth and engagement. The effectiveness of the push notification is generally measured by message open rate. A push message can contain a recommended product, a shopping news and etc., but often only one or two items can be shown in the push message due to the limit of display space. This paper proposes a mixture model approach for predicting push message open rate for a post-purchase complementary product recommendation task. The mixture model is trained to learn latent prediction contexts, which are determined by user and item profiles, and then make open rate predictions accordingly. The item with the highest predicted open rate is then chosen to be included in the push notification message for each user. The parameters of the mixture model are optimized using an EM algorithm. A set of experiments are conducted to evaluate the proposed method live with a popular E-Commerce mobile app. The results show that the proposed method is superior than several existing solutions by a significant margin.
【Keywords】: e-commerce; mixture model; probabilistic latent class model; push notification
【Paper Link】 【Pages】:913-916
【Authors】: Haihui Tan ; Ziyu Lu ; Wenjie Li
【Abstract】: The massive amount of noisy and redundant information in text streams makes it a challenge for users to acquire timely and relevant information in social media. Real-time notification pushing on text stream is of practical importance. In this paper, we formulate the real-time pushing on text stream as a sequential decision making problem and propose a Neural Network based Reinforcement Learning (NNRL) algorithm for real-time decision making, e.g., push or skip the incoming text, with considering both history dependencies and future uncertainty. A novel Q-Network which contains a Long Short Term Memory (LSTM) layer and three fully connected neural network layers is designed to maximize the long-term rewards. Experiment results on the real data from TREC 2016 Real-time Summarization track show that our algorithm significantly outperforms state-of-the-art methods.
【Keywords】: deep reinforcement learning; real-time pushing; text stream
【Paper Link】 【Pages】:917-920
【Authors】: Jianlong Wu ; Zhouchen Lin ; Hongbin Zha
【Abstract】: Cross-modal retrieval has received much attention in recent years. It is a commonly used method to project multi-modality data into a common subspace and then retrieve. However, nearly all existing methods directly adopt the space defined by the binary class label information without learning as the shared subspace for regression. In this paper, we first adopt the spectral regression method to learn the optimal latent space shared by data of all modalities based on the orthogonal constraints. Then we construct a graph model to project the multi-modality data into the latent space. Finally, we combine these two processes together to jointly learn the latent space and regress. We conduct extensive experiments on multiple benchmark datasets and our proposed method outperforms the state-of-the-art approaches.
【Keywords】: cross-modal matching; latent subspace learning; regression
【Paper Link】 【Pages】:921-924
【Authors】: Jing Zhang ; Victor S. Sheng ; Tao Li
【Abstract】: This paper proposes a novel general label aggregation method for both binary and multi-class labeling in crowdsourcing, namely Bi-Layer Clustering (BLC), which clusters two layers of features - the conceptual-level and the physical-level features - to infer true labels of instances. BLC first clusters the instances using the conceptual-level features extracted from their multiple noisy labels and then performs clustering again using the physical-level features. It can facilitate tracking the uncertainty changes of the instances, so that the integrated labels that are likely to be falsely inferred on the conceptual layer can be easily corrected using the estimated labels on the physical layer. Experimental results on two real-world crowdsourcing data sets show that BLC outperforms seven state-of-the-art methods.
【Keywords】: clustering; crowdsourcing; inference; label aggregation
【Paper Link】 【Pages】:925-928
【Authors】: Yao Cheng ; Xiaoou Chen ; Deshun Yang ; Xiaoshuo Xu
【Abstract】: Chroma is a widespread feature for cover song recognition, as it is robust against non-tonal components and independent of timbre and specific instruments. However, Chroma is derived from spectrogram, thus it provides a coarse approximation representation of musical score. In this paper, we proposed a similar but more effective feature Note Class Profile (NCP) derived with music transcription techniques. NCP is a multi-dimensional time serie, each column of which denotes the energy distribution of 12 note classes. Experimental results on benchmark datasets demonstrated its superior performance over existing music features. In addition, NCP feature can be enhanced further with the development of music transcription techniques. The source code can be found in github1.
【Keywords】: cover song recognition; dynamic programming
【Paper Link】 【Pages】:929-932
【Authors】: Fei Huang ; Yong Cheng ; Cheng Jin ; Yuejie Zhang ; Tao Zhang
【Abstract】: Fine-grained Sketch-based Image Retrieval (Fine-grained SBIR), which uses hand-drawn sketches to search the target object images, has been an emerging topic over the last few years. The difficulties of this task not only come from the ambiguous and abstract characteristics of sketches with less useful information, but also the cross-modal gap at both visual and semantic level. However, images on the web are always exhibited with multimodal contents. In this paper, we consider Fine-grained SBIR as a cross-modal retrieval problem and propose a deep multimodal embedding model that exploits all the beneficial multimodal information sources in sketches and images. In our experiment with large quantity of public data, we show that the proposed method outperforms the state-of-the-art methods for Fine-grained SBIR.
【Keywords】: deep multimodal embedding; fine-grained sketch-based image retrieval (fine-grained sbir); multimodal ranking loss
【Paper Link】 【Pages】:933-936
【Authors】: Minh C. Phan ; Aixin Sun ; Yi Tay
【Abstract】: Cross-Device User Linking is the task of detecting same users given their browsing logs on different devices (e.g., tablet, mobile phone, PC, etc.). The problem was introduced in CIKM Cup 2016 together with a new dataset provided by Data-Centric Alliance (DCA). In this paper, we present insightful analysis on the dataset and propose a solution to link users based on their visited URLs, visiting time, and profile embeddings. We cast the problem as pairwise classification and use gradient boosting as the leaning-to-rank model. Our model works on a set of features exacted from URLs, titles, time and session data derived from user device-logs. The model outperforms the best solution in the CIKM Cup by a large margin.
【Keywords】: cross-device user linking; entity resolution
【Paper Link】 【Pages】:937-940
【Authors】: Michal Horovitz ; Liane Lewin-Eytan ; Alex Libov ; Yoelle Maarek ; Ariel Raviv
【Abstract】: Recent research studies on mail search have shown that the longer the query, the better the quality of results, yet a majority of mail queries remain very short and searchers struggle with formulating queries. A known mechanism to assist users in this task is query auto-completion, which has been highly successful in Web search, where it leverages huge logs of queries issued by hundreds of millions of users. This approach cannot be applied directly to mail search as personal query logs are small, mailboxes are not shared and other users' queries are not necessarily generalizable to all. We therefore propose here to leverage the mailbox content in order to generate suggestions, taking advantage of mail-specific features. We then compare this approach to a recent study that augments an individual user's mail search history with query logs from "similar users'', where the similarity is driven by demographics. Finally we show how combining both types of approaches allows for better suggestions quality but also increases the chance that the desired message be retrieved. We validate our claims via a manual qualitative evaluation and large scale quantitative experiments conducted on the query log of Yahoo Mail.
【Keywords】: mail search; query auto completion; query suggestion
【Paper Link】 【Pages】:941-944
【Authors】: Ziming Li ; Maarten de Rijke
【Abstract】: Document ranking is a central problem in many areas, including information retrieval and recommendation. The goal of learning to rank is to automatically create ranking models from training data. The performance of ranking models is strongly affected by the quality and quantity of training data. Collecting large scale training samples with relevance labels involves human labor which is time-consuming and expensive. Selective sampling and active learning techniques have been developed and proven effective in addressing this problem. However, most active methods do not scale well and need to rebuild the model after selected samples are added to the previous training set. We propose a sampling method which selects a set of instances and labels the full set only once before training the ranking model. Our method is based on hierarchical agglomerative clustering (average linkage) and we also report the performance of other linkage criteria that measure the distance between two clusters of query-document pairs. Another difference from previous hierarchical clustering is that we cluster the instances belonging to the same query, which usually outperforms the baselines.
【Keywords】: active learning; hierarchical clustering; learning to rank
【Paper Link】 【Pages】:945-948
【Authors】: Zheng Wei ; Jun Xu ; Yanyan Lan ; Jiafeng Guo ; Xueqi Cheng
【Abstract】: One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures such as normalized discounted cumulative gain~(ND CG). Existing methods usually focus on optimizing a specific evaluation measure calculated at a fixed position, e.g., NDCG calculated at a fixed position K. In information retrieval the evaluation measures, including the widely used NDCG and [email protected], are usually designed to evaluate the document ranking at all of the ranking positions, which provide much richer information than only measuring the document ranking at a single position. Thus, it is interesting to ask if we can devise an algorithm that has the ability of leveraging the measures calculated at all of the ranking postilions, for learning a better ranking model. In this paper, we propose a novel learning to rank model on the basis of Markov decision process (MDP), referred to as MDPRank. In the learning phase of MDPRank, the construction of a document ranking is considered as a sequential decision making, each corresponds to an action of selecting a document for the corresponding position. The policy gradient algorithm of REINFORCE is adopted to train the model parameters. The evaluation measures calculated at every ranking positions are utilized as the immediate rewards to the corresponding actions, which guide the learning algorithm to adjust the model parameters so that the measure is optimized. Experimental results on LETOR benchmark datasets showed that MDPRank can outperform the state-of-the-art baselines.
【Keywords】: learning to rank; markov decision process
【Paper Link】 【Pages】:949-952
【Authors】: Tomohiro Manabe ; Akiomi Nishida ; Makoto P. Kato ; Takehiro Yamamoto ; Sumio Fujita
【Abstract】: We present one of the world's first attempts to examine the feasibility of multileaving evaluation of document rankings on a large scale commercial community Question Answering (cQA) service. As a natural enhancement of interleaving evaluation, multileaving merges more than two input rankings into one and measures the search user satisfaction of each input ranking on the basis of user clicks on the multileaved ranking. We evaluated the adequateness of two major multileaving methods, team draft multileaving (TDM) and optimized multileaving (OM), proposing their practical implementation for live services. Our experimental results demonstrated that multileaving methods could precisely evaluate the effectiveness of five rankings with different quality by using clicks from real users. Moreover, we concluded that OM is more efficient than TDM by observing that most of the evaluation results with OM converged after showing multileaved rankings around 40,000 times and an in-depth analysis of their characteristics.
【Keywords】: evaluation of document ranking; interleaving and multileaving; qa search and retrieval
【Paper Link】 【Pages】:953-956
【Authors】: Ning Gao ; Douglas W. Oard ; Mark Dredze
【Abstract】: Searching conversational speech poses several new challenges, among which is how the searcher will make sense of what they find. This paper describes our initial experiments with a freely available collection of Enron telephone conversations. Our goal is to help the user make sense of search results by finding information about mentioned people, places and organizations. Because automated entity recognition is not yet sufficiently accurate on conversational telephone speech, we ask the user to transcribe just the name, and to indicate where in the recording it was heard. We then seek to link that mention to other mentions of the same entity in a variety of sources (in our experiments, in email and in Wikipedia). We cast this as an entity linking problem, and achieve promising results by utilizing social network features to help compensate for the limited accuracy of automatic transcription for this challenging content.
【Keywords】: entity linking; knowledge base; speech retrieval
【Paper Link】 【Pages】:957-960
【Authors】: Shuai Zhang ; Lina Yao ; Xiwei Xu
【Abstract】: Collaborative filtering (CF) has been successfully used to provide users with personalized products and services. However, dealing with the increasing sparseness of user-item matrix still remains a challenge. To tackle such issue, hybrid CF such as combining with content based filtering and leveraging side information of users and items has been extensively studied to enhance performance. However, most of these approaches depend on hand-crafted feature engineering, which is usually noise-prone and biased by different feature extraction and selection schemes. In this paper, we propose a new hybrid model by generalizing contractive auto-encoder paradigm into matrix factorization framework with good scalability and computational efficiency, which jointly models content information as representations of effectiveness and compactness, and leverage implicit user feedback to make accurate recommendations. Extensive experiments conducted over three large-scale real datasets indicate the proposed approach outperforms the compared methods for item recommendation.
【Keywords】: collaborative filtering; contractive auto-encoders; deep learning
【Paper Link】 【Pages】:961-964
【Authors】: Guy Feigenblat ; Haggai Roitman ; Odellia Boni ; David Konopnicki
【Abstract】: We present a novel unsupervised query-focused multi-document summarization approach. To this end, we generate a summary by extracting a subset of sentences using the Cross-Entropy (CE) Method. The proposed approach is generic and requires no domain knowledge. Using an evaluation over DUC 2005-2007 datasets with several other state-of-the-art baseline methods, we demonstrate that, our approach is both effective and efficient.
【Keywords】: cross entropy method; multidocument summarization
【Paper Link】 【Pages】:965-968
【Authors】: Hyun-Je Song ; A.-Yeong Kim ; Seong-Bae Park
【Abstract】: The number of natural language queries submitted to search engines is increasing as search environments get diversified. However, legacy search engines are still optimized for short keyword queries. Thus, the use of natural language queries at legacy search engines degrades the retrieval performance of the engines. This paper proposes a novel method to translate a natural language query into a keyword query relevant to the natural language query for retrieving better search results without change of the engines. The proposed method formulates the translation as a generation task. That is, the method generates a keyword query from a natural language query by preserving the semantics of the natural language query. A recurrent neural network encoder-decoder architecture is adopted as a generator of keyword queries from natural language queries. In addition, an attention mechanism is also used to cope with long natural language queries.
【Keywords】: neural machine translation; query reformulation; rnn encoder-decoder; translation natural language query into keyword query
【Paper Link】 【Pages】:969-972
【Authors】: Zhitao Wang ; Chengyao Chen ; Wenjie Li
【Abstract】: In this paper, we propose a predictive network representation learning (PNRL) model to solve the structural link prediction problem. The proposed model defines two learning objectives, i.e., observed structure preservation and hidden link prediction. To integrate the two objectives in a unified model, we develop an effective sampling strategy to select certain edges in a given network as assumed hidden links and regard the rest network structure as observed when training the model. By jointly optimizing the two objectives, the model can not only enhance the predictive ability of node representations but also learn additional link prediction knowledge in the representation space. Experiments on four real-world datasets demonstrate the superiority of the proposed model over the other popular and state-of-the-art approaches.
【Keywords】: link prediction; network representation learning
【Paper Link】 【Pages】:973-976
【Authors】: Fangzhao Wu ; Jia Zhang ; Zhigang Yuan ; Sixing Wu ; Yongfeng Huang ; Jun Yan
【Abstract】: Sentence-level sentiment classification is important to understand users' fine-grained opinions. Existing methods for sentence-level sentiment classification are mainly based on supervised learning. However, it is difficult to obtain sentiment labels of sentences since manual annotation is expensive and time-consuming. In this paper, we propose an approach for sentence-level sentiment classification without the need of sentence labels. More specifically, we propose a unified framework to incorporate two types of weak supervision, i.e., document-level and word-level sentiment labels, to learn the sentence-level sentiment classifier. In addition, the contextual information of sentences and words extracted from unlabeled sentences is incorporated into our approach to enhance the learning of sentiment classifier. Experiments on benchmark datasets show that our approach can effectively improve the performance of sentence-level sentiment classification.
【Keywords】: sentence-level; sentiment classification; weak supervision
【Paper Link】 【Pages】:977-980
【Authors】: Theodore Vasiloudis ; Hossein Vahabi ; Ross Kravitz ; Valery Rashkov
【Abstract】: Session length is a very important aspect in determining a user's satisfaction with a media streaming service. Being able to predict how long a session will last can be of great use for various downstream tasks, such as recommendations and ad scheduling. Most of the related literature on user interaction duration has focused on dwell time for websites, usually in the context of approximating post-click satisfaction either in search results, or display ads. In this work we present the first analysis of session length in a mobile-focused online service, using a real world data-set from a major music streaming service. We use survival analysis techniques to show that the characteristics of the length distributions can differ significantly between users, and use gradient boosted trees with appropriate objectives to predict the length of a session using only information available at its beginning. Our evaluation on real world data illustrates that our proposed technique outperforms the considered baseline.
【Keywords】: dwell time; session length; survival analysis; user behavior
【Paper Link】 【Pages】:981-984
【Authors】: Djoerd Hiemstra ; Claudia Hauff ; Leif Azzopardi
【Abstract】: People tend to type short queries, however, the belief is that longer queries are more effective. Consequently, a number of attempts have been made to encourage and motivate people to enter longer queries. While most have failed, a recent attempt - conducted in a laboratory setup - in which the query box has a halo or glow effect, that changes as the query becomes longer, has been shown to increase query length by one term, on average. In this paper, we test whether a similar increase is observed when the same component is deployed in a production system for site search and used by real end users. To this end, we conducted two separate experiments, where the rate at which the color changes in the halo were varied. In both experiments users were assigned to one of two conditions: halo and no-halo. The experiments were ran over a fifty day period with 3,506 unique users submitting over six thousand queries. In both experiments, however, we observed no significant difference in query length. We also did not find longer queries to result in greater retrieval performance. While, we did not reproduce the previous findings, our results indicate that the query halo effect appears to be sensitive to performance and task, limiting its applicability to other contexts.
【Keywords】: a/b testing; halo; query length; research replication
【Paper Link】 【Pages】:985-988
【Authors】: Yifan Chen ; Xiang Zhao ; Maarten de Rijke
【Abstract】: In this paper, we leverage high-dimensional side information to enhance top-N recommendations. To reduce the impact of the curse of high dimensionality, we incorporate a dimensionality reduction method, Locality Preserving Projection (LPP), into the recommendation model. A joint learning model is proposed to achieve the task of dimensionality reduction and recommendation simultaneously and iteratively. Specifically, item similarities generated by the recommendation model are used as the weights of the adjacency graph for LPP while the projections are used to bias the learning of item similarity. Employing LPP for recommendation not only preserves locality but also improves item similarity. Our experimental results illustrate that the proposed method is superior over state-of-the-art methods.
【Keywords】: dimensionality reduction; high dimensionality; side information; top-n recommendation
【Paper Link】 【Pages】:989-992
【Authors】: Masayuki Okamoto ; Zifei Shan ; Ryohei Orihara
【Abstract】: Patent engineers are spending significant time analyzing patent claim structures to grasp the range of technology covered or to compare similar patents in the same patent family. Though patent claims are the most important section in a patent, it is hard for a human to examine them. In this paper, we propose an information-extraction-based technique to grasp the patent claim structure. We confirmed that our approach is promising through empirical evaluation of entity mention extraction and the relation extraction method. We also built a preliminary interface to visualize patent structures, compare patents, and search similar patents.
【Keywords】: information extraction; patent analysis; visualization
【Paper Link】 【Pages】:993-996
【Authors】: Qin Chen ; Qinmin Hu ; Jimmy Xiangji Huang ; Liang He ; Weijie An
【Abstract】: Attention based recurrent neural networks (RNN) have shown a great success for question answering (QA) in recent years. Although significant improvements have been achieved over the non-attentive models, the position information is not well studied within the attention-based framework. Motivated by the effectiveness of using the word positional context to enhance information retrieval, we assume that if a word in the question (i.e., question word) occurs in an answer sentence, the neighboring words should be given more attention since they intuitively contain more valuable information for question answering than those far away. Based on this assumption, we propose a positional attention based RNN model, which incorporates the positional context of the question words into the answers' attentive representations. Experiments on two benchmark datasets show the great advantages of our proposed model. Specifically, we achieve a maximum improvement of 8.83% over the classical attention based RNN model in terms of mean average precision. Furthermore, our model is comparable to if not better than the state-of-the-art approaches for question answering.
【Keywords】: neural networks; positional attention; question answering
【Paper Link】 【Pages】:997-1000
【Authors】: Zhiwei Liu ; Yang Yang ; Zi Huang ; Fumin Shen ; Dongxiang Zhang ; Heng Tao Shen
【Abstract】: Social media has become one of the most credible sources for delivering messages, breaking news, as well as events. Predicting the future dynamics of an event at a very early stage is significantly valuable, e.g, helping company anticipate marketing trends before the event becomes mature. However, this prediction is non-trivial because a) social events always stay with "noise'' under the same topic and b) the information obtained at its early stage is too sparse and limited to support an accurate prediction. In order to overcome these two problems, in this paper, we design an event early embedding model (EEEM) that can 1) extract social events from noise, 2) find the previous similar events, and 3) predict future dynamics of a new event. Extensive experiments conducted on a large-scale dataset of Twitter data demonstrate the capacity of our model on extract events and the promising performance of prediction by considering both volume information as well as content information.
【Keywords】: content information; early prediction; social events; volume dynamics
【Paper Link】 【Pages】:1001-1004
【Authors】: Yaqian Duan ; Xinze Wang ; Yang Yang ; Zi Huang ; Ning Xie ; Heng Tao Shen
【Abstract】: Predicting the popularity of Point of Interest (POI) has become increasingly crucial for location-based services, such as POI recommendation. Most of the existing methods can seldom achieve satisfactory performance due to the scarcity of POI's information, which tendentiously confines the recommendation to popular scenic spots, and ignores the unpopular attractions with potentially precious values. In this paper, we propose a novel approach, termed Hierarchical Multi-Clue Fusion (HMCF), for predicting the popularity of POIs. Specifically, we devise an effective hierarchy to comprehensively describe POI by integrating various types of media information (e.g., image and text) from multiple social sources. For each individual POI, we simultaneously inject semantic knowledge as well as multi-clue representative power. We collect a multi-source POI dataset from four widely-used tourism platforms. Extensive experimental results show that the proposed method can significantly improve the performance of predicting the attractions' popularity as compared to several baselines.
【Keywords】: hierarchical structure; multi-view learning; multiple sources; poi popularity prediction
【Paper Link】 【Pages】:1005-1008
【Authors】: Georgios Balikas ; Simon Moura ; Massih-Reza Amini
【Abstract】: Traditional sentiment analysis approaches tackle problems like ternary (3-category) and fine-grained (5-category) classification by learning the tasks separately. We argue that such classification tasks are correlated and we propose a multitask approach based on a recurrent neural network that benefits by jointly learning them. Our study demonstrates the potential of multitask models on this type of problems and improves the state-of-the-art results in the fine-grained sentiment classification problem.
【Keywords】: bilstm; deep learning; multitask learning; sentiment analysis; text classification; text mining; twitter analysis
【Paper Link】 【Pages】:1009-1012
【Authors】: Naoto Takeda ; Yohei Seki ; Mimpei Morishita ; Yoichi Inagaki
【Abstract】: We propose a method to clarify the evolution of users' information needs related to a user's interests and actions based upon life events such as "childbirth." First, we extract topic transitions using dynamic topic models from blogs posted by users who have experienced life events. Next, we select the topics by computing the differences in topic probabilities before and after the life event. We evaluated our method based on three life events: "childbirth," "finding employment," and "marriage." Our method selected life event-relevant topics such as "child development," "working life," and "wedding ceremony." We found mothers' information needs such as "how to introduce baby food," employees' information needs such as "preparing an induction programme," and couples' information needs such as "wedding reception planning" in each topic.
【Keywords】: dtms (dynamic topic models); information needs; life event
【Paper Link】 【Pages】:1013-1016
【Authors】: Chaoran Cui ; Huidi Fang ; Xiang Deng ; Xiushan Nie ; Hongshuai Dai ; Yilong Yin
【Abstract】: Aesthetics has become increasingly prominent for image search to enhance user satisfaction. Therefore, image aesthetics assessment is emerging as a promising research topic in recent years. In this paper, distinguished from existing studies relying on a single label, we propose to quantify the image aesthetics by a distribution over quality levels. The distribution representation can effectively characterize the disagreement among the aesthetic perceptions of users regarding the same image. Our framework is developed on the foundation of label distribution learning, in which the reliability of training examples and the correlations between quality levels are fully taken into account. Extensive experiments on two benchmark datasets well verified the potential of our approach for aesthetics assessment. The role of aesthetics in image search was also rigorously investigated.
【Keywords】: aesthetics assessment; image search; label distribution learning
【Paper Link】 【Pages】:1017-1020
【Authors】: Ruey-Cheng Chen ; Evi Yulianti ; Mark Sanderson ; W. Bruce Croft
【Abstract】: Incorporating conventional, unsupervised features into a neural architecture has the potential to improve modeling effectiveness, but this aspect is often overlooked in the research of deep learning models for information retrieval. We investigate this incorporation in the context of answer sentence selection, and show that combining a set of query matching, readability, and query focus features into a simple convolutional neural network can lead to markedly increased effectiveness. Our results on two standard question-answering datasets show the effectiveness of the combined model.
【Keywords】: answer sentence selection; convolutional neural networks; external features
【Paper Link】 【Pages】:1021-1024
【Authors】: Min Yang ; Zhou Zhao ; Wei Zhao ; Xiaojun Chen ; Jia Zhu ; Lianqiang Zhou ; Zigang Cao
【Abstract】: In this paper, we propose a novel personalized response generation model via domain adaptation (PRG-DM). First, we learn the human responding style from large general data (without user-specific information). Second, we fine tune the model on a small size of personalized data to generate personalized responses with a dual learning mechanism. Moreover, we propose three new rewards to characterize good conversations that are personalized, informative and grammatical. We employ the policy gradient method to generate highly rewarded responses. Experimental results show that our model can generate better personalized responses for different users.
【Keywords】: domain adaptation; reinforcement learning; response generation
【Paper Link】 【Pages】:1025-1028
【Authors】: Jiwon Hong ; Sang-Wook Kim ; Mina Rho ; YoonHee Choi ; Yoonsik Tak
【Abstract】: Smart TVs are rapidly replacing conventional TVs. In a number of countries, set-top boxes (STB) are widely used to relay TV channels to smart TVs. In such cases, smart TVs cannot identify which TV channel they are receiving. This situation makes it challenging for smart TVs to provide their users with a variety of personalized services, such as context-aware services and recommendation services. In this paper, we introduce our TV channel matching system that resolves such problems. We propose strategies for scaling-out the matching system and improving its accuracy.
【Keywords】: context awareness; distributed indexing; image retrieval; smart tv; tv channel matching
【Paper Link】 【Pages】:1029-1032
【Authors】: Yu Zhang ; Yan Zhang
【Abstract】: Influence maximization, the fundamental of viral marketing, aims to find top-$K$ seed nodes maximizing influence spread under certain spreading models. In this paper, we study influence maximization from a game perspective. We propose a Coordination Game model, in which every individuals make their decisions based on the benefit of coordination with their network neighbors, to study information propagation. Our model serves as the generalization of some existing models, such as Majority Vote model and Linear Threshold model. Under the generalized model, we study the hardness of influence maximization and the approximation guarantee of the greedy algorithm. We also combine several strategies to accelerate the algorithm. Experimental results show that after the acceleration, our algorithm significantly outperforms other heuristics, and it is three orders of magnitude faster than the original greedy method.
【Keywords】: coordination game model; influence maximization; social networks; viral marketing
【Paper Link】 【Pages】:1033-1036
【Authors】: Razieh Rahimi ; Azadeh Shakery
【Abstract】: Online learning to rank for information retrieval has shown great promise in optimization of Web search results based on user interactions. However, online learning to rank has been used only in the monolingual setting where queries and documents are in the same language. In this work, we present the first empirical study of optimizing a model for Cross-Language Information Retrieval (CLIR) based on implicit feedback inferred from user interactions. We show that ranking models for CLIR with acceptable performance can be learned in an online setting although ranking features are noisy because of the language mismatch.
【Keywords】: cross-language information retrieval; learning to rank; online learning
【Paper Link】 【Pages】:1037-1040
【Authors】: Shenglin Zhao ; Irwin King ; Michael R. Lyu ; Jia Zeng ; Mingxuan Yuan
【Abstract】: Urbanization's rapid progress has modernized a large number of human beings' lives. This urbanization progress is accompanied by the increase of a variety of shops (e.g., restaurants and fitness centers) to meet the increasing citizens, which means business opportunities for the investors. Nevertheless, it is difficult for the investors to catch such opportunities because opening what kind of business at which place is not easy to decide. In this paper, we take this challenge and define the business opportunity mining problem, which recommends new business categories at a partitioned business district. Specifically, we exploit the data from location-based social networks (LBSNs) to mine the business opportunities, guiding the business owners to open new commercial shops in certain categories at a particular area. First, we define the properties of a business district and propose a greedy algorithm to partition a city into different districts. Next, we propose an embedding model to learn latent representations of categories, which captures the functional correlations among business categories. Furthermore, we propose a ranking model based on the pairwise loss to recommend categories for a specific district. Finally, we conduct experiments on Yelp data, and experimental results show that our proposed method outperforms the baseline methods and resolves the problem well.
【Keywords】: business intelligence; location-based services; ranking method; recommendation system
【Paper Link】 【Pages】:1041-1044
【Authors】: Nicola Ferro ; Claudio Lucchese ; Maria Maistro ; Raffaele Perego
【Abstract】: Ranking query results effectively by considering user past behaviour and preferences is a primary concern for IR researchers both in academia and industry. In this context, LtR is widely believed to be the most effective solution to design ranking models that account for user-interaction features that have proved to remarkably impact on IR effectiveness. In this paper, we explore the possibility of integrating the user dynamic directly into the LtR algorithms. Specifically, we model with Markov chains the behaviour of users in scanning a ranked result list and we modify Lambdamart, a state-of-the-art LtR algorithm, to exploit a new discount loss function calibrated on the proposed Markovian model of user dynamic. We evaluate the performance of the proposed approach on publicly available LtR datasets, finding that the improvements measured over the standard algorithm are statistically significant.
【Keywords】: lambdamart; learning to rank; user dynamic
【Paper Link】 【Pages】:1045-1048
【Authors】: Garrick Sherman ; Miles Efron
【Abstract】: Document expansion has been shown to improve the effectiveness of information retrieval systems by augmenting documents' term probability estimates with those of similar documents, producing higher quality document representations. We propose a method to further improve document models by utilizing external collections as part of the document expansion process. Our approach is based on relevance modeling, a popular form of pseudo-relevance feedback; however, where relevance modeling is concerned with query expansion, we are concerned with document expansion. Our experiments demonstrate that the proposed model improves ad-hoc document retrieval effectiveness on a variety of corpus types, with a particular benefit on more heterogeneous collections of documents.
【Keywords】: document expansion; information retrieval; language models; relevance models
【Paper Link】 【Pages】:1049-1052
【Authors】: Bing Bai ; Pierre-François Laquerre ; Richard G. Jackson ; Robert Stewart
【Abstract】: In medical practice, knowing the medical history of a patient is crucial for diagnosis and treatment suggestion. However, such information is often recorded in unstructured notes from doctors, potentially mixed with the medical history of family members and mentions of disorders for other reasons (e.g. as potential side-effects). In this work we designed a scheme to automatically extract the medical history of patients from a large healthcare database. More specifically, we first extracted medical conditions mentioned using a rule-based system and a medical gazetteer, then we classified whether such mentions reflected the patient's history or not. We designed our method to be simple and with little human intervention. Our results are very encouraging, supporting the potential for efficient and effective deployment in clinical practice.
【Keywords】: electrical health record; medical history; skip-gram; string kernels
【Paper Link】 【Pages】:1053-1056
【Authors】: Ismail Badache ; Mohand Boughanem
【Abstract】: A large amount of social feedback expressed by social signals (e.g. like, +1, rating) are assigned to web resources. These signals are often exploited as additional sources of evidence in search engines. Our objective in this paper is to study the impact of the new social signals, called Facebook reactions (love, haha, angry, wow, sad) in the retrieval. These reactions allow users to express more nuanced emotions compared to classic signals (e.g. like, share). First, we analyze these reactions and show how users use these signals to interact with posts. Second, we evaluate the impact of each such reaction in the retrieval, by comparing them to both the textual model without social features and the first classical signal (like-based model). These social features are modeled as document prior and are integrated into a language model. We conducted a series of experiments on IMDb dataset. Our findings reveal that incorporating social features is a promising approach for improving the retrieval ranking performance.
【Keywords】: facebook reactions; ranking; social ir; social signals
【Paper Link】 【Pages】:1057-1060
【Authors】: Arash Dargahi Nobari ; Sajad Sotudeh Gharebagh ; Mahmood Neshati
【Abstract】: Finding talented users on Stackoverflow can be a challenging task due to term mismatch between queries and content published on it. In this paper, we propose two translation models to augment a given query with relevant words. The first model is based on a statistical approach and the second one is a word embedding model. Interestingly, the translations provided by these methods are not the same. Although the first model in most cases selects pieces of program codes as translations, the second model provides more semantically related words. Our experiments on a large dataset indicate the efficiency of proposed models.
【Keywords】: expertise retrieval; semantic matching; stackoverflow; statistical machine translation; talent acquisition
【Paper Link】 【Pages】:1061-1064
【Authors】: Liana Ermakova ; Josiane Mothe ; Anton Firsov
【Abstract】: Sentence ordering (SO) is a key component of verbal ability. It is also crucial for automatic text generation. While numerous researchers developed various methods to automatically evaluate the informativeness of the produced contents, the evaluation of readability is usually performed manually. In contrast to that, we present a self-sufficient metric for SO assessment based on text topic-comment structure. We show that this metric has high accuracy.
【Keywords】: comment; evaluation; information retrieval; information structure; sentence ordering; text coherence; topic; topic-comment structure
【Paper Link】 【Pages】:1065-1068
【Authors】: Ludovic Dos Santos ; Benjamin Piwowarski ; Patrick Gallinari
【Abstract】: Most collaborative filtering systems, such as matrix factorization, use vector representations for items and users. Those representations are deterministic, and do not allow modeling the uncertainty of the learned representation, which can be useful when a user has a small number of rated items (cold start), or when there is conflicting information about the behavior of a user or the ratings of an item. In this paper, we leverage recent works in learning Gaussian embeddings for the recommendation task. We show that this model performs well on three representative collections (Yahoo, Yelp and MovieLens) and analyze learned representations.
【Keywords】: gaussian embeddings; probabilistic representation; recommender system
【Paper Link】 【Pages】:1069-1072
【Authors】: Kaspar Beelen ; Evangelos Kanoulas ; Bob van de Velde
【Abstract】: This paper sets out to detect controversial news reports using online discussions as a source of information. We define controversy as a public discussion that divides society and demonstrate that a content and stylometric analysis of these debates yields useful signals for extracting disputed news items. Moreover, we argue that a debate-based approach could produce more generic models, since the discussion architectures we exploit to measure controversy occur on many different platforms.
【Keywords】: behavioral analysis; controversy detection; media analysis
【Paper Link】 【Pages】:1073-1076
【Authors】: Apurva Pathak ; Kshitiz Gupta ; Julian McAuley
【Abstract】: Many websites offer promotions in terms of bundled items that can be purchased together, usually at a discounted rate. 'Bundling' may be a means of increasing sales revenue, but may also be a means for content creators to expose users to new items that they may not have considered in isolation. In this paper, we seek to understand the semantics of what constitutes a 'good' bundle, in order to recommend existing bundles to users on the basis of their constituent products, as well the more difficult task of generating new bundles that are personalized to a user. To do so we collect a new dataset from the Steam video game distribution platform, which is unique in that it contains both 'traditional' recommendation data (rating and purchase histories between users and items), as well as bundle purchase information. We assess issues such as bundle size and item compatibility, and show that these features, when combined with traditional matrix factorization techniques, can lead to highly effective bundle recommendation and generation.
【Keywords】: bpr; collaborative filtering; matrix factorization; steam bundle
【Paper Link】 【Pages】:1077-1080
【Authors】: Claudio Lucchese ; Franco Maria Nardini ; Salvatore Orlando ; Raffaele Perego ; Salvatore Trani
【Abstract】: In this paper we propose X-DART, a new Learning to Rank algorithm focusing on the training of robust and compact ranking models. Motivated from the observation that the last trees of MART models impact the prediction of only a few instances of the training set, we borrow from the DART algorithm the dropout strategy consisting in temporarily dropping some of the trees from the ensemble while new weak learners are trained. However, differently from this algorithm we drop permanently these trees on the basis of smart choices driven by accuracy measured on the validation set. Experiments conducted on publicly available datasets shows that X-DART outperforms DART in training models providing the same effectiveness by employing up to 40% less trees.
【Keywords】: dropout; multiple additive regression trees; pruning
【Paper Link】 【Pages】:1081-1084
【Authors】: Melissa Ailem ; Aghiles Salah ; Mohamed Nadif
【Abstract】: Document clustering is central in modern information retrieval applications. Among existing models, non-negative-matrix factorization (NMF) approaches have proven effective for this task. However, NMF approaches, like other models in this context, exhibit a major drawback, namely they use the bag-of-word representation and, thus, do not account for the sequential order in which words occur in documents. This is an important issue since it may result in a significant loss of semantics. In this paper, we aim to address the above issue and propose a new model which successfully integrates a word embedding model, word2vec, into an NMF framework so as to leverage the semantic relationships between words. Empirical results, on several real-world datasets, demonstrate the benefits of our model in terms of text document clustering as well as document/word embedding.
【Keywords】: non-negative matrix factorization; text document clustering; word embedding
【Paper Link】 【Pages】:1085-1088
【Authors】: Ali Montazeralghaem ; Hamed Zamani ; Azadeh Shakery
【Abstract】: Pseudo-relevance feedback (PRF) refers to a query expansion strategy based on top-retrieved documents, which has been shown to be highly effective in many retrieval models. Previous work has introduced a set of constraints (axioms) that should be satisfied by any PRF model. In this paper, we propose three additional constraints based on the proximity of feedback terms to the query terms in the feedback documents. As a case study, we consider the log-logistic model, a state-of-the-art PRF model that has been proven to be a successful method in satisfying the existing PRF constraints, and show that it does not satisfy the proposed constraints. We further modify the log-logistic model based on the proposed proximity-based constraints. Experiments on four TREC collections demonstrate the effectiveness of the proposed constraints. Our modification the log-logistic model leads to significant and substantial (up to 15%) improvements. Furthermore, we show that the proposed proximity-based function outperforms the well-known Gaussian kernel which does not satisfy all the proposed constraints.
【Keywords】: axiomatic analysis; pseudo-relevance feedback; query expansion; term position; term proximity
【Paper Link】 【Pages】:1089-1092
【Authors】: Tadele Tedla Damessie ; Thao P. Nghiem ; Falk Scholer ; J. Shane Culpepper
【Abstract】: In recent years, gathering relevance judgments through non-topic originators has become an increasingly important problem in Information Retrieval. Relevance judgments can be used to measure the effectiveness of a system, and are often needed to build supervised learning models in learning-to-rank retrieval systems. The two most popular approaches to gathering bronze level judgments - where the judge is not the originator of the information need for which relevance is being assessed, and is not a topic expert - is through a controlled user study, or through crowdsourcing. However, judging comes at a cost (in time, and usually money) and the quality of the judgments can vary widely. In this work, we directly compare the reliability of judgments using three different types of bronze assessor groups. Our first group is a controlled Lab group; the second and third are two different crowdsourcing groups, CF-Document where assessors were free to judge any number of documents for a topic, and CF-Topic where judges were required to judge all of the documents from a single topic, in a manner similar to the Lab group. Our study shows that Lab assessors exhibit a higher level of agreement with a set of ground truth judgments than CF-Topic and CF-Document assessors. Inter-rater agreement rates show analogous trends. These finding suggests that in the absence of ground truth data, agreement between assessors can be used to reliably gauge the quality of relevance judgments gathered from secondary assessors, and that controlled user studies are more likely to produce reliable judgments despite being more costly.
【Keywords】: inter-rater agreement; relevance assessment; test collections
【Paper Link】 【Pages】:1093-1096
【Authors】: Travis Ebesu ; Yi Fang
【Abstract】: The accelerating rate of scientific publications makes it difficult to find relevant citations or related work. Context-aware citation recommendation aims to solve this problem by providing a curated list of high-quality candidates given a short passage of text. Existing literature adopts bag-of-word representations leading to the loss of valuable semantics and lacks the ability to integrate metadata or generalize to unseen manuscripts in the training set. We propose a flexible encoder-decoder architecture called Neural Citation Network (NCN), embodying a robust representation of the citation context with a max time delay neural network, further augmented with an attention mechanism and author networks. The recurrent neural network decoder consults this representation when determining the optimal paper to recommend based solely on its title. Quantitative results on the large-scale CiteSeer dataset reveal NCN cultivates a significant improvement over competitive baselines. Qualitative evidence highlights the effectiveness of the proposed end-to-end neural network revealing a promising research direction for citation recommendation.
【Keywords】: citation recommendation; deep learning; neural machine translation
【Paper Link】 【Pages】:1097-1100
【Authors】: Graham McDonald ; Nicolás García-Pedrajas ; Craig Macdonald ; Iadh Ounis
【Abstract】: Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as confidential information, that is exempt from release. Therefore, government documents must be sensitivity reviewed prior to release, to identify and close any sensitive information. With the adoption of born-digital documents, such as email, there is a need for automatic sensitivity classification to assist digital sensitivity review. SVM classifiers and Part-of-Speech sequences have separately been shown to be promising for sensitivity classification. However, sequence classification methodologies, and specifically SVM kernel functions, have not been fully investigated for sensitivity classification. Therefore, in this work, we present an evaluation of five SVM kernel functions for sensitivity classification using POS sequences. Moreover, we show that an ensemble classifier that combines POS sequence classification with text classification can significantly improve sensitivity classification effectiveness (+6.09% F2) compared with a text classification baseline, according to McNemar's test of significance.
【Keywords】: ensemble learning; freedom of information; sensitive information; sequence classification; text classification
【Paper Link】 【Pages】:1101-1104
【Authors】: Xiangling Zhang ; Yueguo Chen ; Jun Chen ; Xiaoyong Du ; Ke Wang ; Ji-Rong Wen
【Abstract】: The entity set expansion problem is to expand a small set of seed entities to a more complete set of similar entities. It can be applied in applications such as web search, item recommendation and query expansion. Traditionally, people solve this problem by exploiting the co-occurrence of entities within web pages, where latent semantic correlation among seed entities cannot be revealed. We propose a novel approach to solve the problem using knowledge graphs, by considering the deficiency (e.g., incompleteness) of knowledge graphs. We design an effective ranking model based on the semantic features of seeds to retrieve the candidate entities. Extensive experiments on public datasets show that the proposed solution significantly outperforms the state-of-the-art techniques.
【Keywords】: entity search; entity set expansion; knowledge graph
【Paper Link】 【Pages】:1105-1108
【Authors】: Navid Rekabsaz ; Mihai Lupu ; Allan Hanbury ; Hamed Zamani
【Abstract】: Exploitation of term relatedness provided by word embedding has gained considerable attention in recent IR literature. However, an emerging question is whether this sort of relatedness fits to the needs of IR with respect to retrieval effectiveness. While we observe a high potential of word embedding as a resource for related terms, the incidence of several cases of topic shifting deteriorates the final performance of the applied retrieval models. To address this issue, we revisit the use of global context (i.e. the term co-occurrence in documents) to measure the term relatedness. We hypothesize that in order to avoid topic shifting among the terms with high word embedding similarity, they should often share similar global contexts as well. We therefore study the effectiveness of post filtering of related terms by various global context relatedness measures. Experimental results show significant improvements in two out of three test collections, and support our initial hypothesis regarding the importance of considering global context in retrieval.
【Keywords】: global context; lsi; term relatedness; word embedding; word2vec
【Paper Link】 【Pages】:1109-1112
【Authors】: Bo-Wen Zhang ; Xu-Cheng Yin ; Fang Zhou ; Jian-Lin Jin
【Abstract】: During every summer holidays, several editions of reading lists are recommended and emerged on mass media, e.g., New York Times, and BBC. However, these reading lists are built for whole people with general topics for some purposes. What if we expect the books of a specific topic at a specific moment? How to generate the requested reading list for our own automatically? In this paper, we propose a searching framework for building a topical reading list anytime, where the Relevance (between topics and books), Quality (of books), Timeliness (of popularities) and Diversity (of results) are embedded into vector representations respectively based on user-generated contents and statistics on social media. We collected 8,197 real-world topics from 198 diverse groups on Librarything.com. The proposed methods are evaluated on the topic collection and the public benchmarks Social Book Search 2012-2016 (SBS). Experimental results demonstrate the robustness and effectiveness of our framework.
【Keywords】: diversity embedding; quality embedding; reading list; relevance embedding; timeliness embedding
【Paper Link】 【Pages】:1113-1116
【Authors】: Siliang Tang ; Jinjian Zhang ; Ning Zhang ; Fei Wu ; Jun Xiao ; Yueting Zhuang
【Abstract】: Distant Supervision is a widely used approach for training relation extraction models. It generates noisy training samples by heuristically labeling a corpus using an existing knowledge base. Previous noise reduction methods for distant supervision fail to utilize information such as data credibility and sample confidence. In this paper, we proposed a novel neural framework, named ENCORE (External Neural COnstraints REgularized distant supervision), which allows an integration of other information for standard DS through regularizations under multiple external neural networks. In ENCORE, a teacher-student co-training mechanism is used to iterative distilling information from external neural networks to an existing relation extraction model. The experiment results demonstrated that without increasing any data or reshaping its original structure, ENCORE enhanced a CNN based relation extraction model for over 12%. The enhanced model also outperforms the state-of-the-art relation extraction method on the same dataset.
【Keywords】: distant supervision; neural network; relation extraction
【Paper Link】 【Pages】:1117-1120
【Authors】: Masaya Murata ; Kaoru Hiramatsu ; Shin'ichi Satoh
【Abstract】: We adopt the generalized Pareto distribution for the information-based model and show that the parameters can be estimated based on the mean excess function. The proposed information retrieval model corresponds to the extension of the divergence from independence and is designed to be data-driven. The proposed model is then applied to the specific object search called the instance search and the effectiveness is experimentally confirmed.
【Keywords】: divergence from independence; extreme value statistics; generalized pareto distribution; information retrieval; information-based model; instance search; video retrieval
【Paper Link】 【Pages】:1121-1124
【Authors】: Matthew Mitsui ; Jiqun Liu ; Nicholas J. Belkin ; Chirag Shah
【Abstract】: It has been shown that people attempt to accomplish a variety of intentions during the course of an information seeking session, and there is reason to believe that these different information seeking intentions can benefit from system support tailored to each such intention. We address the problem of predicting the presence of such intentions during an information seeking session, through analysis of observable user search behaviors. We present results of a study of 40 participants, each working on two different journalism tasks, which investigated how their search behaviors could indicate their intentions. Using 725 query-segments captured from this study, we demonstrate that information seeking intentions can be predicted with a simple classification model using a linear combination of search behavior features that can be logged with a browser plug-in.
【Keywords】: information seeking episode; information seeking intentions; motivating task; search intentions; search session analysis
【Paper Link】 【Pages】:1125-1128
【Authors】: Ben Carterette
【Abstract】: We analyze 5,792 IR conference papers published over 20 years to investigate how researchers have used and are using statistical significance testing in their experiments
【Keywords】: evaluation; information retrieval; statistical significance
【Paper Link】 【Pages】:1129-1132
【Authors】: Ankan Saha ; Dhruv Arya
【Abstract】: Job Search is a core product at LinkedIn which makes it essential to generate highly relevant search results when a user searches for jobs on Linkedin. Historically job results were ranked using linear models consisting of a combination of user, job and query features. This paper talks about a new generalized mixed effect models introduced in the context of ranking candidate job results for a job search performed on LinkedIn. We build a per-query model which is populated with coefficients corresponding to job-features in addition to the existing global model features. We describe the details of the new method along with the challenges faced in launching such a model into production and making it efficient at a very large scale. Our experiments show improvement over previous baseline ranking models, in terms of offline metrics (both AUC and [email protected] metrics) as well as online metrics in production (Job Applies) which are of interest to us. The resulting method is more powerful and has also been adopted in other applications at LinkedIn successfully.
【Keywords】: generalized linear mixed effect models; information retrieval and search ranking
【Paper Link】 【Pages】:1133-1136
【Authors】: Arman Cohan ; Nazli Goharian
【Abstract】: Citation texts are sometimes not very informative or in some cases inaccurate by themselves; they need the appropriate context from the referenced paper to reflect its exact contributions. To address this problem, we propose an unsupervised model that uses distributed representation of words as well as domain knowledge to extract the appropriate context from the reference paper. Evaluation results show the effectiveness of our model by significantly outperforming the state-of-the-art. We furthermore demonstrate how an effective contextualization method results in improving citation-based summarization of the scientific articles.
【Keywords】: information retrieval; scientific text; text summarization
【Paper Link】 【Pages】:1137-1140
【Authors】: Shanshan Li ; James Caverlee ; Wei Niu ; Parisa Kaghazgaran
【Abstract】: With the rapid adoption of smartphones worldwide and the reliance on app marketplaces to discover new apps, these marketplaces are critical for connecting users with apps. And yet, the user reviews and ratings on these marketplaces may be strategically targeted by app developers. We investigate the use of crowdsourcing platforms to manipulate app reviews. We find that (i) apps targeted by crowdsourcing platforms are rated significantly higher on average than other apps; (ii) the reviews themselves arrive in bursts; (iii) app reviewers tend to repeat themselves by relying on some standard repeated text; and (iv) apps by the same developer tend to share a more similar language model: if one app has been targeted, it is likely that many of the other apps from the same developer have also been targeted.
【Keywords】: app reviews; crowdsourcing; manipulation; user behavior
【Paper Link】 【Pages】:1141-1144
【Authors】: Mohamed Abdel Maksoud ; Gaurav Pandey ; Shuaiqiang Wang
【Abstract】: We introduce CitySearcher, a vertical search engine that searches for cities when queried for an interest. Generally in search engines, utilization of semantics between words is favorable for performance improvement. Even though ambiguous query words have multiple semantic meanings, search engines can return diversified results to satisfy different users' information needs. But for CitySearcher, mismatched semantic relationships can lead to extremely unsatisfactory results. For example, the city Sale would incorrectly rank high for the interest shopping because of semantic interpretations of the words. Thus in our system, the main challenge is to eliminate the mismatched semantic relationships resulting from the side effect of the semantic models. In the previous case, we aim to ignore the semantics of a city's name which is not indicative of the city's characteristics. In CitySearcher, we use word2vec, a very popular word embedding technique to estimate the semantics of the words and create the initial ranks of the cities. To reduce the effect of the mismatched semantic relationships, we generate a set of features for learning based on a novel clustering-based method. With the generated features, we then utilize learning to rank algorithms to rerank the cities for return. We use the English version of Wikivoyage dataset for evaluation of our system, where we sample a very small dataset for training. Experimental results demonstrate the performance gain of our system over various standard retrieval techniques.
【Keywords】: feature engineering; information retrieval; word2vec
【Paper Link】 【Pages】:1145-1148
【Authors】: Giovanni Da San Martino ; Salvatore Romeo ; Alberto Barrón-Cedeño ; Shafiq R. Joty ; Lluís Màrquez ; Alessandro Moschitti ; Preslav Nakov
【Abstract】: We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.
【Keywords】: community question answering; cross-language approaches; distributed representations; kernel-based methods; neural networks; question retrieval
【Paper Link】 【Pages】:1149-1152
【Authors】: Amina Kadry ; Laura Dietz
【Abstract】: Our goal is to complement an entity ranking with human-readable explanations of how those retrieved entities are connected to the information need. Relation extraction technology should aid in finding such support passages, especially in combination with entities and query terms. This work explores how the current state of the art in unsupervised relation extraction (OpenIE) contributes to a solution for the task, assessing potential, limitations, and avenues for further investigation.
【Keywords】: entity-centric retrieval; information retrieval; open ie; relation extraction
【Paper Link】 【Pages】:1153-1156
【Authors】: Darío Garigliotti ; Krisztian Balog
【Abstract】: We address the problem of generating query suggestions to support users in completing their underlying tasks (which motivated them to search in the first place). Given an initial query, these query suggestions should provide a coverage of possible subtasks the user might be looking for. We propose a probabilistic modeling framework that obtains keyphrases from multiple sources and generates query suggestions from these keyphrases. Using the test suites of the TREC Tasks track, we evaluate and analyze each component of our model.
【Keywords】: query suggestions; supporting search tasks; task-based search
【Paper Link】 【Pages】:1157-1160
【Authors】: Keyang Xu ; Zhengzhong Liu ; Jamie Callan
【Abstract】: Many URLs on the Internet point to identical contents, which increase the burden of web crawlers. Techniques that detect such URLs (known as URL de-duping) can greatly save resources such as bandwidth and storage for crawlers. Traditional de-duping methods are usually limited to heavily engineered rule matching strategies.In this work, we propose a novel URL de-duping framework based on sequence-to-sequence (Seq2Seq) neural networks. A single concise translation model can take the place of thousands of explicit rules. Experiments indicate that a vanilla Seq2Seq architecture yields robust and accurate results in detecting duplicate URLs. Furthermore, we demonstrate the efficiency of this framework in the real large-scale web environment.
【Keywords】: sequence-to-sequence neural network; url de-duplication; web crawling
【Paper Link】 【Pages】:1161-1164
【Authors】: Yukai Miao ; Jianbin Qin ; Wei Wang
【Abstract】: In modern search engines, Knowledge Graphs have become a key component for knowledge discovery. When a user searches for an entity, the existing systems usually provide a list of related entities, but they do not necessarily give explanations of how they are related. However, with the help of knowledge graphs, we can generate relatedness graphs between any pair of existing entities. Existing methods of this problem are either graph-based or list-based, but they all have some limitations when dealing with large complex relatedness graphs of two related entity. In this work, we investigate how to summarize the relatedness graphs and how to use the summarized graphs to assistant the users to retrieve target information. We also implemented our approach in an online query system and performed experiments and evaluations on it. The results show that our method produces much better result than previous work.
【Keywords】: graph summarization; graph visualization; knowledge graph
【Paper Link】 【Pages】:1165-1168
【Authors】: Kenny Davila ; Richard Zanibbi
【Abstract】: Math-aware search engines need to support formulae in queries. Mathematical expressions are typically represented as trees defining their operational semantics or visual layout. We propose searching both formula representations using a three-layer model. The first layer selects candidates using spectral matching over tree node pairs. The second layer aligns a query with candidates and computes similarity scores based on structural matching. In the third layer, similarity scores are combined using linear regression. The two representations are combined using retrieval in parallel indices and regression over similarity scores. For NTCIR-12 Wikipedia Formula Browsing task relevance rankings, we see each layer increasing ranking quality and improved results when combining representations as measured by Bpref and nDCG scores.
【Keywords】: formula retrieval; operator tree; symbol layout tree
【Paper Link】 【Pages】:1169-1172
【Authors】: Jiaxin Mao ; Yiqun Liu ; Huan-Bo Luan ; Min Zhang ; Shaoping Ma ; Hengliang Luo ; Yuntao Zhang
【Abstract】: Usefulness judgment measures the user-perceived amount of useful information for the search task in the current search context. Understanding and predicting usefulness judgment are crucial for developing user-centric evaluation methods and providing contextualize results according to the search context. With a dataset collected in a laboratory user study, we systematically investigate the effects of a variety of content, context, and behavior factors on usefulness judgments and find that while user behavior factors are most important in determining usefulness judgments, content and context factors also have significant effects on it. We further adopt these factors as features to build prediction models for usefulness judgments. An AUC score of 0.909 in binary usefulness classification and a Pearson's correlation coefficient of 0.694 in usefulness regression demonstrate the effectiveness of our models. Our study sheds light on the understanding of the dynamics of the user-perceived usefulness of documents in a search session and provides implications for the evaluation and design of Web search engines.
【Keywords】: evaluation; usefulness; user behavior analysis
【Paper Link】 【Pages】:1173-1176
【Authors】: Jyun-Yu Jiang ; Pu-Jen Cheng ; Wei Wang
【Abstract】: Social coding and open source repositories have become more and more popular. Software developers have various alternatives to contribute themselves to the communities and collaborate with others. However, nowadays there is no effective recommender suggesting developers appropriate repositories in both the academia and the industry. Although existing one-class collaborative filtering (OCCF) approaches can be applied to this problem, they do not consider particular constraints of social coding such as the programming languages, which, to some extent, associate the repositories with the developers. The aim of this paper is to investigate the feasibility of leveraging user programming language preference to improve the performance of OCCF-based repository recommendation. Based on matrix factorization, we propose language-regularized matrix factorization (LRMF), which is regularized by the relationships between user programming language preferences. Extensive experiments have been conducted on the real-world dataset of GitHub. The results demonstrate that our framework significantly outperforms five competitive baselines.
【Keywords】: manifold regularization; open source repository; repository recommendation; user programming language preference
【Paper Link】 【Pages】:1177-1180
【Authors】: Mohammad Aliannejadi ; Fabio Crestani
【Abstract】: Personalized context-aware venue suggestion plays a critical role in satisfying the users' needs on location-based social networks (LBSNs). In this paper, we present a set of novel scores to measure the similarity between a user and a candidate venue in a new city. The scores are based on user's history of preferences in other cities as well as user's context. We address the data sparsity problem in venue recommendation with the aid of a proposed approach to predict contextually appropriate places. Furthermore, we show how to incorporate different scores to improve the performance of recommendation. The experimental results of our participation in the TREC 2016 Contextual Suggestion track show that our approach beats state-of-the-art strategies.
【Keywords】: context-awareness; contextual suggestion; location-based social networks; venue recommendation
【Paper Link】 【Pages】:1181-1184
【Authors】: Mossaab Bagdouri ; Douglas W. Oard
【Abstract】: This paper investigates techniques for answering microblog questions by searching in a large community question answering website. Some question transformations are considered, some proprieties of the answering platform are examined, how to select among the various available configurations in a learning-to-rank framework is studied.
【Keywords】: cqa; cross-platform question answering; microblogs
【Paper Link】 【Pages】:1185-1188
【Authors】: Todor Mihaylov ; Daniel Balchev ; Yasen Kiprov ; Ivan Koychev ; Preslav Nakov
【Abstract】: We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.
【Keywords】: community question answering; goodness polarity lexicons; sentiment analysis
【Paper Link】 【Pages】:1189-1192
【Authors】: Dae Hoon Park ; Rikio Chiba
【Abstract】: Query auto-completion (QAC) systems suggest queries that complete a user's text as the user types each character. Such queries are typically selected among previously stored queries, based on specific attributes such as popularity. However, queries cannot be suggested if a user's text does not match any queries in the storage. In order to suggest queries for previously unseen text, we propose a neural language model that learns how to generate a query from a starting text, a prefix. Specifically, we employ a recurrent neural network to handle prefixes in variable length. We perform the first neural language model experiments for QAC, and we evaluate the proposed methods with a public data set. Empirical results show that the proposed methods are as effective as traditional methods for previously seen queries and are superior to the state-of-the-art QAC method for previously unseen queries.
【Keywords】: neural language model; query auto-completion; recurrent neural network
【Paper Link】 【Pages】:1193-1196
【Authors】: Avikalp Srivastava ; Madhav Datt ; Jaikrishna Chaparala ; Shubham Mangla ; Priyadarshi Patnaik
【Abstract】: Corporations spend millions of dollars on developing creative image based promotional content to advertise to their user-base on platforms like Twitter. Our paper is an initial study, where we propose a novel method to evaluate and improve outreach of promotional images from corporations on Twitter, based purely on their describable aesthetic attributes. Existing works in aesthetic based image analysis exclusively focus on the attributes of digital photographs, and are not applicable to advertisements due to the influences of inherent content and context based biases on outreach. Our paper identifies broad categories of biases affecting such images, describes a method for normalizing outreach scores to eliminate effects of those biases, which enables us to subsequently examine the effects of certain handcrafted describable aesthetic features on image outreach. Optimizing on the features resulting from this research is a simple method for corporations to complement their existing marketing strategy to gain significant improvement in user engagement on social media for promotional images.
【Keywords】: advertisement bias elimination; computational image aesthetics; probabilistic inference; curve shifting; social media advertisement optimization; twitter outreach scores; user engagement maximization
【Paper Link】 【Pages】:1197-1200
【Authors】: Niels Dalum Hansen ; Kåre Mølbak ; Ingemar J. Cox ; Christina Lioma
【Abstract】: Influenza-like illness (ILI) estimation from web search data is an important web analytics task. The basic idea is to use the frequencies of queries in web search logs that are correlated with past ILI activity as features when estimating current ILI activity. It has been noted that since influenza is seasonal, this approach can lead to spurious correlations with features/queries that also exhibit seasonality, but have no relationship with ILI. Spurious correlations can, in turn, degrade performance. To address this issue, we propose modeling the seasonal variation in ILI activity and selecting queries that are correlated with the residual of the seasonal model and the observed ILI signal. Experimental results show that re-ranking queries obtained by Google Correlate based on their correlation with the residual strongly favours ILI-related queries.
【Keywords】: ili estimation; query generation; query selection; regression model; time-series analysis
【Paper Link】 【Pages】:1201-1204
【Authors】: Mozhdeh Ariannezhad ; Ali Montazeralghaem ; Hamed Zamani ; Azadeh Shakery
【Abstract】: Number of terms in a query is a query-specific constant that is typically ignored in retrieval functions. However, previous studies have shown that the performance of retrieval models varies for different query lengths, and it usually degrades when query length increases. A possible reason for this issue can be the extraneous terms in longer queries that makes it a challenge for the retrieval models to distinguish between the key and complementary concepts of the query. As a signal to understand the importance of a term, inverse document frequency (IDF) can be used to discriminate query terms. In this paper, we propose a constraint to model the interaction between query length and IDF. Our theoretical analysis shows that current state-of-the-art retrieval models, such as BM25, do not satisfy the proposed constraint. We further analyze the BM25 model and suggest a modification to adapt BM25 so that it adheres to the new constraint. Our experiments on three TREC collections demonstrate that the proposed modification outperforms the baselines, especially for verbose queries.
【Keywords】: axiomatic analysis; query length; term discrimination; theoretical analysis; verbose queries
【Paper Link】 【Pages】:1205-1208
【Authors】: Adam Jatowt ; Daisuke Kawai ; Katsumi Tanaka
【Abstract】: Wikipedia is the result of collaborative effort aiming to represent human knowledge and to make it accessible to the public. Many Wikipedia articles however lack key metadata information. For example, relatively large number of people described in Wikipedia have no information on their birth and death dates. We propose in this paper to estimate entity's lifetimes using link structure in Wikipedia focusing on person entities. Our approach is based on propagating temporal information over links between Wikipedia articles.
【Keywords】: entity dating; temporal link analysis; wikipedia
【Paper Link】 【Pages】:1209-1212
【Authors】: Alberto Barrón-Cedeño ; Giovanni Da San Martino ; Simone Filice ; Alessandro Moschitti
【Abstract】: In many Information Retrieval tasks, the boundary between classes is not well defined, and assigning a document to a specific class may be complicated, even for humans. For instance, a document which is not directly related to the user's query may still contain relevant information. In this scenario, an option is to define an intermediate class collecting ambiguous instances. Yet some natural questions arise. Is this annotation strategy convenient? how should the intermediate class be treated? To answer these questions, we explored two community question answering datasets whose comments were originally annotated with three classes. We re-annotated a subset of instances considering a binary good vs bad setting. Our main contribution is to show empirically that the inclusion of an intermediate class to assess Boolean relevance is not useful. Moreover, in case the data is already annotated with a 3-class strategy, the instances from the intermediate class can be safely removed at training time.
【Keywords】: community question answering; crowdsourcing; learning to rank; relevance assessment
【Paper Link】 【Pages】:1213-1216
【Authors】: Saeid Balaneshinkordan ; Alexander Kotov
【Abstract】: Although information retrieval models based on Markov Random Fields (MRF), such as Sequential Dependence Model and Weighted Sequential Dependence Model (WSDM), have been shown to outperform bag-of-words probabilistic and language modeling retrieval models by taking into account term dependencies, it is not known how to effectively account for term dependencies in query expansion methods based on pseudo-relevance feedback (PRF) for retrieval models of this type. In this paper, we propose Semantic Weighted Dependence Model (SWDM), a PRF based query expansion method for WSDM, which utilizes distributed low-dimensional word representations (i.e., word embeddings). Our method finds the closest unigrams to each query term in the embedding space and top retrieved documents and directly incorporates them into the retrieval function of WSDM. Experiments on TREC datasets indicate statistically significant improvement of SWDM over state-of-the-art MRF retrieval models, PRF methods for MRF retrieval models and embedding based query expansion methods for bag-of-words retrieval models.
【Keywords】: query expansion; semantic retrieval models; sequential dependence model; term dependence retrieval models; word embeddings
【Paper Link】 【Pages】:1217-1220
【Authors】: Jinfeng Rao ; Hua He ; Jimmy Lin
【Abstract】: In recent years, neural networks have been applied to many text processing problems. One example is learning a similarity function between pairs of text, which has applications to paraphrase extraction, plagiarism detection, question answering, and ad hoc retrieval. Within the information retrieval community, the convolutional neural network model proposed by Severyn and Moschitti in a SIGIR 2015 paper has gained prominence. This paper focuses on the problem of answer selection for question answering: we attempt to replicate the results of Severyn and Moschitti using their open-source code as well as to reproduce their results via a de novo (i.e., from scratch) implementation using a completely different deep learning toolkit. Our de novo implementation is instructive in ascertaining whether reported results generalize across toolkits, each of which have their idiosyncrasies. We were able to successfully replicate and reproduce the reported results of Severyn and Moschitti, albeit with minor differences in effectiveness, but affirming the overall design of their model. Additional ablation experiments break down the components of the model to show their contributions to overall effectiveness. Interestingly, we find that removing one component actually increases effectiveness and that a simplified model with only four word overlap features performs surprisingly well, even better than convolution feature maps alone.
【Keywords】: deep learning; question answering; reproducibility; trec
【Paper Link】 【Pages】:1221-1223
【Authors】: Bhaskar Mitra ; Fernando Diaz ; Nick Craswell
【Abstract】: In recent years, the information retrieval (IR) community has witnessed the first successful applications of deep neural network models to short-text matching and ad-hoc retrieval tasks. However, the two communities - focused on deep neural networks and on IR - have less in common when it comes to the choice of programming languages. Indri, an indexing framework popularly used by the IR community, is written in C++, while Torch, a popular machine learning library for deep learning, is written in the light-weight scripting language Lua. To bridge this gap, we introduce Luandri (pronounced "laundry"), a simple interface for exposing the search capabilities of Indri to Torch models implemented in Lua.
【Keywords】: application programming interface; information retrieval; neural networks
【Paper Link】 【Pages】:1225-1228
【Authors】: Royal Sequiera ; Jimmy Lin
【Abstract】: Due to Twitter's terms of service that forbid redistribution of content, creating publicly downloadable collections of tweets for research purposes has been a perpetual problem for the research community. Some collections are distributed by making available the ids of the tweets that comprise the collection and providing tools to fetch the actual content; this approach has scalability limitations. In other cases, evaluation organizers have set up APIs that provide access to collections for specific tasks, without exposing the underlying content. This is a workable solution, but difficult to sustain over the long term since someone has to maintain the APIs. We have noticed that the non-profit Internet Archive has been making available for public download captures of the so-called Twitter "spritzer" stream, which is the same source as the Tweets2013 collection used in the TREC 2013 and 2014 Microblog Tracks. We analyzed both datasets in terms of content overlap and retrieval baselines to show that the Internet Archive data can serve as a drop-in replacement for the Tweets2013 collection, thereby providing the research community with, finally, a downloadable collection of tweets. Beyond this finding, we also study the impact of tweet deletions over time and how they affect the test collections.
【Keywords】: microblog retrieval; trec; twitter
【Paper Link】 【Pages】:1229-1232
【Authors】: Jun Harashima ; Yuichiro Someya ; Yohei Kikuta
【Abstract】: In food-related services, image information is as important as text information for users. For example, in recipe search services, users find recipes based not only on text but also images. To promote studies on food images, many datasets have recently been published. However, they have the following three limitations: most of the datasets include only thousands of images, they only take account of images after cooking not during the cooking process, and the images are not linked to any recipes. In this study, we construct the Cookpad Image Dataset, a novel collection of food images taken from Cookpad, the largest recipe search service in the world. The dataset includes more than 1.64 million images after cooking, and it is the largest among existing datasets. Additionally, it includes more than 3.10 million images taken during the cooking process. To the best of our knowledge, there are no datasets that include such images. Furthermore, the dataset is designed to link to an existing recipe corpus and thus, a variety of recipe texts, such as the title, description, ingredients, and process, is available for each image. In this paper, we described our dataset's features in detail and compared it with existing datasets.
【Keywords】: food image; image collection; recipe
【Paper Link】 【Pages】:1233-1236
【Authors】: Cheng Luo ; Yukun Zheng ; Yiqun Liu ; Xiaochuan Wang ; Jingfang Xu ; Min Zhang ; Shaoping Ma
【Abstract】: Web collection is essential for many Web based researches such as Web Information Retrieval (IR), Web data mining, Corpus linguistics and so on. However, it is usually expensive and time-consuming to collect a large scale of Web pages in lab-based environment and public-available collection becomes a necessity for these researches. In this study, we present a Chinese Web collection, SogouT-16, which is the largest free-of-charge public Chinese Web collection so far. We provide a variety of descriptive characteristics of SogouT-16 and discuss its adoption in a newly-designed ad-hoc retrieval task in NTCIR-13, We Want Web. SogouT-16 also provides online retrieval service and contains a number of auxiliary resources including hyperlink structure graph, query logs, word embedding, and etc. We believe that SogouT-16 will provide new opportunities for novel investigations and applications in IR and other related communities.
【Keywords】: search evaluation; test collection; web corpus
【Paper Link】 【Pages】:1237-1240
【Authors】: Harrisen Scells ; Guido Zuccon ; Bevan Koopman ; Anthony Deacon ; Leif Azzopardi ; Shlomo Geva
【Abstract】: This paper introduces a test collection for evaluating the effectiveness of different methods used to retrieve research studies for inclusion in systematic reviews. Systematic reviews appraise and synthesise studies that meet specific inclusion criteria. Systematic reviews intended for a biomedical science audience use boolean queries with many, often complex, search clauses to retrieve studies; these are then manually screened to determine eligibility for inclusion in the review. This process is expensive and time consuming. The development of systems that improve retrieval effectiveness will have an immediate impact by reducing the complexity and resources required for this process. Our test collection consists of approximately 26 million research studies extracted from the freely available MEDLINE database, 94 review (query) topics extracted from Cochrane systematic reviews, and corresponding relevance assessments. Tasks for which the collection can be used for information retrieval system evaluation are described and the use of the collection to evaluate common baselines within one such task is demonstrated. The test collection is available at https://github.com/ielab/SIGIR2017-PICO-Collection.
【Keywords】: evidence based medicine; ir evaluation; systematic reviews; test collection
【Paper Link】 【Pages】:1241-1244
【Authors】: Dietmar Schabus ; Marcin Skowron ; Martin Trapp
【Abstract】: In this paper we introduce a new data set consisting of user comments posted to the website of a German-language Austrian newspaper. Professional forum moderators have annotated 11,773 posts according to seven categories they considered crucial for the efficient moderation of online discussions in the context of news articles. In addition to this taxonomy and annotated posts, the data set contains one million unlabeled posts. Our experimental results using six methods establish a first baseline for predicting these categories. The data and our code are available for research purposes from https://ofai.github.io/million-post-corpus.
【Keywords】: online discussions
【Paper Link】 【Pages】:1245-1248
【Authors】: Sumit Sidana ; Charlotte Laclau ; Massih-Reza Amini ; Gilles Vandelle ; André Bois-Crettez
【Abstract】: In this paper, we describe a novel, publicly available collection for recommendation systems that records the behavior of customers of the European leader in eCommerce advertising, Kelkoo\footnote{\url{https://www.kelkoo.com/}}, during one month. This dataset gathers implicit feedback, in form of clicks, of users that have interacted with over 56 million offers displayed by Kelkoo, along with a rich set of contextual features regarding both customers and offers. In conjunction with a detailed description of the dataset, we show the performance of six state-of-the-art recommender models and raise some questions on how to encompass the existing contextual information in the system.
【Keywords】: collection; e-advertising; implicit feedback; recommender systems
【Paper Link】 【Pages】:1249-1252
【Authors】: Anastasia Giachanou ; Ida Mele ; Fabio Crestani
【Abstract】: The advent of social media has given the opportunity to users to publicly express and share their opinion about any topic. Public opinion is very important for the interested entities that can leverage such information in the process of making decisions. In addition, identifying sentiment changes and the likely causes that have triggered them allows interested parties to adjust their strategies and attract more positive sentiment. With the aim to facilitate research on this problem, we describe a collection of tweets that can be used for detecting and ranking the likely triggers of sentiment spikes towards different entities. To build the collection, we first group tweets by topic which are then manually annotated according to sentiment polarity and strength. We believe that this collection can be useful for further research on detecting sentiment change triggers, sentiment analysis and sentiment prediction.
【Keywords】: sentiment spikes; social media; test collections
【Paper Link】 【Pages】:1253-1256
【Authors】: Peilin Yang ; Hui Fang ; Jimmy Lin
【Abstract】: Software toolkits play an essential role in information retrieval research. Most open-source toolkits developed by academics are designed to facilitate the evaluation of retrieval models over standard test collections. Efforts are generally directed toward better ranking and less attention is usually given to scalability and other operational considerations. On the other hand, Lucene has become the de facto platform in industry for building search applications (outside a small number of companies that deploy custom infrastructure). Compared to academic IR toolkits, Lucene can handle heterogeneous web collections at scale, but lacks systematic support for evaluation over standard test collections. This paper introduces Anserini, a new information retrieval toolkit that aims to provide the best of both worlds, to better align information retrieval practice and research. Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks. Our initial efforts have focused on three functionalities: scalable, multi-threaded inverted indexing to handle modern web-scale collections, streamlined IR evaluation for ad hoc retrieval on standard test collections, and an extensible architecture for multi-stage ranking. Anserini ships with support for many TREC test collections, providing a convenient way to replicate competitive baselines right out of the box. Experiments verify that our system is both efficient and effective, providing a solid foundation to support future research.
【Keywords】: multi-threaded inverted indexing; open-source toolkits; reproducibility; trec test collections
【Paper Link】 【Pages】:1257-1260
【Authors】: Benjamin Kille ; Andreas Lommatzsch ; Frank Hopfgartner ; Martha Larson ; Arjen P. de Vries
【Abstract】: Recommender System research has evolved to focus on developing algorithms capable of high performance in online systems. This development calls for a new evaluation infrastructure that supports multi-dimensional evaluation of recommender systems. Today's researchers should analyze algorithms with respect to a variety of aspects including predictive performance and scalability. Researchers need to subject algorithms to realistic conditions in online A/B tests. We introduce two resources supporting such evaluation methodologies: the new data set of stream recommendation interactions released for CLEF NewsREEL 2017, and the new Open Recommendation Platform (ORP). The data set allows researchers to study a stream recommendation problem closely by "replaying" it locally, and ORP makes it possible to take this evaluation "live" in a living lab scenario. Specifically, ORP allows researchers to deploy their algorithms in a live stream to carry out A/B tests. To our knowledge, NewsREEL is the first online news recommender system resource to be put at the disposal of the research community. In order to encourage others to develop comparable resources for a wide range of domains, we present a list of practical lessons learned in the development of the dataset and ORP.
【Keywords】: multi-dimensional benchmarking; recommender system; streams
【Paper Link】 【Pages】:1261-1264
【Authors】: Matthias Hagen ; Martin Potthast ; Marcel Gohsen ; Anja Rathgeber ; Benno Stein
【Abstract】: We present a new large-scale collection of 54,772 queries with manually annotated spelling corrections. For 9,170 of the queries (16.74%), spelling variants that are different to the original query are proposed. With its size, our new corpus is an order of magnitude larger than other publicly available query spelling corpora. In addition to releasing the new large-scale corpus, we also provide an implementation of the winner of the Microsoft Speller Challenge from~2011 and compare it on the different publicly available corpora to spelling corrections mined from Google and Bing. This way, we also shed some light on the spelling correction performance of state-of-the-art commercial search systems.
【Keywords】: query log mining; query spelling accuracy; query spelling corpus; query spelling correction
【Paper Link】 【Pages】:1265-1268
【Authors】: Faegheh Hasibi ; Fedor Nikolaev ; Chenyan Xiong ; Krisztian Balog ; Svein Erik Bratsberg ; Alexander Kotov ; Jamie Callan
【Abstract】: The DBpedia-entity collection has been used as a standard test collection for entity search in recent years. We develop and release a new version of this test collection, DBpedia-Entity v2, which uses a more recent DBpedia dump and a unified candidate result pool from the same set of retrieval models. Relevance judgments are also collected in a uniform way, using the same group of crowdsourcing workers, following the same assessment guidelines. The result is an up-to-date and consistent test collection.To facilitate further research, we also provide details about the pre-processing and indexing steps, and include baseline results from both classical and recently developed entity search methods.
【Keywords】: dbpedia; entity retrieval; entity search; semantic search; test collection
【Paper Link】 【Pages】:1269-1272
【Authors】: Mohammad Aliannejadi ; Ida Mele ; Fabio Crestani
【Abstract】: Suggesting personalized venues helps users to find interesting places on location-based social networks (LBSNs). Although there are many LBSNs online, none of them is known to have thorough information about all venues. The Contextual Suggestion track at TREC aimed at providing a collection consisting of places as well as user context to enable researchers to examine and compare different approaches, under the same evaluation setting. However, the officially released collection of the track did not meet many participants' needs related to venue content, online reviews, and user context. That is why almost all successful systems chose to crawl information from different LBSNs. For example, one of the best proposed systems in the TREC 2016 Contextual Suggestion track crawled data from multiple LBSNs and enriched it with venue-context appropriateness ratings, collected using a crowdsourcing platform. Such collection enabled the system to better predict a venue's appropriateness to a given user's context. In this paper, we release both collections that were used by the system above. We believe that these datasets give other researchers the opportunity to compare their approaches with the top systems in the track. Also, it provides the opportunity to explore different methods to predicting contextually appropriate venues.
【Keywords】: collection; context-awareness; venue suggestion
【Paper Link】 【Pages】:1273-1276
【Authors】: Pedro Saleiro ; Natasa Milic-Frayling ; Eduarda Mendes Rodrigues ; Carlos Soares
【Abstract】: Improvements of entity-relationship (E-R) search techniques have been hampered by a lack of test collections, particularly for complex queries involving multiple entities and relationships. In this paper we describe a method for generating E-R test queries to support comprehensive E-R search experiments. Queries and relevance judgments are created from content that exists in a tabular form where columns represent entity types and the table structure implies one or more relationships among the entities. Editorial work involves creating natural language queries based on relationships represented by the entries in the table. We have publicly released the RELink test collection comprising 600 queries and relevance judgments obtained from a sample of Wikipedia List-of-lists-of-lists tables. The latter comprise tuples of entities that are extracted from columns and labelled by corresponding entity types and relationships they represent. In order to facilitate research in complex E-R retrieval, we have created and released as open source the RELink Framework that includes Apache Lucene indexing and search specifically tailored to E-R retrieval. RELink includes entity and relationship indexing based on the ClueWeb-09-B Web collection with FACC1 text span annotations linked to Wikipedia entities. With ready to use search resources and a comprehensive test collection, we support community in pursuing E-R research at scale.
【Keywords】: entity-relationship retrieval
【Paper Link】 【Pages】:1277-1280
【Authors】: Patrick Ernst ; Arunav Mishra ; Avishek Anand ; Vinay Setty
【Abstract】: We demonstrate BioNex, a system to mine, rank and visualize biomedical news events. BioNex takes biomedical queries such as "Ebola virus disease" and retrieves the k most relevant news events for them. To achieve this we first mine the generic news events by clustering them on a daily basis using general named entities and textual features. These clusters are also tagged with disambiguated biomedical entities which aid in biomedical news event exploration. The clusters are then used to compute the importance scores for the event clusters based on a combination of textual, semantic, popularity and historical importance features. BioNex also visualizes the retrieved event clusters to highlight the top news events and corresponding news articles for the given query. The visualization also provides the context for news events using (1) a chain of historically relevant news event clusters, and (2) other non-biomedical events from the same day.
【Keywords】: biological event exploration; biomedical entities; event clustering
【Paper Link】 【Pages】:1281-1284
【Authors】: Claudio Lucchese ; Cristina Ioana Muntean ; Franco Maria Nardini ; Raffaele Perego ; Salvatore Trani
【Abstract】: In this demo paper we propose RankEval, an open-source tool for the analysis and evaluation of Learning-to-Rank (LtR) models based on ensembles of regression trees. Gradient Boosted Regression Trees (GBRT) is a flexible statistical learning technique for classification and regression at the state of the art for training effective LtR solutions. Indeed, the success of GBRT fostered the development of several open-source LtR libraries targeting efficiency of the learning phase and effectiveness of the resulting models. However, these libraries offer only very limited help for the tuning and evaluation of the trained models. In addition, the implementations provided for even the most traditional IR evaluation metrics differ from library to library, thus making the objective evaluation and comparison between trained models a difficult task. RankEval addresses these issues by providing a common ground for LtR libraries that offers useful and interoperable tools for a comprehensive comparison and in-depth analysis of ranking models.
【Keywords】: analysis; evaluation; learning to rank
【Paper Link】 【Pages】:1285-1288
【Authors】: Derek Wu ; Hongning Wang
【Abstract】: We develop an aspect-based sentiment analysis system named ReviewMiner. It analyzes opinions expressed about an entity in an online review at the level of topical aspects to discover each individual reviewer's latent opinion on each aspect as well as his/her relative emphasis on different aspects when forming the overall judgment of the entity. The system personalizes the retrieved results according to users' input preferences over the identified aspects, recommends similar items based on the detailed aspect-level opinions, and summarizes aspect-level opinions in textual, temporal and spatial dimensions. The unique multi-modal opinion summarization and visualization mechanisms provide users with rich perspectives to digest information from user-generated opinionated content for making informed decisions.
【Keywords】: aspect-based sentiment analysis; personalization; review mining
【Paper Link】 【Pages】:1289-1292
【Authors】: Faegheh Hasibi ; Krisztian Balog ; Darío Garigliotti ; Shuo Zhang
【Abstract】: We introduce Nordlys, a toolkit for entity-oriented and semantic search. It provides functionality for entity cataloging, entity retrieval, entity linking, and target type identification. Nordlys may be used as a Python library or as a RESTful API, and also comes with a web-based user interface. The toolkit is open source and is available at http://nordlys.cc.
【Keywords】: entity linking; entity retrieval; result presentation; semantic search
【Paper Link】 【Pages】:1293-1296
【Authors】: Thomas Wilhelm-Stein ; Stefan Kahl ; Maximilian Eibl
【Abstract】: Predicting the performance of individual components of information retrieval systems, in particular the complex interactions between those components, is still challenging. Therefore, professionals are needed for the implementation and configuration of retrieval systems and retrieval components. Our web-based application, called Xtrieval Web Lab, enables newcomers and learners to gain practical knowledge about the information retrieval process. They can arrange a multitude of components of retrieval systems and evaluate them with real world data without utilizing a programming language. Game mechanics guide the learners in their discovery process and motivate them.
【Keywords】: evaluation; game mechanics; information retrieval; software-based teaching environment
【Paper Link】 【Pages】:1297-1300
【Authors】: Jeong Woo Son ; Wonjoo Park ; Sang-Yun Lee ; Jinwoo Kim ; Sun-Joong Kim
【Abstract】: Broadcasting contents are the most plausiable resources for services with video contents. Even though we already have huge amount of produced broadcasting contents, there rarely exists a system to analyze and generate information on broadcasting contents to support content retrieval and recommendation services. This paper proposes a new system for this purpose. In the proposed system, a broadcasting content is segmented into semantic units, scenes, based on its multiple characteristics. The proposed system analyzes scenes and generates their keywords, topics, and stories. Connections among scenes are automatically establishing based on shared keywords, similar topics, and consistency in stories. To support operators, the proposed system offers two tools: Scene Studio and SceneViz. We prepare several Open APIs in the proposed system to provide information and connections for service providers. The feasibility of the proposed system is shown with numerical evaluations on the qualities of generated information. We also introduce two video clip services implemented with our system.
【Keywords】: broadcasting content; content mining; scene segmentation; smart media; topic model
【Paper Link】 【Pages】:1301-1304
【Authors】: Enrique Amigó ; Jorge Carrillo de Albornoz ; Mario Almagro-Cádiz ; Julio Gonzalo ; Javier Rodríguez-Vidal ; Felisa Verdejo
【Abstract】: The EvALL online evaluation service aims to provide a unified evaluation framework for Information Access systems that makes results completely comparable and publicly available for the whole research community. For researchers working on a given test collection, the framework allows to: (i) evaluate results in a way compliant with measurement theory and with state-of-the-art evaluation practices in the field; (ii) quantitatively and qualitatively compare their results with the state of the art; (iii) provide their results as reusable data to the scientific community; (iv) automatically generate evaluation figures and (low-level) interpretation of the results, both as a pdf report and as a latex source. For researchers running a challenge (a comparative evaluation campaign on shared data), the framework helps them to manage, store and evaluate submissions, and to preserve ground truth and system output data for future use by the research community. EvALL can be tested at http://evall.uned.es.
【Keywords】: evaluation; evaluation infrastructure; evaluation metrics; information access; information retrieval
【Paper Link】 【Pages】:1305-1308
【Authors】: Jin Yao Chin ; Sourav S. Bhowmick ; Adam Jatowt
【Abstract】: Tweets summarization aims to find a group of representative tweets for a specific topic. In recent times, there have been several research efforts toward devising a variety of techniques to summarize tweets in Twitter. However, these techniques are either not personal (i.e., consider only tweets in the timeline of a specific user) or are too expensive to be realized on a mobile device. Given that 80% of active Twitter users access the site on mobile devices, in this demonstration we present a lightweight, personalized, on-demand, topic modeling-based tweets summarization engine called TOTEM, designed for such devices. Specifically, TOTEM summarizes most recent tweets on a user's timeline and enables her to visualize and navigate representative topics and associated tweets in a user-friendly tap-and-swipe manner.
【Keywords】: mobile device; personal; summarization; topic modeling; tweets
【Paper Link】 【Pages】:1309-1312
【Authors】: Sosuke Kato ; Riku Togashi ; Hideyuki Maeda ; Sumio Fujita ; Tetsuya Sakai
【Abstract】: Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have enabled researchers to construct robust open-domain question answering (QA) systems. It is often claimed that such state-of-the-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given massive training data plus a separate corpus of Q&A pairs as the target knowledge source, how well would such a system really perform? How fast would it respond? In this demonstration, we provide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-term memory (LSTM) architecture with over 11 million Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&A knowledge source for answer retrieval. Our core demonstration system is a pair of Japanese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and compare the Japanese responses from the two systems after translating them into English.
【Keywords】: community question answering; long short-term memory; question answering
【Paper Link】 【Pages】:1313-1316
【Authors】: Vuong Thanh Tung ; Giulio Jacucci ; Tuukka Ruotsalo
【Abstract】: We demonstrate proactive information retrieval via screen surveillance. A user's digital activities are continuously monitored by capturing all content on a user's screen using optical character recognition. This includes all applications and services being exploited and relies on each individual user's computer usage, such as their Web browsing, emails, instant messaging, and word processing. Topic modeling is then applied to detect the user's topical activity context to retrieve information. We demonstrate a system that proactively retrieves information from a user's activity history being observed on the screen when the user is performing unseen activities on a personal computer. We report an evaluation with ten participants that shows high user satisfaction and retrieval effectiveness. Our demonstration and experimental results show that surveillance of a user's screen can be used to build an extremely rich model of a user's digital activities across application boundaries and enable effective proactive information retrieval.
【Keywords】: proactive information retrieval; screen surveillance; user modeling
【Paper Link】 【Pages】:1317-1320
【Authors】: Ba Quan Truong ; Sourav S. Bhowmick ; Curtis E. Dyreson ; Hong Jing Khok
【Abstract】: Despite a decade of research on XML keyword search (XKS), demonstration of a high quality XKS system has still eluded the information retrieval community. Existing XKS engines primarily suffer from two limitations. First, although the smallest lowest common ancestor (SLCA) algorithm (or a variant, e.g., ELCA) is widely accepted as a meaningful way to identify subtrees containing the query keywords, SLCA typically performs poorly on documents with missing elements, i.e., (sub)elements that are optional, or appear in some instances of an element type but not all. Second, since keyword search can be ambiguous with multiple possible interpretations, it is desirable for an XKS engine to automatically expand the original query by providing a classification of different possible interpretations of the query w.r.t. the original results. However, existing XKS systems do not support such result-based query expansion. We demonstrate ASTERIX, an innovative XKS engine that addresses these limitations.
【Keywords】: ambiguous; keyword search; missing elements; query expansion; slca; xml
【Paper Link】 【Pages】:1321-1324
【Authors】: Aldo Lipani ; Mihai Lupu ; Allan Hanbury
【Abstract】: Every year more than 25 test collections are built among the main Information Retrieval (IR) evaluation campaigns. They are extremely important in IR because they become the evaluation praxis for the forthcoming years. Test collections are built mostly using the pooling method. The main advantage of this method is that it drastically reduces the number of documents to be judged. It does so at the cost of introducing biases, which are sometimes aggravated by non optimal configuration. In this paper we develop a novel visualization technique for the pooling method, and integrate it in a demo application named Visual Pool. This demo application enables the user to interact with the pooling method with ease, and develops visual hints in order to analyze existing test collections, and build better ones.
【Keywords】: pooling method; pooling strategies; test collections; visualization
【Paper Link】 【Pages】:1325-1328
【Authors】: Nimesh Ghelani ; Salman Mohammed ; Shine Wang ; Jimmy Lin
【Abstract】: We present a system for identifying interesting social media posts on Twitter and delivering them to users' mobile devices in real time as push notifications. In our problem formulation, users are interested in broad topics such as politics, sports, and entertainment: our system processes tweets in real time to identify relevant, novel, and salient content. There are three interesting aspects to our work: First, instead of attempting to tame the cacophony of unfiltered tweets, we exploit a smaller, but still sizeable, collection of curated tweet streams corresponding to the Twitter accounts of different media outlets. Second, we apply distant supervision to extract topic labels from curated streams that have a specific focus, which can then be leveraged to build high-quality topic classifiers essentially "for free". Finally, our system delivers content via Twitter direct messages, supporting in situ interactions modeled after conversations with intelligent agents. These ideas are demonstrated in an end-to-end working prototype.
【Keywords】: conversational agents; distant supervision; event detection; twitter
【Paper Link】 【Pages】:1329-1332
【Authors】: Bevan Koopman ; Guido Zuccon ; Jack Russell
【Abstract】: Evidence-based medicine (EBM) is the practice of making clinical decisions based on rigorous scientific evidence. EBM relies on effective access to peer-reviewed literature - a task hampered by both the exponential growth of medical literature and a lack of efficient and effective means of searching and presenting this literature. This paper describes a search engine specifically designed for searching medical literature for the purpose of EBM and in a clinical decision support setting.
【Keywords】: evidence-based medicine; medical information retrieval; search engines
【Paper Link】 【Pages】:1333-1336
【Authors】: Giuseppe Amato ; Paolo Bolettieri ; Vinicius Monteiro de Lira ; Cristina Ioana Muntean ; Raffaele Perego ; Chiara Renso
【Abstract】: An increasing number of people share their thoughts and the images of their lives on social media platforms. People are exposed to food in their everyday lives and share on-line what they are eating by means of photos taken to their dishes. The hashtag #foodporn is constantly among the popular hashtags in Twitter and food photos are the second most popular subject in Instagram after selfies. The system that we propose, WorldFoodMap, captures the stream of food photos from social media and, thanks to a CNN food image classifier, identifies the categories of food that people are sharing. By collecting food images from the Twitter stream and associating food category and location to them, WorldFoodMap permits to investigate and interactively visualize the popularity and trends of the shared food all over the world.
【Keywords】: deep neural network; food image recognition; social media streaming
【Paper Link】 【Pages】:1337-1340
【Authors】: Rolf Jagerman ; Carsten Eickhoff ; Maarten de Rijke
【Abstract】: Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for inferring topic models do not scale up to the massive size of today's publicly available Web-scale data sets. The state-of-the-art approaches rely on custom strategies, implementations and hardware to facilitate their asynchronous, communication-intensive workloads. We present APS-LDA, which integrates state-of-the-art topic modeling with cluster computing frameworks such as Spark using a novel asynchronous parameter server. Advantages of this integration include convenient usage of existing data processing pipelines and eliminating the need for disk writes as data can be kept in memory from start to finish. Our goal is not to outperform highly customized implementations, but to propose a general high-performance topic modeling framework that can easily be used in today's data processing pipelines. We compare APS-LDA to the existing Spark LDA implementations and show that our system can, on a 480-core cluster, process up to 135× more data and 10× more topics without sacricing model quality.
【Keywords】: latent dirichlet allocation; parameter server; topic modeling
【Paper Link】 【Pages】:1341-1344
【Authors】: Yingwei Pan ; Zhaofan Qiu ; Ting Yao ; Houqiang Li ; Tao Mei
【Abstract】: We demonstrate a video captioning bot, named Seeing Bot, which can generate a natural language description about what it is seeing in near real time. Specifically, given a live streaming video, Seeing Bot runs two pre-learned and complementary captioning modules in parallel - one for generating image-level caption for each sampled frame, and the other for generating video-level caption for each sampled video clip. In particular, both the image and video captioning modules are boosted by incorporating semantic attributes which can enrich the generated descriptions, leading to human-level caption generation. A visual-semantic embedding model is then exploited to rank and select the final caption from the two parallel modules by considering the semantic relevance between video content and the generated captions. The Seeing Bot finally converts the generated description to speech and sends the speech to an end user via an earphone. Our demonstration is conducted on any videos in the wild and supports live video captioning.
【Keywords】: chitchat bot; deep convolutional neural networks; image captioning; multi-view embedding; video captioning
【Paper Link】 【Pages】:1345
【Authors】: David Hawking
【Abstract】: Twenty years ago, I fell into the trap of believing that good performance on TREC Ad Hoc could be turned into commercial success via an enterprise search (ES) start-up. Although we achieved a measure of success, the reality was very different from my expectations. It turned out that end-users cared about search relevance only to the extent of reaching for buckets of vitriol when search failed to find what they wanted, and people making purchasing decisions were more interested in other things: what repositories can be included in the search, how responsive search is to updates, what security models are provided, the appearance of the search pages, and achieving internal business goals -- even at the expense of end-user needs. Organizations often purchase search technology as part of another system, such as a content management system, or a records management system, leaving other repositories unsearchable or incompatibly searchable. Causes of enterprise search failure include dimensions studied in IR but an ES system often fails because of the way it is configured, or because it fails to cover information resources that end-users need. The dream of a comprehensive, highly relevant, fully secure single search interface to all of an organization's information is very rarely achieved. There is a huge market potential for a killer ES product but the biggest challenges in exploiting it lie outside the scope of IR research. I look forward to sharing lessons learned from attempting to commercialize IR technology, and new perspectives from my current employment at Bing.
【Keywords】: enterprise search; industry experience; ir start-ups
【Paper Link】 【Pages】:1347
【Authors】: Dhruv Arya ; Ganesh Venkataraman
【Abstract】: The mission of LinkedIn is to connect the world's professionals to make them more productive and successful. LinkedIn operates the world's largest professional network on the Internet with more than 500 Million members in over 200 countries. Core to realizing the mission is to help people find jobs. In this paper, we describe how the jobs recommendations is powered by a search index and some practical challenges involved in scaling such a system.
【Keywords】: information retrieval; recommender systems
【Paper Link】 【Pages】:1349
【Authors】: Fernando Diaz
【Abstract】: Spotify provides users with access to a massive repository of streaming music. While some aspects of music access are familiar to the information retrieval community (e.g. semistructured data, item recommendation), nuances of the music domain require the development of new models of user understanding, intent modeling, relevance, and content understanding. These models can be studied using the large amount of content and usage data at Spotify, allowing us to extend previous results in the music information retrieval community. In this presentation, we will highlight the research involved in developing Spotify and outline a research program for large scale music access.
【Keywords】: music information retrieval
【Paper Link】 【Pages】:1351
【Authors】: Ido Guy ; Kira Radinsky
【Abstract】: Electronic commerce continues to gain popularity in recent years. On eBay, one of the largest on-line marketplaces in the world, millions of new listings (items) are submitted by a variety of sellers every day. This renders a rich diverse inventory characterized by a particularly long tail. In addition, many items in the inventory lack basic structured information, such as product identifiers, brand, category, and other properties, due to sellers' tendency to input unstructured information only, namely title and description. Such inventory therefore requires a handful of large-scale solutions to assist in organizing the data and gaining business insights. In 2016, eBay acquired SalesPredict to help structure its unstructured data. In this proposed presentation, we will share the story of a research startup from its inception until its acquisition and integration as eBay's data science team. We will review the numerous challenges from research and engineering perspectives of a startup and the principal challenges the eBay data science organization deals with today. These include the identification of duplicate, similar, and related products; the extraction of name-value attributes from item titles and descriptions; the matching of items entered by sellers to catalog products; the ranking of item titles based on their likelihood to serve as "good" product titles; and the creation of "browse node" pages to address complex search queries from potential buyers. We will describe how the eBay data science team approaches these challenges and some of the solutions already launched to production. These solutions involve the use of large-scale machine learning, information retrieval, and natural language processing techniques, and should therefore be of interest to the SIGIR audience at large.
【Keywords】:
【Paper Link】 【Pages】:1353
【Authors】: Sudong Chung
【Abstract】: In the past decades, we found ourselves spending more and more time on the Internet. As we surf the Internet longer, we are exposed to digital advertisement more. Some of us got used to it and even take it for granted. And some of us sought for a remedy and use ad-blocking software. This could be an end of story, especially if we were not computer scientists Who are making those advertisements and why? As we know that the television networks make money by broadcasting tv commercial to their viewers. They use the money to create contents and make profit. Therefore, it is obvious that we watch commercial to watch their shows. Of course, some ad-free television networks make funds from monthly subscription fee. But why do we feel that we see more ads online than before? Are we right about it? The answer is Yes. We are seeing more ads than before and it is not only because we spend more time online but also because more marketing money has shifted from offline to online marketing. The increase in digital marketing is due to several reasons; 1) increase of time spent online 2) descent of publishers' offline business 3) innovations in digital advertising technology and creatives. In this talk, I will talk about the history of digital advertising and the recent innovations in advertising technologies to make ads more relevant to individual viewer. Also I will explain why the advanced machine learning and massive user data are behind the innovations.
【Keywords】: ad tech; big data; digital advertising
【Paper Link】 【Pages】:1355
【Authors】: Anton Firsov
【Abstract】: The amount of available data grows every day. The data can help to make better decisions. However, with growing volume and variety it becomes increasingly more difficult to find the necessary data. Traditional search engines such as ElasticSearch or Apache Solr are primarily designed to search for text documents. Whereas a search for data has its own specifics: there is less text and more structure. Knoema's search engine is designed specifically to search for data by leveraging data's structure in order to get better results compared to document-oriented search engines.
【Keywords】: conversational search interaction; intelligent personal assistants; ir; structured data search
【Paper Link】 【Pages】:1357
【Authors】: Ricardo A. Baeza-Yates
【Abstract】: Queries are often ambiguous and can be interpreted in many ways, even by humans. Hence, semantic query understanding's primary objective is to understand the intention behind the query. This implies first predicting the language used to express the query. Second, parsing the query according to that language. Third, extracting the entities and concepts mentioned in the query. Finally, based on all this information, we predict one or more possible intentions with a certain probability, which is particularly important for ambiguous queries. These scores will be one of the inputs for the final semantic ranking. For example, given the query "bond", possible results for query understanding are a financial instrument, the movie character, a chemical reaction, or a term for endearment. Semantic ranking refers to ranking search results using semantic information. In a standard search engine, a rank is computed by using signals or features coming from the search query, from the documents in the collection being searched and from the search context, such as the language and device being used. Using semantic processing, we also add semantic features that come from concepts present in the knowledge base that appear in the query and semantically match documents in the collection. To do this efficiently, all documents are preprocessed semantically to build an index that includes semantic annotations. To accomplish semantic ranking, we use machine learning in several stages. The first stage selects the data sources that we should use to answer the query. In the second stage, each data source generates a set of answers using "learning to rank." The third and final stage ranks these data sources, selecting and ordering the intentions as well as the answers inside each intention (e.g., news) that will appear in the final composite answer. All these techniques are language independent, but may use language dependent features.
【Keywords】: learning to rank; query intention; query understanding; semantic ranking; semantic search
【Paper Link】 【Pages】:1359
【Authors】: Barbara Poblete
【Abstract】: In this talk I will describe "Twicalli", a real-time earthquake detection system based on citizen sensors. This system is publicly available for over a year, at http://twicalli.cl, and is currently in use as a decision support tool by the National Seismology Office and by the Hydrographic and the Oceanographic Service in Chile. The novelty of our system relies on the fact that it has a very good precision and recall tradeoff for earthquakes of all magnitude ranges that were reported on Twitter. Our earthquake detection methodology is simple, efficient, unsupervised, and it can detect earthquakes reported globally in any language and any region. This complements existing approaches that are either: i) supervised and customized to a particular geographical region, which makes them very expensive to scale geographically and keep up-to-date, or ii) unsupervised with low earthquake recall.The evaluation of our system, performed during a 9-month period, shows that our solution is competitive to the best state-of-the-art methods, providing very good precision and recall performance for a wide range of earthquake magnitudes.
【Keywords】: data visualization; decision support; emergency management; event detection; social media; social sensing; twitter
【Paper Link】 【Pages】:1361
【Authors】: Jingfang Xu ; Feifei Zhai ; Zhengshan Xue
【Abstract】: In recent years, more and more Chinese people desires to be able to access the large amount of foreign language information and understand what is happening all over the world. However, language barrier is always a problem to them. In order to break the language barrier and connect Chinese people to the foreign language information in the world, Sogou has built a cross-lingual information retrieval (CLIR) system named Sogou English (http://english.sogou.com), which enables Chinese people to search and browse foreign language information with Chinese. In Sogou English, when the user inputs a Chinese query, it will first translate the Chinese query into English, and then search over the Internet, and finally translate the search results into Chinese so that users can understand them better. Hence with Sogou English, people can read and browse the information from English world without actually knowing English.
【Keywords】: cross-lingual information retrieval; information retrieval; neural machine translation; search engine
【Paper Link】 【Pages】:1363
【Authors】: Hideyuki Maeda
【Abstract】: We present an Euclidean embedding image representation, which serves to rank auction item images through wide range of semantic similarity spectrum, in the order of the relevance to the given query image much more effective than the baseline method in terms of a graded relevance measure. Our method uses three stream deep convolutional siamese networks to learn a distance metric and we leverage search query logs of an auction item search of the largest auction service in Japan. Unlike previous approaches, we define the inter-image relevance on the basis of user queries in the logs used to search each auction item, which enables us to acquire the image representation preserving the features concerning user intents in real e-commerce world.
【Keywords】: euclidean embedding; search by image; search query logs
【Paper Link】 【Pages】:1365
【Authors】: Pavel Serdyukov
【Abstract】: Online search evaluation, and A/B testing in particular, is an irreplaceable tool for modern search engines. Typically, online experiments last for several days or weeks and require a considerable portion of the search traffic. Despite the increasing need for running more experiments, the amount of that traffic is limited. This situation leads to the problem of finding new key performance metrics with higher sensitivity and lower variance. Recently, we proposed a number of techniques to alleviate this need for larger sample sizes in A/B experiments. One approach was based on formulating the quest for finding a sensitive metric as a data-driven machine learning problem of finding a sensitive metric combination \cite{Kharitonov2017}. We assumed that each single observation in these experiments is assigned with a vector of metrics (features) describing it. After that, we learned a linear combination of these metrics, such that the learned combination can be considered as a metric itself, and (a) agrees with the preference direction in the seed experiments according to a baseline ground truth metric, (b) achieves a higher sensitivity than the baseline ground-truth metric. Another approach addressed the problem of delays in the treatment effects causing low sensitivity of the metrics and requiring to conduct A/B experiments with longer duration or larger set of users from a limited traffic \cite{Drutsa2017}. We found that a delayed treatment effect of a metric could be revealed through the daily time series of the metric's measurements over the days of an A/B test. So, we proposed several metrics that learn the models of the trend in such time series and use them to quantify the changes in the user behavior. Finally, in another study \cite{Poyarkov2016}, we addressed the problem of variance reduction for user engagement metrics and developed a general framework that allows us to incorporate both the existing state-of-the-art approaches to reduce the variance and some novel ones based on advanced machine learning techniques. The expected value of the key metric for a given user consists of two components: (1) the expected value for this user irrespectively the treatment assignment and (2) the treatment effect for this user. The expectation of the 1st component does not depend on the treatment assignment and does not contribute to the actual average treatment effect, but may increase the variance of its estimation. If we knew the value of the first component, we would subtract it from the key metric and obtain a new metric with decreased variance. However, since we cannot evaluate the first component exactly, we propose to predict it based on the attributes of the user that are independent of the treatment exposure. Therefore, we propose to utilize, instead of the average value of a key metric, its average deviation from its predicted value. In this way, the problem of variance reduction is reduced to the problem of finding the best predictor for the key metric that is not aware of the treatment exposure. In our general approach, we apply gradient boosted decision trees and achieve a significantly greater variance reduction than the state-of-the-art.
【Keywords】: a/b testing; online evaluation; online metrics
【Paper Link】 【Pages】:1367
【Authors】: Inho Kang
【Abstract】: Naver has been the most popular search engine for over a decade in South Korea. As a search portal, Naver aims to match a user's search intentions to the information from the web pages and databases, and to connect users based on shared interests to provide the best way to find the information. Over the past decade, Naver has been trying to better understand Korean users, queries, and web pages for PC and mobile search. In 2002, Naver introduced Knowledge-IN, which was the forerunner of community Question Answering to find out the need of users and topic experts. Users can ask their specific inquiry to appropriate topic experts in their search results. In addition to PC and mobile, Naver is trying to enable a user to access the relevant information using any other device or interface. In detecting common interest groups and good creators, Naver adds device and interface factors. Not only the contents, but also the delivery media types are important in satisfying users on various devices. Deep learning (DL) based methods have tremendous progress in image and text classification. With DL based methods, not only queries, and text documents, but also images, videos, live-streams, locations, etc. are classified and linked to detect common interest groups, and select and rank good creators and good delivery types in each group. With DL, Naver seeks to provide search results that meet user needs more precisely while learning and improving on the fly. In this talk, I'll cover some efforts and challenges in understanding and satisfying users on various devices.
【Keywords】: common interest group; deep learning; search portal; user understanding
【Paper Link】 【Pages】:1369
【Authors】: Joel Mackenzie
【Abstract】: With the growing popularity of the world-wide-web and the increasing accessibility of smart devices, data is being generated at a faster rate than ever before. This presents scalability challenges to web-scale search systems -- how can we efficiently index, store and retrieve such a vast amount of data? A large amount of prior research has attempted to address many facets of this question, with the invention of a range of efficient index storage and retrieval frameworks that are able to efficiently answer most queries. However, the current literature generally focuses on improving the mean or median query processing time in a given system. In the proposed PhD project, we focus on improving the efficiency of high percentile tail latencies in large scale IR systems while minimising end-to-end effectiveness loss. Although there is a wealth of prior research involving improving the efficiency of large scale IR systems, the most relevant prior work involves predicting long-running queries and processing them in various ways to avoid large query processing times. Prediction is often done through pre-trained models based on both static and dynamic features from queries and documents. Many different approaches to reducing the processing time of long running queries have been proposed, including parallelising queries that are predicted to run slowly, scheduling queries based on their predicted run time, and selecting or modifying the query processing algorithm depending on the load of the system. Considering the specific focus on tail latencies in large-scale IR systems, the proposed research aims to: (i) study what causes large tail latencies to occur in large-scale web search systems, (ii) propose a framework to mitigate tail latencies in multi-stage retrieval through the prediction of a vast range of query-specific efficiency parameters, (iii) experiment with mixed-mode query semantics to provide efficient and effective querying to reduce tail latencies, and (iv) propose a time-bounded solution for Document-at-a-Time (DaaT) query processing which is suitable for current web search systems. As a preliminary study, Crane et al. compared some state-of-the-art query processing strategies across many modern collections. They found that although modern DaaT dynamic pruning strategies are very efficient for ranked disjunctive processing, they have a much larger variance in processing times than Score-at-a-Time (SaaT) strategies which have a similar efficiency profile regardless of query length or the size of the required result set. Furthermore, Mackenzie et al. explored the efficiency trade-offs for paragraph retrieval in a multi-stage question answering system. They found that DaaT dynamic pruning strategies could efficiently retrieve the top-1,000 candidate paragraphs for very long queries. Extending on prior work, Mackenzie et al. showed how a range of per-query efficiency settings can be accurately predicted such that 99.99 percent of queries are serviced in less than 200 ms without noticeable effectiveness loss. In addition, a reference list framework was used for training models such that no relevance judgements or annotations were required. Future work will focus on improving the candidate generation stage in large-scale multi-stage retrieval systems. This will include further exploration of index layouts, traversal strategies, and query rewriting, with the aim of improving early stage efficiency to reduce the system tail latency, while potentially improving end-to-end effectiveness.
【Keywords】: efficiency; scalability; tail latency
【Paper Link】 【Pages】:1371
【Authors】: Amira Ghenai
【Abstract】: People regularly use web search and social media to investigate health related issues. This type of Internet data might contain misinformation i.e. incorrect information which contradicts current established medical understanding. If people are influenced by the presented misinformation in these sources, they can make harmful decisions about their health. My research goal is to investigate the effect of Internet data on people's health. Working with my colleagues on this topic, our current findings suggest that there is a potential for people to be harmed by search engine results. Furthermore, we successfully built a high precision approach to track misinformation in social media. In this paper, I briefly discuss my current work including background key references. Thereafter, I propose a research plan to understand possible mechanisms of misinformation's effect on people and its possible impacts on public health. Later, I will explain the suggested research methodology to achieve the research plan.
【Keywords】: health search; information retrieval; misinformation; public health monitoring; rumor; social computing for health; social media analysis; user study
【Paper Link】 【Pages】:1373
【Authors】: Ziying Yang
【Abstract】: Conventionally, relevance judgments were assessed using ordinal relevance scales such as binary and Sormunen categories [9]. Such judgments record how much overlap there is between the document and the topic. However they have been argued as unreliable and not objective [3, 5, 10] because: (1) documents are usually assessed by limited numbers of experts, with different viewpoints of relevance because of individual factors such as gender, age and background [1]; (2) the distinctions of relevance levels expected by users disparate types may be diverse [7]; (3) assessors' examining criteria drift in varying degrees as more documents are judged [8]; (4) many judgment ties are generated using ordinal scales. In order to have a better understanding of users' perceptions of relevance and collect data with high fidelity, we propose to use the Pairwise Preference technique [2] to collect relevance judgments from a crowdsourcing platform. With the collected judgments, a computed rank list containing all judged documents for each topic will be generated with the goal of having fewer relevance ties.
【Keywords】: crowdsourcing; pairwise preference; relevance assessment
【Paper Link】 【Pages】:1375
【Authors】: Sandeep Avula
【Abstract】: The popularity of messaging platforms such as Slack has given rise to thousands of different chatbots that users can engage with individually or as a group. The proposed dissertation research will investigate the use of searchbots (i.e., chatbots that perform specific search operations) during collaborative information-seeking tasks. Specifically, we will address the following research goals. RG1: Our first research goal will be to investigate the use of searchbots in a collaborative search scenario. The goal of collaborative search is to develop systems that help two or more people collaborate synchronously or asynchronously on information-seeking tasks. Collaborative search systems such as SearchTogether~\cite{Morris2007}, Coagmento~\cite{shah2010coagmento}, CollabSearch~\cite{Yue2012}, and ResultsSpace~\cite{Capra2012} allow users to share information, communicate asynchronously or in real-time, and provide interactive visualizations that raise awareness of each user's search activities, allowing users to learn from each other's search strategies and avoid duplicating work. Prior research shows that while people often search in pairs and in larger groups, they do so without the use of specialized search tools and instead coordinate via "out-of-channel" communication tools such as email, text messaging, phone, and social media~\cite{morris2008survey,Morris2013}. Our first goal will be to investigate the use of searchbots during real-time collaborative search tasks. Our interest in the use of searchbots for collaborative search echoes a suggestion made by Morris~\cite{Morris2013} to develop lightweight collaborative search tools over existing communication platforms.
【Keywords】: chatbots; collaborative search; intelligent agents
【Paper Link】 【Pages】:1377
【Authors】: Anjie Fang
【Abstract】: In the past decade, the use of social media networks (e.g. Twitter) increased dramatically becoming the main channels for the mass public to express their opinions, ideas and preferences, especially during an election or a referendum. Both researchers and the public are interested in understanding what topics are discussed during a real social event, what are the trends of the discussed topics and what is the future topical trend. Indeed, modelling such topics as well as trends offer opportunities for social scientists to continue a long-standing research, i.e. examine the information exchange between people in different communities. We argue that computing science approaches can adequately assist social scientists to extract topics from social media data, to predict their topical trends, or to classify a social media user (e.g. a Twitter user) into a community. However, while topic modelling approaches and classification techniques have been widely used, challenges still exist, such as 1) existing topic modelling approaches can generate topics lacking of coherence for social media data; 2) it is not easy to evaluate the coherence of topics; 3) it can be challenging to generate a large training dataset for developing a social media user classifier. Hence, we identify four tasks to solve these problems and assist social scientists. Initially, we aim to propose topic coherence metrics that effectively evaluate the coherence of topics generated by topic modelling approaches. Such metrics are required to align with human judgements. Since topic modelling approaches cannot always generate useful topics, it is necessary to present users with the most coherent topics using the coherence metrics. Moreover, an effective coherence metric helps us evaluate the performance of our proposed topic modelling approaches. The second task is to propose a topic modelling approach that generates more coherent topics for social media data. We argue that the use of time dimension of social media posts helps a topic modelling approach to distinguish the word usage differences over time, and thus allows to generate topics with higher coherence as well as their trends. A more coherent topic with its trend allows social scientists to quickly identify the topic subject and to focus on analysing the connections between the extracted topics with the social events, e.g., an election. Third, we aim to model and predict the topical trend. Given the timestamps of social media posts within topics, a topical trend can be modelled as a continuous distribution over time. Therefore, we argue that the future trends of topics can be predicted by estimating the density function of their continuous time distribution. By examining the future topical trend, social scientists can ensure the timeliness of their focused events. Politicians and policymakers can keep abreast of the topics that remain salient over time. Finally, we aim to offer a general method that can quickly obtain a large training dataset for constructing a social media user classifier. A social media post contains hashtags and entities. These hashtags (e.g. "#YesScot" in Scottish Independence Referendum) and entities (e.g., job title or parties' name) can reflect the community affiliation of a social media user. We argue that a large and reliable training dataset can be obtained by distinguishing the usage of these hashtags and entities. Using the obtained training dataset, a social media user community classifier can be quickly achieved, and then used as input to assist in examining the different topics discussed in communities. In conclusion, we have identified four aspects for assisting social scientists to better understand the discussed topics on social media networks. We believe that the proposed tools and approaches can help to examine the exchanges of topics among communities on social media networks.
【Keywords】: social media
【Paper Link】 【Pages】:1379
【Authors】: Esraa Ali
【Abstract】: Faceted Search Systems (FSS) have gained prominence in research as one of the exploratory search approaches that support complex search tasks. They provide facets to educate users about the information space and allow them to refine their search query and navigate back and forth between resources on a single results page. When the information available in the collection being searched across increases, so does the number of associated facets. This can make it impractical to display all of the facets at once. To tackle this problem, FSS employ methods for facet ranking. Ranking methods can be based on the information structure, the textual queries issued by the user, or the usage logs. Such methods reflect neither the importance of the facets nor the user interests. I focus on the problem of ranking facets from knowledge bases (KB) and Linked Open Data (LOD). KB have the advantage of containing high quality structured data. With the increasing size and complexity of LOD datasets, the task of deciding which facets should be manifest to the user, and in which order, becomes more difficult. Moreover, the idea of personalizing exploratory search can be challenging and tricky, since personalization in IR (specifically precision-oriented search engines) implicitly implies narrowing and focusing the information space to retrieve the most relevant results according to the users' interests and desires. On the contrary, exploratory search systems are typically recall-oriented and they favor covering as much from the information space as possible. They also encourage diversifying the user knowledge to help them learn and discover the unknown. The generation of a ranked list of facets should be a dynamic process for a number of reasons. First of all, manually setting up facets is a time consuming task which relies upon domain experts. Second, it is not practical on large, multi-domain datasets. Even one-off automatic facet generation and ranking might not be suitable for data that changes and grows over time. Lastly, the relevance of facets can be user, query and context dependant. I am proposing a personalized approach to the dynamic ranking of facets. The approach combines different sources of information to recommend the most relevant facets. The first source is the knowledge-base from which the facets are originally generated. The second is facets generated from the top-ranked documents in a search system. The user search query is submitted to a general search engine and the top ranked documents are used to add context to the ranking process. Finally, the third source uses a user interests profile, which is collected from social media and the user's behavior in the system. These sources contribute to the final ranking score to reflect the importance of facets without ignoring user interests. My proposed research aims to answer the following research questions: RQ1: To what extent does the addition of features from retrieved search results from a general web search improve the computation of facet relevance? RQ2: What is the most effective method to incorporate personal interests and user usage data into the ranking process? RQ3: Does personalising facet ranking have a measurable impact upon the user search experience.
【Keywords】: exploratory search; faceted search; knowledge bases; personalization
【Paper Link】 【Pages】:1381
【Authors】: Ke Yuan
【Abstract】: With the quick availability and growth of mathematical information worldwide, how to effectively retrieve the relevant information about mathematical formulae, has attracted increasing attention from the researchers of mathematical information retrieval (MIR). Existing methods mainly focus on the appearance similarity between formulae. However, there are more important formula-related information that could be explored, for instance, link relations between formulae, formula contexts and temporal information. In this study, I propose a novel formula feature modeling method for mathematical information retrieval. In more details, three new formula features have been proposed for better representing mathematical formulae: formula-related concept features extracted from link structure (Formula Citation Graph, FCG), essential semantic features extracted from descriptive textual information of formulae through Recurrent Neural Networks (RNN) and temporal features extracted from time-related information. All these features could be used to index and retrieve formulae.
【Keywords】: formula citation graph (fcg); formula features; mathematical information retrieval (mir)
【Paper Link】 【Pages】:1383
【Authors】: Jarana Manotumruksa
【Abstract】: In recent years, vast amounts of user-generated data have being created on Location-Based Social Networks (LBSNs) such as Yelp and Foursquare. Making effective personalised venue suggestions to users based on their preferences and surrounding context is a challenging task. Context-Aware Venue Recommendation (CAVR) is an emerging topic that has gained a lot of attention from researchers, where context can be the user's current location for example. Matrix Factorisation (MF) is one of the most popular collaborative filtering-based techniques, which can be used to predict a user's rating on venues by exploiting explicit feedback (e.g. users' ratings on venues). However, such explicit feedback may not be available, particularly for inactive users, while implicit feedback is easier to obtain from LBSNs as it does not require the users to explicitly express their satisfaction with the venues. In addition, the MF-based approaches usually suffer from the sparsity problem where users/venues have very few rating, hindering the prediction accuracy. Although previous works on user-venue rating prediction have proposed to alleviate the sparsity problem by leveraging user-generated data such as social information from LBSNs, research that investigates the usefulness of Deep Neural Network algorithms (DNN) in alleviating the sparsity problem for CAVR remains untouched or partially studied.
【Keywords】: bayesian personalised ranking (bpr); context-aware venue recommendation (cavr); deep neural network models (dnn); location-based social networks (lbsns); matrix factorisation (mf); personalised pairwise ranking framework with multiple sampling criteria (prfmc)
【Paper Link】 【Pages】:1385
【Authors】: Moumita Basu
【Abstract】: In recent years, several disaster events (e.g., earthquakes in Nepal-India and Italy, terror attacks in Paris and Brussels) have proven the crucial role of Online Social Media (OSM) in providing actionable situational information. However, in such media, the crucial information is typically obscured by a lot of insignificant information (e.g., personal opinions, prayers for victims). Moreover, when time is critical, owing to the rapid speed and huge volume of microblogs, it is infeasible for human subjects to go through all the tweets posted. Hence, automated IR methods are needed to extract the relevant information from the deluge of posts. Though several methodologies have been developed for tasks like classification, summarization, etc. of social media data posted during disasters [5], there are still several research challenges that need to be addressed for effectively utilising social media data (e.g., microblogs) for aiding disaster relief operations
【Keywords】: disaster relief; microblogs; online social media
【Paper Link】 【Pages】:1387-1389
【Authors】: Ben Carterette
【Abstract】: The past 20 years have seen a great improvement in the rigor of information retrieval experimentation, due primarily to two factors: high-quality, public, portable test collections such as those produced by TREC (the Text REtrieval Conference), and the increased practice of statistical hypothesis testing to determine whether measured improvements can be ascribed to something other than random chance. Together these create a very useful standard for reviewers, program committees, and journal editors; work in information retrieval (IR) increasingly cannot be published unless it has been evaluated using a well-constructed test collection and shown to produce a statistically significant improvement over a good baseline. But, as the saying goes, any tool sharp enough to be useful is also sharp enough to be dangerous. Statistical tests of significance are widely misunderstood. Most researchers and developers treat them as a "black box": evaluation results go in and a p-value comes out. But because significance is such an important factor in determining what research directions to explore and what is published, using p-values obtained without thought can have consequences for everyone doing research in IR. Ioannidis has argued that the main consequence in the biomedical sciences is that most published research findings are false; could that be the case in IR as well?
【Keywords】: evaluation; information retrieval; reproducibility; statistical significance testing
【Paper Link】 【Pages】:1391-1393
【Authors】: Dhruv Arya ; Ganesh Venkataraman ; Aman Grover ; Krishnaram Kenthapadi
【Abstract】: Modern day social media search and recommender systems require complex query formulation that incorporates both user context and their explicit search queries. Users expect these systems to be fast and provide relevant results to their query and context. With millions of documents to choose from, these systems utilize a multi-pass scoring function to narrow the results and provide the most relevant ones to users. Candidate selection is required to sift through all the documents in the index and select a relevant few to be ranked by subsequent scoring functions. It becomes crucial to narrow down the document set while maintaining relevant ones in resulting set. In this tutorial we survey various candidate selection techniques and deep dive into case studies on a large scale social media platform. In the later half we provide hands-on tutorial where we explore building these candidate selection models on a real world dataset and see how to balance the tradeoff between relevance and latency.
【Keywords】: candidate selection; information retrieval; personalization; recommender systems; search
【Paper Link】 【Pages】:1395-1397
【Authors】: Alex Deng ; Pavel Dmitriev ; Somit Gupta ; Ron Kohavi ; Paul Raff ; Lukas Vermeer
【Abstract】: The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100's of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps. Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.
【Keywords】: a/b testing; experimentation
【Paper Link】 【Pages】:1399-1401
【Authors】: ChengXiang Zhai
【Abstract】: Text data include all kinds of natural language text such as web pages, news articles, scientific literature, emails, enterprise documents, and social media posts. As text data continues to grow quickly, it is increasingly important to develop intelligent systems to help people manage and make use of vast amounts of text data ("big text data"). As a new family of effective general approaches to text data retrieval and analysis, probabilistic topic models, notably Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocations (LDA), and many extensions of them, have been studied actively in the past decade with widespread applications. These topic models are powerful tools for extracting and analyzing latent topics contained in text data; they also provide a general and robust latent semantic representation of text data, thus improving many applications in information retrieval and text mining. Since they are general and robust, they can be applied to text data in any natural language and about any topics. This tutorial will systematically review the major research progress in probabilistic topic models and discuss their applications in text retrieval and text mining. The tutorial will provide (1) an in-depth explanation of the basic concepts, underlying principles, and the two basic topic models (i.e., PLSA and LDA) that have widespread applications, (2) a broad overview of all the major representative topic models (that are usually extensions of PLSA or LDA), and (3) a discussion of major challenges and future research directions. The tutorial should be appealing to anyone who would like to learn about topic models, how and why they work, their widespread applications, and the remaining research challenges to be solved, including especially graduate students, researchers who want to develop new topic models, and practitioners who want to apply topic models to solve many application problems. The attendants are expected to have basic knowledge of probability and statistics.
【Keywords】: lda; plsa; probabilistic models; statistical language models; text mining
【Paper Link】 【Pages】:1403-1406
【Authors】: Tom Kenter ; Alexey Borisov ; Christophe Van Gysel ; Mostafa Dehghani ; Maarten de Rijke ; Bhaskar Mitra
【Abstract】: Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.
【Keywords】: language models; learning to rank; personalization; probabilistic retrieval models; question answering
【Paper Link】 【Pages】:1407-1410
【Authors】: Ian Soboroff
【Abstract】: This is a full-day tutorial on building and validating test collections. The intended audience is advanced students who nd themselves in need of a test collection, or actually in the process of building a test collection, to support their own research. Not everyone can talk TREC, CLEF, INEX, or NTCIR into running a track to build the collection you need. The goal of this tutorial is to lay out issues, procedures, pitfalls, and practical advice.
【Keywords】: tutorial
【Paper Link】 【Pages】:1411-1414
【Authors】: Diane Kelly ; Anita Crescenzi
【Abstract】: This full-day tutorial provides general instruction about the design of controlled laboratory experiments that are conducted in order to better understand human information interaction and retrieval. Different data collection methods and procedures are described, with an emphasis on self-report measures and scales. This tutorial also introduces the use of statistical power analysis for sample size estimation and introduces and demonstrate two data analysis procedures, Multilevel Modeling and Structural Equation Modeling, that allow for examination of the whole set of variables present in interactive information retrieval (IIR) experiments, along with their various effect sizes. The goals of the tutorial are to increase participants' (1) understanding of the uses of controlled laboratory experiments with human participants; (2) understanding of the technical vocabulary and procedures associated with such experiments and (3) confidence in conducting and evaluating IIR experiments. Ultimately, we hope our tutorial will increase research capacity and research quality in IR by providing instruction about best practices to those contemplating interactive IR experiments.
【Keywords】: data collection; information retrieval experiments with users; quantitative data analysis methods; self-report data; study design
【Paper Link】 【Pages】:1415-1418
【Authors】: Guido Zuccon ; Bevan Koopman
【Abstract】: The HS2017 tutorial will cover topics from an area of information retrieval (IR) with significant societal impact - health search. Whether it is searching patient records, helping medical professionals find best-practice evidence, or helping the public locate reliable and readable health information online, health search is a challenging area for IR research with an actively growing community and many open problems. This tutorial will provide attendees with a full stack of knowledge on health search, from understanding users and their problems to practical, hands-on sessions on current tools and techniques, current campaigns and evaluation resources, as well as important open questions and future directions.
【Keywords】: consumer health search; domain specific information retrieval; domain specific search; health information retrieval; health information seeking; health search; medical information retrieval
【Paper Link】 【Pages】:1419-1420
【Authors】: Enrique Amigó ; Hui Fang ; Stefano Mizzaro ; ChengXiang Zhai
【Abstract】: This is the first workshop on the emerging interdisciplinary research area of applying axiomatic thinking to information retrieval (IR) and related tasks. The workshop aims to help foster collaboration of researchers working on different perspectives of axiomatic thinking and encourage discussion and research on general methodological issues related to applying axiomatic thinking to IR and related tasks.
【Keywords】: axiomatic thinking
【Paper Link】 【Pages】:1421-1422
【Authors】: Muthu Kumar Chandrasekaran ; Kokil Jaidka ; Philipp Mayr
【Abstract】: The large scale of scholarly publications poses a challenge for scholars in information seeking and sensemaking. Bibliometrics, information retrieval (IR), text mining and NLP techniques could help in these search and look-up activities, but are not yet widely used. This workshop is intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, text mining and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The BIRNDL workshop at SIGIR 2017 will incorporate an invited talk, paper sessions and the third edition of the Computational Linguistics (CL) Scientific Summarization Shared Task.
【Keywords】: citation analysis; digital libraries; information extraction; information retrieval; natural language processing; scientometrics; summarization
【Paper Link】 【Pages】:1423-1424
【Authors】: Hideo Joho ; Lawrence Cavedon ; Jaime Arguello ; Milad Shokouhi ; Filip Radlinski
【Abstract】: Recent advances in commercial conversational services that allow naturally spoken and typed interaction, particularly for well-formulated questions and commands, have increased the need for more human-centric interactions in information retrieval. The First International Workshop on Conversational Approaches to Information Retrieval (CAIR`17) brings together academic and industrial researchers to create a forum for research on conversational approaches to search. A specific focus is on techniques that support complex and multi-turn user-machine dialogues for information access and retrieval, and multi-model interfaces for interacting with such systems. We invite submissions addressing all modalities of conversation, including speech-based, text-based, and multimodal interaction. We also welcome studies of human-human interaction (e.g., collaborative search) that can inform the design of conversational search applications, and work on evaluation of conversational approaches.
【Keywords】: conversational search; discourse and dialogue; information retrieval
【Paper Link】 【Pages】:1425-1426
【Authors】: Jon Degenhardt ; Surya Kallumadi ; Maarten de Rijke ; Luo Si ; Andrew Trotman ; Yinghui Xu
【Abstract】: eCommerce Information Retrieval has received little attention in the academic literature, yet it is an essential component of some of the largest web sites (such as eBay, Amazon, Airbnb, Alibaba, Taobao, Target, Facebook, and others). SIGIR has for several years seen sponsorship from these kinds of organizations, who clearly value the importance of research into Information Retrieval. This workshop brings together researchers and practitioners of eCommerce IR to discuss topics unique to it, to set a research agenda, and to examine how to build a dataset for research into this fascinating topic. eCommerce IR is ripe for research and has a unique set of problems. For example, in eCommerce search there may be no hypertext links between documents (products); there is a click stream, but more importantly, there is often a buy stream. eCommerce problems are wide in scope and range from user interaction modalities (the kinds of search seen in when buying are different from those of web-page search (i.e. it is not clear how shopping and buying relate to the standard web-search interaction models)) through to dynamic updates of a rapidly changing collection on auction sites, and the experienceness of some products (such as Airbnb bookings).
【Keywords】: ecommerce; product search; recommendation
【Paper Link】 【Pages】:1427-1428
【Authors】: Laura Dietz ; Chenyan Xiong ; Edgar Meij
【Abstract】: Knowledge graphs have been used throughout the history of information retrieval for a variety of tasks. Technological advances in knowledge acquisition and alignment technology from the last few years gave rise to a body of new approaches for utilizing knowledge graphs in text retrieval tasks. It is therefore time to consolidate the community efforts in studying how knowledge graph technology can be employed in information retrieval systems in the most effective way. It is also time to start a dialogue with researchers working on knowledge acquisition and alignment to ensure that resulting technologies and algorithms meet the demands posed by information retrieval tasks. The goal of this workshop is to bring together a community of researchers and practitioners who are interested in using, aligning, and constructing knowledge graphs and similar semantic resources for information retrieval applications.
【Keywords】: entities; information retrieval; knowledge graphs
【Paper Link】 【Pages】:1429-1430
【Authors】: Leif Azzopardi ; Matt Crane ; Hui Fang ; Grant Ingersoll ; Jimmy Lin ; Yashar Moshfeghi ; Harrisen Scells ; Peilin Yang ; Guido Zuccon
【Abstract】: As an empirical discipline, information access and retrieval research requires substantial software infrastructure to index and search large collections. This workshop is motivated by the desire to better align information retrieval research with the practice of building search applications from the perspective of open-source information retrieval systems. Our goal is to promote the use of Lucene for information access and retrieval research.
【Keywords】: information retrieval toolkits; open-source software
【Paper Link】 【Pages】:1431-1432
【Authors】: Nick Craswell ; W. Bruce Croft ; Maarten de Rijke ; Jiafeng Guo ; Bhaskar Mitra
【Abstract】: In recent years, deep neural networks have yielded significant performance improvements in application areas such as speech recognition, computer vision, and machine translation. This has led to expectations in the information retrieval (IR) community that these novel machine learning approaches are likely to demonstrate a similar scale of breakthroughs on IR tasks within the next couple of years. In the Neu-IR (pronounced "new IR") 2016 workshop, however, there was a growing concern that the lack of availability of large scale training and evaluation datasets may be hindering the research community from making adequate progress in this area. It was also highlighted that the community would benefit from establishing a shared public repository of neural IR models and shared evaluation resources for better reproducibility and speed of experimentation. After the first successful Neu-IR workshop at SIGIR 2016, our goal this year will be to host a highly interactive full-day workshop to bring the neural IR community together to specifically address these key challenges facing this line of research. The workshop will request the community to submit proposals on generating large scale benchmark collections, building a shared model repository, and standardizing frameworks appropriate for evaluating deep neural network models. In addition, the workshop will provide a forum for the growing community of IR researchers to present their recent (published and unpublished) work involving (shallow or deep) neural network based approaches in an interactive poster session.
【Keywords】: deep learning; information retrieval; neural networks
【Paper Link】 【Pages】:1433-1434
【Authors】: Key-Sun Choi ; Teruko Mitamura ; Piek Vossen ; Jin-Dong Kim ; Axel-Cyrille Ngonga Ngomo
【Abstract】: Over the past years, several challenges and calls for research projects have pointed out the dire need for pushing natural language interfaces. In this context, the importance of Semantic Web data as a premier knowledge source is rapidly increasing. But we are still far from having accurate natural language interfaces that allow handling complex information needs in a user-centric and highly performant manner. The development of such interfaces requires collaboration of a range of different fields, including natural language processing, information extraction, knowledge base construction and population, reasoning, and question answering. With the goal to join forces in the collaborative development of natural language QA systems, the second OKBQA workshop is organized within the 40th SIGIR conference.
【Keywords】: knowledge base; natural language processing; question-answering