39. SIGIR 2016:Pisa, Italy

Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17-21, 2016. ACM 【DBLP Link

Paper Num: 234 || Session Num: 32

Keynote I 1

1. Understanding Human Language: Can NLP and Deep Learning Help?

Paper Link】 【Pages】:1

【Authors】: Christopher D. Manning

【Abstract】: There is a lot of overlap between the core problems of information retrieval (IR) and natural language processing (NLP). An IR system gains from understanding a user need and from understanding documents, and hence being able to determine whether a document has information that satisfies the user need. Much of NLP is about the same thing: Natural language understanding aims to understand the meaning of questions and documents and meaning relationships. The exciting recent application of deep learning approaches in NLP has brought new tools for effectively understanding language semantics. In principle, there should be a lot of synergy, though in practice the concerns of IR on large systems and macro-scale understanding have tended to contrast with the emphasis in NLP on language structure and micro-scale understanding. My talk will emphasize the two topics of how NLP can contribute to understanding textual relationships and how deep learning approaches substantially aid in this goal. One basic -- and very successful tool -- has been the new generation of distributed word representations: neural word embeddings. However, beyond just word meanings, we need to understand how to compose the meanings of larger pieces of text. Two requirements for that are good ways to understand the structure of human language utterances and ways to compose their meanings. Deep learning methods can help for both tasks. Finally, we need to understand relationships between pieces of text, to be able to do tasks such as Natural Language Inference (or Recognizing Textual Entailment) and Question Answering, and I will look at some of our recent work in these areas, both with and without the help of neural networks

【Keywords】: compositionality; deep learning; natural language inference; natural language processing; question answering; recognizing textual entailment; word vectors

Keynote II 1

2. Big Data in Climate: Opportunities and Challenges for Machine Learning.

Paper Link】 【Pages】:3

【Authors】: Vipin Kumar

【Abstract】: This talk will present an overview of research being done in a large interdisciplinary project on the development of novel data mining and machine learning approaches for analyzing massive amount of climate and ecosystem data now available from satellite and ground-based sensors, and physics-based climate model simulations. These information-rich data sets offer huge potential for monitoring, understanding, and predicting the behavior of the Earth's ecosystem and for advancing the science of global change. This talk will discuss challenges in analyzing such data sets and some of our research results in mapping the dynamics of surface water globally as well as detecting deforestation and fires in tropical forests using data from Earth observing satellites.

【Keywords】: big data; identifying tropical forest fires.; monitoring global water dynamics; predictive models for imperfect; incomplete; and heterogeneous data; rare class detection; spatio-temporal data

Evaluation I 3

3. Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015.

Paper Link】 【Pages】:5-14

【Authors】: Tetsuya Sakai

【Abstract】: We conducted a systematic review of 840 SIGIR full papers and 215 TOIS papers published between 2006 and 2015. The original objective of the study was to identify IR effectiveness experiments that are seriously underpowered (i.e., the sample size is far too small so that the probability of missing a real difference is extremely high) or overpowered (i.e., the sample size is so large that a difference will be considered statistically significant even if the actual effect size is extremely small). However, it quickly became clear to us that many IR effectiveness papers either lack significance testing or fail to report p-values and/or test statistics, which prevents us from conducting power analysis. Hence we first report on how IR researchers (fail to) report on significance test results, what types of tests they use, and how the reporting practices may have changed over the last decade. From those papers that reported enough information for us to conduct power analysis, we identify extremely overpowered and underpowered experiments, as well as appropriate sample sizes for future experiments. The raw results of our systematic survey of 1,055 papers and our R scripts for power analysis are available online. Our hope is that this study will help improve the reporting practices and experimental designs of future IR effectiveness studies.

【Keywords】: effect sizes; evaluation; power analysis; sample sizes; statistical power; statistical significance; systematic review

4. Bayesian Performance Comparison of Text Classifiers.

Paper Link】 【Pages】:15-24

【Authors】: Dell Zhang ; Jun Wang ; Emine Yilmaz ; Xiaoling Wang ; Yuxin Zhou

【Abstract】: How can we know whether one classifier is really better than the other? In the area of text classification, since the publication of Yang and Liu's seminal SIGIR-1999 paper, it has become a standard practice for researchers to apply null-hypothesis significance testing (NHST) on their experimental results in order to establish the superiority of a classifier. However, such a frequentist approach has a number of inherent deficiencies and limitations, e.g., the inability to accept the null hypothesis (that the two classifiers perform equally well), the difficulty to compare commonly-used multivariate performance measures like F1 scores instead of accuracy, and so on. In this paper, we propose a novel Bayesian approach to the performance comparison of text classifiers, and argue its advantages over the traditional frequentist approach based on t-test etc. In contrast to the existing probabilistic model for F1 scores which is unpaired, our proposed model takes the correlation between classifiers into account and thus achieves greater statistical power. Using several typical text classification algorithms and a benchmark dataset, we demonstrate that the our approach provides rich information about the difference between two classifiers' performances.

【Keywords】: bayesian inference; hypothesis testing; performance evaluation; text classification

5. A General Linear Mixed Models Approach to Study System Component Effects.

Paper Link】 【Pages】:25-34

【Authors】: Nicola Ferro ; Gianmaria Silvello

【Abstract】: Topic variance has a greater effect on performances than system variance but it cannot be controlled by system developers who can only try to cope with it. On the other hand, system variance is important on its own, since it is what system developers may affect directly by changing system components and it determines the differences among systems. In this paper, we face the problem of studying system variance in order to better understand how much system components contribute to overall performances. To this end, we propose a methodology based on General Linear Mixed Model (GLMM) to develop statistical models able to isolate system variance, component effects as well as their interaction by relying on a Grid of Points (GoP) containing all the combinations of analysed components. We apply the proposed methodology to the analysis of TREC Ad-hoc data in order to show how it works and discuss some interesting outcomes of this new kind of analysis. Finally, we extend the analysis to different evaluation measures, showing how they impact on the sources of variance.

【Keywords】: component effects; component-based evaluation; generalized linear mixed models; grid of points

Speech and Conversation Systems 3

6. Searching by Talking: Analysis of Voice Queries on Mobile Web Search.

Paper Link】 【Pages】:35-44

【Authors】: Ido Guy

【Abstract】: The growing popularity of mobile search and the advancement in voice recognition technologies have opened the door for web search users to speak their queries, rather than type them. While this kind of voice search is still in its infancy, it is gradually becoming more widespread. In this paper, we examine the logs of a commercial search engine's mobile interface, and compare the spoken queries to the typed-in queries. We place special emphasis on the semantic and syntactic characteristics of the two types of queries. %Our analysis suggests that voice queries focus more on audio-visual content and question answering, and less on social networking and adult domains. We also conduct an empirical evaluation showing that the language of voice queries is closer to natural language than typed queries. Our analysis reveals further differences between voice and text search, which have implications for the design of future voice-enabled search tools.

【Keywords】: mobile search; spoken queries; voice search

7. Predicting User Satisfaction with Intelligent Assistants.

Paper Link】 【Pages】:45-54

【Authors】: Julia Kiseleva ; Kyle Williams ; Ahmed Hassan Awadallah ; Aidan C. Crook ; Imed Zitouni ; Tasos Anastasakos

【Abstract】: There is a rapid growth in the use of voice-controlled intelligent personal assistants on mobile devices, such as Microsoft's Cortana, Google Now, and Apple's Siri. They significantly change the way users interact with search systems, not only because of the voice control use and touch gestures, but also due to the dialogue-style nature of the interactions and their ability to preserve context across different queries. Predicting success and failure of such search dialogues is a new problem, and an important one for evaluating and further improving intelligent assistants. While clicks in web search have been extensively used to infer user satisfaction, their significance in search dialogues is lower due to the partial replacement of clicks with voice control, direct and voice answers, and touch gestures. In this paper, we propose an automatic method to predict user satisfaction with intelligent assistants that exploits all the interaction signals, including voice commands and physical touch gestures on the device. First, we conduct an extensive user study to measure user satisfaction with intelligent assistants, and simultaneously record all user interactions. Second, we show that the dialogue style of interaction makes it necessary to evaluate the user experience at the overall task level as opposed to the query level. Third, we train a model to predict user satisfaction, and find that interaction signals that capture the user reading patterns have a high impact: when including all available interaction signals, we are able to improve the prediction accuracy of user satisfaction from 71% to 81% over a baseline that utilizes only click and query features.

【Keywords】: intelligent assistant; mobile search; spoken dialogue system; user experience; user satisfaction; user study

8. Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System.

Paper Link】 【Pages】:55-64

【Authors】: Rui Yan ; Yiping Song ; Hua Wu

【Abstract】: To establish an automatic conversation system between humans and computers is regarded as one of the most hardcore problems in computer science, which involves interdisciplinary techniques in information retrieval, natural language processing, artificial intelligence, etc. The challenges lie in how to respond so as to maintain a relevant and continuous conversation with humans. Along with the prosperity of Web 2.0, we are now able to collect extremely massive conversational data, which are publicly available. It casts a great opportunity to launch automatic conversation systems. Owing to the diversity of Web resources, a retrieval-based conversation system will be able to find at least some responses from the massive repository for any user inputs. Given a human issued message, i.e., query, our system would provide a reply after adequate training and learning of how to respond. In this paper, we propose a retrieval-based conversation system with the deep learning-to-respond schema through a deep neural network framework driven by web data. The proposed model is general and unified for different conversation scenarios in open domain. We incorporate the impact of multiple data inputs, and formulate various features and factors with optimization into the deep learning framework. In the experiments, we investigate the effectiveness of the proposed deep neural network structures with better combinations of all different evidence. We demonstrate significant performance improvement against a series of standard and state-of-art baselines in terms of p@1, MAP, nDCG, and MRR for conversational purposes.

【Keywords】: contextual modeling; conversation system; deep neural networks; learning-to-respond

Retrieval Models 3

9. Document Retrieval Using Entity-Based Language Models.

Paper Link】 【Pages】:65-74

【Authors】: Hadas Raviv ; Oren Kurland ; David Carmel

【Abstract】: We address the ad hoc document retrieval task by devising novel types of entity-based language models. The models utilize information about single terms in the query and documents as well as term sequences marked as entities by some entity-linking tool. The key principle of the language models is accounting, simultaneously, for the uncertainty inherent in the entity-markup process and the balance between using entity-based and term-based information. Empirical evaluation demonstrates the merits of using the language models for retrieval. For example, the performance transcends that of a state-of-the-art term proximity method. We also show that the language models can be effectively used for cluster-based document retrieval and query expansion.

【Keywords】: document retrieval; entity-based language models

10. Engineering Quality and Reliability in Technology-Assisted Review.

Paper Link】 【Pages】:75-84

【Authors】: Gordon V. Cormack ; Maura R. Grossman

【Abstract】: The objective of technology-assisted review ("TAR") is to find as much relevant information as possible with reasonable effort. Quality is a measure of the extent to which a TAR method achieves this objective, while reliability is a measure of how consistently it achieves an acceptable result. We are concerned with how to define, measure, and achieve high quality and high reliability in TAR. When quality is defined using the traditional goal-post method of specifying a minimum acceptable recall threshold, the quality and reliability of a TAR method are both, by definition, equal to the probability of achieving the threshold. Assuming this definition of quality and reliability, we show how to augment any TAR method to achieve guaranteed reliability, for a quantifiable level of additional review effort. We demonstrate this result by augmenting the TAR method supplied as the baseline model implementation for the TREC 2015 Total Recall Track, measuring reliability and effort for 555 topics from eight test collections. While our empirical results corroborate our claim of guaranteed reliability, we observe that the augmentation strategy may entail disproportionate effort, especially when the number of relevant documents is low. To address this limitation, we propose stopping criteria for the model implementation that may be applied with no additional review effort, while achieving empirical reliability that compares favorably to the provably reliable method. We further argue that optimizing reliability according to the traditional goal-post method is inconsistent with certain subjective aspects of quality, and that optimizing a Taguchi quality loss function may be more apt.

【Keywords】: continuous active learning; e-discovery; electronic discovery; predictive coding; quality; relevance feedback; reliability; systematic review; technology-assisted review; test collections

11. A Sequential Decision Formulation of the Interface Card Model for Interactive IR.

Paper Link】 【Pages】:85-94

【Authors】: Yinan Zhang ; ChengXiang Zhai

【Abstract】: The Interface Card model is a promising new theoretical framework for modeling and optimizing interactive retrieval interfaces, but how to systematically instantiate it to solve concrete interface optimization problems remains an open challenge. We propose a novel formulation of the Interface Card model based on sequential decision theory, leading to a general framework for formal modeling of user states and stopping actions. The proposed framework naturally connects optimization of interactive retrieval with Markov Decision Processes and Partially Observable Markov Decision Processes, and enables the use of reinforcement learning algorithms for optimizing interactive retrieval interfaces. Simulation and user study experiments demonstrate the effectiveness of the proposed model in automatically adjusting the interface layout in adaptation to inferred user stopping tendencies in addition to user interaction and screen size.

【Keywords】: interface card model

Learning-to-rank 3

12. Generalized BROOF-L2R: A General Framework for Learning to Rank Based on Boosting and Random Forests.

Paper Link】 【Pages】:95-104

【Authors】: Clebson C. A. de Sá ; Marcos André Gonçalves ; Daniel Xavier de Sousa ; Thiago Salles

【Abstract】: The task of retrieving information that really matters to the users is considered hard when taking into consideration the current and increasingly amount of available information. To improve the effectiveness of this information seeking task, systems have relied on the combination of many predictors by means of machine learning methods, a task also known as learning to rank (L2R). The most effective learning methods for this task are based on ensembles of tress (e.g., Random Forests) and/or boosting techniques (e.g., RankBoost, MART, LambdaMART). In this paper, we propose a general framework that smoothly combines ensembles of additive trees, specifically Random Forests, with Boosting in a original way for the task of L2R. In particular, we exploit out-of-bag samples as well as a selective weight updating strategy (according to the out-of-bag samples) to effectively enhance the ranking performance. We instantiate such a general framework by considering different loss functions, different ways of weighting the weak learners as well as different types of weak learners. In our experiments our rankers were able to outperform all state-of-the-art baselines in all considered datasets, using just a small percentage of the original training set and faster convergence rates.

【Keywords】: boosting; learning to rank; random forests

13. An Optimization Framework for Remapping and Reweighting Noisy Relevance Labels.

Paper Link】 【Pages】:105-114

【Authors】: Yury Ustinovskiy ; Valentina Fedorova ; Gleb Gusev ; Pavel Serdyukov

【Abstract】: Relevance labels is the essential part of any learning to rank framework. The rapid development of crowdsourcing platforms led to a significant reduction of the cost of manual labeling. This makes it possible to collect very large sets of labeled documents to train a ranking algorithm. However, relevance labels acquired via crowdsourcing are typically coarse and noisy, so certain consensus models are used to measure the quality of labels and to reduce the noise. This noise is likely to affect a ranker trained on such labels, and, since none of the existing consensus models directly optimizes ranking quality, one has to apply some heuristics to utilize the output of a consensus model in a ranking algorithm, e.g., to use majority voting among workers to get consensus labels. The major goal of this paper is to unify existing approaches to consensus modeling and noise reduction within a learning to rank framework. Namely, we present a machine learning algorithm aimed at improving the performance of a ranker trained on a crowdsourced dataset by proper remapping of labels and reweighting of samples. In the experimental part, we use several characteristics of workers/labels extracted via various consensus models in order to learn the remapping and reweighting functions. Our experiments on a large-scale dataset demonstrate that we can significantly improve state-of-the-art machine-learning algorithms by incorporating our framework.

【Keywords】: IR theory; consensus models; learning to rank

14. Learning to Rank with Selection Bias in Personal Search.

Paper Link】 【Pages】:115-124

【Authors】: Xuanhui Wang ; Michael Bendersky ; Donald Metzler ; Marc Najork

【Abstract】: Click-through data has proven to be a critical resource for improving search ranking quality. Though a large amount of click data can be easily collected by search engines, various biases make it difficult to fully leverage this type of data. In the past, many click models have been proposed and successfully used to estimate the relevance for individual query-document pairs in the context of web search. These click models typically require a large quantity of clicks for each individual pair and this makes them difficult to apply in systems where click data is highly sparse due to personalized corpora and information needs, e.g., personal search. In this paper, we study the problem of how to leverage sparse click data in personal search and introduce a novel selection bias problem and address it in the learning-to-rank framework. This paper proposes a few bias estimation methods, including a novel query-dependent one that captures queries with similar results and can successfully deal with sparse data. We empirically demonstrate that learning-to-rank that accounts for query-dependent selection bias yields significant improvements in search effectiveness through online experiments with one of the world's largest personal search engines.

【Keywords】: learning-to-rank; personal search; selection bias

Music and Math 3

15. On Effective Personalized Music Retrieval by Exploring Online User Behaviors.

Paper Link】 【Pages】:125-134

【Authors】: Zhiyong Cheng ; Jialie Shen ; Steven C. H. Hoi

【Abstract】: In this paper, we study the problem of personalized text based music retrieval which takes users' music preferences on songs into account via the analysis of online listening behaviours and social tags. Towards the goal, a novel Dual-Layer Music Preference Topic Model (DL-MPTM) is proposed to construct latent music interest space and characterize the correlations among (user, song, term). Based on the DL-MPTM, we further develop an effective personalized music retrieval system. To evaluate the system's performance, extensive experimental studies have been conducted over two test collections to compare the proposed method with the state-of-the-art music retrieval methods. The results demonstrate that our proposed method significantly outperforms those approaches in terms of personalized search accuracy.

【Keywords】: personalized; semantic music retrieval; topic model

16. Semantification of Identifiers in Mathematics for Better Math Information Retrieval.

Paper Link】 【Pages】:135-144

【Authors】: Moritz Schubotz ; Alexey Grigorev ; Marcus Leich ; Howard S. Cohl ; Norman Meuschke ; Bela Gipp ; Abdou S. Youssef ; Volker Markl

【Abstract】: Mathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we disambiguate mathematical identifiers. By regarding formulae and natural text as one monolithic information source, we are able to extract the semantics of identifiers in a process we term Mathematical Language Processing (MLP). As scientific communities tend to establish standard (identifier) notations, we use the document domain to infer the actual meaning of an identifier. Therefore, we adapt the software development concept of namespaces to mathematical notation. Thus, we learn namespace definitions by clustering the MLP results and mapping those clusters to subject classification schemata. In addition, this gives fundamental insights into the usage of mathematical notations in science, technology, engineering and mathematics. Our gold standard based evaluation shows that MLP extracts relevant identifier-definitions. Moreover, we discover that identifier namespaces improve the performance of automated identifier-definition extraction, and elevate it to a level that cannot be achieved within the document context alone.

【Keywords】: MIR; MLP; definitions; identifiers; mathematical information retrieval; mathematical knowledge management; mathematical language processing; mathematics; mathoid; mathosphere; namespace discovery; wikipedia

17. Multi-Stage Math Formula Search: Using Appearance-Based Similarity Metrics at Scale.

Paper Link】 【Pages】:145-154

【Authors】: Richard Zanibbi ; Kenny Davila ; Andrew Kane ; Frank Wm. Tompa

【Abstract】: When using a mathematical formula for search (query-by-expression), the suitability of retrieved formulae often depends more upon symbol identities and layout than deep mathematical semantics. Using a Symbol Layout Tree representation for formula appearance, we propose the Maximum Subtree Similarity (MSS) for ranking formulae based upon the subexpression whose symbols and layout best match a query formula. Because MSS is too expensive to apply against a complete collection, the Tangent-3 system first retrieves expressions using an inverted index over symbol pair relationships, ranking hits using the Dice coefficient; the top-k formulae are then re-ranked by MSS. Tangent-3 obtains state-of-the-art performance on the NTCIR-11 Wikipedia formula retrieval benchmark, and is efficient in terms of both space and time. Retrieval systems for other graphical forms, including chemical diagrams, flowcharts, figures, and tables, may benefit from adopting this approach.

【Keywords】: inverted index; mathematical information retrieval (MIR); query-by-expression; subtree similarity

Microblog 3

18. Explainable User Clustering in Short Text Streams.

Paper Link】 【Pages】:155-164

【Authors】: Yukun Zhao ; Shangsong Liang ; Zhaochun Ren ; Jun Ma ; Emine Yilmaz ; Maarten de Rijke

【Abstract】: User clustering has been studied from different angles: behavior-based, to identify similar browsing or search patterns, and content-based, to identify shared interests. Once user clusters have been found, they can be used for recommendation and personalization. So far, content-based user clustering has mostly focused on static sets of relatively long documents. Given the dynamic nature of social media, there is a need to dynamically cluster users in the context of short text streams. User clustering in this setting is more challenging than in the case of long documents as it is difficult to capture the users' dynamic topic distributions in sparse data settings. To address this problem, we propose a dynamic user clustering topic model (or UCT for short). UCT adaptively tracks changes of each user's time-varying topic distribution based both on the short texts the user posts during a given time period and on the previously estimated distribution. To infer changes, we propose a Gibbs sampling algorithm where a set of word-pairs from each user is constructed for sampling. The clustering results are explainable and human-understandable, in contrast to many other clustering algorithms. For evaluation purposes, we work with a dataset consisting of users and tweets from each user. Experimental results demonstrate the effectiveness of our proposed clustering model compared to state-of-the-art baselines.

【Keywords】: short text processing; user clustering; user topic modeling

19. Topic Modeling for Short Texts with Auxiliary Word Embeddings.

Paper Link】 【Pages】:165-174

【Authors】: Chenliang Li ; Haoran Wang ; Zhiqian Zhang ; Aixin Sun ; Zongyang Ma

【Abstract】: For many applications that require semantic understanding of short texts, inferring discriminative and coherent latent topics from short texts is a critical and fundamental task. Conventional topic models largely rely on word co-occurrences to derive topics from a collection of documents. However, due to the length of each document, short texts are much more sparse in terms of word co-occurrences. Data sparsity therefore becomes a bottleneck for conventional topic models to achieve good results on short texts. On the other hand, when a human being interprets a piece of short text, the understanding is not solely based on its content words, but also her background knowledge (e.g., semantically related words). The recent advances in word embedding offer effective learning of word semantic relations from a large corpus. Exploiting such auxiliary word embeddings to enrich topic modeling for short texts is the main focus of this paper. To this end, we propose a simple, fast, and effective topic model for short texts, named GPU-DMM. Based on the Dirichlet Multinomial Mixture (DMM) model, GPU-DMM promotes the semantically related words under the same topic during the sampling process by using the generalized Polya urn (GPU) model. In this sense, the background knowledge about word semantic relatedness learned from millions of external documents can be easily exploited to improve topic modeling for short texts. Through extensive experiments on two real-world short text collections in two languages, we show that GPU-DMM achieves comparable or better topic representations than state-of-the-art models, measured by topic coherence. The learned topic representation leads to the best accuracy in text classification task, which is used as an indirect evaluation.

【Keywords】: short texts; topic model; word embeddings

20. Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams.

Paper Link】 【Pages】:175-184

【Authors】: Xin Qian ; Jimmy J. Lin ; Adam Roegiest

【Abstract】: We propose and validate a novel interleaved evaluation methodology for two complementary information seeking tasks on document streams: retrospective summarization and prospective notification. In the first, the user desires relevant and non-redundant documents that capture important aspects of an information need. In the second, the user wishes to receive timely, relevant, and non-redundant update notifications for a standing information need. Despite superficial similarities, interleaved evaluation methods for web ranking cannot be directly applied to these tasks; for example, existing techniques do not account for temporality or redundancy. Our proposed evaluation methodology consists of two components: a temporal interleaving strategy and a heuristic for credit assignment to handle redundancy. By simulating user interactions with interleaved results on submitted runs to the TREC 2014 tweet timeline generation (TTG) task and the TREC 2015 real-time filtering task, we demonstrate that our methodology yields system comparisons that accurately match the result of batch evaluations. Analysis further reveals weaknesses in current batch evaluation methodologies to suggest future directions for research.

【Keywords】: TREC; microblogs; push notifications; summarization; tweets

21. Learning Query and Document Relevance from a Web-scale Click Graph.

Paper Link】 【Pages】:185-194

【Authors】: Shan Jiang ; Yuening Hu ; Changsung Kang ; Tim Daly Jr. ; Dawei Yin ; Yi Chang ; ChengXiang Zhai

【Abstract】: Click-through logs over query-document pairs provide rich and valuable information for multiple tasks in information retrieval. This paper proposes a vector propagation algorithm on the click graph to learn vector representations for both queries and documents in the same semantic space. The proposed approach incorporates both click and content information, and the produced vector representations can directly improve ranking performance for queries and documents that have been observed in the click log. For new queries and documents that are not in the click log, we propose a two-step framework to generate the vector representation, which significantly improves the coverage of our vectors while maintaining the high quality. Experiments on Web-scale search logs from a major commercial search engine demonstrate the effectiveness and scalability of the proposed method. Evaluation results show that NDCG scores are significantly improved against multiple baselines by using the proposed method both as a ranking model and as a feature in a learning-to-rank framework.

【Keywords】: click-through bipartite graph; query-document relevance; vector generation; vector propagation; web search

22. Click-based Hot Fixes for Underperforming Torso Queries.

Paper Link】 【Pages】:195-204

【Authors】: Masrour Zoghi ; Tomás Tunys ; Lihong Li ; Damien Jose ; Junyan Chen ; Chun Ming Chin ; Maarten de Rijke

【Abstract】: Ranking documents using their historical click-through rate (CTR) can improve relevance for frequently occurring queries, i.e., so-called head queries. It is difficult to use such click signals on non-head queries as they receive fewer clicks. In this paper, we address the challenge of dealing with torso queries on which the production ranker is performing poorly. Torso queries are queries that occur frequently enough so that they are not considered as tail queries and yet not frequently enough to be head queries either. They comprise a large portion of most commercial search engines' traffic, so the presence of a large number of underperforming torso queries can harm the overall performance significantly. We propose a practical method for dealing with such cases, drawing inspiration from the literature on learning to rank (LTR). Our method requires relatively few clicks from users to derive a strong re-ranking signal by comparing document relevance between pairs of documents instead of using absolute numbers of clicks per document. By infusing a modest amount of exploration into the ranked lists produced by a production ranker and extracting preferences between documents, we obtain substantial improvements over the production ranker in terms of page-level online metrics. We use an exploration dataset consisting of real user clicks from a large-scale commercial search engine to demonstrate the effectiveness of the method. We conduct further experimentation on public benchmark data using simulated clicks to gain insight into the inner workings of the proposed method. Our results indicate a need for LTR methods that make more explicit use of the query and other contextual information.

【Keywords】: learning to rank; underperforming queries

23. A Context-aware Time Model for Web Search.

Paper Link】 【Pages】:205-214

【Authors】: Alexey Borisov ; Ilya Markov ; Maarten de Rijke ; Pavel Serdyukov

【Abstract】: In web search, information about times between user actions has been shown to be a good indicator of users' satisfaction with the search results. Existing work uses the mean values of the observed times, or fits probability distributions to the observed times. This implies a context-independence assumption that the time elapsed between a pair of user actions does not depend on the context, in which the first action takes place. We validate this assumption using logs of a commercial web search engine and discover that it does not always hold. For between 37% to 80% of query-result pairs, depending on the number of observations, the distributions of click dwell times have statistically significant differences in query sessions for which a given result (i) is the first item to be clicked and (ii) is not the first. To account for this context bias effect, we propose a context-aware time model (CATM). The CATM allows us (i) to predict times between user actions in contexts, in which these actions were not observed, and (ii) to compute context-independent estimates of the times by predicting them in predefined contexts. Our experimental results show that the CATM provides better means than existing methods to predict and interpret times between user actions.

【Keywords】: time modeling; user behavior; web search

Question Answering 3

24. Novelty based Ranking of Human Answers for Community Questions.

Paper Link】 【Pages】:215-224

【Authors】: Adi Omari ; David Carmel ; Oleg Rokhlenko ; Idan Szpektor

【Abstract】: Questions and their corresponding answers within a community based question answering (CQA) site are frequently presented as top search results forWeb search queries and viewed by millions of searchers daily. The number of answers for CQA questions ranges from a handful to dozens, and a searcher would be typically interested in the different suggestions presented in various answers for a question. Yet, especially when many answers are provided, the viewer may not want to sift through all answers but to read only the top ones. Prior work on answer ranking in CQA considered the qualitative notion of each answer separately, mainly whether it should be marked as best answer. We propose to promote CQA answers not only by their relevance to the question but also by the diversification and novelty qualities they hold compared to other answers. Specifically, we aim at ranking answers by the amount of new aspects they introduce with respect to higher ranked answers (novelty), on top of their relevance estimation. This approach is common in Web search and information retrieval, yet it was not addressed within the CQA settings before, which is quite different from classic document retrieval. We propose a novel answer ranking algorithm that borrows ideas from aspect ranking and multi-document summarization, but adapts them to our scenario. Answers are ranked in a greedy manner, taking into account their relevance to the question as well as their novelty compared to higher ranked answers and their coverage of important aspects. An experiment over a collection of Health questions, using a manually annotated gold-standard dataset, shows that considering novelty for answer ranking improves the quality of the ranked answer list.

【Keywords】: community-based question answering; diversification; novelty

25. That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search.

Paper Link】 【Pages】:225-234

【Authors】: Boaz Petersil ; Avihai Mejer ; Idan Szpektor ; Koby Crammer

【Abstract】: A fundamental task in Information Retrieval (IR) is term weighting. Early IR theory considered both the presence or absence of all terms in the lexicon for ranking and needed to weight them all. Yet, as the size of lexicons grew and models became too complex, common weighting models preferred to aggregate only the weights of the query terms that are matched in candidate documents. Thus, unmatched term contribution in these models is only considered indirectly, such as in probability smoothing with corpus distribution, or in weight normalization by document length. In this work we propose a novel term weighting model that directly assesses the weights of unmatched terms, and show its benefits. Specifically, we propose a Learning To Rank framework, in which features corresponding to matched terms are also "mirrored" in similar features that account only for unmatched terms. The relative importance of each feature is learned via a click-through query log. As a test case, we consider vertical search in Community-based Question Answering(CQA) sites from Web queries. Queries that result in viewing CQA content often contain fine grained information needs and benefit more from unmatched term weighting. We assess our model both via manual evaluation and via automatic evaluation over a clickthrough log. Our results show consistent improvement in retrieval when unmatched information is taken into account. This holds both when only identical terms are considered matched, and when related terms are matched via distributional similarity.

【Keywords】: community-based question answering; document ranking; unmatched terms

26. When a Knowledge Base Is Not Enough: Question Answering over Knowledge Bases with External Text Data.

Paper Link】 【Pages】:235-244

【Authors】: Denis Savenkov ; Eugene Agichtein

【Abstract】: One of the major challenges for automated question answering over Knowledge Bases (KBQA) is translating a natural language question to the Knowledge Base (KB) entities and predicates. Previous systems have used a limited amount of training data to learn a lexicon that is later used for question answering. This approach does not make use of other potentially relevant text data, outside the KB, which could supplement the available information. We introduce a new system, Text2KB, that enriches question answering over a knowledge base by using external text data. Specifically, we revisit different phases in the KBQA process and demonstrate that text resources improve question interpretation, candidate generation and ranking. Building on a state-of-the-art traditional KBQA system, Text2KB utilizes web search results, community question answering and general text document collection data, to detect question topic entities, map question phrases to KB predicates, and to enrich the features of the candidates derived from the KB. Text2KB significantly improves performance over the baseline KBQA method, as measured on a popular WebQuestions dataset. The results and insights developed in this work can guide future efforts on combining textual and structured KB data for question answering.

【Keywords】: knowledge bases; question answering

Learning 3

27. Transfer Learning for Cross-Lingual Sentiment Classification with Weakly Shared Deep Neural Networks.

Paper Link】 【Pages】:245-254

【Authors】: Guangyou Zhou ; Zhao Zeng ; Jimmy Xiangji Huang ; Tingting He

【Abstract】: Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of source language data and that of target language data. To address this challenge, previous studies have been performed to make use of the translated resources for sentiment classification in the target language, and the classification performance is far from satisfactory because of the language gap between the source language and the translated target language. In this paper, to address the above challenge, we present a novel deep neural network structure, called Weakly Shared Deep Neural Networks (WSDNNs), to transfer the cross-lingual information from a source language to a target language. To share the sentiment labels between two languages, we build multiple weakly shared layers of features. It allows to represent both shared inter-language features and language-specific ones, making this structure more flexible and powerful in capturing the feature representations of bilingual languages jointly. We conduct a set of experiments with cross-lingual sentiment classification tasks on multilingual Amazon product reviews. The empirical results show that our proposed approach significantly outperforms the state-of-the-art methods for cross-lingual sentiment classification, especially when label data is scarce.

【Keywords】: auto-encoders; cross-lingual; sentiment classification

28. Query to Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor Grammars.

Paper Link】 【Pages】:255-264

【Authors】: Ke Zhai ; Zornitsa Kozareva ; Yuening Hu ; Qi Li ; Weiwei Guo

【Abstract】: Web search queries provide a surprisingly large amount of information, which can be potentially organized and converted into a knowledgebase. In this paper, we focus on the problem of automatically identifying brand and product entities from a large collection of web queries in online shopping domain. We propose an unsupervised approach based on adaptor grammars that does not require any human annotation efforts nor rely on any external resources. To reduce the noise and normalize the query patterns, we introduce a query standardization step, which groups multiple search patterns and word orderings together into their most frequent ones. We present three different sets of grammar rules used to infer query structures and extract brand and product entities. To give an objective assessment of the performance of our approach, we conduct experiments on a large collection of online shopping queries and intrinsically evaluate the knowledgebase generated by our method qualitatively and quantitatively. In addition, we also evaluate our framework on extrinsic tasks on query tagging and chunking. Our empirical studies show that the knowledgebase discovered by our approach is highly accurate, has good coverage and significantly improves the performance on the external tasks.

【Keywords】: Bayesian nonparametrics; adaptor grammars; context-free grammar; knowledgebase; named entity recognition; query processing

29. Learning for Efficient Supervised Query Expansion via Two-stage Feature Selection.

Paper Link】 【Pages】:265-274

【Authors】: Zhiwei Zhang ; Qifan Wang ; Luo Si ; Jianfeng Gao

【Abstract】: Query expansion (QE) is a well known technique to improve retrieval effectiveness, which expands original queries with extra terms that are predicted to be relevant. A recent trend in the literature is Supervised Query Expansion (SQE), where supervised learning is introduced to better select expansion terms. However, an important but neglected issue for SQE is its efficiency, as applying SQE in retrieval can be much more time-consuming than applying Unsupervised Query Expansion (UQE) algorithms. In this paper, we point out that the cost of SQE mainly comes from term feature extraction, and propose a Two-stage Feature Selection framework (TFS) to address this problem. The first stage is adaptive expansion decision, which determines if a query is suitable for SQE or not. For unsuitable queries, SQE is skipped and no term features are extracted at all, which reduces the most time cost. For those suitable queries, the second stage is cost constrained feature selection, which chooses a subset of effective yet inexpensive features for supervised learning. Extensive experiments on four corpora (including three academic and one industry corpus) show that our TFS framework can substantially reduce the time cost for SQE, while maintaining its effectiveness.

【Keywords】: efficiency; query expansion; supervised learning

Efficiency I 3

30. Leveraging Context-Free Grammar for Efficient Inverted Index Compression.

Paper Link】 【Pages】:275-284

【Authors】: Zhaohua Zhang ; Jiancong Tong ; Haibing Huang ; Jin Liang ; Tianlong Li ; Rebecca J. Stones ; Gang Wang ; Xiaoguang Liu

【Abstract】: Large-scale search engines need to answer thousands of queries per second over billions of documents, which is typically done by querying a large inverted index. Many highly optimized integer encoding techniques are applied to compress the inverted index and reduce the query processing time. In this paper, we propose a new grammar-based inverted index compression scheme, which can improve the performance of both index compression and query processing. Our approach identifies patterns (common subsequences of docIDs) among different posting lists and generates a context-free grammar to succinctly represent the inverted index. To further optimize the compression performance, we carefully redesign the index structure. Experiments show a reduction up to 8.8% in space usage while decompression is up to 14% faster. We also design an efficient list intersection algorithm which utilizes the proposed grammar-based inverted index. We show that our scheme can be combined with common docID reassignment methods and encoding techniques, and yields about 14% to 27% higher throughput for AND queries by utilizing multiple threads.

【Keywords】: context-free grammar; inverted index compression; query processing

31. Fast and Compact Hamming Distance Index.

Paper Link】 【Pages】:285-294

【Authors】: Simon Gog ; Rossano Venturini

【Abstract】: Searching for similar objects in a collection is a core task of many applications in databases, pattern recognition, and information retrieval. As there exist similarity-preserving hash functions like SimHash, indexing these objects reduces to the solution of the Approximate Dictionary Queries problem. In this problem we have to index a collection of fixed-sized keys to efficiently retrieve all the keys which are at a Hamming distance at most k from a query key. In this paper we propose new solutions for the approximate dictionary queries problem. These solutions combine the use of succinct data structures with an efficient representation of the keys to significantly reduce the space usage of the state-of-the-art solutions without introducing any time penalty. Finally, by exploiting triangle inequality, we can also significantly speed up the query time of the existing solutions.

【Keywords】: hamming distance; indexing

32. Fast First-Phase Candidate Generation for Cascading Rankers.

Paper Link】 【Pages】:295-304

【Authors】: Qi Wang ; Constantinos Dimopoulos ; Torsten Suel

【Abstract】: Current search engines use very complex ranking functions based on hundreds of features. While such functions return high-quality results, they create efficiency challenges as it is too costly to fully evaluate them on all documents in the union, or even intersection, of the query terms. To address this issue, search engines use a series of cascading rankers, starting with a very simple ranking function and then applying increasingly complex and expensive ranking functions on smaller and smaller sets of candidate results. Researchers have recently started studying several problems within this framework of query processing by cascading rankers; see, e.g., [5, 13, 17, 51]. We focus on one such problem, the design of the initial cascade. Thus, the goal is to very quickly identify a set of good candidate documents that should be passed to the second and further cascades. Previous work by Asadi and Lin [3, 5] showed that while a top-k computation on either the union or intersection gives good results, a further optimization using a global document ordering based on spam scores leads to a significant reduction in quality. Our contribution is to propose an alternative framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and a set of early termination techniques. Using an end-to-end evaluation with a complex machine-learned ranker, we show that our approach finds candidates about an order of magnitude faster than a conjunctive top-k computation, while essentially matching the quality.

【Keywords】: candidate generation; cascading ranking; complex ranking function; inverted index; learning to rank; query processing; term pairs; top-k computation

Recommendation Systems I 3

33. Learning to Rank Features for Recommendation over Multiple Categories.

Paper Link】 【Pages】:305-314

【Authors】: Xu Chen ; Zheng Qin ; Yongfeng Zhang ; Tao Xu

【Abstract】: Incorporating phrase-level sentiment analysis on users' textual reviews for recommendation has became a popular meth-od due to its explainable property for latent features and high prediction accuracy. However, the inherent limitations of the existing model make it difficult to (1) effectively distinguish the features that are most interesting to users, (2) maintain the recommendation performance especially when the set of items is scaled up to multiple categories, and (3) model users' implicit feedbacks on the product features. In this paper, motivated by these shortcomings, we first introduce a tensor matrix factorization algorithm to Learn to Rank user Preferences based on Phrase-level sentiment analysis across Multiple categories (LRPPM for short), and then by combining this technique with Collaborative Filtering (CF) method, we propose a novel model called LRPPM-CF to boost the performance of recommendation. Thorough experiments on two real-world datasets demonstrate that our proposed model is able to improve the performance in the tasks of capturing users' interested features and item recommendation by about 17%-24% and 7%-13%, respectively, as compared with several state-of-the-art methods.

【Keywords】: collaborative filtering; recommender systems; sentiment analysis; tensor factorization

34. How Much Novelty is Relevant?: It Depends on Your Curiosity.

Paper Link】 【Pages】:315-324

【Authors】: Pengfei Zhao ; Dik Lun Lee

【Abstract】: Traditional recommendation systems (RS's) aim to recommend items that are relevant to the user's interest. Unfortunately, the recommended items will soon become too familiar to the user and hence fail to arouse her interest. Discovery-oriented recommendation systems (DORS's) complement accuracy with "discover utilities" (DU's) such as novelty and diversity and optimize the tradeoff between the DU's and accuracy of the recommendations. Unfortunately, DORS's ignore an important fact that different users have different appetites for DU's. That is, highly curious users can accept highly novel and diversified recommendations whereas conservative users would behave in the opposite manner. In this paper, we propose a curiosity-based recommendation system (CBRS) framework which generates recommendations with a personalized amount of DU's to fit the user's curiosity level. The major contribution of this paper is a computational model of user curiosity, called Probabilistic Curiosity Model (PCM), which is based on the curiosity arousal theory and Wundt curve in psychology research. In PCM, we model a user's curiosity with a curiosity distribution function learnt from the user's access history and compute a curiousness score for each item representing how curious the user is about the item. CBRS then selects items which are both relevant and have high curiousness score, bounded by the constraint that the amount of DU's fits the user's DU appetite. We use joint optimization and co-factorization approaches to incorporate the curiosity signal into the recommendations. Extensive experiments have been performed to evaluate the performance of CBRS against the baselines using a music dataset from last.fm. The results show that compared to the baselines CBRS not only provides more personalized recommendations that adapt to the user's curiosity level but also improves the recommendation accuracy.

【Keywords】: curiosity; novelty; personalization; psychology; recommendation

35. Discrete Collaborative Filtering.

Paper Link】 【Pages】:325-334

【Authors】: Hanwang Zhang ; Fumin Shen ; Wei Liu ; Xiangnan He ; Huanbo Luan ; Tat-Seng Chua

【Abstract】: We address the efficiency problem of Collaborative Filtering (CF) by hashing users and items as latent vectors in the form of binary codes, so that user-item affinity can be efficiently calculated in a Hamming space. However, existing hashing methods for CF employ binary code learning procedures that most suffer from the challenging discrete constraints. Hence, those methods generally adopt a two-stage learning scheme composed of relaxed optimization via discarding the discrete constraints, followed by binary quantization. We argue that such a scheme will result in a large quantization loss, which especially compromises the performance of large-scale CF that resorts to longer binary codes. In this paper, we propose a principled CF hashing framework called Discrete Collaborative Filtering (DCF), which directly tackles the challenging discrete optimization that should have been treated adequately in hashing. The formulation of DCF has two advantages: 1) the Hamming similarity induced loss that preserves the intrinsic user-item similarity, and 2) the balanced and uncorrelated code constraints that yield compact yet informative binary codes. We devise a computationally efficient algorithm with a rigorous convergence proof of DCF. Through extensive experiments on several real-world benchmarks, we show that DCF consistently outperforms state-of-the-art CF hashing techniques, e.g, though using only 8 bits, DCF is even significantly better than other methods using 128 bits.

【Keywords】: collaborative filtering; discrete hashing; recommendation

User Needs 3

36. Understanding Information Need: An fMRI Study.

Paper Link】 【Pages】:335-344

【Authors】: Yashar Moshfeghi ; Peter Triantafillou ; Frank E. Pollick

【Abstract】: The raison d'etre of IR is to satisfy human information need. But, do we really understand information need? Despite advances in the past few decades in both the IR and relevant scientific communities, this question is largely unanswered. We do not really understand how an information need emerges and how it is physically manifested. Information need refers to a complex concept: at the very initial state of the phenomenon (i.e. at a visceral level), even the searcher may not be aware of its existence. This renders the measuring of this concept (using traditional behaviour studies) nearly impossible. In this paper, we investigate the connection between an information need and brain activity. Using functional Magnetic Resonance Imaging (fMRI), we measured the brain activity of twenty four participants while they performed a Question Answering (Q/A) Task, where the questions were carefully selected and developed from TREC-8 and TREC 2001 Q/A Track. The results of this experiment revealed a distributed network of brain regions commonly associated with activities related to information need and retrieval and differing brain activity in processing scenarios when participants knew the answer to a given question and when they did not and needed to search. We believe our study and conclusions constitute an important step in unravelling the nature of information need and therefore better satisfying it.

【Keywords】: anomalous states of knowledge; fmri study; information need; information retrieval

37. User Behavior in Asynchronous Slow Search.

Paper Link】 【Pages】:345-354

【Authors】: Ryan Burton ; Kevyn Collins-Thompson

【Abstract】: Conventional Web search is predicated on returning results to users as quickly as possible. However, for some search tasks, users have reported a willingness to wait for the perfect set of results. In this work, we present the first study to analyze users' willingness to wait and their search success, when given a Web search system that embodies characteristics of slow search, where speed can be traded for an improvement in quality. We conducted a between-subjects user study involving tasks that required multiple queries to complete, providing a Web search system that gave users the option to additionally issue asynchronous queries for which results improve in relevance over time as users continued working. We analyze the resulting survey results and interaction log data to investigate how users spent their time while waiting, and how behavior and search outcomes changes when users are given the option of using a system with asynchronous slow search capabilities. We find that when given a slow search system, users are able to perceive the improvement in quality over time, and find tasks to be easier compared to a baseline conventional Web search system. Additionally, we find that users continue to issue their own queries and examine additional documents while the slow search queries are processed in the background, and use the slow search feature more effectively as they gain exposure to its behavior across tasks. Our study significantly advances our understanding of the benefits and tradeoffs involved in providing slow search scenarios for Web search.

【Keywords】: interactive information retrieval; search behavior; slow search; user interfaces

38. Going back in Time: An Investigation of Social Media Re-finding.

Paper Link】 【Pages】:355-364

【Authors】: Florian Meier ; David Elsweiler

【Abstract】: Social Media (SM) has become a valuable information source to many in diverse situations. In IR, research has focused on real-time aspects and as such little is known about how long SM content is of value to users, if and how often it is re-accessed, the strategies people employ to re-access and if difficulties are experienced while doing so. We present results from a 5 month-long naturalistic, log-based study of user interaction with Twitter, which suggest re-finding to be a regular activity and that Tweets can offer utility for longer than one might think. We shed light on re-finding strategies revealing that remembered people are used as a stepping stone to Tweets rather than searching for content directly. Bookmarking strategies reported in the literature are used infrequently as a means to re-access. Finally, we show that by using statistical modelling it is possible to predict if a Tweet has future utility and is likely to be re-found. Our findings have implications for the design of social media search systems and interfaces, in particular for Twitter, to better support users re-find previously seen content.

【Keywords】: clickstream analysis; re-finding; twitter

Privacy, Advertising, and Products 3

39. R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities.

Paper Link】 【Pages】:365-374

【Authors】: Joanna Asia Biega ; Krishna P. Gummadi ; Ida Mele ; Dragan Milchevski ; Christos Tryfonopoulos ; Gerhard Weikum

【Abstract】: Privacy of Internet users is at stake because they expose personal information in posts created in online communities, in search queries, and other activities. An adversary that monitors a community may identify the users with the most sensitive properties and utilize this knowledge against them (e.g., by adjusting the pricing of goods or targeting ads of sensitive nature). Existing privacy models for structured data are inadequate to capture privacy risks from user posts. This paper presents a ranking-based approach to the assessment of privacy risks emerging from textual contents in online communities, focusing on sensitive topics, such as being depressed. We propose ranking as a means of modeling a rational adversary who targets the most afflicted users. To capture the adversary's background knowledge regarding vocabulary and correlations, we use latent topic models. We cast these considerations into the new model of R-Susceptibility, which can inform and alert users about their potential for being targeted, and devise measures for quantitative risk assessment. Experiments with real-world data show the feasibility of our approach.

【Keywords】: online communities; privacy; privacy models; privacy risks; sensitive states; sensitive topics; susceptibility; user ranking

Paper Link】 【Pages】:375-384

【Authors】: Mihajlo Grbovic ; Nemanja Djuric ; Vladan Radosavljevic ; Fabrizio Silvestri ; Ricardo A. Baeza-Yates ; Andrew Feng ; Erik Ordentlich ; Lee Yang ; Gavin Owens

【Abstract】: Sponsored search represents a major source of revenue for web search engines. The advertising model brings a unique possibility for advertisers to target direct user intent communicated through a search query, usually done by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries, it is particularly challenging for advertisers to identify all relevant queries. For this reason search engines often provide a service of advanced matching, which automatically finds additional relevant queries for advertisers to bid on. We present a novel advance match approach based on the idea of semantic embeddings of queries and ads. The embeddings were learned using a large data set of user search sessions, consisting of search queries, clicked ads and search links, while utilizing contextual information such as dwell time and skipped ads. To address the large-scale nature of our problem, both in terms of data and vocabulary size, we propose a novel distributed algorithm for training of the embeddings. Finally, we present an approach for overcoming a cold-start problem associated with new ads and queries. We report results of editorial evaluation and online tests on actual search traffic. The results show that our approach significantly outperforms baselines in terms of relevance, coverage and incremental revenue. Lastly, as part of this study, we open sourced query embeddings that can be used to advance the field.

【Keywords】: ad retrieval; sponsored search; word embeddings

41. Retrieving Non-Redundant Questions to Summarize a Product Review.

Paper Link】 【Pages】:385-394

【Authors】: Mengwen Liu ; Yi Fang ; Dae Hoon Park ; Xiaohua Hu ; Zhengtao Yu

【Abstract】: Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question retrieval and question diversification. We first propose probabilistic retrieval models to locate candidate questions that are relevant to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative for individual reviews.

【Keywords】: diversification; question retrieval; review summarization

Novelty and Diversity 3

Paper Link】 【Pages】:395-404

【Authors】: Long Xia ; Jun Xu ; Yanyan Lan ; Jiafeng Guo ; Xueqi Cheng

【Abstract】: Search result diversification has attracted considerable attention as a means to tackle the ambiguous or multi-faceted information needs of users. One of the key problems in search result diversification is novelty, that is, how to measure the novelty of a candidate document with respect to other documents. In the heuristic approaches, the predefined document similarity functions are directly utilized for defining the novelty. In the learning approaches, the novelty is characterized based on a set of handcrafted features. Both the similarity functions and the features are difficult to manually design in real world due to the complexity of modeling the document novelty. In this paper, we propose to model the novelty of a document with a neural tensor network. Instead of manually defining the similarity functions or features, the new method automatically learns a nonlinear novelty function based on the preliminary representation of the candidate document and other documents. New diverse learning to rank models can be derived under the relational learning to rank framework. To determine the model parameters, loss functions are constructed and optimized with stochastic gradient descent. Extensive experiments on three public TREC datasets show that the new derived algorithms can significantly outperform the baselines, including the state-of-the-art relational learning to rank models.

【Keywords】: neural tensor network; relational learning to rank; search result diversification

43. ScentBar: A Query Suggestion Interface Visualizing the Amount of Missed Relevant Information for Intrinsically Diverse Search.

Paper Link】 【Pages】:405-414

【Authors】: Kazutoshi Umemoto ; Takehiro Yamamoto ; Katsumi Tanaka

【Abstract】: For intrinsically diverse tasks, in which collecting extensive information from different aspects of a topic is required, searchers often have difficulty formulating queries to explore diverse aspects and deciding when to stop searching. With the goal of helping searchers discover unexplored aspects and find the appropriate timing for search stopping in intrinsically diverse tasks, we propose ScentBar, a query suggestion interface visualizing the amount of important information that a user potentially misses collecting from the search results of individual queries. We define the amount of missed information for a query as the additional gain that can be obtained from unclicked search results of the query, where gain is formalized as a set-wise metric based on aspect importance, aspect novelty, and per-aspect document relevance and is estimated by using a state-of-the-art algorithm for subtopic mining and search result diversification. Results of a user study involving 24 participants showed that the proposed interface had the following advantages when the gain estimation algorithm worked reasonably: (1) ScentBar users stopped examining search results after collecting a greater amount of relevant information; (2) they issued queries whose search results contained more missed information; (3) they obtained higher gain, particularly at the late stage of their sessions; and (4) they obtained higher gain per unit time. These results suggest that the simple query visualization helps make the search process of intrinsically diverse tasks more efficient, unless inaccurate estimates of missed information are visualized.

【Keywords】: intrinsic diversity; query suggestion interface; search stopping

Paper Link】 【Pages】:415-424

【Authors】: Xiao-Jie Wang ; Zhicheng Dou ; Tetsuya Sakai ; Ji-Rong Wen

【Abstract】: Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships among the intents. In this paper, we introduce intent hierarchies to model the relationships among intents. Based on intent hierarchies, we propose several hierarchical measures that can consider the relationships among intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections. Our main experimental findings are: (1) Hierarchical measures are generally more discriminative and intuitive than existing measures using flat lists of intents; (2) When the queries have multilayer intent hierarchies, hierarchical measures are less correlated to existing measures, but can get more improvement in discriminative power; (3) Hierarchical measures are more intuitive in terms of diversity or relevance. The hierarchical measures using the whole intent hierarchies are more intuitive than only using the leaf nodes in terms of diversity and relevance.

【Keywords】: ambiguity; diversity; evaluation; hierarchy; novelty

Entities and Knowledge Graphs 3

45. Robust and Collective Entity Disambiguation through Semantic Embeddings.

Paper Link】 【Pages】:425-434

【Authors】: Stefan Zwicklbauer ; Christin Seifert ; Michael Granitzer

【Abstract】: Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. We propose a new collective, graph-based disambiguation algorithm utilizing semantic entity and document embeddings for robust entity disambiguation. Robust thereby refers to the property of achieving better than state-of-the-art results over a wide range of very different data sets. Our approach is also able to abstain if no appropriate entity can be found for a specific surface form. Our evaluation shows, that our approach achieves significantly (>5%) better results than all other publicly available disambiguation algorithms on 7 of 9 datasets without data set specific tuning. Moreover, we discuss the influence of the quality of the knowledge base on the disambiguation accuracy and indicate that our algorithm achieves better results than non-publicly available state-of-the-art algorithms.

【Keywords】: embeddings; entity disambiguation; neuronal networks

46. Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from Knowledge Graph.

Paper Link】 【Pages】:435-444

【Authors】: Fedor Nikolaev ; Alexander Kotov ; Nikita Zhiltsov

【Abstract】: Accurate projection of terms in free-text queries onto structured entity representations is one of the fundamental problems in entity retrieval from knowledge graphs. In this paper, we demonstrate that existing retrieval models for ad-hoc structured and unstructured document retrieval fall short of addressing this problem, due to their rigid assumptions. According to these assumptions, either all query concepts of the same type (unigrams and bigrams) are projected onto the fields of entity representations with identical weights or such projection is determined based only on one simple statistic, which makes it sensitive to data sparsity. To address this issue, we propose the Parametrized Fielded Sequential Dependence Model (PFSDM) and the Parametrized Fielded Full Dependence Model (PFFDM), two novel models for entity retrieval from knowledge graphs, which infer the user's intent behind each individual query concept by dynamically estimating its projection onto the fields of structured entity representations based on a small number of statistical and linguistic features. Experimental results obtained on several publicly available benchmarks indicate that PFSDM and PFFDM consistently outperform state-of-the-art retrieval models for the task of entity retrieval from knowledge graph.

【Keywords】: entity retrieval; feature-based models; knowledge graphs

47. Hierarchical Random Walk Inference in Knowledge Graphs.

Paper Link】 【Pages】:445-454

【Authors】: Qiao Liu ; Liuyi Jiang ; Minghao Han ; Yao Liu ; Zhiguang Qin

【Abstract】: Relational inference is a crucial technique for knowledge base population. The central problem in the study of relational inference is to infer unknown relations between entities from the facts given in the knowledge bases. Two popular models have been put forth recently to solve this problem, which are the latent factor models and the random-walk models, respectively. However, each of them has their pros and cons, depending on their computational efficiency and inference accuracy. In this paper, we propose a hierarchical random-walk inference algorithm for relational learning in large scale graph-structured knowledge bases, which not only maintains the computational simplicity of the random-walk models, but also provides better inference accuracy than related works. The improvements come from two basic assumptions we proposed in this paper. Firstly, we assume that although a relation between two entities is syntactically directional, the information conveyed by this relation is equally shared between the connected entities, thus all of the relations are semantically bidirectional. Secondly, we assume that the topology structures of the relation-specific subgraphs in knowledge bases can be exploited to improve the performance of the random-walk based relational inference algorithms. The proposed algorithm and ideas are validated with numerical results on experimental data sampled from practical knowledge bases, and the results are compared to state-of-the-art approaches.

【Keywords】: knowledge base; knowledge graphs; random walk model; relational inference; statistical relational learning

SIRIP I: Big companies, big data 4

48. When Watson Went to Work: Leveraging Cognitive Computing in the Real World.

Paper Link】 【Pages】:455-456

【Authors】: Aya Soffer ; David Konopnicki ; Haggai Roitman

【Abstract】:

【Keywords】: affective computing; cognitive computing; customer support applications; problem resolution; question answering

49. Ask Your TV: Real-Time Question Answering with Recurrent Neural Networks.

Paper Link】 【Pages】:457-458

【Authors】: Ferhan Türe ; Oliver Jojic

【Abstract】: Voice-based interfaces are very popular in today's world, and Comcast customers are no exception. Usage stats show that our new X1 TV platform receives millions of voice queries per day. As a result, expanding the coverage of our voice interface provides a critical competitive advantage, allowing customers to speak freely instead of having to stick to a rigid set of commands. The ultimate objective is to provide a more natural user experience and increase access to our knowledge graph (KG) and entertainment platform. We describe a real-time factoid question answering (QA) system, using our internal KG for training (i.e., generating labeled example question-answer pairs) and for retrieval at test time. We hope that this will inspire other companies to take advantage of readily available unlabeled data, machine learning and search technologies to build products that can improve customer experiences.Our approach consists of two steps: First, two neural network models are trained to predict a structured query from the free-form input question. Then, a search through all facts in the KG retrieves answers consistent with the structured query.

【Keywords】: comcast; deep learning; entity detection; information extraction; knowledge graph; question answering; recurrent neural networks; structured prediction; tv

50. Amazon Search: The Joy of Ranking Products.

Paper Link】 【Pages】:459-460

【Authors】: Daria Sorokina ; Erick Cantú-Paz

【Abstract】: Amazon is one of the world's largest e-commerce sites and Amazon Search powers the majority of Amazon's sales. As a consequence, even small improvements in relevance ranking both positively influence the shopping experience of millions of customers and significantly impact revenue. In the past, Amazon's product search engine consisted of several hand-tuned ranking functions using a handful of input features. A lot has changed since then. In this talk we are going to cover a number of relevance algorithms used in Amazon Search today. We will describe a general machine learning framework used for ranking within categories, blending separate rankings in All Product Search, NLP techniques used for matching queries and products, and algorithms targeted at unique tasks of specific categories --- books and fashion.

【Keywords】: amazon; e-commerce; search

Paper Link】 【Pages】:461-462

【Authors】: Viet Ha-Thuc ; Shakti Sinha

【Abstract】: LinkedIn search is deeply personalized - for the same queries, different searchers expect completely different results. This paper presents our approach to achieving this by mining various data sources available in LinkedIn to infer searchers' intents (such as hiring, job seeking, etc.), as well as extending the concept of homophily to capture the searcher-result similarities on many aspects. Then, learning-to-rank is applied to combine these signals with standard search features.

【Keywords】: federation; learning-to-rank; personalization

Evaluation II 3

52. When does Relevance Mean Usefulness and User Satisfaction in Web Search?

Paper Link】 【Pages】:463-472

【Authors】: Jiaxin Mao ; Yiqun Liu ; Ke Zhou ; Jian-Yun Nie ; Jingtao Song ; Min Zhang ; Shaoping Ma ; Jiashen Sun ; Hengliang Luo

【Abstract】: Relevance is a fundamental concept in information retrieval (IR) studies. It is however often observed that relevance as annotated by secondary assessors may not necessarily mean usefulness and satisfaction perceived by users. In this study, we confirm the difference by a laboratory study in which we collect relevance annotations by external assessors, usefulness and user satisfaction information by users, for a set of search tasks. We also find that a measure based on usefulness rather than relevance annotated has a better correlation with user satisfaction. However, we show that external assessors are capable of annotating usefulness when provided with more search context information. In addition, we also show that it is possible to generate automatically usefulness labels when some training data is available. Our findings explain why traditional system-centric evaluation metrics are not well aligned with user satisfaction and suggest that a usefulness-based evaluation method can be defined to better reflect the quality of search systems perceived by the users.

【Keywords】: evaluation; relevance; usefulness; user satisfaction

53. How Many Workers to Ask?: Adaptive Exploration for Collecting High Quality Labels.

Paper Link】 【Pages】:473-482

【Authors】: Ittai Abraham ; Omar Alonso ; Vasilis Kandylas ; Rajesh Patel ; Steven Shelford ; Aleksandrs Slivkins

【Abstract】: Crowdsourcing has been part of the IR toolbox as a cheap and fast mechanism to obtain labels for system development and evaluation. Successful deployment of crowdsourcing at scale involves adjusting many variables, a very important one being the number of workers needed per human intelligence task (HIT). We consider the crowdsourcing task of learning the answer to simple multiple-choice HITs, which are representative of many relevance experiments. In order to provide statistically significant results, one often needs to ask multiple workers to answer the same HIT. A stopping rule is an algorithm that, given a HIT, decides for any given set of worker answers to stop and output an answer or iterate and ask one more worker. In contrast to other solutions that try to estimate worker performance and answer at the same time, our approach assumes the historical performance of a worker is known and tries to estimate the HIT difficulty and answer at the same time. The difficulty of the HIT decides how much weight to give to each worker's answer. In this paper we investigate how to devise better stopping rules given workers' performance quality scores. We suggest adaptive exploration as a promising approach for scalable and automatic creation of ground truth. We conduct a data analysis on an industrial crowdsourcing platform, and use the observations from this analysis to design new stopping rules that use the workers' quality scores in a non-trivial manner. We then perform a number of experiments using real-world datasets and simulated data, showing that our algorithm performs better than other approaches.

【Keywords】: adaptive algorithms; assessments; crowdsourcing; ground truth; label quality; multi-armed bandits

54. Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines.

Paper Link】 【Pages】:483-492

【Authors】: Bekir Taner Dinçer ; Craig Macdonald ; Iadh Ounis

【Abstract】: A robust retrieval system ensures that user experience is not damaged by the presence of poorly-performing queries. Such robustness can be measured by risk-sensitive evaluation measures, which assess the extent to which a system performs worse than a given baseline system. However, using a particular, single system as the baseline suffers from the fact that retrieval performance highly varies among IR systems across topics. Thus, a single system would in general fail in providing enough information about the real baseline performance for every topic under consideration, and hence it would in general fail in measuring the real risk associated with any given system. Based upon the Chi-squared statistic, we propose a new measure ZRisk that exhibits more promise since it takes into account multiple baselines when measuring risk, and a derivative measure called GeoRisk, which enhances ZRisk by also taking into account the overall magnitude of effectiveness. This paper demonstrates the benefits of ZRisk and GeoRisk upon TREC data, and how to exploit GeoRisk for risk-sensitive learning to rank, thereby making use of multiple baselines within the learning objective function to obtain effective yet risk-averse/robust ranking systems. Experiments using 10,000 topics from the MSLR learning to rank dataset demonstrate the efficacy of the proposed Chi-square statistic-based objective function.

【Keywords】: baselines; learning-to-rank; risk-sensitive evaluation

Events 3

55. Event Digest: A Holistic View on Past Events.

Paper Link】 【Pages】:493-502

【Authors】: Arunav Mishra ; Klaus Berberich

【Abstract】: For a general user, easy access to vast amounts of online information available on past events has made retrospection much harder. We propose a problem of automatic event digest generation to aid effective and efficient retrospection. For this, in addition to text, a digest should maximize the reportage of time, geolocations, and entities to present a holistic view on the past event of interest. We propose a novel divergence-based framework that selects excerpts from an initial set of pseudo-relevant documents, such that the overall relevance is maximized, while avoiding redundancy in text, time, geolocations, and named entities, by treating them as independent dimensions of an event. Our method formulates the problem as an Integer Linear Program (ILP) for global inference to diversify across the event dimensions. Relevance and redundancy measures are defined based on JS-divergence between independent query and excerpt models estimated for each event dimension. Elaborate experiments on three real-world datasets are conducted to compare our methods against the state-of-the-art from the literature. Using Wikipedia articles as gold standard summaries in our evaluation, we find that the most holistic digest of an event is generated with our method that integrates all event dimensions. We compare all methods using standard Rouge-1, -2, and -SU4 along with Rouge-NP, and a novel weighted variant of Rouge.

【Keywords】: diversification; event digest; linking; semantic annotations

56. Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events.

Paper Link】 【Pages】:503-512

【Authors】: Andreas Spitz ; Michael Gertz

【Abstract】: Real world events, such as historic incidents, typically contain both spatial and temporal aspects and involve a specific group of persons. This is reflected in the descriptions of events in textual sources, which contain mentions of named entities and dates. Given a large collection of documents, however, such descriptions may be incomplete in a single document, or spread across multiple documents. In these cases, it is beneficial to leverage partial information about the entities that are involved in an event to extract missing information. In this paper, we introduce the LOAD model for cross-document event extraction in large-scale document collections. The graph-based model relies on co-occurrences of named entities belonging to the classes locations, organizations, actors, and dates and puts them in the context of surrounding terms. As such, the model allows for efficient queries and can be updated incrementally in negligible time to reflect changes to the underlying document collection. We discuss the versatility of this approach for event summarization, the completion of partial event information, and the extraction of descriptions for named entities and dates. We create and provide a LOAD graph for the documents in the English Wikipedia from named entities extracted by state-of-the-art NER tools. Based on an evaluation set of historic data that include summaries of diverse events, we evaluate the resulting graph. We find that the model not only allows for near real-time retrieval of information from the underlying document collection, but also provides a comprehensive framework for browsing and summarizing event data.

【Keywords】: document indexing; entity linking; event extraction; event representation; named entities; ranking; summarization

57. GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams.

Paper Link】 【Pages】:513-522

【Authors】: Chao Zhang ; Guangyu Zhou ; Quan Yuan ; Honglei Zhuang ; Yu Zheng ; Lance M. Kaplan ; Shaowen Wang ; Jiawei Han

【Abstract】: The real-time discovery of local events (e.g., protests, crimes, disasters) is of great importance to various applications, such as crime monitoring, disaster alarming, and activity recommendation. While this task was nearly impossible years ago due to the lack of timely and reliable data sources, the recent explosive growth in geo-tagged tweet data brings new opportunities to it. That said, how to extract quality local events from geo-tagged tweet streams in real time remains largely unsolved so far. We propose GeoBurst, a method that enables effective and real-time local event detection from geo-tagged tweet streams. With a novel authority measure that captures the geo-topic correlations among tweets, GeoBurst first identifies several pivots in the query window. Such pivots serve as representative tweets for potential local events and naturally attract similar tweets to form candidate events. To select truly interesting local events from the candidate list, GeoBurst further summarizes continuous tweet streams and compares the candidates against historical activities to obtain spatiotemporally bursty ones. Finally, GeoBurst also features an updating module that finds new pivots with little time cost when the query window shifts. As such, GeoBurst is capable of monitoring continuous streams in real time. We used crowdsourcing to evaluate GeoBurst on two real-life data sets that contain millions of geo-tagged tweets. The results demonstrate that GeoBurst significantly outperforms state-of-the-art methods in precision, and is orders of magnitude faster.

【Keywords】: event detection; local event; social media; tweet; twitter

SIRIP II: Small companies, big ideas 3

Paper Link】 【Pages】:523-524

【Authors】: Manos Tsagkias ; Wouter Weerkamp

【Abstract】: 904Labs B.V. was founded in 2014 by Wouter Weerkamp, Manos Tsagkias, and Maarten de Rijke to commercialize state-of-the-art search engine technology. 904Labs' strategic product is a self-learning search engine for online retailers, which uses some of the most recent scientific developments in machine learning and search engine evaluation. 904Labs has raised about 200K in funding and has signed pilots with large international and national companies. Since its start, 904Labs has grown with two developers and two experienced business people. In this presentation we tell how to go from research to business and the challenges it brings along?

【Keywords】: product search; self-learning search; startup

59. Sedano: A News Stream Processor for Business.

Paper Link】 【Pages】:525-526

【Authors】: Ugo Scaiella ; Giacomo Berardi ; Giuliano Mega ; Roberto Santoro

【Abstract】: We present Sedano, a system for processing and indexing a continuous stream of business-related news. Sedano defines pipelines whose stages analyze and enrich news items (e.g., newspaper articles and press releases). News data coming from several content sources are stored, processed and then indexed in order to be consumed by Atoka, our business intelligence product. Atoka users can retrieve news about specific companies, filtering according to various facets. Sedano features both an entity-linking phase, which finds mentions of companies in news, and a classification phase, which classifies news according to a set of business events. Its flexible architecture allows Sedano to be deployed on commodity machines while being scalable and fault-tolerant

【Keywords】: business intelligence; entity linking; news retrieval; text classification

60. Ranking Financial Tweets.

Paper Link】 【Pages】:527-528

【Authors】: Diego Ceccarelli ; Francesco Nidito ; Miles Osborne

【Abstract】: Recently Twitter has complemented traditional newswire as a source of valuable Financial information. Although there is a rich body of published research dealing with the task of ranking tweets, there has been little published research dealing with ranking tweets within a Financial context. Here we consider whether popularity factors within Twitter can be used as a signal for popularity within the domain of financial experts. Our results suggest that what interests Finance is not the same as what interests the users of Twitter.

【Keywords】: finance; ranking; tweet; twitter

Recommendation Systems II 3

61. Contextual Bandits in a Collaborative Environment.

Paper Link】 【Pages】:529-538

【Authors】: Qingyun Wu ; Huazheng Wang ; Quanquan Gu ; Hongning Wang

【Abstract】: Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit parameters pertaining to each user independently. This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components. In this paper, we develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithm against several state-of-the-art contextual bandit algorithms.

【Keywords】: collaborative contextual bandits; online recommendations; reinforcement learning

62. Collaborative Filtering Bandits.

Paper Link】 【Pages】:539-548

【Authors】: Shuai Li ; Alexandros Karatzoglou ; Claudio Gentile

【Abstract】: Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grouping users based on the items under consideration and, at the same time, grouping items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting.

【Keywords】: bandits; clustering; collaborative filtering; computational advertising; filtering and recommending; online learning; recommender systems; regret

63. Fast Matrix Factorization for Online Recommendation with Implicit Feedback.

Paper Link】 【Pages】:549-558

【Authors】: Xiangnan He ; Hanwang Zhang ; Min-Yen Kan ; Tat-Seng Chua

【Abstract】: This paper contributes improvements on both the effectiveness and efficiency of Matrix Factorization (MF) methods for implicit feedback. We highlight two critical issues of existing works. First, due to the large space of unobserved feedback, most existing works resort to assign a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption is invalid in real-world settings. Second, most methods are also designed in an offline setting and fail to keep up with the dynamic nature of online data. We address the above two issues in learning MF models from implicit feedback. We first propose to weight the missing data based on item popularity, which is more effective and flexible than the uniform-weight assumption. However, such a non-uniform weighting poses efficiency challenge in learning the model. To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data. We exploit this efficiency to then seamlessly devise an incremental update strategy that instantly refreshes a MF model given new feedback. Through comprehensive experiments on two public datasets in both offline and online protocols, we show that our implemented, open-source (https://github.com/hexiangnan/sigir16-eals) eALS consistently outperforms state-of-the-art implicit MF methods.

【Keywords】: ALS; coordinate descent; implicit feedback; item recommendation; matrix factorization; online learning

64. Leveraging User Interaction Signals for Web Image Search.

Paper Link】 【Pages】:559-568

【Authors】: Neil O'Hare ; Paloma de Juan ; Rossano Schifanella ; Yunlong He ; Dawei Yin ; Yi Chang

【Abstract】: User interfaces for web image search engine results differ significantly from interfaces for traditional (text) web search results, supporting a richer interaction. In particular, users can see an enlarged image preview by hovering over a result image, and an `image preview' page allows users to browse further enlarged versions of the results, and to click-through to the referral page where the image is embedded. No existing work investigates the utility of these interactions as implicit relevance feedback for improving search ranking, beyond using clicks on images displayed in the search results page. In this paper we propose a number of implicit relevance feedback features based on these additional interactions: hover-through rate, 'converted-hover' rate, referral page click through, and a number of dwell time features. Also, since images are never self-contained, but always embedded in a referral page, we posit that clicks on other images that are embedded on the same referral webpage as a given image can carry useful relevance information about that image. We also posit that query-independent versions of implicit feedback features, while not expected to capture topical relevance, will carry feedback about the quality or attractiveness of images, an important dimension of relevance for web image search. In an extensive set of ranking experiments in a learning to rank framework, using a large annotated corpus, the proposed features give statistically significant gains of over 2% compared to a state of the art baseline that uses standard click features.

【Keywords】: ranking; user behavior; web image search

65. Self-Paced Cross-Modal Subspace Matching.

Paper Link】 【Pages】:569-578

【Authors】: Jian Liang ; Zhihang Li ; Dong Cao ; Ran He ; Jingdong Wang

【Abstract】: Cross-modal matching methods match data from different modalities according to their similarities. Most existing methods utilize label information to reduce the semantic gap between different modalities. However, it is usually time-consuming to manually label large-scale data. This paper proposes a Self-Paced Cross-Modal Subspace Matching (SCSM) method for unsupervised multimodal data. We assume that multimodal data are pair-wised and from several semantic groups, which form hard pair-wised constraints and soft semantic group constraints respectively. Then, we formulate the unsupervised cross-modal matching problem as a non-convex joint feature learning and data grouping problem. Self-paced learning, which learns samples from 'easy' to 'complex', is further introduced to refine the grouping result. Moreover, a multimodal graph is constructed to preserve the relationship of both inter- and intra-modality similarity. An alternating minimization method is employed to minimize the non-convex optimization problem, followed by the discussion on its convergence analysis and computational complexity. Experimental results on four multimodal databases show that SCSM outperforms state-of-the-art cross-modal subspace learning methods.

【Keywords】: cross-modal matching; heterogeneous data; self-paced learning; unsupervised subspace learning

66. Composite Correlation Quantization for Efficient Multimodal Retrieval.

Paper Link】 【Pages】:579-588

【Authors】: Mingsheng Long ; Yue Cao ; Jianmin Wang ; Philip S. Yu

【Abstract】: Efficient similarity retrieval from large-scale multimodal database is pervasive in modern search engines and social networks. To support queries across content modalities, the system should enable cross-modal correlation and computation-efficient indexing. While hashing methods have shown great potential in achieving this goal, current attempts generally fail to learn isomorphic hash codes in a seamless scheme, that is, they embed multiple modalities in a continuous isomorphic space and separately threshold embeddings into binary codes, which incurs substantial loss of retrieval accuracy. In this paper, we approach seamless multimodal hashing by proposing a novel Composite Correlation Quantization (CCQ) model. Specifically, CCQ jointly finds correlation-maximal mappings that transform different modalities into isomorphic latent space, and learns composite quantizers that convert the isomorphic latent features into compact binary codes. An optimization framework is devised to preserve both intra-modal similarity and inter-modal correlation through minimizing both reconstruction and quantization errors, which can be trained from both paired and partially paired data in linear time. A comprehensive set of experiments clearly show the superior effectiveness and efficiency of CCQ against the state of the art hashing methods for both unimodal and cross-modal retrieval.

【Keywords】: correlation analysis; hashing; multimodal retrieval; quantization

SIRIP III: Modeling and Evaluation 4

67. Principles for the Design of Online A/B Metrics.

Paper Link】 【Pages】:589-590

【Authors】: Widad Machmouchi ; Georg Buscher

【Abstract】: In this paper, we describe principles for designing metrics in the context of A/B experiments. We share some issues that comes up in designing such experiments and provide solutions to avoid such pitfalls.

【Keywords】: a/b measurement; controlled experiments; metric design principles.

68. Visual Recommendation Use Case for an Online Marketplace Platform: allegro.pl.

Paper Link】 【Pages】:591-594

【Authors】: Anna Wróblewska ; Lukasz Raczkowski

【Abstract】: In this paper we describe a small content-based visual recommendation project built as part of the Allegro online marketplace platform. We extracted relevant data only from images, as they are inherently better at capturing visual attributes than textual offer descriptions. We used several image descriptors to extract color and texture information in order to find visually similar items. We tested our results against available textual offer tags and also asked human users to subjectively assess the precision. Finally, we deployed the solution to our platform.

【Keywords】: cbir; content-based recommendations; e-commerce; elasticsearch; image processing; image search; lire; online auctions; opencv; visual search

69. AOL's Named Entity Resolver: Solving Disambiguation via Document Strongly Connected Components and Ad-Hoc Edges Construction.

Paper Link】 【Pages】:595-596

【Authors】: Roni Wiener ; Yonatan Ben-Simhon ; Anna Chen

【Abstract】: Named Entity Disambiguation is the task of disambiguating named entity mentions in unstructured text and linking them to their corresponding entries in a large knowledge base such as Freebase. Practically, each text match in a given document should be mapped to the correct entity out of the corresponding entities in the knowledge base or none of them if no correct entity is found (Empty Entry). The case of an empty entry makes the problem at hand more complex, but by solving it, one can successfully cope with missing and erroneous data as well as unknown entities. In this work we present AOL's Named Entity Resolver which was designed to handle real life scenarios including empty entries. As part of the automated news analysis platform, it processes over 500K news articles a day, entities from each article are extracted and disambiguated. According to our experiments, AOL's resolver shows much better results in disambiguating entities mapped to Wikipedia or Freebase compared to industry leading products.

【Keywords】: named entity disambiguation; named entity resolution

70. The Data Stack in Information Retrieval.

Paper Link】 【Pages】:597

【Authors】: Omar Alonso

【Abstract】: I propose to look at information retrieval applications from the perspective of the data stack infrastructure that is needed in research prototypes and production systems.

【Keywords】: applications.; big data; data science; infrastructure; small data

Behavior Models and Applications 3

71. Predicting User Engagement with Direct Displays Using Mouse Cursor Information.

Paper Link】 【Pages】:599-608

【Authors】: Ioannis Arapakis ; Luis A. Leiva

【Abstract】: Predicting user engagement with direct displays (DD) is of paramount importance to commercial search engines, as well as to search performance evaluation. However, understanding within-content engagement on a web page is not a trivial task mainly because of two reasons: (1) engagement is subjective and different users may exhibit different behavioural patterns; (2) existing proxies of user engagement (e.g., clicks, dwell time) suffer from certain caveats, such as the well-known position bias, and are not as effective in discriminating between useful and non-useful components. In this paper, we conduct a crowdsourcing study and examine how users engage with a prominent web search engine component such as the knowledge module (KM) display. To this end, we collect and analyse more than 115k mouse cursor positions from 300 users, who perform a series of search tasks. Furthermore, we engineer a large number of meta-features which we use to predict different proxies of user engagement, including attention and usefulness. In our experiments, we demonstrate that our approach is able to predict more accurately different levels of user engagement and outperform existing baselines.

【Keywords】: direct displays; knowledge module; mouse cursor tracking; user engagement; web search

Paper Link】 【Pages】:609-618

【Authors】: Fernando Diaz ; Qi Guo ; Ryen W. White

【Abstract】: Search result examination is an important part of searching. High page load latency for landing pages (clicked results) can reduce the efficiency of the search process. Proactively prefetching landing pages in advance of clickthrough can save searchers valuable time. However, prefetching consumes resources that are wasted unless the prefetched results are requested by searchers. Balancing the costs in prefetching particular results against the benefits in reduced latency to searchers represents the search result prefetching challenge. We present methods that leverage searchers' cursor movements on search result pages in real time to dynamically estimate the result that searchers will request next. We demonstrate through large-scale log analysis that our approach significantly outperforms three strong baselines that prefetch results based on (i) the search engine result ranking, (ii) past clicks from all searchers for the query, or (iii) past clicks from the current searcher for the query. Our promising findings have implications for the design of search support that makes the search process more efficient.

【Keywords】: cursor-tracking; efficiency; prefetching

Paper Link】 【Pages】:619-628

【Authors】: Yiqun Liu ; Zeyang Liu ; Ke Zhou ; Meng Wang ; Huanbo Luan ; Chao Wang ; Min Zhang ; Shaoping Ma

【Abstract】: Predicting users' examination of search results is one of the key concerns in Web search related studies. With more and more heterogeneous components federated into search engine result pages (SERPs), it becomes difficult for traditional position-based models to accurately predict users' actual examination patterns. Therefore, a number of prior works investigate the connection between examination and users' explicit interaction behaviors (e.g.~click-through, mouse movement). Although these works gain much success in predicting users' examination behavior on SERPs, they require the collection of large scale user behavior data, which makes it impossible to predict examination behavior on newly-generated SERPs. To predict user examination on SERPs containing heterogenous components without user interaction information, we propose a new prediction model based on visual saliency map and page content features. Visual saliency, which is designed to measure the likelihood of a given area to attract human visual attention, is used to predict users' attention distribution on heterogenous search components. With an experimental search engine, we carefully design a user study in which users' examination behavior (eye movement) is recorded. Examination prediction results based on this collected data set demonstrate that visual saliency features significantly improve the performance of examination model in heterogeneous search environments. We also found that saliency features help predict internal examination behavior within vertical results.

【Keywords】: eye tracking; user behavior analysis; visual saliency

Efficiency II 2

74. A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation.

Paper Link】 【Pages】:629-638

【Authors】: Xin Jin ; Tao Yang ; Xun Tang

【Abstract】: Machine-learned classification and ranking techniques often use ensembles to aggregate partial scores of feature vectors for high accuracy and the runtime score computation can become expensive when employing a large number of ensembles. The previous work has shown the judicious use of memory hierarchy in a modern CPU architecture which can effectively shorten the time of score computation. However, different traversal methods and blocking parameter settings can exhibit different cache and cost behavior depending on data and architectural characteristics. It is very time-consuming to conduct exhaustive search for performance comparison and optimum selection. This paper provides an analytic comparison of cache blocking methods on their data access performance with an approximation and proposes a fast guided sampling scheme to select a traversal method and blocking parameters for effective use of memory hierarchy. The evaluation studies with three datasets show that within a reasonable amount of time, the proposed scheme can identify a highly competitive solution that significantly accelerates score calculation.

【Keywords】: cache locality; ensemble methods; query processing

75. Improved Caching Techniques for Large-Scale Image Hosting Services.

Paper Link】 【Pages】:639-648

【Authors】: Xiao Bai ; Berkant Barla Cambazoglu ; Archie Russell

【Abstract】: Commercial image serving systems, such as Flickr and Facebook, rely on large image caches to avoid the retrieval of requested images from the costly backend image store, as much as possible. Such systems serve the same image in different resolutions and, thus, in different sizes to different clients, depending on the properties of the clients' devices. The requested resolutions of images can be cached individually, as in the traditional caches, reducing the backend workload. However, a potentially better approach is to store relatively high-resolution images in the cache and resize them during the retrieval to obtain lower-resolution images. Having this kind of on-the-fly image resizing capability enables image serving systems to deploy more sophisticated caching policies and improve their serving performance further. In this paper, we formalize the static caching problem in image serving systems which provide on-the-fly image resizing functionality in their edge caches or regional caches. We propose two gain-based caching policies that construct a static, fixed-capacity cache to reduce the average serving time of images. The basic idea in the proposed policies is to identify the best resolution(s) of images to be cached so that the average serving time for future image retrieval requests is reduced. We conduct extensive experiments using real-life data access logs obtained from Flickr. We show that one of the proposed caching policies reduces the average response time of the service by up to 4.2% with respect to the best-performing baseline that mainly relies on the access frequency information to make the caching decisions. This improvement implies about 25% reduction in cache size under similar serving time constraints.

【Keywords】: cache hit rate; image caching; image hosting service; image resizing; response latency

Short Collection Papers 17

76. A Complete & Comprehensive Movie Review Dataset (CCMR).

Paper Link】 【Pages】:661-664

【Authors】: Xuezhi Cao ; Weiyue Huang ; Yong Yu

【Abstract】: Online review sites are widely used for various domains including movies and restaurants. These sites now have strong influences towards users during purchasing processes. There exist plenty of research works for review sites on various aspects, including item recommendation, user behavior analysis, etc. However, due to the lack of complete and comprehensive dataset, there are still problems that remain to be solved. Therefore, in this paper we assemble and publish such dataset (CCMR) for the community. CCMR outruns existing datasets in terms of completeness, comprehensiveness and scale. Besides describing the dataset and its collecting methodology, we also propose several potential research topics that are made possible by having this dataset. Such topics include: (i) a statistical approach to reduce the impacts from fake reviews and (ii) analyzing and modeling the influences of public opinions towards users during rating actions. We further conduct preliminary analysis and experiments for both directions to show that they are promising.

【Keywords】: review sites; test collection; user behavior

77. A Cross-Platform Collection of Social Network Profiles.

Paper Link】 【Pages】:665-668

【Authors】: Maria Han Veiga ; Carsten Eickhoff

【Abstract】: The proliferation of Internet-enabled devices and services has led to a shifting balance between digital and analogue aspects of our everyday lives. In the face of this development there is a growing demand for the study of privacy hazards, the potential for unique user deanonymization and information leakage between the various social media profiles many of us maintain. To enable the structured study of such adversarial effects, this paper presents a dedicated dataset of cross-platform social network personas (i.e., the same person has accounts on multiple platforms). The corpus comprises 850 users who generate predominantly English content. Each user object contains the online footprint of the same person in three distinct social networks: Twitter, Instagram and Foursquare. In total, it encompasses over 2.5M tweets, 340k check-ins and 42k Instagram posts. We describe the collection methodology, characteristics of the dataset, and how to obtain it. Finally, we discuss a common use case, cross-platform user identification.

【Keywords】: benchmark; cross platform collection; dataset; linked users; online social networks; test collection

78. A Test Collection for Matching Patients to Clinical Trials.

Paper Link】 【Pages】:669-672

【Authors】: Bevan Koopman ; Guido Zuccon

【Abstract】: We present a test collection to study the use of search engines for matching eligible patients (the query) to clinical trials (the document). Clinical trials are experiments conducted in the development of new medical treatments, drugs or devices. Recruiting candidates for a trial is often a time-consuming and resource intensive effort, and imposes delays or even the cancellation of trials. The collection described in this paper provides: i) a large corpus of clinical trials; ii) 60 patient case reports used as topics; iii) multiple query representations for a single topic (long, short and ad-hoc); iv) a user provided estimate of how many trials they expect each patient topic would be eligible for; and v) relevance assessments by medical professionals. The availability of such a collection allows researchers to investigate, among other questions: i) the effectiveness of retrieval methods for this task, ii) how multiple representations of an information affect retrieval iii) what influences relevance assessments in this context, iv) whether automated matching of patients to trials improves patient recruitment. The collection is available at http://doi.org/10.4225/08/5714557510C17.

【Keywords】: clinical trials; information retrieval; medical information retrieval; test collections

79. ArabicWeb16: A New Crawl for Today's Arabic Web.

Paper Link】 【Pages】:673-676

【Authors】: Reem Suwaileh ; Mucahid Kutlu ; Nihal Fathima ; Tamer Elsayed ; Matthew Lease

【Abstract】: Web crawls provide valuable snapshots of the Web which enable a wide variety of research, be it distributional analysis to characterize Web properties or use of language, content analysis in social science, or Information Retrieval (IR) research to develop and evaluate effective search algorithms. While many English-centric Web crawls exist, existing public Arabic Web crawls are quite limited, limiting research and development. To remedy this, we present ArabicWeb16, a new public Web crawl of roughly 150M Arabic Web pages with significant coverage of dialectal Arabic as well as Modern Standard Arabic. For IR researchers, we expect ArabicWeb16 to support various research areas: ad-hoc search, question answering, filtering, cross-dialect search, dialect detection, entity search, blog search, and spam detection. Combined use with a separate Arabic Twitter dataset we are also collecting may provide further value.

【Keywords】: arabic search; evaluation; multi-dialect; web collection

80. Building Test Collections for Evaluating Temporal IR.

Paper Link】 【Pages】:677-680

【Authors】: Hideo Joho ; Adam Jatowt ; Roi Blanco ; Haitao Yu ; Shuhei Yamamoto

【Abstract】: Research on temporal aspects of information retrieval has recently gained considerable interest within the Information Retrieval (IR) community. This paper describes our efforts for building test collections for the purpose of fostering temporal IR research. In particular, we overview the test collections created at the two recent editions of Temporal Information Access (Temporalia) task organized at NTCIR-11 and NTCIR-12, report on selected results and discuss several observations we made during the task design and implementation. Finally, we outline further directions for constructing test collections suitable for temporal IR.

【Keywords】: temporal ir; temporal query intents; temporal search result diversification; test collections

81. DAJEE: A Dataset of Joint Educational Entities for Information Retrieval in Technology Enhanced Learning.

Paper Link】 【Pages】:681-684

【Authors】: Vladimir Estivill-Castro ; Carla Limongelli ; Matteo Lombardi ; Alessandro Marani

【Abstract】: In the Technology Enhanced Learning (TEL) community, the problem of conducting reproducible evaluations of recommender systems is still open, due to the lack of exhaustive benchmarks. The few public datasets available in TEL have limitations, being mostly small and local. Recently, Massive Open Online Courses (MOOC) are attracting many studies in TEL, mainly because of the huge amount of data for these courses and their potential for many applications in TEL. This paper presents DAJEE, a dataset built from the crawling of MOOCs hosted on the Coursera platform. DAJEE offers information on the usage of more than 20,000 resources in 407 courses by 484 instructors, with a conjunction of different educational entities in order to store the courses' structure and the instructors' teaching experiences.

【Keywords】: coursera; educational dataset; mooc; tel dataset

82. Evaluating Retrieval over Sessions: The TREC Session Track 2011-2014.

Paper Link】 【Pages】:685-688

【Authors】: Ben Carterette ; Paul D. Clough ; Mark M. Hall ; Evangelos Kanoulas ; Mark Sanderson

【Abstract】: Information Retrieval (IR) research has traditionally focused on serving the best results for a single query - so-called ad hoc retrieval. However, users typically search iteratively, refining and reformulating their queries during a session. A key challenge in the study of this interaction is the creation of suitable evaluation resources to assess the effectiveness of IR systems over sessions. This paper describes the TREC Session Track, which ran from 2010 through to 2014, which focussed on forming test collections that included various forms of implicit feedback. We describe the test collections; a brief analysis of the differences between datasets over the years; and the evaluation results that demonstrate that the use of user session data significantly improved effectiveness.

【Keywords】: evaluation; test collection; trec session track

83. EveTAR: A New Test Collection for Event Detection in Arabic Tweets.

Paper Link】 【Pages】:689-692

【Authors】: Hind Almerekhi ; Maram Hasanain ; Tamer Elsayed

【Abstract】: Research on event detection in Twitter is often obstructed by the lack of publicly-available evaluation mechanisms such as test collections; this problem is more severe when considering the scarcity of them in languages other than English. In this paper, we present EveTAR, the first publicly-available test collection for event detection in Arabic tweets. The collection includes a crawl of 590M Arabic tweets posted in a month period and covers 66 significant events (in 8 different categories) for which more than 134k relevance judgments were gathered using crowdsourcing with high average inter-annotator agreement (Kappa value of 0.6). We demonstrate the usability of the collection by evaluating 3 state-of-the-art event detection algorithms. The collection is also designed to support other retrieval tasks, as we show in our experiments with ad-hoc search systems.

【Keywords】: ad-hoc search; crowdsourcing; evaluation; twitter

84. GNMID14: A Collection of 110 Million Global Music Identification Matches.

Paper Link】 【Pages】:693-696

【Authors】: Cameron Summers ; Greg Tronel ; Jason Cramer ; Aneesh Vartakavi ; Phillip Popp

【Abstract】: A new dataset is presented composed of music identification matches from Gracenote, a leading global music metadata company. Matches from January 1, 2014 to December 31, 2014 have been curated and made available as a public dataset called Gracenote Music Identification 2014, or GNMID14, at the following address: https://developer.gracenote.com/mid2014. This collection is the first significant music identification dataset and one of the largest music related datasets available containing more than 110M matches in 224 countries for 3M unique tracks, and 509K unique artists. It features geotemporal information (i.e. country and match date), genre and mood metadata. In this paper, we characterize the dataset and demonstrate its utility for Information Retrieval (IR) research.

【Keywords】: collection; content identification; dataset; fingerprint; genre; geotemporal; mood; music; similarity; test

Paper Link】 【Pages】:697-700

【Authors】: Suzan Verberne ; Bram Arends ; Wessel Kraaij ; Arjen P. de Vries

【Abstract】: We have collected the access logs for our university's web domain over a time span of 4.5 years. We now release the pre-processed data of a 3-month period for research into user navigation behavior. We preprocessed the data so that only successful GET requests of web pages by non-bot users are kept. The resulting 3-month collection comprises 9.6M page visits (190K unique URLs) by 744K unique visitors.

【Keywords】: access logs; data collection; graph clustering; link analysis; navigation behavior

86. New Collection Announcement: Focused Retrieval Over the Web.

Paper Link】 【Pages】:701-704

【Authors】: Ivan Habernal ; Maria Sukhareva ; Fiana Raiber ; Anna Shtok ; Oren Kurland ; Hadar Ronen ; Judit Bar-Ilan ; Iryna Gurevych

【Abstract】: Focused retrieval (a.k.a., passage retrieval) is important at its own right and as an intermediate step in question answering systems. We present a new Web-based collection for focused retrieval. The document corpus is the Category A of the ClueWeb12 collection. Forty-nine queries from the educational domain were created. The $100$ documents most highly ranked for each query by a highly effective learning-to-rank method were judged for relevance using crowdsourcing. All sentences in the relevant documents were judged for relevance.

【Keywords】: focused retrieval

87. NTCIR Lifelog: The First Test Collection for Lifelog Research.

Paper Link】 【Pages】:705-708

【Authors】: Cathal Gurrin ; Hideo Joho ; Frank Hopfgartner ; Liting Zhou ; Rami Albatal

【Abstract】: Test collections have a long history of supporting repeatable and comparable evaluation in Information Retrieval (IR). However, thus far, no shared test collection exists for IR systems that are designed to index and retrieve multimodal lifelog data. In this paper we introduce the first test collection for personal lifelog data, which has been employed for the NTCIR12-Lifelog task. In this paper, the requirements for the test collection are motivated, the process of creating the test collection is described, along with an overview of the test collection. Finally suggestions are given for possible applications of the test collection.

【Keywords】: evaluation; information retrieval; lifelog; multimodal; test collection

Paper Link】 【Pages】:709-712

【Authors】: Stewart Whiting ; Joemon M. Jose ; Omar Alonso

【Abstract】: In 2012, Sogou, a major Chinese web search engine released a large-scale query log containing 43.5M user interactions, including submitted queries and clicked web page search results. This query log offers a deep sample of queries over a two day period from 30th December 2011 to 1st January 2012. In August 2013, we identified 1.4M predominantly Chinese language unique search result URLs that were clicked at least three times in this query log. We crawled the HTML content of these URLs to construct the supplementary SOGOU-2012-CRAWL dataset, which we release in this work. A real large-scale query log with accompanying crawl such as this offers several opportunities for reproducible information retrieval (IR) research, including query classification, intent modelling and indexing strategy. In this paper we first detail the query log and crawl dataset construction and characteristics. Following this, to demonstrate potential applications we use the crawl to indicatively analyse various time-based patterns in web content and search behaviour. In particular, we study the distribution of language-independent date expressions in the crawled web content. Based on this, we propose a simple approach for modelling the past/present/future temporal intent of queries based on the date the query was submitted by the user, and the dates appearing in the clicked search results. We observe several prominent temporal patterns which may lead to novel time-aware IR approaches.

【Keywords】: chinese; crawl; intent; temporal; time

89. The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums.

Paper Link】 【Pages】:713-716

【Authors】: Ian Soboroff ; Kira Griffitt ; Stephanie Strassel

【Abstract】: This paper describes a new test collection for passage retrieval from multilingual, informal text. The task being modeled is that of a monolingual English-speaking user who wishes to search discussion forum text in a foreign language. The system retrieves relevant short passages of text and presents them to the user, translated into English. The test collection contains more than 2 billion words of discussion thread text, 250 queries representing complex informational search needs, and manual relevance judgments of forum post passages, pooled from real systems. This information retrieval test collection is the first to combine multilingual search, passage retrieval, and informal online genre text.

【Keywords】: forum search; machine translation; passage retrieval

90. The Factoid Queries Collection.

Paper Link】 【Pages】:717-720

【Authors】: Ido Guy ; Dan Pelleg

【Abstract】: We present a collection of over 15,000 queries, issued to commercial web search engines, whose answer is a single fact. The collection was produced based on queries landing on questions within a large community question answering website, each with a best answer no longer than 3 words and an explicit reference to a Wikipedia page. We describe the collection generation process and provide a variety of descriptive characteristics, demonstrating the collection?s uniqueness compared to existing datasets and its potential use for research of factoid question answering and retrieval.

【Keywords】: dataset; fact retrieval; factoid question answering

91. The LExR Collection for Expertise Retrieval in Academia.

Paper Link】 【Pages】:721-724

【Authors】: Vitor Mangaravite ; Rodrygo L. T. Santos ; Isac S. Ribeiro ; Marcos André Gonçalves ; Alberto H. F. Laender

【Abstract】: Expertise retrieval has been the subject of intense research over the past decade, particularly with the public availability of benchmark test collections for expertise retrieval in enterprises. Another domain which has seen comparatively less research on expertise retrieval is academic search. In this paper, we describe the Lattes Expertise Retrieval (LExR) test collection for research on academic expertise retrieval. LExR has been designed to provide a large-scale benchmark for two complementary expertise retrieval tasks, namely, expert profiling and expert finding. Unlike currently available test collections, which fully support only one of these tasks, LExR provides graded relevance judgments performed by expert judges separately for each task. In addition, LExR is both cross-organization and cross-area, encompassing candidate experts from all areas of knowledge working in research institutions all over Brazil. As a result, it constitutes a valuable resource for fostering new research directions on expertise retrieval in an academic setting.

【Keywords】: academic search; expertise retrieval

92. UQV100: A Test Collection with Query Variability.

Paper Link】 【Pages】:725-728

【Authors】: Peter Bailey ; Alistair Moffat ; Falk Scholer ; Paul Thomas

【Abstract】: We describe the UQV100 test collection, designed to incorporate variability from users. Information need ?backstories? were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, and provide the queries they would use; plus effort estimates of how many useful documents they would have to read to satisfy the need. A total of 10,835 queries were collected from 263 workers. After normalization and spell-correction, 5,764 unique variations remained; these were then used to construct a document pool via Indri-BM25 over the ClueWeb12-B corpus. Qualified crowd workers made relevance judgments relative to the backstories, using a relevance scale similar to the original TREC approach; first to a pool depth of ten per query, then deeper on a set of targeted documents. The backstories, query variations, normalized and spell-corrected queries, effort estimates, run outputs, and relevance judgments are made available collectively as the UQV100 test collection. We also make available the judging guidelines and the gold hits we used for crowd-worker qualification and spam detection. We believe this test collection will unlock new opportunities for novel investigations and analysis, including for problems such as task-intent retrieval performance and consistency (independent of query variation), query clustering, query difficulty prediction, and relevance feedback, among others.

【Keywords】: backstory; clueweb; information need; information retrieval; test collection; trec; user variability

Short Research Papers 87

93. A Dynamic Recurrent Model for Next Basket Recommendation.

Paper Link】 【Pages】:729-732

【Authors】: Feng Yu ; Qiang Liu ; Shu Wu ; Liang Wang ; Tieniu Tan

【Abstract】: Next basket recommendation becomes an increasing concern. Most conventional models explore either sequential transaction features or general interests of users. Further, some works treat users' general interests and sequential behaviors as two totally divided matters, and then combine them in some way for next basket recommendation. Moreover, the state-of-the-art models are based on the assumption of Markov Chains (MC), which only capture local sequential features between two adjacent baskets. In this work, we propose a novel model, Dynamic REcurrent bAsket Model (DREAM), based on Recurrent Neural Network (RNN). DREAM not only learns a dynamic representation of a user but also captures global sequential features among baskets. The dynamic representation of a specific user can reveal user's dynamic interests at different time, and the global sequential features reflect interactions of all baskets of the user over time. Experiment results on two public datasets indicate that DREAM is more effective than the state-of-the-art models for next basket recommendation.

【Keywords】: next basket recommendation; recurrent neural network

94. A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling.

Paper Link】 【Pages】:733-736

【Authors】: Fanghong Jian ; Jimmy Xiangji Huang ; Jiashu Zhao ; Tingting He ; Po Hu

【Abstract】: Traditional information retrieval (IR) models, in which a document is normally represented as a bag of words and their frequencies, capture the term-level and document-level information. Topic models, on the other hand, discover semantic topic-based information among words. In this paper, we consider term-based information and semantic information as two features of query terms and propose a simple enhancement for ad-hoc IR via topic modeling. In particular, three topic-based hybrid models, LDA-BM25, LDA-MATF and LDA-LM, are proposed. A series of experiments on eight standard datasets show that our proposed models can always outperform significantly the corresponding strong baselines over all datasets in terms of MAP and most of datasets in terms of P@5 and P@20. A direct comparison on eight standard datasets also indicates our proposed models are at least comparable to the state-of-the-art approaches.

【Keywords】: dirichlet language model; lda; probabilistic model

95. An Empirical Study of Learning to Rank for Entity Search.

Paper Link】 【Pages】:737-740

【Authors】: Jing Chen ; Chenyan Xiong ; Jamie Callan

【Abstract】: This work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents constructed from their RDF triples, and field-based text similarity features are extracted for query-entity pairs. State-of-the-art learning to rank methods learn models for ad-hoc entity search. Our experiments on an entity search test collection based on DBpedia confirm that learning to rank methods are as powerful for ranking entities as for ranking documents, and establish a new state-of-the-art for accuracy on this benchmark dataset.

【Keywords】: dbpedia; entity search; knowledge base; learning to rank

96. An Exploration of Evaluation Metrics for Mobile Push Notifications.

Paper Link】 【Pages】:741-744

【Authors】: Luchen Tan ; Adam Roegiest ; Jimmy J. Lin ; Charles L. A. Clarke

【Abstract】: How do we evaluate systems that filter social media streams and send users updates via push notifications on their mobile phones? Such notifications must be relevant, timely, and novel. In this paper, we explore various evaluation metrics for this task, focusing specifically on measuring relevance. We begin with an analysis of metrics deployed at the TREC 2015 Microblog evaluations. A simple change to the metrics, reflecting a different assumption, dramatically alters system rankings. Applying another metric, previously used in the TREC Microblog evaluations, again yields different system rankings. We find little correlation between a number of "reasonable" evaluation metrics, which suggests that system effectiveness depends on how you measure it---an undesirable state in IR evaluation. However, we argue that existing evaluation metrics can be generalized into a framework that uses the same underlying contingency table, but places different weights and penalties. Although we stop short of proposing the "one true metric", this framework can guide the future development of a family of metrics that more accurately models user needs.

【Keywords】: filtering; meta-evaluation; microblogs; trec; tweets

97. An Improved Multileaving Algorithm for Online Ranker Evaluation.

Paper Link】 【Pages】:745-748

【Authors】: Brian Brost ; Ingemar J. Cox ; Yevgeny Seldin ; Christina Lioma

【Abstract】: Online ranker evaluation is a key challenge in information retrieval. An important task in the online evaluation of rankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have been found to be efficient and sensitive, i.e. they can quickly detect even small differences in quality. It has recently been shown that multileaving methods exhibit similar sensitivity but can be more efficient than interleaving methods. This paper presents empirical results demonstrating that existing multileaving methods either do not scale well with the number of rankers, or, more problematically, can produce results which substantially differ from evaluation measures like NDCG. The latter problem is caused by the fact that they do not correctly account for the similarities that can occur between rankers being multileaved. We propose a new multileaving method for handling this problem and demonstrate that it substantially outperforms existing methods, in some cases reducing errors by as much as 50%.

【Keywords】: multileaving; online ranker evaluation

98. An Unsupervised Approach to Anomaly Detection in Music Datasets.

Paper Link】 【Pages】:749-752

【Authors】: Yen-Cheng Lu ; Chih-Wei Wu ; Chang-Tien Lu ; Alexander Lerch

【Abstract】: This paper presents an unsupervised method for systematically identifying anomalies in music datasets. The model integrates categorical regression and robust estimation techniques to infer anomalous scores in music clips. When applied to a music genre recognition dataset, the new method is able to detect corrupted, distorted, or mislabeled audio samples based on commonly used features in music information retrieval. The evaluation results show that the algorithm outperforms other anomaly detection methods and is capable of finding problematic samples identified by human experts. The proposed method introduces a preliminary framework for anomaly detection in music data that can serve as a useful tool to improve data integrity in the future.

【Keywords】: anomaly detection; data clean-up; music genre retrieval; music information retrieval

99. Anonymizing Query Logs by Differential Privacy.

Paper Link】 【Pages】:753-756

【Authors】: Sicong Zhang ; Grace Hui Yang ; Lisa Singh

【Abstract】: Query logs are valuable resources for Information Retrieval (IR) research. However, because they are also rich in private and personal information, the huge concern of leaking user privacy prevents query logs from being shared from the search companies to the broad research community. Bothered by the lack of good research data for years, the authors of this paper are motivated to explore ways to generate anonymized query logs that can still be effectively used to support the search task. We introduce a framework to anonymize query logs by differential privacy, the latest development in privacy research. The framework is empirically evaluated against multiple search algorithms on their retrieval utility, measured in standard IR evaluation metrics, using the anonymized logs. The experiments show that our framework is able to achieve a good balance between retrieval utility and privacy.

【Keywords】: differential privacy; privacy-preserving information retrieval; query log anonymization

100. Audio Features Affected by Music Expressiveness: Experimental Setup and Preliminary Results on Tuba Players.

Paper Link】 【Pages】:757-760

【Authors】: Alberto Introini ; Giorgio Presti ; Giuseppe Boccignone

【Abstract】: Within a Music Information Retrieval perspective, the goal of the study presented here is to investigate the impact on sound features of the musician's affective intention, namely when trying to intentionally convey emotional contents via expressiveness. A preliminary experiment has been performed involving 10 tuba players. The recordings have been analysed by extracting a variety of features, which have been subsequently evaluated by combining both classic and machine learning statistical techniques. Results are reported and discussed.

【Keywords】: music and emotions; music information retrieval; performance analysis; sentiment analysis; tuba

Paper Link】 【Pages】:761-764

【Authors】: Adam Fourney ; Susan T. Dumais

【Abstract】: Web search functionality is increasingly integrated into operating systems, software applications, and other interactive environments that extend beyond the traditional web browser. In particular, intelligent virtual assistants (e.g., Microsoft Cortana or Apple Siri) often "fall-back" to generic web search in cases where utterances fall outside the set of scenarios known to the agent. In this paper we analyze a 3 month log of web search queries posed via the Cortana virtual assistant. We report that, in this environment, users frequently ask questions that implicitly pertain to the systems or devices from which they are searching (e.g., asking: [how do I take a screenshot]). Unfortunately, accurately answering these implicit system queries poses significant challenges to general web search engines, due in part to the lack of available context. We show that such queries: (1) can be detected with high precision, (2) are common, and (3) can be automatically reformulated to substantially improve retrieval performance in these fall-through scenarios.

【Keywords】: implicit system search; virtual assistants

102. Axiomatic Analysis for Improving the Log-Logistic Feedback Model.

Paper Link】 【Pages】:765-768

【Authors】: Ali Montazeralghaem ; Hamed Zamani ; Azadeh Shakery

【Abstract】: Pseudo-relevance feedback (PRF) has been proven to be an effective query expansion strategy to improve retrieval performance. Several PRF methods have so far been proposed for many retrieval models. Recent theoretical studies of PRF methods show that most of the PRF methods do not satisfy all necessary constraints. Among all, the log-logistic model has been shown to be an effective method that satisfies most of the PRF constraints. In this paper, we first introduce two new PRF constraints. We further analyze the log-logistic feedback model and show that it does not satisfy these two constraints as well as the previously proposed "relevance effect" constraint. We then modify the log-logistic formulation to satisfy all these constraints. Experiments on three TREC newswire and web collections demonstrate that the proposed modification significantly outperforms the original log-logistic model, in all collections.

【Keywords】: axiomatic analysis; pseudo-relevance feedback; query expansion; semantic similarity; theoretical analysis

103. Balancing Relevance Criteria through Multi-Objective Optimization.

Paper Link】 【Pages】:769-772

【Authors】: Joost van Doorn ; Daan Odijk ; Diederik M. Roijers ; Maarten de Rijke

【Abstract】: Offline evaluation of information retrieval systems typically focuses on a single effectiveness measure that models the utility for a typical user. Such a measure usually combines a behavior-based rank discount with a notion of document utility that captures the single relevance criterion of topicality. However, for individual users relevance criteria such as credibility, reputability or readability can strongly impact the utility. Also, for different information needs the utility can be a different mixture of these criteria. Because of the focus on single metrics, offline optimization of IR systems does not account for different preferences in balancing relevance criteria. We propose to mitigate this by viewing multiple relevance criteria as objectives and learning a set of rankers that provide different trade-offs w.r.t. these objectives. We model document utility within a gain-based evaluation framework as a weighted combination of relevance criteria. Using the learned set, we are able to make an informed decision based on the values of the rankers and a preference w.r.t. the relevance criteria. On a dataset annotated for readability and a web search dataset annotated for sub-topic relevance we demonstrate how trade-offs between can be made explicit. We show that there are different available trade-offs between relevance criteria.

【Keywords】: effectiveness measures; learning to rank; multi-objective optimization; optimistic linear support

104. Build Emotion Lexicon from the Mood of Crowd via Topic-Assisted Joint Non-negative Matrix Factorization.

Paper Link】 【Pages】:773-776

【Authors】: Kaisong Song ; Wei Gao ; Ling Chen ; Shi Feng ; Daling Wang ; Chengqi Zhang

【Abstract】: In the research of building emotion lexicons, we witness the exploitation of crowd-sourced affective annotation given by readers of online news articles. Such approach ignores the relationship between topics and emotion expressions which are often closely correlated. We build an emotion lexicon by developing a novel joint non-negative matrix factorization model which not only incorporates crowd-annotated emotion labels of articles but also generates the lexicon using the topic-specific matrices obtained from the factorization process. We evaluate our lexicon via emotion classification on both benchmark and built-in-house datasets. Results demonstrate the high-quality of our lexicon.

【Keywords】: emotion classification; emotion lexicon; joint nmf

105. Burst Detection in Social Media Streams for Tracking Interest Profiles in Real Time.

Paper Link】 【Pages】:777-780

【Authors】: Cody Buntain ; Jimmy J. Lin

【Abstract】: This work presents RTTBurst, an end-to-end system for ingesting descriptions of user interest profiles and discovering new and relevant tweets based on those interest profiles using a simple model for identifying bursts in token usage. Our approach differs from standard retrieval-based techniques in that it primarily focuses on identifying noteworthy moments in the tweet stream, and ?summarizes? those moments using selected tweets. We lay out the architecture of RTTBurst, our participation in and performance at the TREC 2015 Microblog track, and a method for combining and potentially improving existing TREC systems. Official results and post hoc experiments show that our simple targeted burst detection technique is competitive with existing systems. Furthermore, we demonstrate that our burst detection mechanism can be used to improve the performance of other systems for the same task.

【Keywords】: burst detection; real-time tracking; twitter

106. Cluster-based Joint Matrix Factorization Hashing for Cross-Modal Retrieval.

Paper Link】 【Pages】:781-784

【Authors】: Dimitrios Rafailidis ; Fabio Crestani

【Abstract】: Cross-modal retrieval has been an emerging topic over the last years, as modern applications have to efficiently search for multimedia documents with different modalities. In this study, we propose a cross-modal hashing method by following a cluster-based joint matrix factorization strategy. Our method first builds clusters for each modality separately and then generates a cross-modal cluster representation for each document. We formulate a joint matrix factorization process with the constraint that pushes the documents' representations of the different modalities and the cross-modal cluster representations into a common consensus matrix. In doing so, we capture the inter-modality, intra-modality and cluster-based similarities in a unified latent space. Finally, we present an efficient way to generate the hash codes using the maximum entropy principle and compute the binary codes for external queries. In our experiments with two publicly available data sets, we show that the proposed method outperforms state-of-the-art hashing methods for different cross-modal retrieval tasks.

【Keywords】: cross-modal retrieval; hashing; matrix factorization

107. Collaborative Ranking with Social Relationships for Top-N Recommendations.

Paper Link】 【Pages】:785-788

【Authors】: Dimitrios Rafailidis ; Fabio Crestani

【Abstract】: Recommendation systems have gained a lot of attention because of their importance for handling the unprecedentedly large amount of available content on the Web, such as movies, music, books, etc. Although Collaborative Ranking (CR) models can produce accurate recommendation lists, in practice several real-world problems decrease their ranking performance, such as the sparsity and cold start problems. Here, to account for the fact that the selections of social friends can leverage the recommendation accuracy, we propose SCR, a Social CR model. Our model learns personalized ranking functions collaboratively, using the notion of Social Reverse Height, that is, considering how well the relevant items of users and their social friends have been ranked at the top of the list. The reason that we focus on the top of the list is that users mainly see the top-N recommendations, and not the whole ranked list. In our experiments with a benchmark data set from Epinions, we show that our SCR model performs better than state-of-the-art CR models that either consider social relationships or focus on the ranking performance at the top of the list.

【Keywords】: collaborative ranking; recommendation systems; social relationships

108. Community-based Cyberreading for Information Understanding.

Paper Link】 【Pages】:789-792

【Authors】: Zhuoren Jiang ; Xiaozhong Liu ; Liangcai Gao ; Zhi Tang

【Abstract】: Although the content in scientific publications is increasingly challenging, it is necessary to investigate another important problem, that of scientific information understanding. For this proposed problem, we investigate novel methods to assist scholars (readers) to better understand scientific publications by enabling physical and virtual collaboration. For physical collaboration, an algorithm will group readers together based on their profiles and reading behavior, and will enable the cyberreading collaboration within a online reading group. For virtual collaboration, instead of pushing readers to communicate with others, we cluster readers based on their estimated information needs. For each cluster, a learning to rank model will be generated to recommend readers' communitized resources (i.e., videos, slides, and wikis) to help them understand the target publication.

【Keywords】: cyberreading; education; information understanding; user study

109. Computational Creativity Based Video Recommendation.

Paper Link】 【Pages】:793-796

【Authors】: Wei Lu ; Korris Fu-Lai Chung

【Abstract】: Computational creativity, as an emerging domain of application, emphasizes the use of big data to automatically design new knowledge. Based on the availability of complex multi-relational data, one aspect of computational creativity is to infer unexplored regions of feature space and novel learning paradigm, which is particularly useful for online recommendation. Tensor models offer effective approaches for complex multi-relational data learning and missing element completion. Targeting at constructing a recommender system that can compromise between accuracy and creativity for users, a deep Bayesian probabilistic tensor framework for tag and item recommending is adopted. Empirical results demonstrate the superiority of the proposed method and indicate that it can better capture latent patterns of interaction relationships and generate interesting recommendations based on creative tag combinations.

【Keywords】: bayesian tensor factorization; recommendation; serendipity

110. Controversy Detection in Wikipedia Using Collective Classification.

Paper Link】 【Pages】:797-800

【Authors】: Shiri Dori-Hacohen ; David D. Jensen ; James Allan

【Abstract】: Concerns over personalization in IR have sparked an interest in detection and analysis of controversial topics. Accurate detection would enable many beneficial applications, such as alerting search users to controversy. Wikipedia's broad coverage and rich metadata offer a valuable resource for this problem. We hypothesize that intensities of controversy among related pages are not independent; thus, we propose a stacked model which exploits the dependencies among related pages. Our approach improves classification of controversial web pages when compared to a model that examines each page in isolation, demonstrating that controversial topics exhibit homophily. Using notions of similarity to construct a subnetwork for collective classification, rather than using the default network present in the relational data, leads to improved classification with wider applications for semi-structured datasets, with the effects most pronounced when a small set of neighbors is used.

【Keywords】: collective classification; controversy detection; semi-structured data; similarity; stacked classification

111. Discovering Author Interest Evolution in Topic Modeling.

Paper Link】 【Pages】:801-804

【Authors】: Min Yang ; Jincheng Mei ; Fei Xu ; Wenting Tu ; Ziyu Lu

【Abstract】: Discovering the author's interest over time from documents has important applications in recommendation systems, authorship identification and opinion extraction. In this paper, we propose an interest drift model (IDM), which monitors the evolution of author interests in time-stamped documents. The model further uses the discovered author interest information to help finding better topics. Unlike traditional topic models, our model is sensitive to the ordering of words, thus it extracts more information from the semantic meaning of the context. The experiment results show that the IDM model learns better topics than state-of-the-art topic models.

【Keywords】: author topic model; dynamic author interests; dynamic author topic model; topic model

112. Distributional Random Oversampling for Imbalanced Text Classification.

Paper Link】 【Pages】:805-808

【Authors】: Alejandro Moreo ; Andrea Esuli ; Fabrizio Sebastiani

【Abstract】: The accuracy of many classification algorithms is known to suffer when the data are imbalanced (i.e., when the distribution of the examples across the classes is severely skewed). Many applications of binary text classification are of this type, with the positive examples of the class of interest far outnumbered by the negative examples. Oversampling (i.e., generating synthetic training examples of the minority class) is an often used strategy to counter this problem. We present a new oversampling method specifically designed for classifying data (such as text) for which the distributional hypothesis holds, according to which the meaning of a feature is somehow determined by its distribution in large corpora of data. Our Distributional Random Oversampling method generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. We discuss results we have obtained on the Reuters-21578, OHSUMED-S, and RCV1-v2 datasets.

【Keywords】: distributional hypothesis; imbalanced text classification; oversampling

113. Doc2Sent2Vec: A Novel Two-Phase Approach for Learning Document Representation.

Paper Link】 【Pages】:809-812

【Authors】: Ganesh J ; Manish Gupta ; Vasudeva Varma

【Abstract】: Doc2Sent2Vec is an unsupervised approach to learn low-dimensional feature vector (or embedding) for a document. This embedding captures the semantics of the document and can be fed as input to machine learning algorithms to solve a myriad number of applications in the field of data mining and information retrieval. Some of these applications include document classification, retrieval, and ranking. The proposed approach is two-phased. In the first phase, the model learns a vector for each sentence in the document using a standard word-level language model. In the next phase, it learns the document representation from the sentence sequence using a novel sentence-level language model. Intuitively, the first phase captures the word-level coherence to learn sentence embeddings, while the second phase captures the sentence-level coherence to learn document embeddings. Compared to the state-of-the-art models that learn document vectors directly from the word sequences, we hypothesize that the proposed decoupled strategy of learning sentence embeddings followed by document embeddings helps the model learn accurate and rich document representations. We evaluate the learned document embeddings by considering two classification tasks: scientific article classification and Wikipedia page classification. Our model outperforms the current state-of-the-art models in the scientific article classification task by ?12.07% and the Wikipedia page classification task by ?6.93%, both in terms of F1 score. These results highlight the superior quality of document embeddings learned by the Doc2Sent2Vec approach.

【Keywords】: distributed representation; document embedding; document modeling; machine learning; sentence embedding; word embedding

114. Dynamically Integrating Item Exposure with Rating Prediction in Collaborative Filtering.

Paper Link】 【Pages】:813-816

【Authors】: Ting-Yi Shih ; Ting-Chang Hou ; Jian-De Jiang ; Yen-Chieh Lien ; Chia-Rui Lin ; Pu-Jen Cheng

【Abstract】: The paper proposes a novel approach to appropriately promote those items with few ratings in collaborative filtering. Different from previous works, we force the items with few ratings to be promoted to the users who would potentially be able to give ratings, and then leverage the gathered user preference to punish the promoted items with low quality intrinsically. By slightly sacrificing the benefit of recommending the best items in terms of user satisfaction, our approach seeks to provide all of the items with a chance to be visible equally. The results of the experiments conducted on MovieLens and Netflix data demonstrate its feasibility.

【Keywords】: collaborative filtering; item exposure; novelty

Paper Link】 【Pages】:817-820

【Authors】: Anat Hashavit ; Roy Levin ; Ido Guy ; Gilad Kutiel

【Abstract】: In recent years, studies about trend detection in online social media streams have begun to emerge. Since not all users are likely to always be interested in the same set of trends, some of the research also focused on personalizing the trends by using some predefined personalized context. In this paper, we take this problem further to a setting in which the user's context is not predefined, but rather determined as the user issues a query. This presents a new challenge since trends cannot be computed ahead of time using high latency algorithms. We present RT-Trend, an online trend detection algorithm that promptly finds relevant in-context trends as users issue search queries over a dataset of documents. We evaluate our approach using real data from an online social network by assessing its ability to predict actual activity increase of social network entities in the context of a search result. Since we implemented this feature into an existing tool with an active pool of users, we also report click data, which suggests positive feedback.

【Keywords】: analytics; tag cloud; trend cloud; trends; word cloud

116. Enhancing First Story Detection using Word Embeddings.

Paper Link】 【Pages】:821-824

【Authors】: Sean Moran ; Richard McCreadie ; Craig Macdonald ; Iadh Ounis

【Abstract】: In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a state-of-the-art FSD system.

【Keywords】: document expansion; locality sensitive hashing; nearest neighbour search; paraphrase; streaming data; twitter

117. Examining the Coherence of the Top Ranked Tweet Topics.

Paper Link】 【Pages】:825-828

【Authors】: Anjie Fang ; Craig Macdonald ; Iadh Ounis ; Philip Habel

【Abstract】: Topic modelling approaches help scholars to examine the topics discussed in a corpus. Due to the popularity of Twitter, two distinct methods have been proposed to accommodate the brevity of tweets: the tweet pooling method and Twitter LDA. Both of these methods demonstrate a higher performance in producing more interpretable topics than the standard Latent Dirichlet Allocation (LDA) when applied on tweets. However, while various metrics have been proposed to estimate the coherence of the generated topics from tweets, the coherence of the top ranked topics, those that are most likely to be examined by users, has not been investigated. In addition, the effect of the number of generated topics K on the topic coherence scores has not been studied. In this paper, we conduct large-scale experiments using three topic modelling approaches over two Twitter datasets, and apply a state-of-the-art coherence metric to study the coherence of the top ranked topics and how K affects such coherence. Inspired by ranking metrics such as precision at n, we use coherence at n to assess the coherence of a topic model. To verify our results, we conduct a pairwise user study to obtain human preferences over topics. Our findings are threefold: we find evidence that Twitter LDA outperforms both LDA and the tweet pooling method because the top ranked topics it generates have more coherence; we demonstrate that a larger number of topics (K) helps to generate topics with more coherence; and finally, we show that coherence at n is more effective when evaluating the coherence of a topic model than the average coherence score.

【Keywords】: coherence metrics; lda; topic model; twitter; twitter lda

Paper Link】 【Pages】:829-832

【Authors】: Jin Young Kim ; Jaime Teevan ; Nick Craswell

【Abstract】: Gathering evidence about whether a search result is relevant is a core concern in the evaluation and improvement of information retrieval systems. Two common sources of evidence for establishing relevance are judgements from trained assessors and logs of online user behavior. However, both are limited; it is hard for a trained assessor to know exactly what users want to find, and user behavior only provides an implicit and ambiguous signal. In this paper, we aim to address these limitations by collecting explicit feedback on web search results from users in situ as they search. When users return to the search result page via the browser back button after having clicked on a result, we ask them to provide a binary thumbs up or thumbs down judgment and text feedback. We collect in situ feedback from a large commercial search engine, and compare this feedback with the judgments provided by trained assessors. We find that in situ feedback differs significantly from traditional relevance judgments, and that it suggests a different interpretation of behavior signals, with the dwell time threshold between negative and positive in situ feedback being 87 seconds, longer than the more common heuristic of 30 seconds. Using text feedback from users, we discuss why user feedback may differ from editorial judgments.

【Keywords】: explicit user feedback; ir evaluation; web search

119. Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles.

Paper Link】 【Pages】:833-836

【Authors】: Claudio Lucchese ; Franco Maria Nardini ; Salvatore Orlando ; Raffaele Perego ; Nicola Tonellotto ; Rossano Venturini

【Abstract】: Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems. This paper investigates the opportunities given by SIMD capabilities of modern CPUs to the end of efficiently evaluating regression trees ensembles. We propose V-QuickScorer (vQS), which exploits SIMD extensions to vectorize the document scoring, i.e., to perform the ensemble traversal by evaluating multiple documents simultaneously. We provide a comprehensive evaluation of vQS against the state of the art on three publicly available datasets. Experiments show that vQS provides speed-ups up to a factor of 3.2x.

【Keywords】: document scoring; ensemble methods; learning to rank

120. Exploiting Semantic Coherence Features for Information Retrieval.

Paper Link】 【Pages】:837-840

【Authors】: Xinhui Tu ; Jimmy Xiangji Huang ; Jing Luo ; Tingting He

【Abstract】: Most of the existing information retrieval models assume that the terms of a text document are independent of each other. These retrieval models integrate three major variables to determine the degree of importance of a term for a document: within document term frequency, document length and the specificity of the term in the collection. Intuitively, the importance of a term for a document is not only dependent on the three aspects mentioned above, but also dependent on the degree of semantic coherence between the term and the document. In this paper, we propose a heuristic approach, in which the degree of semantic coherence of the query terms with a document is adopted to improve the information retrieval performance. Experimental results on standard TREC collections show the proposed models consistently outperform the state-of-the-art models.

【Keywords】: document ranking; retrieval model; term weighting

Paper Link】 【Pages】:841-844

【Authors】: Matthew Mitsui ; Chirag Shah ; Nicholas J. Belkin

【Abstract】: We present a method for extracting the self-reported intentions of users engaged in an information seeking episode. We recruited participants to conduct search sessions and subsequently asked them to self-report their intentions. A total of 27 users participated in a lab study, during which they worked on two search tasks. After each search session, participants indicated their intentions during that session while viewing a video replay. Results indicate that the set of search intentions provided to participants was sufficient to account for intentions in four journalism-related information seeking tasks: a copy editing task, interview preparation task, relationships task, and story pitch task. The results also suggest regular patterns in intentions that can be exploited for identification of task type as well as potential applications to personalization and recommendation during a search episode.

【Keywords】: information seeking episode; information seeking intentions; motivating task; search intentions; search session analysis

122. First Story Detection using Multiple Nearest Neighbors.

Paper Link】 【Pages】:845-848

【Authors】: Jeroen B. P. Vuurens ; Arjen P. de Vries

【Abstract】: First Story Detection (FSD) systems aim to identify those news articles that discuss an event that was not reported before. Recent work on FSD has focussed almost exclusively on efficiently detecting documents that are dissimilar from their nearest neighbor. We propose a novel FSD approach that is more effective, by adapting a recently proposed method for news summarization based on 3-nearest neighbor clustering. We show that this approach is more effective than a baseline that uses dissimilarity of an individual document from its nearest neighbor.

【Keywords】: first story detection

123. Health Monitoring on Social Media over Time.

Paper Link】 【Pages】:849-852

【Authors】: Sumit Sidana ; Shashwat Mishra ; Sihem Amer-Yahia ; Marianne Clausel ; Massih-Reza Amini

【Abstract】: Social media has become a major source for analyzing all aspects of daily life. Thanks to dedicated latent topic analysis methods such as the Ailment Topic Aspect Model (ATAM), public health can now be observed on Twitter. In this work, we are interested in monitoring people's health over time. Recently, Temporal-LDA (TM?LDA) was proposed for efficiently modeling general-purpose topic transitions over time. In this paper, we propose Temporal Ailment Topic Aspect (TM?ATAM), a new latent model dedicated to capturing transitions that involve health-related topics. TM?ATAM learns topic transition parameters by minimizing the prediction error on topic distributions between consecutive posts at different time and geographic granularities. Our experiments on an 8-month corpus of tweets show that it largely outperforms its predecessors.

【Keywords】: ailments; public health; social media; topic models

124. How Informative is a Term?: Dispersion as a measure of Term Specificity.

Paper Link】 【Pages】:853-856

【Authors】: Rodney McDonell ; Justin Zobel ; Bodo Billerbeck

【Abstract】: Similarity functions assign scores to documents in response to queries. These functions require as input statistics about the terms in the queries and documents, where the intention is that the statistics are estimates of the relative informativeness of the terms. Common measures of informativeness use the number of documents containing each term (the document frequency) as a key measure. We argue in this paper that the distribution of within-document frequencies across a collection is also pertinent to informativeness, a measure that has not been considered in prior work: the most informative words tend to be those whose frequency of occurrence has high variance. We propose use of relative standard deviation (RSD) as a measure of variability incorporating within-document frequencies, and show that RSD compares favourably with inverse document frequency (IDF), in both in-principle analysis and in practice in retrieval, with small but consistent gains.

【Keywords】: dispersion; relative standard deviation; rsd; term specificity

125. Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach.

Paper Link】 【Pages】:857-860

【Authors】: Yashar Moshfeghi ; Alvaro Francisco Huertas-Rosero ; Joemon M. Jose

【Abstract】: In this paper we introduce a game scenario for crowdsourcing (CS) using incentives as a bait for careless (gambler) workers, who respond to them in a characteristic way. We hypothesise that careless workers are risk-inclined and can be detected in the game scenario by their use of time, and test this hypothesis in two steps: first, we formulate and prove a theorem stating that a risk-inclined worker will react to competition with shorter Task Completion Time (TCT) than a risk-neutral or risk-averse worker. Second, we check if the game scenario introduces a link between TCT and performance, by performing a crowdsourced evaluation using 35 topics from the TREC-8 collection. Experimental evidence confirms our hypothesis, showing that TCT can be used as a powerful discrimination factor to detect careless workers. This is a valuable result in the quest for quality assurance in CS-based micro tasks such as relevance assessment.

【Keywords】: chicken game; crowdsourcing; game theory; relevance assessment

126. Impact of Review-Set Selection on Human Assessment for Text Classification.

Paper Link】 【Pages】:861-864

【Authors】: Adam Roegiest ; Gordon V. Cormack

【Abstract】: In a laboratory study, human assessors were significantly more likely to judge the same documents as relevant when they were presented for assessment within the context of documents selected using random or uncertainty sampling, as compared to relevance sampling. The effect is substantial and significant [0.54 vs. 0.42, p<0.0002] across a population of documents including both relevant and non-relevant documents, for several definitions of ground truth. This result is in accord with Smucker and Jethani's SIGIR 2010 finding that documents were more likely to be judged relevant when assessed within low-precision versus high-precision ranked lists. Our study supports the notion that relevance is malleable, and that one should take care in assuming any labeling to be ground truth, whether for training, tuning, or evaluating text classifiers.

【Keywords】: assessor error; ediscovery; electronic discovery; evaluation; recall; supervised learning; user study

127. Improving Automated Controversy Detection on the Web.

Paper Link】 【Pages】:865-868

【Authors】: Myungha Jang ; James Allan

【Abstract】: Automatically detecting controversy on the Web is a useful capability for a search engine to help users review web content with a more balanced and critical view. The current state-of-the art approach is to find K-Nearest-Neighbors in Wikipedia to the document query, and to aggregate their controversy scores that are automatically computed from the Wikipedia edit-history features. In this paper, we discover two major weakness in the prior work and propose modifications. First, the generated single query from document to find KNN Wikipages easily becomes ambiguous. Thus, we propose to generate multiple queries from smaller but more topically coherent paragraph of the document. Second, the automatically computed controversy scores of Wikipedia articles that depend on "edit war" features have a drawback that without an edit history, there can be no edit wars. To infer more reliable controversy scores for articles with little edit history, we smooth the original score from the scores of the neighbors with more established edit history. We show that the modified framework is improved by up to 5% for binary controversy classification in a publicly available dataset.

【Keywords】: binary classification; controversy detection; query generation

128. Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval.

Paper Link】 【Pages】:869-872

【Authors】: Qingyao Ai ; Liu Yang ; Jiafeng Guo ; W. Bruce Croft

【Abstract】: Incorporating topic level estimation into language models has been shown to be beneficial for information retrieval (IR) models such as cluster-based retrieval and LDA-based document representation. Neural embedding models, such as paragraph vector (PV) models, on the other hand have shown their effectiveness and efficiency in learning semantic representations of documents and words in multiple Natural Language Processing (NLP) tasks. However, their effectiveness in information retrieval is mostly unknown. In this paper, we study how to effectively use the PV model to improve ad-hoc retrieval. We propose three major improvements over the original PV model to adapt it for the IR scenario: (1) we use a document frequency-based rather than the corpus frequency-based negative sampling strategy so that the importance of frequent words will not be suppressed excessively; (2) we introduce regularization over the document representation to prevent the model overfitting short documents along with the learning iterations; and (3) we employ a joint learning objective which considers both the document-word and word-context associations to produce better word probability estimation. By incorporating this enhanced PV model into the language modeling framework, we show that it can significantly outperform the state-of-the-art topic enhanced language models.

【Keywords】: language model; paragraph vector; retrieval model

129. Improving Retrieval Quality Using Pseudo Relevance Feedback in Content-Based Image Retrieval.

Paper Link】 【Pages】:873-876

【Authors】: Dinesha Chathurani Nanayakkara Wasam Uluwitige ; Timothy Chappell ; Shlomo Geva ; Vinod Chandran

【Abstract】: The increased availability of image capturing devices has enabled collections of digital images to rapidly expand in both size and diversity. This has created a constantly growing need for efficient and effective image browsing, searching, and retrieval tools. Pseudo-relevance feedback (PRF) has proven to be an effective mechanism for improving retrieval accuracy. An original, simple yet effective rank-based PRF mechanism (RB-PRF) that takes into account the initial rank order of each image to improve retrieval accuracy is proposed. This RB-PRF mechanism innovates by making use of binary image signatures to improve retrieval precision by promoting images similar to highly ranked images and demoting images similar to lower ranked images. Empirical evaluations based on standard benchmarks, namely Wang, Oliva & Torralba, and Corel datasets demonstrate the effectiveness of the proposed RB-PRF mechanism in image retrieval.

【Keywords】: cbir; content based image retrieval; image signature; pseudo relevance feedback

130. Ingrams: A Neuropsychological Explanation For Why People Search.

Paper Link】 【Pages】:877-880

【Authors】: Peter Bailey ; Nick Craswell

【Abstract】: Why do people start a search? Why do they stop? Why do they do what they do in-between? Our goal in this paper is to provide a simple yet general explanation for these acts that has its basis in neuropsychology and observed user behavior. We coin the term "ingram", as an information counterpart to Richard Semon's ?engram? or "memory trace". People search to create ingrams. People stop searching because they have created sufficient ingrams, or given up. We describe these acts through a pair of user models and use it to explain various user behaviors in search activity. Understanding people?s search acts in terms of ingrams may help us predict or model the interaction of people?s information needs, the queries they issue, and the information they consume. If we could observe certain decision-making acts within these activities, we might also gain new insight into the relationships between textual information and knowledge representation.

【Keywords】: ingram; mingram; preingram; query intent; search satisfaction.

131. Investment Recommendation using Investor Opinions in Social Media.

Paper Link】 【Pages】:881-884

【Authors】: Wenting Tu ; David Wai-Lok Cheung ; Nikos Mamoulis ; Min Yang ; Ziyu Lu

【Abstract】: Investor social media, such as StockTwist, are gaining increasing popularity. These sites allow users to post their investing opinions and suggestions in the form of microblogs. Given the growth of the posted data, a significant and challenging research problem is how to utilize the personal wisdom and different viewpoints in these opinions to help investment. Previous work aggregates sentiments related to stocks and generates buy or hold recommendations for stocks obtaining favorable votes while suggesting sell or short actions for stocks with negative votes. However, considering the fact that there always exist unreasonable or misleading posts, sentiment aggregation should be improved to be robust to noise. In this paper, we improve investment recommendation by modeling and using the quality of each investment opinion. To model the quality of an opinion, we use multiple categories of features generated from the author information, opinion content and the characteristics of stocks to which the opinion refers. Then, we discuss how to perform investment recommendation (including opinion recommendation and portfolio recommendation) with predicted qualities of investor opinions. Experimental results on real datasets demonstrate effectiveness of our work in recommending high-quality opinions and generating profitable investment decisions.

【Keywords】: data mining; recommendation systems; stock prediction

132. "Is Sven Seven?": A Search Intent Module for Children.

Paper Link】 【Pages】:885-888

【Authors】: Nevena Dragovic ; Ion Madrazo Azpiazu ; Maria Soledad Pera

【Abstract】: The Internet is the biggest data-sharing platform, comprised of an immeasurable quantity of resources covering diverse topics appealing to users of all ages. Children shape tomorrow's society, so it is essential that this audience becomes agile with searching information. Although young users prefer well-known search engines, their lack of skill in formulating adequate queries and the fact that search tools were not designed explicitly with children in mind, can result in poor outcomes. The reasons for this include children's limited vocabulary, which makes it challenging to articulate information needs using short queries, or their tendency to create queries that are too long, which translates to few or irrelevant retrieved results. To enhance web search environments in response to children's behaviors and expectations, in this paper we discuss an initial effort to verify well-known issues, and identify yet to be explored ones, that affect children in formulating (natural language or keyword) queries. We also present a novel search intent module developed in response to these issues, which can seamlessly be integrated with existing search engines favored by children. The proposed module interprets a child's query and creates a shorter and more concise query to submit to a search engine, which can lead to a more successful search session. Initial experiments conducted using a sample of children queries validate the correctness of the proposed search intent module.

【Keywords】: children; query intent; search engines

133. Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search.

Paper Link】 【Pages】:889-892

【Authors】: Kyle Williams ; Julia Kiseleva ; Aidan C. Crook ; Imed Zitouni ; Ahmed Hassan Awadallah ; Madian Khabsa

【Abstract】: Answers on mobile search result pages have become a common way to attempt to satisfy users without them needing to click on search results. Many different types of answers exist, such as weather, flight and currency answers. Understanding the effect that these different answer types have on mobile user behavior and how they contribute to satisfaction is important for search engine evaluation. We study these two aspects by analyzing the logs of a commercial search engine and through a user study. Our results show that user click, abandonment and engagement behavior differs depending on the answer types present on a page. Furthermore, we find that satisfaction rates differ in the presence of different answer types with simple answer types, such as time zone answers, leading to more satisfaction than more complex answers, such as news answers. Our findings have implications for the study and application of user satisfaction for search systems.

【Keywords】: good abandonment; mobile answers; satisfaction

134. Jointly Modeling Review Content and Aspect Ratings for Review Rating Prediction.

Paper Link】 【Pages】:893-896

【Authors】: Zhipeng Jin ; Qiudan Li ; Daniel Dajun Zeng ; YongCheng Zhan ; Ruoran Liu ; Lei Wang ; Hongyuan Ma

【Abstract】: Review rating prediction is of much importance for sentiment analysis and business intelligence. Existing methods work well when aspect-opinion pairs can be accurately extracted from review texts and aspect ratings are complete. The challenges of improving prediction accuracy are how to capture the semantics of review content and how to fill in the missing values of aspect ratings. In this paper, we propose a novel review rating prediction method, which improves the prediction accuracy by capturing deep semantics of review content and alleviating data missing problem of aspect ratings. The method firstly learns the latent vector representation of review content using skip-thought vectors, a state-of-the-art deep learning method, then, the missing values of aspect ratings are filled in based on users? history reviewing behaviors, finally, a novel optimization framework is proposed to predict the review rating. Experimental results on two real-world datasets demonstrate the efficacy of the proposed method.

【Keywords】: aspect rating; data missing; review rating prediction; skip-thought vectors

135. Learning to Project and Binarise for Hashing Based Approximate Nearest Neighbour Search.

Paper Link】 【Pages】:897-900

【Authors】: Sean Moran

【Abstract】: In this paper we focus on improving the effectiveness of hashing-based approximate nearest neighbour search. Generating similarity preserving hashcodes for images has been shown to be an effective and efficient method for searching through large datasets. Hashcode generation generally involves two steps: bucketing the input feature space with a set of hyperplanes, followed by quantising the projection of the data-points onto the normal vectors to those hyperplanes. This procedure results in the makeup of the hashcodes depending on the positions of the data-points with respect to the hyperplanes in the feature space, allowing a degree of locality to be encoded into the hashcodes. In this paper we study the effect of learning both the hyperplanes and the thresholds as part of the same model. Most previous research either learn the hyperplanes assuming a fixed set of thresholds, or vice-versa. In our experiments over two standard image datasets we find statistically significant increases in retrieval effectiveness versus a host of state-of-the-art data-dependent and independent hashing models.

【Keywords】: image retrieval; locality sensitive hashing; nearest neighbour search

136. Linking Organizational Social Network Profiles.

Paper Link】 【Pages】:901-904

【Authors】: Jerome Cheng ; Kazunari Sugiyama ; Min-Yen Kan

【Abstract】: Many organizations possess social media accounts on different social networks, but these profiles are not always linked. End applications, users, as well as the organization themselves, can benefit when the profiles are appropriately identified and linked. Most existing works on social network entity linking focus on linking individuals, and do not model features specific for organizational linking. We address this gap not only to link official social media accounts but also to discover and solve the identification and linking of associated affiliate accounts -- such as geographical divisions and brands -- which are important to distinguish. We instantiate our method for classifying profiles on social network services for Twitter and Facebook, which major organizations use. We classify profiles as to whether they belong to an organization or its affiliates. Our best classifier achieves an accuracy of 0.976 on average in both datasets, significantly improving baselines that exploit the features used in state-of-the-art comparable user linkage strategies.

【Keywords】: organization entity profiling; organizational social profiles; record linkage; social networks

137. Load-Balancing in Distributed Selective Search.

Paper Link】 【Pages】:905-908

【Authors】: Yubin Kim ; Jamie Callan ; J. Shane Culpepper ; Alistair Moffat

【Abstract】: Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. Here we extend the study of selective search using a fine-grained simulation investigating: selective search efficiency in a parallel query processing environment; the difference in efficiency when term-based and sample-based resource selection algorithms are used; and the effect of two policies for assigning index shards to machines. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search. In particular, we show that selective search is capable of both higher throughput and lower latency in a parallel environment than is exhaustive search.

【Keywords】: distributed search; efficiency; load balancing; selective search

138. Multi-Rate Deep Learning for Temporal Recommendation.

Paper Link】 【Pages】:909-912

【Authors】: Yang Song ; Ali Mamdouh Elkahky ; Xiaodong He

【Abstract】: Modeling temporal behavior in recommendation systems is an important and challenging problem. Its challenges come from the fact that temporal modeling increases the cost of parameter estimation and inference, while requiring large amount of data to reliably learn the model with the additional time dimensions. Therefore, it is often difficult to model temporal behavior in large-scale real-world recommendation systems. In this work, we propose a novel deep neural network based architecture that models the combination of long-term static and short-term temporal user preferences to improve the recommendation performance. To train the model efficiently for large-scale applications, we propose a novel pre-train method to reduce the number of free parameters significantly. The resulted model is applied to a real-world data set from a commercial News recommendation system. We compare to a set of established baselines and the experimental results show that our method outperforms the state-of-the-art significantly.

【Keywords】: deep neural networks; long short-term memory network; recommender systems; temporal recommender systems

139. Network-Aware Recommendations of Novel Tweets.

Paper Link】 【Pages】:913-916

【Authors】: Noor Aldeen Alawad ; Aris Anagnostopoulos ; Stefano Leonardi ; Ida Mele ; Fabrizio Silvestri

【Abstract】: With the rapid proliferation of microblogging services such as Twitter, a large number of tweets is published everyday often making users feel overwhelmed with information. Helping these users to discover potentially interesting tweets is an important task for such services. In this paper, we present a novel tweet-recommendation approach, which exploits network, content, and retweet analyses for making recommendations of tweets. The idea is to recommend tweets that are not visible to the user (i.e., they do not appear in the user timeline) because nobody in her social circles published or retweeted them. To do that, we create the user's ego-network up to depth two and apply the transitivity property of the friends-of-friends relationship to determine interesting recommendations, which are then ranked to best match the user's interests. Experimental results demonstrate that our approach improves the state-of-the-art technique.

【Keywords】: content and network analysis; tweet recommendation

Paper Link】 【Pages】:917-920

【Authors】: Qing Zhang ; Houfeng Wang

【Abstract】: With a large amount of complex network data available, most existing recommendation models consider exploiting rich user social relations for better interest targeting. In these approaches, the underlying assumption is that similar users in social networks would prefer similar items. However, in practical scenarios, social link may not be formed by common interest. For example, one general collected social network might be used for various specific recommendation scenarios. The problem of noisy social relations without interest relevance will arise to hurt the performance. Moreover, the sparsity problem of social network makes it much more challenging, due to the two-fold problem needed to be solved simultaneously, for effectively incorporating social information to benefit recommendation. To address this challenge, we propose an adaptive embedding approach to solve the both jointly for better recommendation in real world setting. Experiments conducted on real world datasets show that our approach outperforms current methods.

【Keywords】: collaborative filtering; embedding; recommendation; social personalized ranking

141. On a Topic Model for Sentences.

Paper Link】 【Pages】:921-924

【Authors】: Georgios Balikas ; Massih-Reza Amini ; Marianne Clausel

【Abstract】: Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.

【Keywords】: bayesian learning; representation learning; text classification; text mining; topic modeling; unsupervised learning

Paper Link】 【Pages】:925-928

【Authors】: Vitor Mangaravite ; Rodrygo L. T. Santos

【Abstract】: State-of-the-art expert search approaches rely on document-person associations to infer the expertise of a candidate person for a given query. Such associations have traditionally been modeled as boolean variables, indicating whether or not a candidate authored a document, and further normalized to penalize prolific authorships. In this paper, we address expert search in academia, where the authorship of a document can be determined with reasonable certainty. In contrast to traditional approaches, we propose to model associations as non-boolean variables, reflecting the probability that a document is informative of the expertise of a candidate. Moreover, we introduce an alternative normalization scheme that measures how discriminative a particular document-person association is in light of all associations involving either the document or the person. Through a large-scale user study with academic experts from several areas of knowledge, we demonstrate the suitability of the proposed association and normalization schemes to improve the effectiveness of a state-of-the-art expert search approach.

【Keywords】: academic search; expertise retrieval

Paper Link】 【Pages】:929-932

【Authors】: Helge Holzmann ; Wolfgang Nejdl ; Avishek Anand

【Abstract】: Web archives are large longitudinal collections that store webpages from the past, which might be missing on the current live Web. Consequently, temporal search over such collections is essential for finding prominent missing webpages and tasks like historical analysis. However, this has been challenging due to the lack of popularity information and proper ground truth to evaluate temporal retrieval models. In this paper we investigate the applicability of external longitudinal resources to identify important and popular websites in the past and analyze the social bookmarking service Delicious for this purpose. The timestamped bookmarks on Delicious provide explicit cues about popular time periods in the past along with relevant descriptors. These are valuable to identify important documents in the past for a given temporal query. Focusing purely on recall, we analyzed more than 12,000 queries and find that using Delicious yields average recall values from 46% up to 100%, when limiting ourselves to the best represented queries in the considered dataset. This constitutes an attractive and low-overhead approach for quick access into Web archives by not dealing with the actual contents.

【Keywords】: analysis; temporal search; web archives

144. On the Effectiveness of Contextualisation Techniques in Spoken Query Spoken Content Retrieval.

Paper Link】 【Pages】:933-936

【Authors】: David Nicolas Racca ; Gareth J. F. Jones

【Abstract】: In passage and XML retrieval, contextualisation techniques seek to improve the rank of a relevant element by considering information from its surrounding elements and its container document. Recent research has demonstrated that some of these techniques are also particularly effective in spoken content retrieval tasks (SCR). However, no previous research has directly compared contextualisation techniques in an SCR setting, nor has it studied their potential to provide robustness to speech recognition errors. In this paper, we evaluate different contextualisation techniques, including a recently proposed technique based on positional language models (PLM) on the task of retrieving relevant spoken passages in response to a spoken query. We study the benefits of these techniques when queries and documents are transcribed with increasingly higher error rates. Experimental results over the Japanese NTCIR SpokenQuery&Doc collection show that combining global and local context is beneficial for SCR and that models usually benefit from using larger amounts of context in highly noisy conditions.

【Keywords】: contextualisation; positional models; spoken content retrieval; spoken queries

145. Ordinal Text Quantification.

Paper Link】 【Pages】:937-940

【Authors】: Giovanni Da San Martino ; Wei Gao ; Fabrizio Sebastiani

【Abstract】: In recent years there has been a growing interest in text quantification, a supervised learning task where the goal is to accurately estimate, in an unlabelled set of items, the prevalence (or "relative frequency") of each class c in a predefined set C. Text quantification has several applications, and is a dominant concern in fields such as market research, the social sciences, political science, and epidemiology. In this paper we tackle, for the first time, the problem of ordinal text quantification, defined as the task of performing text quantification when a total order is defined on the set of classes; estimating the prevalence of "five stars" reviews in a set of reviews of a given product, and monitoring this prevalence across time, is an example application. We present OQT, a novel tree-based OQ algorithm, and discuss experimental results obtained on a dataset of tweets classified according to sentiment strength.

【Keywords】: ordinal quantification; quantification; sentiment analysis

146. Pearson Rank: A Head-Weighted Gap-Sensitive Score-Based Correlation Coefficient.

Paper Link】 【Pages】:941-944

【Authors】: Ning Gao ; Mossaab Bagdouri ; Douglas W. Oard

【Abstract】: One way of evaluating the reusability of a test collection is to determine whether removing the unique contributions of some system would alter the preference order between that system and others. Rank correlation measures such as Kendall's tau are often used for this purpose. Rank correlation measures are appropriate for ordinal measures in which only preference order is important, but many evaluation measures produce system scores in which both the preference order and the magnitude of the score difference are important. Such measures are referred to as interval. Pearson's rho offers one way in which correlation can be computed over results from an interval measure such that smaller errors in the gap size are preferred. When seeking to improve over existing systems, we care the most about comparisons among the best systems. For that purpose we prefer head-weighed measures such as tau_AP, which is designed for ordinal data. No present head weighted measure fully leverages the information present in interval effectiveness measures. This paper introduces such a measure, referred to as Pearson Rank.

【Keywords】: correlation coefficient; evaluation metric

147. Polarized User and Topic Tracking in Twitter.

Paper Link】 【Pages】:945-948

【Authors】: Mauro Coletto ; Claudio Lucchese ; Salvatore Orlando ; Raffaele Perego

【Abstract】: Digital traces of conversations in micro-blogging platforms and OSNs provide information about user opinion with a high degree of resolution. These information sources can be exploited to understand and monitor collective behaviours. In this work, we focus on polarisation classes, i.e., those topics that require the user to side exclusively with one position. The proposed method provides an iterative classification of users and keywords: first, polarised users are identified, then polarised keywords are discovered by monitoring the activities of previously classified users. This method thus allows tracking users and topics over time. We report several experiments conducted on two Twitter datasets during political election time-frames. We measure the user classification accuracy on a golden set of users, and analyse the relevance of the extracted keywords for the ongoing political discussion.

【Keywords】: algorithm; classification; controversy; hashtags; polarization; polarized user; social networks; topic tracking; twitter; user

148. Post-Learning Optimization of Tree Ensembles for Efficient Ranking.

Paper Link】 【Pages】:949-952

【Authors】: Claudio Lucchese ; Franco Maria Nardini ; Salvatore Orlando ; Raffaele Perego ; Fabrizio Silvestri ; Salvatore Trani

【Abstract】: Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth of training examples. In practice, efficiency and effectiveness are intertwined concepts and trading off effectiveness for meeting efficiency constraints typically existing in large-scale systems is one of the most urgent issues. In this paper we propose a new framework, named CLEaVER, for optimizing machine-learned ranking models based on ensembles of regression trees. The goal is to improve efficiency at document scoring time without affecting quality. Since the cost of an ensemble is linear in its size, CLEaVER first removes a subset of the trees in the ensemble, and then fine-tunes the weights of the remaining trees according to any given quality measure. Experiments conducted on two publicly available LtR datasets show that CLEaVER is able to prune up to 80% of the trees and provides an efficiency speed-up up to 2.6x without affecting the effectiveness of the model.

【Keywords】: efficiency; learning to rank; pruning

149. Quit While Ahead: Evaluating Truncated Rankings.

Paper Link】 【Pages】:953-956

【Authors】: Fei Liu ; Alistair Moffat ; Timothy Baldwin ; Xiuzhen Zhang

【Abstract】: Many types of search tasks are answered through the computation of a ranked list of suggested answers. We re-examine the usual assumption that answer lists should be as long as possible, and suggest that when the number of matching items is potentially small -- perhaps even zero -- it may be more helpful to "quit while ahead", that is, to truncate the answer ranking earlier rather than later. To capture this effect, metrics are required which are attuned to the length of the ranking, and can handle cases in which there are no relevant documents. In this work we explore a generalized approach for representing truncated result sets, and propose modifications to a number of popular evaluation metrics.

【Keywords】: information retrieval evaluation; question answering; truncated ranking

150. Quote Recommendation in Dialogue using Deep Neural Network.

Paper Link】 【Pages】:957-960

【Authors】: Hanbit Lee ; Yeonchan Ahn ; Haejun Lee ; Seungdo Ha ; Sang-goo Lee

【Abstract】: Quotes, or quotations, are well known phrases or sentences that we use for various purposes such as emphasis, elaboration, and humor. In this paper, we introduce a task of recommending quotes which are suitable for given dialogue context and we present a deep learning recommender system which combines recurrent neural network and convolutional neural network in order to learn semantic representation of each utterance and construct a sequence model for the dialog thread. We collected a large set of twitter dialogues with quote occurrences in order to evaluate proposed recommender system. Experimental results show that our approach outperforms not only the other state-of-the-art algorithms in quote recommendation task, but also other neural network based methods built for similar tasks.

【Keywords】: deep neural network; dialogue model; quote recommendation

151. Ranking Documents Through Stochastic Sampling on Bayesian Network-based Models: A Pilot Study.

Paper Link】 【Pages】:961-964

【Authors】: Xing Tan ; Jimmy Xiangji Huang ; Aijun An

【Abstract】: Using approximate inference techniques, we investigate in this paper the applicability of Bayesian Networks to the problem of ranking a large set of documents. Topology of the network is a bipartite. Network parameters (conditional probability distributions) are determined through an adoption of the weighting scheme tf-idf. Rank of a document with respect to a given query is defined as the corresponding posterior probability, which is estimated through performing Rejection Sampling. Experimental results suggest that performance of the model is at least comparable to the baseline ones such as BM25. The framework of this model potentially offers new and novel ways in weighting documents. Integrating the model with other ranking algorithms, meanwhile, is expected to bring in performance improvement in document ranking.

【Keywords】: bayesian networks; information retrieval; stochastic sampling

152. Ranking Health Web Pages with Relevance and Understandability.

Paper Link】 【Pages】:965-968

【Authors】: João R. M. Palotti ; Lorraine Goeuriot ; Guido Zuccon ; Allan Hanbury

【Abstract】: We propose a method that integrates relevance and understandability to rank health web documents. We use a learning to rank approach with standard retrieval features to determine topical relevance and additional features based on readability measures and medical lexical aspects to determine understandability. Our experiments measured the effectiveness of the learning to rank approach integrating understandability on a consumer health benchmark. The findings suggest that this approach promotes documents that are at the same time topically relevant and understandable.

【Keywords】: evaluation; health search; learning to rank; readability; understandability

Paper Link】 【Pages】:969-972

【Authors】: Yinglong Zhang ; Jacek Gwizdka

【Abstract】: In this paper, we present a cognitive-economic approach to examining the cost in information search. Unlike previous studies on economic models, we calculated the cost in information search based on participants' eye-tracking data as well as their behavioral data, such as query formulation, search task duration, SERP and web page visits. Using Principal Component Analysis (PCA), we explored a possible latent factor structure of variables representing the cost in information search. Our results indicated that the cost of information seeking could be associated with two distinct aspects of search, exploratory and validation processes.

【Keywords】: eye tracking; information seeking stopping behavior

154. Retrievability of Code Mixed Microblogs.

Paper Link】 【Pages】:973-976

【Authors】: Debasis Ganguly ; Ayan Bandyopadhyay ; Mandar Mitra ; Gareth J. F. Jones

【Abstract】: Mixing multiple languages within the same document, a phenomenon called (linguistic) code mixing or code switching, is a frequent trend among multilingual users of social media. In the context of information retrieval (IR), code mixing may affect retrieval effectiveness due to the mixing of different vocabularies with different collection statistics within a single collection of documents. In this paper, we investigate the indexing and retrieval strategies for a mixed collection of documents, comprising of code-mixed and the monolingual documents. In particular, we address three alternative modes of indexing, namely (a) a single index for the two sub-collections; (b) a separate index for each sub-collection; and (c) a clustered index with two individual sub-collection statistics coupled with the overall one. We make use of the expected retrievability scores of the two classes of documents to empirically show that indexing strategies (a) and (b) mostly retrieve the monolingual documents at top ranks with standard retrieval approaches. Our experiments show that, by contrast, the clustered index (c) is able to alleviate this problem by improving the retrievability of the code-mixed documents.

【Keywords】: code mixing; fusion; microblog retrieval; retrievability

155. Retweeting Behavior Prediction Based on One-Class Collaborative Filtering in Social Networks.

Paper Link】 【Pages】:977-980

【Authors】: Bo Jiang ; Jiguang Liang ; Ying Sha ; Rui Li ; Wei Liu ; Hongyuan Ma ; Lihong Wang

【Abstract】: Social behaviors such as retweetings, comments or likes are valuable information for human activities analysis. We focus here on user's retweeting behavior which has been considered as a key mechanism of information diffusion in social networks. Since we can only observe on which messages user retweet. It is a typically one-class setting which only positive examples or implicit feedback can be observed. However, few research works on retweeting prediction consider one-class setting. In this paper, we analyze and study the fundamental factors that might affect retweetability of a tweet, and then employ one-class collaborative filtering method by quantitatively measure the user personal preference and social influence between users and messages to predict user's retweeting behavior. Experimental results on a real-world dataset from social network show that the proposed method is effective and can improve the performance of the one-class collaborative filtering over baseline methods through leveraging weighted negative examples information.

【Keywords】: one-class collaborative filtering; retweeting prediction; social influence; user preference; weighted negative

156. Sampling Strategies and Active Learning for Volume Estimation.

Paper Link】 【Pages】:981-984

【Authors】: Haotian Zhang ; Jimmy J. Lin ; Gordon V. Cormack ; Mark D. Smucker

【Abstract】: This paper tackles the challenge of accurately and efficiently estimating the number of relevant documents in a collection for a particular topic. One real-world application is estimating the volume of social media posts (e.g., tweets) pertaining to a topic, which is fundamental to tracking the popularity of politicians and brands, the potential sales of a product, etc. Our insight is to leverage active learning techniques to find all the "easy" documents, and then to use sampling techniques to infer the number of relevant documents in the residual collection. We propose a simple yet effective technique for determining this "switchover" point, which intuitively can be understood as the "knee" in an effort vs. recall gain curve, as well as alternative sampling strategies beyond the knee. We show on several TREC datasets and a collection of tweets that our best technique yields more accurate estimates (with the same effort) than several alternatives.

【Keywords】: high-recall retrieval; twitter

Paper Link】 【Pages】:985-988

【Authors】: François Mairesse ; Paul Raccuglia ; Shiv Vitaladevuni

【Abstract】: Voice search applications are typically evaluated by comparing the predicted query to a reference human transcript, regardless of the search results returned by the query. While we find that an exact transcript match is highly indicative of user satisfaction, a transcript which does not match the reference still produces satisfactory search results a significant fraction of the time. This paper therefore proposes an evaluation method that compares the search results of the speech recognition hypotheses with the search results produced by a human transcript. Compared with a strict sentence match, a human evaluation shows that search result overlap is a better predictor of (a) user satisfaction and (b) search result click-through. Finally, we propose a model predicting the Expected Search Satisfaction Rate (ESSR), conditioned on search overlap outcomes. On a held out set of 1036 voice search queries, our model predicted an ESSR within 0.9% (relative) of the ground truth satisfaction averaged over 3 human judges.

【Keywords】: automatic speech recognition; entity resolution; evaluation; search engines; voice search

158. Seeking Serendipity: A Living Lab Approach to Understanding Creative Retrieval in Broadcast Media Production.

Paper Link】 【Pages】:989-992

【Authors】: Sabrina Sauer ; Maarten de Rijke

【Abstract】: This paper presents a method to map user needs and integrate serendipitous search behaviors in search algorithm development: the living lab approach. This user-centered design approach involves technology users during technology development to catch unexpected insights and successfully innovate. This paper focuses on the preliminary findings of a living lab case study to answer the question how this methodology reveals fine-grained information about users' serendipitous search behaviors. The case study involves a specific user group, media professionals who work in broadcast television and use audiovisual archives to create audiovisual content, during the development of new search algorithms for a large audiovisual archive. Research insights are based on data gathered during one co-design workshop, and ten in-depth semi-structured interviews with media professionals. Findings stipulate that these users balance socio-technical constraints and affordances during creative retrieval to (1) find exactly what is sought; and (2) increase the possibility of serendipitous, unforeseen search results. We conclude that modeling these search processes in terms of improvising with constraints and affordances enables an effective articulation and channeling of user-technology interaction insights into new technology development. The paper suggests next steps in the living lab approach to further understand serendipitous search and creative retrieval processes.

【Keywords】: creative retrieval; living labs; serendipity

159. Selectively Personalizing Query Auto-Completion.

Paper Link】 【Pages】:993-996

【Authors】: Fei Cai ; Maarten de Rijke

【Abstract】: Query auto-completion (QAC) is being used by many of today's search engines. It helps searchers formulate queries by providing a list of query completions after entering an initial prefix of a query. To cater for a user's specific information needs, personalized QAC strategies use a searcher's search history and their profile. Is personalization consistently effective in different search contexts? We study the QAC problem by selectively personalizing the query completion list. Based on a lenient personalized QAC strategy that encodes the ranking signal as a trade-off between query popularity and search context, we propose a model for selectively personalizing query auto-completion (SP-QAC) to study this trade-off. We predict effective trade-offs based on a regression model, where the typed query prefix, clicked documents and preceding queries in the same session are used to weigh personalization in QAC. Experiments on the AOL query log show the SP-QAC model can significantly outperform a state-of-the-art personalized QAC approach.

【Keywords】: personalization; query auto completion; web search

160. SG++: Word Representation with Sentiment and Negation for Twitter Sentiment Classification.

Paper Link】 【Pages】:997-1000

【Authors】: Qinmin Hu ; Yijun Pei ; Qin Chen ; Liang He

【Abstract】: Here we propose an advance Skip-gram model to incorporate both word sentiment and negation information. In particular, there is a a softmax layer for the word sentiment polarity upon the Skip-gram model. Then, two paralleled embedding layers are set up in the same embedding space, one for the affirmative context and the other for the negated context, followed by their loss functions. We evaluate our proposed model on the 2013 and 2014 SemEval data sets. The experimental results show that the proposed approach achieves better performance and learns higher dimensional word embedding informatively on the large-scale data.

【Keywords】: negation; neural network; twitter sentiment classification; word representation

161. SGT Framework: Social, Geographical and Temporal Relevance for Recreational Queries in Web Search.

Paper Link】 【Pages】:1001-1004

【Authors】: Stewart Whiting ; Omar Alonso

【Abstract】: While location-based social networks (LBSNs) have become widely used for sharing and consuming location information, a large number of users turn to general web search engines for recreational activity ideas. In these cases, users typically express a query combining desired activity type, constraints and suitability, around an explicit location and time -- for example, "parks for kids in NYC in winter", or "cheap bars for bachelor party in san francisco". In this work we characterize such queries as recreational queries, and propose a relevance framework for ranking points of interest (POIs) to present in the web search recreational vertical using signals from query logs and LBSNs. The first part of this framework is a taxonomy of recreational intents, which we derive from those previously seen in query logs and other behavioral data. Based on the most popular recreational intents, we proceed to outline a new relevance model combining social, geographical and temporal information. We implement a prototype and conduct a preliminary user-study evaluation. Results show the proposed relevance model and bundles greatly improve user satisfaction for recreational queries.

【Keywords】: activity search; recreation; temporal

162. SimCC-AT: A Method to Compute Similarity of Scientific Papers with Automatic Parameter Tuning.

Paper Link】 【Pages】:1005-1008

【Authors】: Masoud Reyhani Hamedani ; Sang-Wook Kim

【Abstract】: In this paper, we propose SimCC-AT (similarity based on content and citations with automatic parameter tuning) to compute the similarity of scientific papers. As in SimCC, the state-of-the-art method, we exploit a notion of a contribution score in similarity computation. SimCC-AT utilizes an automatic weighting scheme based on SVMrank and thus requires only a smaller number of experiments for parameter tuning than SimCC. Furthermore, our experimental results with a real-world dataset show that the accuracy of SimCC-AT is dramatically higher than that of other existing methods and is comparable to that of SimCC.

【Keywords】: automatic weighting; citations; content; contribution score; similarity

163. Simple Dynamic Emission Strategies for Microblog Filtering.

Paper Link】 【Pages】:1009-1012

【Authors】: Luchen Tan ; Adam Roegiest ; Charles L. A. Clarke ; Jimmy J. Lin

【Abstract】: Push notifications from social media provide a method to keep up-to-date on topics of personal interest. To be effective, notifications must achieve a balance between pushing too much and pushing too little. Push too little and the user misses important updates; push too much and the user is overwhelmed by unwanted information. Using data from the TREC 2015 Microblog track, we explore simple dynamic emission strategies for microblog push notifications. The key to effective notifications lies in establishing and maintaining appropriate thresholds for pushing updates. We explore and evaluate multiple threshold setting strategies, including purely static thresholds, dynamic thresholds without user feedback, and dynamic thresholds with daily feedback. Our best technique takes advantage of daily feedback in a simple yet effective manner, achieving the best known result reported in the literature to date.

【Keywords】: relevance feedback; score thresholds; trec

164. Subspace Clustering Based Tag Sharing for Inductive Tag Matrix Refinement with Complex Errors.

Paper Link】 【Pages】:1013-1016

【Authors】: Yuqing Hou ; Zhouchen Lin ; Jin-ge Yao

【Abstract】: Annotating images with tags is useful for indexing and retrieving images. However, many available annotation data include missing or inaccurate annotations. In this paper, we propose an image annotation framework which sequentially performs tag completion and refinement. We utilize the subspace property of data via sparse subspace clustering for tag completion. Then we propose a novel matrix completion model for tag refinement, integrating visual correlation, semantic correlation and the novelly studied property of complex errors. The proposed method outperforms the state-of-the-art approaches on multiple benchmark datasets even when they contain certain levels of annotation noise.

【Keywords】: complex errors; image annotation; matrix completion; subspace clustering

165. Temporal Query Intent Disambiguation using Time-Series Data.

Paper Link】 【Pages】:1017-1020

【Authors】: Yue Zhao ; Claudia Hauff

【Abstract】: Understanding temporal intents behind users' queries is essential to meet users' time-related information needs. In order to classify queries according to their temporal intent (e.g. Past or Future), we explore the usage of time-series data derived from Wikipedia page views as a feature source. While existing works leverage either proprietary search engine query logs or highly processed and aggregated data (such as Google Trends) for this purpose, we investigate the utility of a freely available data source for this purpose. Our experiments on the NTCIR-12 Temporalia-2 dataset show, that Wikipedia pageview-based time-series data can significantly improve the disambiguation of temporal intents for specific types of queries, in particular those without temporal expressions present in the query string.

【Keywords】: disambiguation; temporal intents

166. To Blend or Not to Blend?: Perceptual Speed, Visual Memory and Aggregated Search.

Paper Link】 【Pages】:1021-1024

【Authors】: Lauren Turpin ; Diane Kelly ; Jaime Arguello

【Abstract】: While aggregated search interfaces that present vertical results to searchers are fairly common in today's search environments, little is known about how searchers' cognitive abilities impact how they use and evaluate these interfaces. This study evaluates the relationship between two cognitive abilities ? perceptual speed and visual memory ? and searchers' behaviors and interface preferences when using two aggregated search interfaces: one that blends vertical results into the search results (blended) and one that does not (non-blended). Cognitive tests were administered to sixteen participants who subsequently performed four search tasks using the two interfaces. Participants' search interactions were logged and after searching, they rated the usability, engagement and effectiveness of each interface, as well as made comparative evaluations. Results showed that participants with low perceptual speed spent significantly more time completing tasks when using the blended interface, while those with high perceptual speed spent roughly equivalent amounts of time completing tasks with the two interfaces. Those with low perceptual speed also rated both interfaces as significantly less usable along many measures, and were less satisfied with their searches. There were also main effects for interface: participants rated the non-blended interface significantly more usable than the blended interface.

【Keywords】: aggregated search interfaces; cognitive ability; search behavior

167. Topic Model based Privacy Protection in Personalized Web Search.

Paper Link】 【Pages】:1025-1028

【Authors】: Wasi Uddin Ahmad ; Md Masudur Rahman ; Hongning Wang

【Abstract】: Modern search engines utilize users' search history for personalization, which provides more effective, useful and relevant search results. However, it also has the potential risk of revealing users' privacy by identifying their underlying intention from their logged search behaviors. To address this privacy issue, we proposed a Topic-based Privacy Protection solution on client side. In our solution, each user query will be submitted with k additional cover queries, which will act as a proxy to disguise users' intent from a search engine. The set of cover queries are generated in a controlled way so that each query carries similar uncertainty to randomize a user's search history while still providing necessary utility for the search engine to perform personalization. We used statistical topic models to infer topics from the original user query and generated cover queries of similar entropy but from unrelated topics. Extensive experiments are performed on AOL search log and the promising results demonstrated the effectiveness of our solution.

【Keywords】: personalized search;information retrieval; privacy

168. Topic Quality Metrics Based on Distributed Word Representations.

Paper Link】 【Pages】:1029-1032

【Authors】: Sergey I. Nikolenko

【Abstract】: Automated evaluation of topic quality remains an important unsolved problem in topic modeling and represents a major obstacle for development and evaluation of new topic models. Previous attempts at the problem have been formulated as variations on the coherence and/or mutual information of top words in a topic. In this work, we propose several new metrics for evaluating topic quality with the help of distributed word representations; our experiments suggest that the new metrics are a better match for human judgement, which is the gold standard in this case, than previously developed approaches.

【Keywords】: text mining; topic modeling; topic quality

169. Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance.

Paper Link】 【Pages】:1033-1036

【Authors】: Julián Urbano ; Mónica Marrero

【Abstract】: The Kendall ? and AP rank correlation coefficients have become mainstream in Information Retrieval research for comparing the rankings of systems produced by two different evaluation conditions, such as different effectiveness measures or pool depths. However, in this paper we focus on the expected rank correlation between the mean scores observed with a test collection and the true, unobservable means under the same conditions. In particular, we propose statistical estimators of ? and AP correlations following both parametric and non-parametric approaches, and with special emphasis on small topic sets. Through large scale simulation with TREC data, we study the error and bias of the estimators. In general, such estimates of expected correlation with the true ranking may accompany the results reported from an evaluation experiment, as an easy to understand figure of reliability. All the results in this paper are fully reproducible with data and code available online

【Keywords】: average precision; correlation; estimation; evaluation; kendall; test collection

170. Tracking Sentiment by Time Series Analysis.

Paper Link】 【Pages】:1037-1040

【Authors】: Anastasia Giachanou ; Fabio Crestani

【Abstract】: In recent years social media have emerged as popular platforms for people to share their thoughts and opinions on all kind of topics. Tracking opinion over time is a powerful tool that can be used for sentiment prediction or to detect the possible reasons of a sentiment change. Understanding topic and sentiment evolution allows enterprises or government to capture negative sentiment and act promptly. In this study, we explore conventional time series analysis methods and their applicability on topic and sentiment trend analysis. We use data collected from Twitter that span over nine months. Finally, we study the usability of outliers detection and different measures such as sentiment velocity and acceleration on the task of sentiment tracking.

【Keywords】: sentiment change; sentiment dynamics; time series analysis

171. Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder.

Paper Link】 【Pages】:1041-1044

【Authors】: Soroush Vosoughi ; Prashanth Vijayaraghavan ; Deb Roy

【Abstract】: We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.

【Keywords】: cnn; convolutional neural networks; embedding; encoder-decoder; lstm; tweet; tweet2vec; twitter

172. Two Sample T-tests for IR Evaluation: Student or Welch?

Paper Link】 【Pages】:1045-1048

【Authors】: Tetsuya Sakai

【Abstract】: There are two well-known versions of the t-test for comparing means from unpaired data: Student's t-test and Welch's t-test. While Welch's t-test does not assume homoscedasticity (i.e., equal variances), nit involves approximations. A classical textbook recommendation would be to use Student's t-test if either the two sample sizes are similar or the two sample variances are similar, and to use Welch's t-test only when both of the above conditions are violated. However, a more recent recommendation seems to be to use Welch's t-test unconditionally. Using past data from both TREC and NTCIR, the present study demonstrates that the latter advice should not be followed blindly in the context of IR system evaluation. More specifically, our results suggest that if the sample sizes differ substantially and if the larger sample has a substantially larger variance,Welch's t-test may not be reliable.

【Keywords】: statistical significance; test collections; topics; variances

Paper Link】 【Pages】:1049-1052

【Authors】: Rishabh Mehrotra ; Prasanta Bhattacharya ; Emine Yilmaz

【Abstract】: While a major share of prior work have considered search sessions as the focal unit of analysis for seeking behavioral insights, search tasks are emerging as a competing perspective in this space. In the current work, we quantify user search task behavior for both single- as well as multi-task search sessions and relate it to tasks and topics. Specifically, we analyze user-disposition, topic and user-interest level heterogeneities that are prevalent in search task behavior. Our results show that while search multi-tasking is a common phenomenon among the search engine users, the extent and choice of multi-tasking topics vary significantly across users. We find that not only do users have varying propensities to multi-task, they also search for distinct topics across single-task and multi-task sessions. To our knowledge, this is among the first studies to fully characterize online search tasks with a focus on user- and topic-level differences that are observable from search sessions.

【Keywords】: multitasking; search tasks; user behavior

174. Understanding Website Behavior based on User Agent.

Paper Link】 【Pages】:1053-1056

【Authors】: Kien Pham ; Aécio S. R. Santos ; Juliana Freire

【Abstract】: Web sites have adopted a variety of adversarial techniques to prevent web crawlers from retrieving their content. While it is possible to simulate users behavior using a browser to crawl such sites, this approach is not scalable. Therefore, understanding existing adversarial techniques is important to design crawling strategies that can adapt to retrieve the content as efficiently as possible. Ideally, a web crawler should detect the nature of the adversarial policies and select the most cost-effective means to defeat them. In this paper, we discuss the results of a large-scale study of web site behavior based on their responses to different user-agents. We issued over 9 million HTTP GET requests to 1.3 million unique web sites from DMOZ using six different user-agents and the TOR network as an anonymous proxy. We observed that web sites do change their responses depending on user-agents and IP addresses. This suggests that probing sites for these features can be an effective means to detect adversarial techniques.

【Keywords】: adversarial crawling; focused crawler; stealth crawling; user agent; web cloaking; web crawler detection

175. Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data.

Paper Link】 【Pages】:1057-1060

【Authors】: Anjie Fang ; Craig Macdonald ; Iadh Ounis ; Philip Habel

【Abstract】: Scholars often seek to understand topics discussed on Twitter using topic modelling approaches. Several coherence metrics have been proposed for evaluating the coherence of the topics generated by these approaches, including the pre-calculated Pointwise Mutual Information (PMI) of word pairs and the Latent Semantic Analysis (LSA) word representation vectors. As Twitter data contains abbreviations and a number of peculiarities (e.g. hashtags), it can be challenging to train effective PMI data or LSA word representation. Recently, Word Embedding (WE) has emerged as a particularly effective approach for capturing the similarity among words. Hence, in this paper, we propose new Word Embedding-based topic coherence metrics. To determine the usefulness of these new metrics, we compare them with the previous PMI/LSA-based metrics. We also conduct a large-scale crowdsourced user study to determine whether the new Word Embedding-based metrics better align with human preferences. Using two Twitter datasets, our results show that the WE-based metrics can capture the coherence of topics in tweets more robustly and efficiently than the PMI/LSA-based ones.

【Keywords】: coherence metrics; lda; topic models; twitter; twitter lda; word embeddings

176. Utilizing Focused Relevance Feedback.

Paper Link】 【Pages】:1061-1064

【Authors】: Elinor Brondwine ; Anna Shtok ; Oren Kurland

【Abstract】: We present a novel study of ad hoc retrieval methods utilizing document-level relevance feedback and/or focused relevance feedback; namely, passages marked as (non-)relevant. The first method uses a novel mixture model that integrates relevant and non-relevant information at the language model level. The second method fuses retrieval scores produced by using relevant and non-relevant information separately. Empirical exploration attests to the merits of our methods, and sheds light on the effectiveness of using and integrating relevance feedback for textual units of varying granularities.

【Keywords】: feedback; focused; relevance

177. What Makes a Query Temporally Sensitive?

Paper Link】 【Pages】:1065-1068

【Authors】: Craig Willis ; Garrick Sherman ; Miles Efron

【Abstract】: This work takes an in-depth look at the factors that affect manual classifications of 'temporally sensitive' information needs. We use qualitative and quantitative techniques to analyze 660 topics from the Text Retrieval Conference (TREC) previously used in the experimental evaluation of temporal retrieval models. Regression analysis is used to identify factors in previous manual classifications. We explore potential problems with the previous classifications, considering principles and guidelines for future work on temporal retrieval models.

【Keywords】: query classification; temporal information retrieval; temporal relevance

178. Which Information Sources are More Effective and Reliable in Video Search.

Paper Link】 【Pages】:1069-1072

【Authors】: Zhiyong Cheng ; Xuanchong Li ; Jialie Shen ; Alexander G. Hauptmann

【Abstract】: It is common that users are interested in finding video segments, which contain further information about the video contents in a segment of interest. To facilitate users to find and browse related video contents, video hyperlinking aims at constructing links among video segments with relevant information in a large video collection. In this study, we explore the effectiveness of various video features on the performance of video hyperlinking, including subtitle, metadata, content features (i.e., audio and visual), surrounding context, as well as the combinations of those features. Besides, we also test different search strategies over different types of queries, which are categorized according to their video contents. Comprehensive experimental studies have been conducted on the dataset of TRECVID 2015 video hyperlinking task. Results show that (1) text features play a crucial role in search performance, and the combination of audio and visual features cannot provide improvements; (2) the consideration of contexts cannot obtain better results; and (3) due to the lack of training examples, machine learning techniques cannot improve the performance.

【Keywords】: video hyperlinking; video search

179. Why do you Think this Query is Difficult?: A User Study on Human Query Prediction.

Paper Link】 【Pages】:1073-1076

【Authors】: Stefano Mizzaro ; Josiane Mothe

【Abstract】: Predicting if a query will be difficult for a system is important to improve retrieval effectiveness by implementing specific processing. There have been several attempts to predict difficulty, both automatically and manually; but without high accuracy at a pre-retrieval stage. In this paper, we focus rather on understanding Why a query is perceived by humans as difficult. We ran two separated but related experiments in which we asked humans to provide both a query difficulty prediction and reasons to explain their prediction. Results show that: (i) reasons can be categorized into 4 classes; (ii) reasons can be framed into closed questions to be answered on a Likert scale; and (iii) some reasons correlate in a coherent way with the human predicted numerical difficulty. On the basis of these results it is possible to derive hints to be provided to help users when formulating their queries and to avoid them to rely on their wrong perception of difficulty.

【Keywords】: difficulty understanding; information retrieval; query difficulty

Demonstrations 21

180. A Platform for Streaming Push Notifications to Mobile Assessors.

Paper Link】 【Pages】:1077-1080

【Authors】: Adam Roegiest ; Luchen Tan ; Jimmy J. Lin ; Charles L. A. Clarke

【Abstract】: We present an assessment platform for gathering online relevance judgments for mobile push notifications that will be deployed in the newly-created TREC 2016 Real-Time Summarization (RTS) track. There is emerging interest in building systems that filter social media streams such as tweets to identify interesting and novel content in real time, putatively for delivery to users' mobile phones. In our evaluation design, all participants subscribe to the Twitter streaming API to identify relevant tweets with respect to a set of interest profiles. As the systems generate results, they are pushed in real time to our evaluation broker via a REST API. The broker then "routes" the tweets to assessors who have installed a custom app on their mobile phones. We detail the design of this platform and discuss a number of challenges that need to be tackled in this type of "Living Labs" setup. It is our goal that such an evaluation design will mitigate any issues that have arisen in traditional batch-style evaluations of this type of task.

【Keywords】: assessment platform; living labs; mobile apps; relevance judgments

181. A Visual Analytics Approach for What-If Analysis of Information Retrieval Systems.

Paper Link】 【Pages】:1081-1084

【Authors】: Marco Angelini ; Nicola Ferro ; Giuseppe Santucci ; Gianmaria Silvello

【Abstract】: We present the innovative visual analytics approach of the VATE system, which eases and makes more effective the experimental evaluation process by introducing the what-if analysis. The what-if analysis is aimed at estimating the possible effects of a modification to an IR system to select the most promising fixes before implementing them, thus saving a considerable amount of effort. VATE builds on an analytical framework which models the behavior of the systems in order to make estimations, and integrates this analytical framework into a visual part which, via proper interaction and animations, receives input and provides feedback to the user.

【Keywords】: VATE; failure analysis; information retrieval evaluation; visual analytics; what-if analysis

182. An Architecture for Privacy-Preserving and Replicable High-Recall Retrieval Experiments.

Paper Link】 【Pages】:1085-1088

【Authors】: Adam Roegiest ; Gordon V. Cormack

【Abstract】: We demonstrate the infrastructure used in the TREC 2015 Total Recall track to facilitate controlled simulation of "assessor in the loop" high-recall retrieval experimentation. The implementation and corresponding design decisions are presented for this platform. This includes the necessary considerations to ensure that experiments are privacy-preserving when using test collections that cannot be distributed. Furthermore, we describe the use of virtual machines as a means of system submission in order to to promote replicable experiments while also ensuring the security of system developers and data providers.

【Keywords】: experimentation; high recall; privacy; rest; virtual machine

183. Analysing Temporal Evolution of Interlingual Wikipedia Article Pairs.

Paper Link】 【Pages】:1089-1092

【Authors】: Simon Gottschalk ; Elena Demidova

【Abstract】: Wikipedia articles representing an entity or a topic in different language editions evolve independently within the scope of the language-specific user communities. This can lead to different points of views reflected in the articles, as well as complementary and inconsistent information. An analysis of how the information is propagated across the Wikipedia language editions can provide important insights in the article evolution along the temporal and cultural dimensions and support quality control. To facilitate such analysis, we present MultiWiki -- a novel web-based user interface that provides an overview of the similarities and differences across the article pairs originating from different language editions on a timeline. MultiWiki enables users to observe the changes in the interlingual article similarity over time and to perform a detailed visual comparison of the article snapshots at a particular time point.

【Keywords】: interlingual text alignment; multilingual evolution; wikipedia

184. Cobwebs from the Past and Present: Extracting Large Social Networks using Internet Archive Data.

Paper Link】 【Pages】:1093-1096

【Authors】: Miroslav Shaltev ; Jan-Hendrik Zab ; Philipp Kemkes ; Stefan Siersdorfer ; Sergej Zerr

【Abstract】: Social graph construction from various sources has been of interest to researchers due to its application potential and the broad range of technical challenges involved. The World Wide Web provides a huge amount of continuously updated data and information on a wide range of topics created by a variety of content providers, and makes the study of extracted people networks and their temporal evolution valuable for social as well as computer scientists. In this paper we present SocGraph - an extraction and exploration system for social relations from the content of around 2 billion web pages collected by the Internet Archive over the 17 years time period between 1996 and 2013. We describe methods for constructing large social graphs from extracted relations and introduce an interface to study their temporal evolution.

【Keywords】: social graphs temporal evolution

185. Context-Sensitive Auto-Completion for Searching with Entities and Categories.

Paper Link】 【Pages】:1097-1100

【Authors】: Andreas Schmidt ; Johannes Hoffart ; Dragan Milchevski ; Gerhard Weikum

【Abstract】: When searching in a document collection by keywords, good auto-completion suggestions can be derived from query logs and corpus statistics. On the other hand, when querying documents which have automatically been linked to entities and semantic categories, auto-completion has not been investigated much. We have developed a semantic auto-completion system, where suggestions for entities and categories are computed in real-time from the context of already entered entities or categories and from entity-level co-occurrence statistics for the underlying corpus. Given the huge size of the knowledge bases that underlie this setting, a challenge is to compute the best suggestions fast enough for interactive user experience. Our demonstration shows the effectiveness of our method, and its interactive usability.

【Keywords】: categories; context adaptive; corpus adaptive; entities; semantic-auto-completion

186. EAIMS: Emergency Analysis Identification and Management System.

Paper Link】 【Pages】:1101-1104

【Authors】: Richard McCreadie ; Craig Macdonald ; Iadh Ounis

【Abstract】: Social media has great potential as a means to enable civil protection and law enforcement agencies to more effectively tackle disasters and emergencies. However, there is currently a lack of tools that enable civil protection agencies to easily make use of social media. The Emergency Analysis Identification and Management System (EAIMS) is a prototype service that provides real-time detection of emergency events, related information finding and credibility analysis tools for use over social media during emergencies. This system exploits machine learning over data gathered from past emergencies and disasters to build effective models for identifying new events as they occur, tracking developments within those events and analyzing those developments for the purposes of enhancing the decision making processes of emergency response agencies.

【Keywords】: emergency management; social media

Paper Link】 【Pages】:1105-1108

【Authors】: Jaspreet Singh ; Wolfgang Nejdl ; Avishek Anand

【Abstract】: Archives are an important source of study for various scholars. Digitization and the web have made archives more accessible and led to the development of several time-aware exploratory search systems. However these systems have been designed for more general users rather than scholars. Scholars have more complex information needs in comparison to general users. They also require support for corpus creation during their exploration process. In this paper we present Expedition - a time-aware exploratory search system that addresses the requirements and information needs of scholars. Expedition possesses a suite of ad-hoc and diversity based retrieval models to address complex information needs; a newspaper-style user interface to allow for larger textual previews and comparisons; entity filters to more naturally refine a result list and an interactive annotated timeline which can be used to better identify periods of importance.

【Keywords】: exploratory search; historical search; history by diversity; retrieval models; scholars; time-aware exploration

188. iGlasses: A Novel Recommendation System for Best-fit Glasses.

Paper Link】 【Pages】:1109-1112

【Authors】: Xiaoling Gu ; Lidan Shou ; Pai Peng ; Ke Chen ; Sai Wu ; Gang Chen

【Abstract】: We demonstrate iGlasses, a novel recommendation system that accepts a frontal face photo as the input and returns the best-fit eyeglasses as the output. As conventional recommendation techniques such as collaborative filtering become inapplicable in the problem, we propose a new recommendation method which exploits the implicit matching rules between human faces and eyeglasses. We first define fine-grained attributes for human faces and frames of glasses respectively. Then, we develop a recommendation framework based on a probabilistic graphical model, which effectively captures the correlation among these fine-grained attributes. Ranking of the frames (glasses) is done by their similarity to the query facial attributes. Finally, we produce a synthesized image for the input face to demonstrate the visual effect when wearing the recommended glasses.

【Keywords】: eyeglasses recommendation; probabilistic graphical model

Paper Link】 【Pages】:1113-1116

【Authors】: Sean McKeown ; Martynas Buivys ; Leif Azzopardi

【Abstract】: Individuals living in highly networked societies publish a large amount of personal, and potentially sensitive, information online. Web investigators can exploit such information for a variety of purposes, such as in background vetting and fraud detection. However, such investigations require a large number of expensive man hours and human effort. This paper describes InfoScout, a search tool which is intended to reduce the time it takes to identify and gather subject centric information on the Web. InfoScout collects relevance feedback information from the investigator in order to re-rank search results, allowing the intended information to be discovered more quickly. Users may still direct their search as they see fit, issuing ad-hoc queries and filtering existing results by keywords. Design choices are informed by prior work and industry collaboration.

【Keywords】: entity search; open-source intelligence; people searching; professional search; web investigation

Paper Link】 【Pages】:1117-1120

【Authors】: Pranav Ramarao ; Suresh Iyengar ; Pushkar Chitnis ; Raghavendra Udupa ; Balasubramanyan Ashok

【Abstract】: Emails continue to remain the most important and widely used mode of online communication despite having its origins in the middle of last century and being threatened by a variety of online communication innovations. While several studies have predicted the continuous growth of volume of email communication, there is little innovation on improving the search in emails, an imperative part of the user experience. In this work, we present a lightweight email application codenamed InLook, that intends to provide a productive search experience.

【Keywords】: email; productivity; ranking; search

191. Interacting with Financial Data using Natural Language.

Paper Link】 【Pages】:1121-1124

【Authors】: Vassilis Plachouras ; Charese Smiley ; Hiroko Bretz ; Ola Taylor ; Jochen L. Leidner ; Dezhao Song ; Frank Schilder

【Abstract】: Financial and economic data are typically available in the form of tables and comprise mostly of monetary amounts, numeric and other domain-specific fields. They can be very hard to search and they are often made available out of context, or in forms which cannot be integrated with systems where text is required, such as voice-enabled devices. This work presents a novel system that enables both experts in the finance domain and non-expert users to search financial data with both keyword and natural language queries. Our system answers the queries with an automatically generated textual description using Natural Language Generation (NLG). The answers are further enriched with derived information, not explicitly asked in the user query, to provide the context of the answer. The system is designed to be flexible in order to accommodate new use cases without significant development effort, thus allowing fast integration of new datasets.

【Keywords】: search

192. LONLIES: Estimating Property Values for Long Tail Entities.

Paper Link】 【Pages】:1125-1128

【Authors】: Mina H. Farid ; Ihab F. Ilyas ; Steven Euijong Whang ; Cong Yu

【Abstract】: Web search engines often retrieve answers for queries about popular entities from a growing knowledge base that is populated by a continuous information extraction process. However, less popular entities are not frequently mentioned on the web and are generally interesting to fewer users; these entities reside on the long tail of information. Traditional knowledge base construction techniques that rely on the high frequency of entity mentions to extract accurate facts about these mentions have little success with entities that have low textual support. We present Lonlies, a system for estimating property values of long tail entities by leveraging their relationships to head topics and entities. We demonstrate (1) how Lonlies builds communities of entities that are relevant to a long tail entity utilizing a text corpus and a knowledge base; (2) how Lonlies determines which communities to use in the estimation process; (3) how we aggregate estimates from community entities to produce final estimates, and (4) how users interact with Lonlies to provide feedback to improve the final estimation results.

【Keywords】: entity search; knowledge base construction; question answering

193. Personalised News and Blog Recommendations based on User Location, Facebook and Twitter User Profiling.

Paper Link】 【Pages】:1129-1132

【Authors】: Gabriella Kazai ; Iskander Yusof ; Daoud Clarke

【Abstract】: This demo presents a prototype mobile app that provides out-of-the-box personalised content recommendations to its users by leveraging and combining the user's location, their Facebook and/or Twitter feed and their in-app actions to automatically infer their interests. We build individual models for each user and each location. At retrieval time we construct the user's personalised feed by mixing different sources of content-based recommendations with content directly from their Facebook/Twitter feeds, locally trending articles and content propagated through their in-app social network. Both explicit and implicit feedback signals from the users' interactions with their recommendations are used to update their interests models and to learn their preferences over the different content sources.

【Keywords】: lumi news; mobile app; recommender system

Paper Link】 【Pages】:1133-1136

【Authors】: Alan Medlar ; Kalle Ilves ; Ping Wang ; Wray L. Buntine ; Dorota Glowacka

【Abstract】: Despite the growing importance of exploratory search, information retrieval (IR) systems tend to focus on lookup search. Lookup searches are well served by optimising the precision and recall of search results, however, for exploratory search this may be counterproductive if users are unable to formulate an appropriate search query. We present a system called PULP that supports exploratory search for scientific literature, though the system can be easily adapted to other types of literature. PULP uses reinforcement learning (RL) to avert the user from context traps resulting from poorly chosen search queries, trading off between exploration (presenting the user with diverse topics) and exploitation (moving towards more specific topics). Where other RL-based systems suffer from the "cold start" problem, requiring sufficient time to adjust to a user's information needs, PULP initially presents the user with an overview of the dataset using temporal topic models. Topic models are displayed in an interactive alluvial diagram, where topics are shown as ribbons that change thickness with a given topics relative prevalence over time. Interactive, exploratory search sessions can be initiated by selecting topics as a starting point.

【Keywords】: bandit algorithms; exploratory search; exploratory search interfaces; scientific literature search; topic models

Paper Link】 【Pages】:1137-1140

【Authors】: Cheng Zhang ; Peng Zhang ; Jingfei Li ; Dawei Song

【Abstract】: Traditional information retrieval systems rank documents according to their relevance to users' input queries. State of the art commercial search engines (SEs) train ranking models and suggest query refinements by exploiting collective intelligence implicitly using global users' query logs. However, they do not provide an explicit channel for users to communicate with each other in the search process. By asking or discussing with other users on the fly, a user could find relevant information more conveniently and gain a better search experience. In this paper, we present a demo of novel Search Engine with a live Chat Channel (SECC). SECC can group users automatically based on their input queries and allow them to communicate with each other in real time through a chat interface.

【Keywords】: chat channel; collective intelligence; information retrieval; search engine

196. Simulating Interactive Information Retrieval: SimIIR: A Framework for the Simulation of Interaction.

Paper Link】 【Pages】:1141-1144

【Authors】: David Maxwell ; Leif Azzopardi

【Abstract】: Simulation provides a powerful and cost-effective approach to explore and evaluate how interactions between a searcher and system influence search behaviour and performance. With a growing interest in simulation and an increasing number of papers using such an approach, there is a need for a flexible framework for simulation. Thus, we present SimIIR, an open-source toolkit for building and conducting Interactive Information Retrieval (IIR) experiments. The framework consists of a number of high level components, including the simulation, the searcher and the system, all of which must be configured. The SimIIR framework provides a series of interchangeable components. Examples of these components include the querying strategies (how simulated queries are formulated) and stopping strategies (the depth to which a searcher will examine snippets and documents) that a simulated searcher will employ. We have implemented various existing strategies so that they can be used by other researchers to not only replicate and reproduce past experiments, but also create new experiments. This paper describes the SimIIR framework and the different components that can be configured and extended as required.

【Keywords】: continuation strategies; querying strategies; search behavior; simulation; stopping strategies; strategy; user modeling

197. The ComeWithMe System for Searching and Ranking Activity-Based Carpooling Rides.

Paper Link】 【Pages】:1145-1148

【Authors】: Vinicius Monteiro de Lira ; Chiara Renso ; Raffaele Perego ; Salvatore Rinzivillo ; Valéria Cesário Times

【Abstract】: ComeWithMe is an activity oriented carpooling service that enlarges the candidate destinations of a ride request by considering alternative places where the desired activity can be performed. It is based on the observation that individuals often move towards a place to perform an activity while the activity is often not strictly associated with a single place, as one may go for shopping or eating to many different locations. Activity-oriented carpooling hugely increases the number of rides matching a query, thus introducing requirements on system responsiveness and ranking effectiveness that are not common to traditional carpooling services. The demoed system implements the ComeWithMe service in almost its entirety, and includes the back-end and a user-friendly mobile application for smart-phones aimed at achieving users' acceptance and usability.

【Keywords】: carpooling; ranking; rides search

Paper Link】 【Pages】:1149-1152

【Authors】: Ali Shemshadi ; Quan Z. Sheng ; Yongrui Qin

【Abstract】: The rapidly growing paradigm of the Internet of Things (IoT) requires new search engines, which can crawl heterogeneous data sources and search in highly dynamic contexts. Existing search engines cannot meet these requirements as they are designed for traditional Web and human users only. This is contrary to the fact that things are emerging as major producers and consumers of information. Currently, there is very little work on searching IoT and a number of works claim the unavailability of public IoT data. However, it is dismissed that a majority of real-time web-based maps are sharing data that is generated by things, directly. To shed light on this line of research, in this paper, we firstly create a set of tools to capture IoT data from a set of given data sources. We then create two types of interfaces to provide real-time searching services on dynamic IoT data for both human and machine users.

【Keywords】: big data; internet of things; search engine; web of things

199. Tweetviz: Visualizing Tweets for Business Intelligence.

Paper Link】 【Pages】:1153-1156

【Authors】: Bas Sijtsma ; Pernilla Qvarfordt ; Francine Chen

【Abstract】: Social media offers potential opportunities for businesses to extract business intelligence. This paper presents Tweetviz, an interactive tool to help businesses extract actionable information from a large set of noisy Twitter messages. Tweetviz visualizes the tweet sentiment of business locations, identifies other business venues that Twitter users visit, and estimates some simple demographics of the Twitter users frequenting a business. A user study to evaluate the system's ability indicates that Tweetviz can provide an overview of a business's issues and sentiment as well as information aiding users in creating customer profiles.

【Keywords】: information visualization; location profiling; sentiment analysis; social media analytics

200. Where the Event Lies: Predicting Event Occurrence in Textual Documents.

Paper Link】 【Pages】:1157-1160

【Authors】: Andrea Ceroni ; Ujwal Gadiraju ; Jan Matschke ; Simon Wingert ; Marco Fisichella

【Abstract】: Manually inspecting text in a document collection to assess whether an event occurs in it is a cumbersome task. Although a manual inspection can allow one to identify and discard false events, it becomes infeasible with increasing numbers of automatically detected events. In this paper, we present a system to automatize event validation, defined as the task of determining whether a given event occurs in a given document or corpus. In addition to supporting users seeking for information that corroborates a given event, event validation can also boost the precision of automatically detected event sets by discarding false events and preserving the true ones. The system allows to specify events, retrieves candidate web documents, and assesses whether events occur in them. The validation results are shown to the user, who can revise the decision of the system. The validation method relies on a supervised model to predict the occurrence of events in a non-annotated corpus. This system can also be used to build ground-truths for event corpora.

【Keywords】: corpus construction; evaluation; event detection; event validation

Doctoral Consortium 15

201. A Novel Approach to Define and Model Contextual Features in Recommender Systems.

Paper Link】 【Pages】:1161

【Authors】: Parisa Lak

【Abstract】: Recommender Systems(RS) provide more accurate and more relevant recommendations using contextual feature(s). This accuracy improvement is at the cost of computational expenses. Therefore, finding and selecting the most relevant contextual features is an important problem. Moreover, modeling and incorporating the selected contextual features in RS algorithms has an impact on both the accuracy and computational cost. We are conducting a series of studies to detect, define, select, model and incorporate the most relevant contextual features for RS algorithms. The feature detection, definition and selection approach involves the evaluation of features derived from implicit and explicit information. The selected features from this approach can be modeled and incorporated in any selected RS algorithm. In our recent works, we also propose a series of algorithms that incorporates multiple contextual features in the baseline matrix factorization (MF) algorithm. We use the selected contextual features to modify user biases and item biases in the baseline MF.

【Keywords】: collaborative filtering; context aware recommender systems; context awareness; contextual feature selection; item based recommender system; matrix factorization; recommender systems; user base recommender systems

202. A Study of Information Seeking Behavior Using Physical and Online Explorations.

Paper Link】 【Pages】:1163

【Authors】: Dongho Choi

【Abstract】: People have their behavioral patterns, through which they determine how to seek and use information. People also exhibit established mobility pattern in their everyday lives. Meanwhile, the modern technologies such as smartphones, wearable devices, and eye trackers have allowed researchers to collect personal, contextual, and cognitive information of users, and create behavioral models from different perspectives. Considering the analogy between information exploration and geographical exploration, I want to identify the interconnections between these behaviors and predict individuals? search behavior using personal and contextual signals. The proposal uses a mixed-method approach that involves a four-week field study, a game study, and a lab study, collecting data from 40 participants through mobile device, eye tracker, online logs, and weekly diary.

【Keywords】: exploratory behavior; geographical exploration; information seeking behavior

203. Appearance-Based Retrieval of Mathematical Notation in Documents and Lecture Videos.

Paper Link】 【Pages】:1165

【Authors】: Kenny Davila

【Abstract】: Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. Based on the notion that visually similar formulas are related, we propose a framework for appearance-based formula retrieval in two different modalities: symbolic for text documents and image-Based for videos. We believe that we can achieve high quality formula retrieval results using the visual appearance of math notation without complex formula semantic analysis. We represent mathematical notation using different graph types to take advantage of the information available on each domain. For symbolic formula retrieval, math expressions in text formats like LaTeX are parsed to generate Symbol Layout Trees. For image-based formula retrieval, image processing techniques are used to create a graph-based image content representation. We store these graphs using an inverted index of pairs of primitives defined by the triplet (p, q, r), where p and q are the labels of two primitives connected in the graph by the path r. Retrieval is a two-stage process: candidate selection and reranking. The first stage uses pairs of primitives from the query graph to find matches in the inverted index. Each match is given an initial score using the Dice coefficient of matched pairs of primitives. The best top-K candidates from the first stage are selected for re-ranking using a detailed similarity metric. Two steps are performed for each candidate: matching and scoring. The matching step is done by searching for the largest common substructure between query and candidate graphs. Matching is related to the problem of finding the maximum common subgraph isomorphism (MCS) between two graphs. In addition, we consider label unification for symbolic formula retrieval, and our wildcard query nodes can match entire subgraphs. In the scoring step, multiple similarity criteria define a score vector used to sort candidates, either by lexicographic order or by a function of these scores. Different datasets and benchmarks will be required to evaluate each modality. For symbolic formula retrieval, we will use the most recent versions of the NTCIR MathIR Tasks benchmarks. To the best of our knowledge, there are no benchmarks for large scale image-based formula retrieval. However, the same collections used for symbolic formula retrieval could be adapted by rendering math expressions to images. In addition, we will use datasets of math lecture videos for image-based formula retrieval. Traditional graded-scales of relevance used for evaluation of retrieval systems have been shown to have inconsistency issues. We plan to use pairwise candidate comparisons during our evaluation phase. Some aggregation methods exist that generate relevance scores and ideal rankings using these pairwise candidate comparisons. The proposed framework can be adapted to work for other domains like chemistry or technical diagrams where visually similar elements are usually related.

【Keywords】: content-based image retrieval; graph-based retrieval; mathematical information retrieval

204. Beyond Topical Relevance: Studying Understandability and Reliability in Consumer Health Search.

Paper Link】 【Pages】:1167

【Authors】: João R. M. Palotti

【Abstract】: Nowadays people rely on search engines to explore, understand and manage their health. A recent study from Pew Internet states that one in each three adult American Internet users have used the Internet as a diagnosis tool. Retrieving incorrect or unclear health information poses high risks as people may dismiss serious symptoms, use inappropriate treatments or escalate their health concerns about common symptomatology. A number of studies have shown that the average user experiences difficulty in understanding the content of a large portion of the results retrieved by current search engine technology. Other studies have examined how poor the quality of health information on the web can be. In the context of consumer (non-experts) health search, search engines should not only retrieve relevant information, but also promote information that is understandable by the user and that is reliable/trustable and verified. The focus of my Ph.D. is to go beyond topical relevance and study understandability and reliability as two important facets of relevance that must be incorporated into search systems to increase user satisfaction, especially in the context of consumer health search.

【Keywords】: health search; information retrieval; learning to rank; reliability; understandability

205. Enhancing Information Retrieval with Adapted Word Embedding.

Paper Link】 【Pages】:1169

【Authors】: Navid Rekabsaz

【Abstract】: Recent developments on word embedding provide a novel source of information for term-to-term similarity. A recurring question now is whether the provided term associations can be properly integrated in the traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we propose addressing the question of combining the term-to-term similarity of word embedding with IR models. The retrieval models in the approach are enhanced by altering the basic components of document retrieval, i.e. term frequency (tf) and document frequency (df). In addition, we target the study of the meaning of the term relatedness of word embedding models and its applicability in IR. This research topic consists of first explore of reliable similarity thresholds of word embedding vectors to indicate ?related terms? and second, identification of the linguistic types of the terms relatedness.

【Keywords】: ir models; related terms; similarity; word embeddings

206. Fairness in Information Retrieval.

Paper Link】 【Pages】:1171

【Authors】: Aldo Lipani

【Abstract】: The offline evaluation of Information Retrieval (IR) systems is performed through the use of test collections. A test collection, in its essence, is composed of: a collection of documents, a set of topics and, a set of relevance assessments for each topic, derived from the collection of documents. Ideally, for each topic, all the documents of the test collection should be judged, but due to the dimensions of the collections of documents, and their exponential growth over the years, this practice soon became impractical. Therefore, early in IR history, this problem has been addressed through the use of the pooling method. The pooling method consists of optimizing the relevance assessment process by pooling the documents retrieved by different search engines following a particular pooling strategy. The most common one consists on pooling the top d documents of each run. The pool is constructed from systems taking part in a challenge for which the collection was made, at a specific point in time, after which the collection is generally frozen in terms of relevance judgments. This method leads to a bias called pool bias, which is the effect that documents that were not selected in the pool created from the original runs will never be considered relevant. Thereby, this bias affects the evaluation of a system that has not been part of the pool, with any IR evaluation measures, making the comparison with pooled systems unfair. IR measures have evolved over the years and become more and more complex and difficult to interpret. Witnessing a need in industry for measures that 'make sense', I focus on the problematics of the two fundamental IR evaluation measures, Precision at cut-off P@n and Recall at cut-off $R@n$. There are two reasons to consider such 'simple' metrics: first, they are cornerstones for many other developed metrics and, second, they are easy to understand by all users. To the eyes of a practitioner, these two evaluation measures are interesting because they lead to more intuitive interpretations like, how much time people are reading useless documents (low precision), or how many relevant documents they are missing (low recall). But this last interpretation, due to the fact that recall is inversely proportional to the number of relevant documents per topic, is very difficult to be addressed if to be judged is just a portion of the collection of documents, as it is done when using the pooling method. To tackle this problem, another kind of evaluation has been developed, based on measuring how much an IR system makes documents accessible. Accessibility measures can be seen as a complementary evaluation to recall because they provide information on whether some relevant documents are not retrieved due to an unfairness in accessibility. The main goal of this Ph.D. is to increase the stability and reusability of existing test collections, when to be evaluated are systems in terms of precision, recall, and accessibility. The outcome will be: the development of a novel estimator to tackle the pool bias issue for P@n, and R@n, a comprehensive analysis of the effect of the estimator on varying pooling strategies, and finally, to support the evaluation of recall, an analytic approach to the evaluation of accessibility measures.

【Keywords】: accessibility; p@n; pool bias; pooling method; r@n

207. Going Beyond Relevance: Incorporating Effort in Information Retrieval.

Paper Link】 【Pages】:1173

【Authors】: Manisha Verma

【Abstract】: Primary focus of Information retrieval (IR) systems has been to optimizefor Relevance. Existing approaches used to rank documents or evaluate IR systems do not account for "user effort". At present, relevance captures topical overlap between document and user query. This mechanism does not take into consideration either time or effort of end user to satisfy information need. While a judge may spend time assessing a document, an end user may not thoroughly examine a document. We identified factors that are associated with effort for a single document and gathered judgments for same. We also investigated the role of several features in predicting effort on webpage. In future, we shall investigate role of effort on mobile and investigate effort based evaluation methodology that also takes into account user's search task.

【Keywords】: effort; information retrieval; relevance

208. Measuring Interestingness of Political Documents.

Paper Link】 【Pages】:1175

【Authors】: Hosein Azarbonyad

【Abstract】: Political texts are pervasive on the Web covering laws and policies in national and supranational jurisdictions. Access to this data is crucial for government transparency and accountability to the population. The main aim of our research is developing a ranking method for political documents which captures the interesting content within political documents. Text interestingness is a measure of assessing the quality of documents from users' perspective which shows their willingness to read a document. Different approaches are proposed for measuring the interestingness of texts. In this research we focus on measuring political texts' interestingness. As political data sources, we use publicly available parliamentary proceedings.

【Keywords】: political documents; text interestingness; topical diversity

Paper Link】 【Pages】:1177

【Authors】: Jiyun Luo

【Abstract】: Nowadays searching for complicated information needs becomes more and more common. These complicated needs usually require the users to reform different queries and conduct multiple retrievals in a search session. There are a lot of technologies are developed to help session searches. Riccho, pseudo relevance feedback, and etc. can help finding relevant documents. xQuAD, RxQuAD, and etc. can help the user to explore. However none of these approaches alone works well in session searches, because they don't treat a search session as a whole. They can't answer questions like when to explore and when to exploit. In this work, we model session searches as Partially Observable Markov Decision Processes (POMDP). We model user's implicit feedbacks, such as query reformulation and user clickthrough data into the POMDP framework. Further we extend the forms of user feedbacks. We implement a new search interface which allows us to capture more explicit feedbacks from users, such as passage level relevance judgments, irrelevant judgments, duplicate judgment, and etc. We propose algorithms to effectively model these feedback signals into the POMDP framework and improve session search performance. Our algorithm is able to automatically balance users' needs of exploration and exploitation.

【Keywords】: dynamic search; session search; user feedback modeling

Paper Link】 【Pages】:1179

【Authors】: Mengdie Zhuang

【Abstract】: Typically, interactive information retrieval (IIR) system evaluations assess search processes and outcomes using a combination of two types of measures: 1. user perception (e.g. users? attitudes of the search experience and outcome); 2. user behaviour (e.g. time and counts of various actions including mouse and keyboard clicks). In general, we assume that they are indicative of the search outcomes (e.g. performance, opinion). However, search is a dynamic process with changing outcomes. Therefore, neither measure solely provides a holistic way of evaluating search. On one hand, user behaviour measures are only descriptive of the outcome, and are not interpretive of the process. That is to say, they lack the rationale behind why those behaviours occurred. Another problem is that some mental activities may not reflect on user behaviour [1]. The challenge with logfiles, which contain behaviour data, is the voluminous number of data points and the need to find a reliable approach to define groups or sets based on behavioural patterns. Not all users are alike and nor do they all take the same approach to search for the same things, as evidenced by the TREC, INEX and CLEF interactive tracks. On the other hand, user perception measures are acquired in such small samples that do not scale to large participant populations, and are rarely measured constantly due to the laborious and time consuming data collection methods (e.g. questionnaire, interview). Moreover, not enough emphasis is put on assessing the reliability of individual perception measures, and the wide usage of likert-type scale limits the interpretation of answers. For a holistic understanding of the search process, we need both perception and behaviour measures. I speculate that user behaviour may predict user perception, and thus we should be able to analyse large-scale files for a greater understanding of the likely human responses.

【Keywords】: interactive information retrieval evaluation; search process; user behaviour

211. Retrievability: An Independent Evaluation Measure.

Paper Link】 【Pages】:1181

【Authors】: Colin Wilkie

【Abstract】: Information Retrieval systems have traditionally been evaluated in terms of efficiency and performance. These aspects of retrieval systems, whilst very important, do not cover a crucial aspect of the system, the access it provides to the documents of the collection. Retrievability, a document centric evaluation measure, introduced by Azzopardi and Vinay, provides an alternative approach to evaluation [1]. Retrievability is the ease with which a document can be retrieved using a retrieval system. The more queries which retrieve the document, and the higher up the document is returned, the more retrievable it is. It can thus be used to describe how difficult it is to find documents in the collection given a particular configuration of a retrieval system. Unlike typical performance evaluations, performing a retrievability analysis can be done without recourse to relevancy judgements meaning there is no reliance on a test collection. This has major advantages when tuning a retrieval systems parameters as the tuning can be performed on the live collection.

【Keywords】: bias; effectiveness; evaluation; retrievability

212. Significant Words Representations of Entities.

Paper Link】 【Pages】:1183

【Authors】: Mostafa Dehghani

【Abstract】: Transforming the data into a suitable representation is the first key step of data analysis, and the performance of any data oriented method is heavily depending on it. We study questions on how we can best learn representations for textual entities that are: 1) precise, 2) robust against noisy terms, 3) transferable over time, and 4) interpretable by human inspection. Inspired by the early work of Luhn, we propose significant words language models of a set of documents that capture all, and only, the significant shared terms from them. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of incidental rare terms that are only explained by specific documents, which eventually results in having only the significant terms left in the model.

【Keywords】: language model; significant words; swlm

213. Time-Quality Trade-offs in Search.

Paper Link】 【Pages】:1185

【Authors】: Ryan Burton

【Abstract】: In this paper, I propose a research agenda surrounding the notion of slow search, where retrieval speed may be traded for improvements in result quality. This time-quality trade- off leads to a number of implications in the areas of human- computer interaction and information retrieval algorithms, and I plan to explore this space along various dimensions, including adjustments in user behavior when exposed to new search paradigms, investigating the utility a user perceives when given the option to use slow search in addition to traditional search, and examining different notions of 'quality'. I have conducted preliminary studies to probe user behavior and attitudes towards a particular implementation of slow search, and how users? expected behaviors compare to their actual behaviors. I will provide an outline of these studies, and propose future work in this as well as related areas.

【Keywords】: interactive information retrieval; search behavior; slow search; user interfaces

214. Torii: Attribute-based Polarity Analysis with Big Datasets.

Paper Link】 【Pages】:1187

【Authors】: Fernando O. Gallego

【Abstract】: Polarity analysis has become a key aspect of market analysis. The number of companies that are interested in the general opinion of the crowd regarding the items that they sell is increasing everyday. Attribute-based polarity analysis is a fine-grained approach that computes if the opinion about an attribute of (a component of) an item is positive, negative, or neutral. The existing techniques have a number of problems, namely: they do not take into account the conditions expressed in the opinions (e.g., when they hold and when they do not), they do not generally use any contextual information (e.g., past user opinions on the same attribute), and they are not validated on big datasets (e.g., billions of messages). In this paper, we present Torii, which is an attribute-based polarity analysis technique that takes both conditions and contextual information into account; we also present our approach to validate it on big datasets.

【Keywords】: attribute conditions and contextual information; attribute-based polarity analysis; big datasets validation

215. User Interaction in Mobile Web Search.

Paper Link】 【Pages】:1189

【Authors】: Jaewon Kim

【Abstract】: From previous studies, we believe that search behaviour on touch-enabled mobile devices is different from the behaviour with desktop screens. In the proposed research, we intend to explore user interaction while searching with the aim of improving search experience on mobile devices.

【Keywords】: mobile device; scroll effect; user interaction; web search

Tutorials 12

216. Collaborative Information Seeking: Art and Science of Achieving 1+1>2 in IR.

Paper Link】 【Pages】:1191-1194

【Authors】: Chirag Shah

【Abstract】: Traditional IR techniques, systems, and methods that assume an individual searcher are often shown to be inadequate for addressing search problems that are multi-faceted and/or too complex or difficult for individuals. The next big leap in information seeking/retrieval could happen by considering social and collaborative aspects of search. In this half-day tutorial, this concept, along with some of the foundational works and latest developments in the field of collaborative information seeking (CIS) will be presented. Specifically, the course will introduce the student to theories, methodologies, and tools that focus on information retrieval/seeking in collaboration. The student will have an opportunity to learn about the social aspect of IR with a focus on collaborative search or CIS situations, systems, and evaluation techniques. The three hours will be divided as: (1) introduction to group-based IR models, approaches, and systems; (2) back-end of CIS systems with system-focused mediation and front-end with user-focused mediation; and (3) evaluation of CIS systems/approaches, prediction and recommendations with collaborative aspects of IR, and future directions. The attendees will be given a course-pack that will include a reference list, an annotated bibliography of seminal works in the field, and depictions of relevant models/frameworks.

【Keywords】: collaborative information seeking; collaborative ir; tutorial

217. Constructing and Mining Web-scale Knowledge Graphs.

Paper Link】 【Pages】:1195-1197

【Authors】: Evgeniy Gabrilovich ; Nicolas Usunier

【Abstract】: Recent years have witnessed a proliferation of large-scale knowledge graphs, from purely academic projects such as YAGO to major commercial projects such as Google's Knowledge Graph and Microsoft's Satori. Whereas there is a large body of research on mining homogeneous graphs, this new generation of information networks are highly heterogeneous, with thousands of entity and relation types and billions of instances of those types (graph vertices and edges). In this tutorial, we present the state of the art in constructing, mining, and growing knowledge graphs. The purpose of the tutorial is to equip newcomers to this exciting field with an understanding of the basic concepts, tools and methodologies, open research challenges, as well as pointers to available datasets and relevant literature. Knowledge graphs have become an enabling resource for a plethora of new knowledge-rich applications. Consequently, the tutorial will also discuss the role of knowledge bases in empowering a range of web applications, from web search to social networks to digital assistants. A publicly available knowledge base (Freebase) will be used throughout the tutorial to exemplify the different techniques.

【Keywords】: knowledge graphs

218. Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement.

Paper Link】 【Pages】:1199-1201

【Authors】: Thorsten Joachims ; Adith Swaminathan

【Abstract】: Online metrics measured through A/B tests have become the gold standard for many evaluation questions. But can we get the same results as A/B tests without actually fielding a new system? And can we train systems to optimize online metrics without subjecting users to an online learning algorithm? This tutorial summarizes and unifies the emerging body of methods on counterfactual evaluation and learning. These counterfactual techniques provide a well-founded way to evaluate and optimize online metrics by exploiting logs of past user interactions. In particular, the tutorial unifies the causal inference, information retrieval, and machine learning view of this problem, providing the basis for future research in this emerging area of great potential impact. Supplementary material and resources are available online at http://www.cs.cornell.edu/~adith/CfactSIGIR2016.

【Keywords】: batch learning from bandit feedback; causal inference; counterfactual estimation; learning to rank

219. Deep Learning for Information Retrieval.

Paper Link】 【Pages】:1203-1206

【Authors】: Hang Li ; Zhengdong Lu

【Abstract】: Recent years have observed a significant progress in information retrieval and natural language processing with deep learning technologies being successfully applied into almost all of their major tasks. The key to the success of deep learning is its capability of accurately learning distributed representations (vector representations or structured arrangement of them) of natural language expressions such as sentences, and effectively utilizing the representations in the tasks. This tutorial aims at summarizing and introducing the results of recent research on deep learning for information retrieval, in order to stimulate and foster more significant research and development work on the topic in the future. The tutorial mainly consists of three parts. In the first part, we introduce the fundamental techniques of deep learning for natural language processing and information retrieval, such as word embedding, recurrent neural networks, and convolutional neural networks. In the second part, we explain how deep learning, particularly representation learning techniques, can be utilized in fundamental NLP and IR problems, including matching, translation, classification, and structured prediction. In the third part, we describe how deep learning can be used in specific application tasks in details. The tasks are search, question answering (from either documents, database, or knowledge base), and image retrieval.

【Keywords】: deep learning; image retrieval; information retrieval; question answering; search

220. From Design to Analysis: Conducting Controlled Laboratory Experiments with Users.

Paper Link】 【Pages】:1207-1210

【Authors】: Diane Kelly ; Anita Crescenzi

【Abstract】: This full-day tutorial provides general instruction about the design of controlled laboratory experiments that are conducted in order to better understand human information interaction and retrieval. Different data collection methods and procedures are described, with an emphasis on self-report measures and scales. This tutorial also introduces the use of statistical power analysis for sample size estimation and introduces and demonstrate two data analysis procedures, Multilevel Modeling and Structural Equation Modeling, that allow for examination of the whole set of variables present in interactive information retrieval (IIR) experiments, along with their various effect sizes. The goals of the tutorial are (1) to increase participants? understanding of the uses of controlled laboratory experiments with human participants; (2) to increase participants? understanding of the technical vocabulary and procedures associated with such experiments and (2) to increase participants? confidence in conducting and evaluating IIR experiments. Ultimately, we hope our tutorial will increase research capacity and research quality in IR by providing instruction about best practices to those contemplating interactive IR experiments.

【Keywords】: data analysis; experimental design; interactive information retrieval; user studies

221. Instant Search: A Hands-on Tutorial.

Paper Link】 【Pages】:1211-1214

【Authors】: Ganesh Venkataraman ; Abhimanyu Lad ; Viet Ha-Thuc ; Dhruv Arya

【Abstract】: Instant search has become a common part of the search experience in most popular search engines and social networking websites. The goal is to provide instant feedback to the user in terms of query completions ("instant suggestions") or directly provide search results ("instant results") as the user is typing their query. The need for instant search has been further amplified by the proliferation of mobile devices and services like Siri and Google Now that aim to address the user's information need as quickly as possible. Examples of instant results include web queries like "weather san jose" (which directly provides the current temperature), social network queries like searching for someone's name on Facebook or LinkedIn (which directly provide the people matching the query). In each of these cases, instant search constitutes a superior user experience, as opposed to making the user complete their query before the system returns a list of results on the traditional search engine results page (SERP). We consider instant search experience to be a combination of instant results and instant suggestions, with the goal of satisfying the user's information need as quickly as possible with minimal effort on the part of the user. We first present the challenges involved in putting together an instant search solution at scale, followed by a survey of IR and NLP techniques that can be used to address them. We will also conduct a hands-on session aimed at putting together an end-to-end instant search system using open source tools and publicly available data sets. These tools include typeahead.js from Twitter for the frontend and Lucene/elasticsearch for the backend. We present techniques for prefix-based retrieval as well as injecting custom ranking functions into elasticsearch. For the search index, we will use the dataset made available by Stackoverflow. This tutorial is aimed at both researchers interested in knowing about retrieval techniques used for instant search as well as practitioners interested in deploying an instant search system at scale. The authors have worked extensively on building and scaling LinkedIn's instant search experience. To the best of our knowledge, this is the first tutorial that covers both theoretical and practical aspects of instant search.

【Keywords】: information retrieval; instant search; query understanding

222. Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial.

Paper Link】 【Pages】:1215-1218

【Authors】: Artem Grotov ; Maarten de Rijke

【Abstract】: During the past 10--15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front. Below we describe why we believe that the time is right for an intermediate-level tutorial on online learning to rank, the objectives of the proposed tutorial, its relevance, as well as more practical details, such as format, schedule and support materials.

【Keywords】: bandit algorithms; exploration vs. exploitation; online learning to rank

223. Question Answering with Knowledge Base, Web and Beyond.

Paper Link】 【Pages】:1219-1221

【Authors】: Wen-tau Yih ; Hao Ma

【Abstract】: In this tutorial, we give the audience a coherent overview of the research of question answering (QA). We first introduce a variety of QA problems proposed by pioneer researchers and briefly describe the early efforts. By contrasting with the current research trend in this domain, the audience can easily comprehend what technical problems remain challenging and what the main breakthroughs and opportunities are during the past half century. For the rest of the tutorial, we select three categories of the QA problems that have recently attracted a great deal of attention in the research community, and present the tasks with the latest technical survey. We conclude the tutorial by discussing the new opportunities and future directions of QA research.

【Keywords】: knowledge bases; question answering; web search

Paper Link】 【Pages】:1223-1226

【Authors】: Berkant Barla Cambazoglu ; Ricardo A. Baeza-Yates

【Abstract】: Commercial web search engines need to process thousands of queries every second and provide responses to user queries within a few hundred milliseconds. As a consequence of these tight performance constraints, search engines construct and maintain very large computing infrastructures for crawling the Web, indexing discovered pages, and processing user queries. The scalability and efficiency of these infrastructures require careful performance optimizations in every major component of the search engine. This tutorial aims to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. In particular, the tutorial provides an in-depth architectural overview of a web search engine, mainly focusing on the web crawling, indexing, and query processing components. The scalability and efficiency issues encountered in these components are presented at four different granularities: at the level of a single computer, a cluster of computers, a single data center, and a multi-center search engine. The tutorial also points out some open research problems and provides recommendations to researchers who are new to the field.

【Keywords】: crawling; efficiency; indexing; query processing; scalability; web search engines

Paper Link】 【Pages】:1227-1230

【Authors】: Leif Azzopardi

【Abstract】: Search is an inherently interactive, non-deterministic and user-dependent process. This means that there are many different possible sequences of interactions which could be taken (some ending in success and others ending in failure). Simulation provides a low cost, repeatable and reproducible way to explore a large range of different possibilities. This makes simulation very appealing, but it also requires care and consideration in developing, implementing and instantiating models of user behaviour for the purposes of experimentation. In this tutorial, we aim to provide researchers with an overview of simulation, detailing the various types of simulation, models of search behavior used to simulate interaction, along with an overview of the various models of querying, stopping, selecting and marking. Through the course of the tutorial we will describe various studies and how they have used simulation to explore different behaviours and aspects of the search process. The final section of the tutorial will be dedicated to "best practice" and how to build, ground and validate simulations. The tutorial will conclude with a demonstration of an open source simulation framework that can be used develop various kinds of simulations.

【Keywords】: evaluation; information retrieval; performance; simulation

226. Succinct Data Structures in Information Retrieval: Theory and Practice.

Paper Link】 【Pages】:1231-1233

【Authors】: Simon Gog ; Rossano Venturini

【Abstract】: Succinct data structures are used today in many information retrieval applications, e.g., posting lists representation, language model representation, indexing (social) graphs, query auto-completion, document retrieval and indexing dictionary of strings, just to mention the most recent ones. These new kind of data structures mimic the operations of their classical counterparts within a comparable time complexity but require much less space. With the availability of several libraries for basic succinct structures - like SDSL, Succinct, Facebook?s Folly, and Sux - it is relatively easy to directly profit from advances in this field. In this tutorial we will introduce this field of research by presenting the most important succinct data structures to represent set of integers, set of points, trees, graphs and strings together with their most important applications to Information Retrieval problems. The introduction of the succinct data structures will be sustained with a practical session with programming handouts to solve. This will allow the attendees to directly experiment with implementations of these solutions on real datasets and understand the potential benefits they can bring on their own projects.

【Keywords】: data compression; indexing; succinct data structures

227. Temporal Information Retrieval.

Paper Link】 【Pages】:1235-1238

【Authors】: Nattiya Kanhabua ; Avishek Anand

【Abstract】: The study of temporal dynamics and its impact can be framed within the so-called temporal IR approaches, which explain how user behavior, document content and scale vary with time, and how we can use them in our favor in order to improve retrieval effectiveness. This half-day tutorial will outline research issues with respect to temporal dynamics, and provide a comprehensive overview of temporal IR approaches, essentially regarding processing dynamic content, temporal information extraction, temporal query analysis, and time-aware retrieval and ranking. The tutorial is structured into two sessions. During the first session, we will explain the general and wide aspects associated to temporal dynamics by focusing on the web domain, from content and structural changes to variations of user behavior and interactions. We will begin with temporal indexing and query processing. Next step, we will explain current approaches to time-aware retrieval and ranking, which can be classified into different types based on two main notions of relevance with respect to time, namely, recency-based ranking, and time-dependent ranking. In the latter session, we will describe research issues centered on determining the temporal intent of queries, and time-aware query enhancement, e.g., temporal relevance feedback, and time-aware query reformulation. In addition, we present applications in related research areas, e.g., exploration, summarization, and clustering of search results, as well as future event retrieval and prediction. To this end, we conclude our tutorial and outline future directions. This tutorial targets graduate students, researchers and practitioners in the field of information retrieval. The goal is to provide an overview as well as an important context that enables further research on and practical applications within this area.

【Keywords】: adaptive crawling and caching; temporal indexing; temporal information extraction; temporal queries; time-aware ranking

Workshops 7

228. Third International Workshop on Gamification for Information Retrieval (GamifIR 2016).

Paper Link】 【Pages】:1239-1240

【Authors】: Michael Meder ; Frank Hopfgartner ; Gabriella Kazai ; Udo Kruschwitz

【Abstract】: Stronger engagement and greater participation is often crucial to reach a goal or to solve an issue. Issues like the emerging employee engagement crisis, insufficient knowledge sharing, and chronic procrastination. In many cases we need and search for tools to beat procrastination or to change people's habits. Gamification is the approach to learn from often fun, creative and engaging games. In principle, it is about understanding games and applying game design elements in a non-gaming environments. This offers possibilities for wide area improvements. For example more accurate work, better retention rates and more cost effective solutions by relating motivations for participating as more intrinsic than conventional methods. In the context of Information Retrieval (IR) it is not hard to imagine that many tasks could benefit from gamification techniques. Besides several manual annotation tasks of data sets for IR research, user participation is important in order to gather implicit or even explicit feedback to feed the algorithms. Gamification, however, comes with its own challenges and its adoption in IR is still in its infancy. Given the enormous response to the first and second GamifIR workshops that were both co-located with ECIR, and the broad range of topics discussed, we now organized the third workshop at SIGIR 2016 to address a range of emerging challenges and opportunities.

【Keywords】: crowdsourcing; gamification; information retrieval; workshop

229. HIA 2016: The 2nd International Workshop on Heterogeneous Information Access at SIGIR 2016.

Paper Link】 【Pages】:1241

【Authors】: Ke Zhou ; Yiqun Liu ; Roger Jie Luo ; Joemon M. Jose

【Abstract】: Information access is becoming increasingly heterogeneous. Especially when the user's information need is for exploratory purpose, returning a set of diverse results from different resources could benefit the user. For example, when a user is planning a trip to China, retrieving and showing results from vertical search engines like travel, flight information, map and Q2A sites can satisfy the user's rich and diverse information need. This heterogeneous search paradigm is useful in many contexts and brings many new challenges.

【Keywords】: aggregated search; card recommendation; composite retrieval; federated search; heterogeneous

Paper Link】 【Pages】:1243

【Authors】: Steven Bedrick ; Lorraine Goeuriot ; Gareth J. F. Jones ; Anastasia Krithara ; Henning Müller ; George Paliouras

【Abstract】:

【Keywords】: medical information retrieval

231. Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval.

Paper Link】 【Pages】:1245-1246

【Authors】: Nick Craswell ; W. Bruce Croft ; Jiafeng Guo ; Bhaskar Mitra ; Maarten de Rijke

【Abstract】: In recent years, deep neural networks have yielded significant performance improvements on speech recognition and computer vision tasks, as well as led to exciting breakthroughs in novel application areas such as automatic voice translation, image captioning, and conversational agents. Despite demonstrating good performance on natural language processing (NLP) tasks (e.g., language modelling and machine translation, the performance of deep neural networks on information retrieval (IR) tasks has had relatively less scrutiny. Recent work in this area has mainly focused on word embeddings and neural models for short text similarity. The lack of many positive results in this area of information retrieval is partially due to the fact that IR tasks such as ranking are fundamentally different from NLP tasks, but also because the IR and neural network communities are only beginning to focus on the application of these techniques to core information retrieval problems. Given that deep learning has made such a big impact, first on speech processing and computer vision and now, increasingly, also on computational linguistics, it seems clear that deep learning will have a major impact on information retrieval and that this is an ideal time for a workshop in this area. Neu-IR (pronounced "new IR") will be a forum for new research relating to deep learning and other neural network based approaches to IR. The purpose is to provide an opportunity for people to present new work and early results, compare notes on neural network toolkits, share best practices, and discuss the main challenges facing this line of research.

【Keywords】: deep learning; information retrieval; neural networks

232. Privacy-Preserving IR 2016: Differential Privacy, Search, and Social Media.

Paper Link】 【Pages】:1247-1248

【Authors】: Grace Hui Yang ; Ian Soboroff ; Li Xiong ; Charles L. A. Clarke ; Simson L. Garfinkel

【Abstract】: Due to lack of mature techniques in privacy-preserving information retrieval (IR), concerns about information privacy and security have become serious obstacles that prevent valuable user data to be used in IR research such as studies on query logs, social media, and medical record retrieval. In SIGIR 2014 and SIGIR 2015, we have run the privacy-preserving IR workshops exploring and understanding the privacy and security risks in information retrieval. This year, we continue the efforts of connecting the two disciplines of IR and privacy/security by organizing this workshop. We target on three themes, differential privacy and IR dataset release, privacy in search and browsing, and privacy in social media. The workshop includes panels with researchers from both fields on these three themes, as well as invite industry speakers for real-world challenges. The goals of this workshop include (1) bringing together the two research fields, and (2) yielding fruitful collaborations.

【Keywords】: privacy-preserving information retrieval

Paper Link】 【Pages】:1249-1250

【Authors】: Jacek Gwizdka ; Preben Hansen ; Claudia Hauff ; Jiyin He ; Noriko Kando

【Abstract】: The "Search as Learning" (SAL) workshop is focused on an area within the information retrieval field that is only beginning to emerge: supporting users in their learning whilst interacting with information content.

【Keywords】: human information interaction; learning; search

234. SIGIR 2016 Workshop WebQA II: Web Question Answering Beyond Factoids.

Paper Link】 【Pages】:1251-1252

【Authors】: Alessandro Moschitti ; Lluís Màrquez ; Preslav Nakov ; Eugene Agichtein ; Charles L. A. Clarke ; Idan Szpektor

【Abstract】: Web search engines have made great progress at answering factoid queries. However, they are not well-tailored for managing more complex questions, especially when they require explanation and/or description. The WebQA workshop series aims at exploring diverse approaches to answering questions on the Web. This year, particular emphasis will be given to Community Question Answering (CQA), where comments by the users engaged in the forum communities can be used to answer new questions. Questions posted on the Web can be short and ambiguous (similarly to Web queries to a search engine). These issues make the WebQA task more challenging than traditional QA, and finding the most effective approaches for it remains an open problem. Unlike the more formal conference format, the aim of this workshop is to bring together researchers in diverse areas working on this problem, including those from NLP, IR, social media and recommender systems communities. This workshop is specifically designed for the SIGIR audience. However, due to its format, its goal, as compared to the main conference, is to conduct a more focused and open discussion, encouraging the presentation of work in progress and late-breaking initial results in Web Question Answering. Both academic and industrial participation will be solicited, including keynotes and invited speakers.

【Keywords】: answer aggregation from various sources; answer summarization from various sources; answering complex; multi-sentence questions; answering opinion questions; including sentiment analysis; evaluation of question answering systems; identifying question intent in web queries; inferring answers for web questions using knowledge bases or graphs; social and user generated question analysis; using collaboratively generated content for qa