Literature Review 2009 - 2010

Thursday, October 07, 2010

Diversity and Novelty

The first set of papers concern diversity and novelty in information retrieval. Search diversity is a new area of search that focuses on ensuring search results lists cover multiple interpretations of a search query within their highest ranked results.

Carbonell J. and Goldstein J., The use of MMR, diversity-based reranking for reordering documents and producing summaries, SIGIR, 1998 The classic that began (?) diversity research. Maximal marginal relevance (MMR) reduce query results redundancy and maintaining relevance in re-ranking and passage selection for summarization.

Clarke, C., et al., Novelty and Diversity in Information Retrieval Evaluation, ACM IR Conference, 2008, Framework for evaluation that rewards novelty (need to avoid redundancy) and diversity (need to resolve ambiguity).

Agrawal R., et al., Diversifying Search Results, WSDM, 2009 Diversifying search results by making explicit use of the topics the query or the documents may refer to.

Carterette B. and Chandar P., Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval, CIKM, 2009 Within a specific topic diversity is achieved by retrieving documents that cover different facets of that topic. Attempt to find a small set of documents that covers all the facets of a topic.

Wang J., Zhu J., Portfolio Theory of Information Retrieval, SIGIR 2009 Select top-n documents as a whole by balancing the overall relevance of the list against its risk (variance). [Note: is there ( if so, what is) the relation to bias variance decomposition?]

Kohlschutter, C. and Chirita, P. and Nejdl, W., Using Link Analysis to Identify Aspects in Faceted Web Search, SIGIR, 2006 Narrowing query with more keywords leads to a higher probability relevant documents filtered out. Address with faceted search. which divides results into orthogonal categories, works well w/annotated data.

Personalized Search

The following focus on personalized search including personalized clustering systems, personalized versions of PageRank, and various other forms of personalization. These are generally tangentially related to learning to rank.

Ferragina, P. and Gulli, A., A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering, WWW, 2005 Cluster snippets returned by search engine into hierarchy labeled to capture theme of the snippets.

Sia, K. and Zhu, S. and Chi, Y. and Hino, K. and Tseng, B., Capturing User Interests by Both Exploitation and Exploration, LNIAI, 2007 Build personal information manager by discovering potential interests through topic diversification, adapt to user interest drift.

Dou, Z. and Song, R. and Wen, J., A Large-scale Evaluation and Analysis of Personalized Search Strategies, WWW 2007 Personalized search has significant improvements on some search queries but little effect on others, it decrease accuracy in some situations. Click-based personalization consistently performs well, profile-based personalization is unstable, long and short-term contexts important to improving performance.

Micarelli, A. and Gasparetti, F. and Sciarrone, F. and Gauch, S., Personalized Search on the World Wide Web, The Adaptive Web, 2007 Overview of personalized search.

White, R. and Bailey, P. and Chen, L., Predicting User Interests from Contextual Information, SIGIR, 2009 How to use information from contextual sources in post-query navigation and general browsing to support search, user modeling, and to predict future behavior.

Sieg, A. and Mobasher, B. and Burke, R., Web Search Personalization with Ontological User Profiles, CIKM, 2007 Building context models for search personalization based on short-term query or context of current activity, semantic knowledge about the domain under investigation, user profile capturing long-term interests.

Carmel, D. et al., Personalized Social Search Based on the User’s Social Network, CIKM, 2009 Previous user actions don’t always provide a good indication of user needs and privacy concerns, benefits achievable through user interaction vary across queries, can be confusing to users if same query returns different results over time.

Stamou, S. and Ntoulas, A., Search Personalization through Query and Page Topical Analysis, Kluwer, 2008 Identifying interests of a user given no explicit feedback, retrieve results matching these interests.

Gleich, D. and Polito, M., Approximating Personalized PageRank with Minimal Use of Web Graph Data, Internet Mathematics, V.3, N.3, pp. 257-294 Computing personalized PageRank using personalized teleportation vector.

Other Topics in Information Retrieval

Kriewel S. and Fuhr N., Evaluation of an adaptive search suggestion system, ECIR, 2010 Develop system for situational search suggestions.

Bai, B., et al., Learning to Rank with (a Lot of) Word Features, Information Retrieval, 13:3, 2009 In LDA and even supervised LDA signal is at the the document level and is query independent, they learn a model for a query and target texts by considering features generated by all pairs of words between the two texts.

Chappelle O., Zhang Y., A Dynamic Bayesian Network Click Model for Web Search Ranking, WWW, 2009 Provide unbiased estimation of relevance from click logs, use predicted relevance as feature in a ranking function and as supplementary data to learn a ranking function/

Sanderson M., Ambiguous Queries: Test Collections Need More Sense, SIGIR, 2008 Study ambiguity in queries and words, study how test collections handle ambiguous queries.

H. Zeng, et al., Learning to Cluster Web Search Results, SIGIR, 2004 Cluster web search results and provide meaningful labels to clusters.

T. Zhou, et al., Solving the apparent diversity-accuracy dilemma of recommender systems, PNAS, 2010 Recommend items and take diversity and novelty into account.

V. Hollink and M. van Someren. Optimal link categorization for minimal retrieval effort, DIR, 2006 Categorizing search results has costs that can outweigh gains from categorization, a solution is to find a hierarchical structure minimizing retrieval time.

Qiu, F. and Cho, J., Automatic Identification of User Interest For Personalized Search, WWW, 2006 Queries are ambiguous, can’t determine user intention. This can be alleviated with personalized search but user’s don’t want to provide explicit feedback, a solution is automated personalization.

Wilson, M. and Schraefel, M. and White, R., Evaluating Advanced Search Interfaces using Established Information-Seeking Models, SIGIR, 2007 Richer search models developed to support users w/poorly defined goal or complex queries, unclear how design of rich system affects performance.

Hollink, V. and Tsikrika, T. and de Vries, A., Semantic vs term-based query modification analysis, DIR, 2010 Searchers often in iterative process of querying w/single information need, studying query modification patterns leads to insights into user behavior and improved interactive support, current methods don’t look at semantic relations.

Xiang, B. et al., Context-Aware Ranking in Web Search, SIGIR, 2010 Current ranking models insensitive to context and divorced from ranking model