Welcome and Opening Remarks

Keynote Speech
(Session Chair: Sihem Amer-Yahia)
Humane Data Mining
Rakesh Agrawal
(Microsoft Research, USA)

Poster Boaster

Coffee Break and posters

Session 1


DB: Faceted Search, Web Query Results Presentation
(Session Chair: Jimeng Sun)

Dynamic Faceted Search for Discovery-Driven Analysis
Dash Debabrata, Rao Jun, Nimrod Megiddo, Anastasia Ailamaki, Guy Lohman
(Carnegie Mellon University and IBM Almaden Research Center, USA) (863)

Minimum Effort Driven Dynamic Faceted Search in Structured Databases
Senjuti Basu Roy, Haidong Wang, Gautam Das, Ullas Nambiar, Mukesh Mohania
(University of Texas at Arlington, USA) (756)

A Language for Manipulating Clustered Web Documents Results
Gloria Bordogna, Alessandro Campi, Giuseppe Psaila, Stefania Ronchi
(CNR IDPA and University of Bergamo, Italy) (570)

Integrating Web Query Results: Holistic Schema Matching
Shui-Lung Chuang, Kevin Chang
(University of Illinois at Urbana-Champaign, USA) (936)

IR: Web Search 1
(Session Chair: Susan Dumais)

How does Clickthrough Data Reflect Retrieval Quality?
Filip Radlinski, Madhu Kurup, Thorsten Joachims
(Cornell University, USA) (332)

Efficient and Effective Link Analysis with Precomputed SALSA Maps
Marc Najork, Nick Craswell
(Microsoft Research Cambridge, UK) (794)

Achieving both High Precision and High Recall in Near-Duplicate Detection
Lianen Huang, Lei Wang, Xiaoming Li
(Peking University, China) (63)

Are Clickthrough Data Adequate for Learning Web Search Rankings?
Zhicheng Dou, Ruihua Song, Xiaojie Yuan, Ji-Rong Wen
(Nankai University, Shanghai Jiao Tong University and Microsoft Research Asia, China) (396)

KM: Classification
(Session Chair: Iadh Ounis)

Error-Driven Generalist+Experts (EDGE): A Multi-stage Ensemble Framework for Text Categorization
Jian Huang, Omid Madani, C. Lee Giles
(Pennsylvania State University, USA) (167)

A Sparse Gaussian Processes Classification Framework for Fast Tag Suggestions
Yang Song, Lu Zhang, C. Lee Giles
(Pennsylvania State University, USA) (15)

Transfer Learning From Multiple Source Domains via Consensus Regularization
Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He
(Chinese Academy of Science, China) (441)

Classifying Networked Entities with Modularity Kernels
Dell Zhang, Robert Mao
(University of London, UK) (146)

Industry Research Track (Session Chair: Nazli Goharian)

Web-Scale Named Entity Recognition
Casey Whitelaw, Alex Kehlenbeck, Nemanja Petrovic, Lyle Ungar
(University of Pennsylvania, USA) (531)

Semi-Automated Logging of Contact Center Telephone Calls
Roy Byrd, Mary Neff, Wilfried Teiken, Youngja Park, Keh-Shin F Cheng, Stephen Gates, Karthik, Visweswariah
(IBM T.J. Watson Research Center, USA) (126)

MedSearch: A Specialized Search Engine for Medical Information
Gang Luo, Chunqiang Tang, Hao Yang, Xin Wei
(IBM T.J. Watson Research Center, USA) (65)

An Empirical Study of Required Dimensionality for Large-scale Latent Semantic Indexing Applications
Roger Bradford
(Agilex Technologies Inc., USA) (370)

Lunch Break and posters

Session 2


DB: Efficient Maintenance and Query Optimization
(Session Chair: Luke Huan)

Content-Based Filtering for Efficient Online Materialized View Maintenance
Gang Luo, Philip Yu
(IBM T.J. Watson Research Center, USA) (119)

A Step towards Incremental Maintenance of Composed Schema Mappings
Qian Gang
(Nanjing University of Finance & Economics, China) (977)

Modeling and Exploiting Query Interactions in Database Systems
Mumtaz Ahmad, Ashraf Aboulnaga, Shivnath Babu, Kamesh Munagala
(University of Waterloo, Canada and Duke University, USA) (892)

A Novel Optimization Approach to Efficiently Process Aggregate Similarity Queries in Metric Access Methods
Humberto Razente, Maria Camila Barioni, Agma Traina, Christos Faloutsos, Caetano Traina
(University of Sao Paulo at Sao Carlos, Brazil) (776)

IR: Social Search
(Session Chair: Neel Sundaresan)

Can All Tags Be Used for Search?
Kerstin Bischoff, Claudiu Firan, Wolfgang Nejdl, Raluca Paiu
(University of Hannover, Germany) (504)

Comparing Citation Contexts for Information Retrieval
Anna Ritchie, Simone Teufel, Stephen Robertson
(University of Cambridge, UK) (116)

Social Tags: Meanings and Suggestions
Fabian Suchanek, Milan Vojnovic, Dinan Gunawardena
(Max-Planck-Institute for Computer Science, Germany) (617)

Mining Social Networks Using Heat Diffusion Processes for Marketing Candidates Selection
Hao Ma, Haixuan Yang, Michael R. Lyu, Irwin King
(The Chinese University of Hong Kong, China) (894)

IR/KM: Machine Learning
(Session Chair: Rong Jin)

Exploiting Temporal Contexts in Text Classification
Leonardo Rocha, Fernando Mourao, Adriano Pereira, Marcos Goncalves, Wagner Meira
(Federal University of Minas Gerais, Brazil) (345)

Kernel Methods, Syntax and Semantics for Relational Text Categorization
Alessandro Moschitti
(University of Trento, Italy) (704)

BNS Feature Scaling: An Improved Representation over TF-IDF for SVM Text Classification
George Forman
(HP Labs, USA) (105)

Learning a Two-Stage SVM/CRF Sequence Classifier
Guilherme Hoefel, Charles Elkan
(University of California San Diego, USA) (349)

KM: Link and Graph Mining
(Session Chair: Christos Faloutsos)

Local Approximation of PageRank and Reverse PageRank
Li-Tal Mashiach, Ziv Bar-Yossef
(Technion, Israel) (30)

Link Privacy in Social Networks
Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar, Ying Xu
(Stanford University, USA) (883)

On Effective Presentation of Graph Patterns: A Structural Representative Approach
Chen Chen, Cindy Lin, Xifeng Yan, Jiawei Han
(University of Illinois at Urbana-Champaign, USA) (1084)

Characterizing and Predicting Community Members from Evolutionary and Heterogeneous Networks
Sourav S. Bhowmick, Qiankun Zhao, Xin Zheng, Kai Yi
(Nanyang Technological University, Singapore) (184)

KM: Information Filtering
(Session Chair: Lyle Ungar)

An Algorithm to Determine Peer-Reviewers
Marko Rodriguez, Johan Bollen
(Los Alamos National Laboratory, USA) (77)

Spam Characterization and Detection in Peer-to-Peer File-Sharing Systems
Dongmei Jia, Wai Gen Yee, Ophir Frieder
(Illinois Institute of Technology, USA) (941)

Predicting Web Spam with HTTP Session Information
Steve Webb, James Caverlee, Calton Pu
(Georgia Institute of Technology and Texas A&M University, USA) (777)

Inferring Semantic Query Relations from Collective User Behavior
Nish Parikh, Neel Sundaresan
(eBay Research Labs, USA) (703)

Coffee Break and posters

Session 3

DB: Stream Processing
(Session Chair: Ashraf Aboulnaga)

Anomaly-Free Incremental Output in Stream Processing
George Mihaila, Ioana Roxana Stanoi, Christian Lang
(IBM T. J. Watson Research Center and IBM Almaden Research Center, USA) (392)

SNIF TOOL: Sniffing for Patterns in Continuous Streams
Abhishek Mukherji, Elke Rundensteiner, David Brown, Venkatesh Raghavan
(Worcester Polytechnic Institute, USA) (773)

Real-Time New Event Detection for Video Streams
Gang Luo, Rong Yan, Philip Yu
(IBM T.J. Watson Research Center, USA) (64)

Linear Time Membership in a Class of Regular Expressions with Interleaving and Counting
Giorgio Ghelli, Dario Colazzo, Carlo Sartiani
(Universita di Pisa, Italy) (468)

IR: Theory
(Session Chair: Michael Lyu)

Generalized Inverse Document Frequency
Donald Metzler
(Yahoo! Research, USA) (131)

TinyLex: Static N-Gram Index Pruning with Perfect Recall
Derrick Coetzee
(Microsoft Research, USA) (840)

Revisiting the Relationship between Document Length and Relevance
David Losada, Leif Azzopardi, Mark Baillie
(University of Santiago de Compostela, Spain, University of Glasgow and University of Strathclyde, UK) (285)

Relating Dependent Indexes using Dempster-Shafer Theory
Lixin Shi, Jian-Yun Nie, Guihong Cao
(University of Montreal, Canada) (1094)

IR: Query Analysis
(Session Chair: Diane Kelly)

Improved Query Difficulty Prediction for the Web
Claudia Hauff, Vanessa Murdock, Ricardo Baeza-Yates
(University of Twente, The Netherlands and Yahoo! Research Barcelona, Spain) (61)

Understanding the Relationship between Searchers' Queries and Information Goals
Doug Downey, Dan Liebling, Susan Dumais
(University of Washington and Microsoft Research, USA) (878)

Active Relevance Feedback for Difficult Queries
Zuobing Xu, Ram Akella
(University of California Santa Cruz, USA) (204)

Query Suggestion Using Hitting Time
Qiaozhu Mei, Dengyong Zhou, Kenneth Church
(University of Illinois at Urbana-Champaign, USA) (921)

KM: Web Mining
(Session Chair: Jiang-Ming Yang)

Mining Term Association Patterns from Search Logs for Effective Query Reformulation
Xuanhui Wang, ChengXiang Zhai
(University of Illinois at Urbana-Champaign, USA) (178)

Non-Local Evidence for Expert Finding
Krisztian Balog, Maarten de Rijke
(University of Amsterdam, The Netherlands) (865)

Discovering Leaders from Community Actions
Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan
(University of British Columbia, Canada and Yahoo! Research Barcelona, Spain) (711)

Learning to Link with Wikipedia
David Milne, Ian H. Witten
(University of Waikato, New Zealand) (1046)



Privacy-Preserving Data Publishing for Horizontally Partitioned Databases
Pawel Jurczyk, Li Xiong
(Emory University, USA) (93)

CE2--Towards a Large Scale Hybrid Search Engine with Integrated Ranking Support
Haofen Wang, Thanh Tran
(Shanghai Jiao Tong University, China and University Karlsruhe, Germany) (143)

Scaling up Duplicate Detection in Graph Data
Melanie Herschel, Felix Naumann
(Hasso-Plattner-Institut, Germany) (169)

ROAD: An Efficient Framework for Location Dependent Spatial Queries on Road Networks
Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng
(Pennsylvania State University, USA and Singapore Management University, Singapore) (182)

View and Index Selection for Query-Performance Improvement: Quality-Centered Algorithms and Heuristics
Maxim Kormilitsin, Rada Chirkova, Yahya Fathi, Matthias Stallmann
(North Carolina State University, USA) (302)

SQL Extension for Exploring Multiple Tables
Sung Jin Kim, Junghoo (John) Cho
(University of California, Los Angeles, USA) (347)

PBFilter: Indexing Flash-Resident Data through Partitioned Summaries
Shaoyi Yin, Philippe Pucheral, Xiaofeng Meng
(INRIA, France and PRiSM & Renmin University of China, China) (486)


Transaction Reordering with Application to Synchronized Scans
Gang Luo, Jeffrey Naughton, Curt Ellmann, Michael Watzke
(IBM T.J. Watson Research Center, USA) (66)

Yizkor Books: A Voice for the Silent Past
Jason Soo, Rebecca Cathey, Ophir Frieder, Michlean Amir, Gideon Frieder
(Illinois Institute of Technology and BAE Systems, USA) (718)


An Approximate String Matching Approach for Handling Incorrectly Typed URLs
Mihai Stroe, Radu Berinde, Cosmin Negruseri, Dan Popovici
(Google Switzerland GmbH, Switzerland) (60)

Speed up Semantic Search in P2P Networks
Wang Qiang, Rui Li, Lei Chen, Jie Lian, Tamer Ozsu
(University of Waterloo, Canada and HKUST, China) (69)

A Note Search Based Forecasting of Ad Volume in Contextual Advertising
Xuerui Wang, Andrei Broder, Marcus Fontoura, Vanja Josifovski
(University of Massachusetts, Amherst, Yahoo! Research, USA and PUC-Rio, Brazil) (103)

An Extension of PLSA for Document Clustering
Young-Min Kim, Jean-Francois Pessiot, Massih Amini, Patrick Gallinari
(Pierre and Marie Curie University, France) (221)

Online Spam-Blog Detection Through Blog Search
Linhong Zhu, Aixin Sun, Byron Choi
(Nanyang Technological University, Singapore) (245)

Nested Region Algebra Extended with Variables for Tag-Annotated Text Search
Katsuya Masuda, Junichi Tsujii
(University of Tokyo, Japan and University of Manchester, UK) (403)

Searching the Wikipedia with Contextual Information
Antti Ukkonen, Carlos Castillo, Debora Donato, Aristides Gionis
(Yahoo! Research Barcelona, Spain) (481)

Winnowing-Based Text Clustering
Javier Parapar, Alvaro Barreiro
(University of A Coruna, Spain) (496)

Using Sequence Classification for Filtering Web Pages
Binyamin Rosenfeld, Ronen Feldman, Lyle Ungar
(Hebrew University, Israel and University of Pennsylvania, USA) (533)

Passage Relevance Models for Genomics Search
Jay Urbain, Ophir Frieder, Nazli Goharian
(Illinois Institute of Technology, USA) (587)

Cross-Document Cross-Lingual Coreference Retrieval
Elif Aktolga, Marc-Allen Cartright, James Allan
(University of Massachusetts, Amherst, USA) (708)

Siphon++: A Hidden-WebCrawler for Keyword-Based Interfaces
Karane Vieira, Luciano Barbosa, Juliana Freire, Altigran Silva
(University of Utah, USA and Federal University of Amazonas, Brazil) (761)

Investigating External Corpus and Clickthrough Statistics for Query Expansion in the Legal Domain
Tonya Custis, Khalid Al-Kofahi
(Thomson Reuters, USA) (785)

Corpus Microsurgery: Criteria Optimization for Medical Cross-Language IR
Monica Rogati, Yiming Yang, Jaime Carbonell
(LinkedIn and Carnegie Mellon University, USA) (990)

Metadata Extraction and Indexing for Map Search in Web Documents
Tan Qingzhao, Prasenjit Mitra, C. Lee Giles
(Pennsylvania State University, USA) (1002)


An Integration Strategy for Mining Product Features and Opinions
Qingliang Miao, Qiudan Li, Ruwei Dai
(CAS, China) (205)

Overlapping Community Structure Detection in Networks
Nan Du, Bin Wu, Bai Wang
(Beijing University of Posts and Telecom, China) (218)

Coreference Resolution using Expressive Logic Models
Ki Chan, Wai Lam, Xiaofeng Yu
(The Chinese University of Hong Kong, China) (237)

A Method to Predict Social Annotations
Ming-Hung Hsu, Hsin-Hsi Chen
(National Taiwan University, Taiwan) (258)

Large Maximal Cliques Enumeration in Sparse Graphs
Natwar Modani, Kuntal Dey
(IBM India Research Lab, India) (263)

Summarization of Social Activity over Time: People, Actions and Concepts in Dynamic Networks
Yu-Ru Lin, Hari Sundaram, Aisling Kelliher
(Arizona State University, USA) (344)

Using Tag Semantic Network for Keyphrase Extraction in Blogs
Lizhen Qu, Iryna Gurevych, Christof MUller
(Technische Universitat Darmstadt, Germany) (461)

Handling Implicit Geographic Evidence for Geographic IR
Nuno Cardoso, Mario J. Gaspar da Silva, Diana Santos
(University of Lisbon, Portugal ) (522)

Estimating Real-valued Characteristics of Criminals from their Recorded Crimes
Richard Bache, Fabio Crestani
(University of Strathclyde, UK and University of Lugano, Switzerland) (584)

Representative Entry Selection for Profiling Blogs
Jinfeng Zhuang, Steven C.H. Hoi, Aixin Sun, Rong Jin, Maxim Kormilitsin
(Nanyang Technological University, Singapore and Michigan State University, USA) (665)

Efficient Web Matrix Processing based on Dual Reordering
Hsu Chih-Ming, Ming-Syan Chen
(National Taiwan University, Taiwan) (808)

CoreEx: Heuristic Content Extraction from Online News Articles
Jyotika Prasad, Andreas Paepcke
(Stanford University, USA) (832)

A Novel Email Abstraction Scheme for Spam Detection
Chi-Yao Tseng, Ming-Syan Chen, Pin-Chieh Sung
(National Taiwan University, Taiwan) (801)

Tag-Based Filtering for Personalized Bookmark Recommendations
Pavan Kumar Vatturi, Werner Geyer, Casey Dugan, Michael Muller, Beth Brownholtz
(Oregon State University, USA) (811)

Closing the Loop in Webpage Understanding
Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, Ji-Rong Wen
(Microsoft Research Asia, China) (1017)

Keynote Speech
(Session Chair: Alek Kolcz)
Markov Logic: A Unifying Language for Information and Knowledge Management
Pedro Domingos
(University of Washington, USA)

Poster Booster

Coffee Break and posters

Session 4


DB / Industry: XML Data Integration and XML Query Optimization
(Session Chair: Caetano Traina)

Rewriting of Visibly Pushdown Languages for XML Data Integration
Alex Thomo, Venkatesh Srinivasan
(University of Victoria, Canada) (693)

Some Rewrite Optimizations of XQuery Navigation in DB2
Jarek Gryz, Guangjun Xie, Qi Cheng, Calisto Zuzarte
(York University, Canada) (315)

Pruning Nested XQuery Queries
Billel Gueni, Talel Abdessalem, Bogdan Cautis, Emmanuel Waller
(LTCI - Telecom ParisTech and Universite de Paris-Sud, France) (231)

Heuristic Approaches for Checking Containment of Generalized Tree-Pattern Queries
Pawel Placek, Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis
(New Jersey Institute of Technology, USA) (979)

IR: Evaluation
(Session Chair: Ian Soboroff)

Retrievability: An Evaluatoin Measure for Higher Order Information Access Tasks
Leif Azzopardi, Vishwa Vinay
(University of Glasgow, UK and Microsoft Research Labs Cambridge, UK) (120)

Statistical Power in Retrieval Experimentation
William Webber, Alistair Moffat, Justin Zobel
(University of Melbourne, Australia) (946)

Comparing Metrics across TREC and NTCIR: The Robustness to System Bias
Sakai Tetsuya
(NewsWatch, Inc., Japan) (40)

How Evaluator Domain Expertise Affects Search Result Relevance Judgements
Kenneth Kinney, Scott Huffman, Juting Zhai
(Google, Inc., USA) (663)

KM: Statistical Techniques
(Session Chair: Omid Madani)

Clustered Subset Selection and its Applications on IT Service Metrics
Christos Boutsidis, Jimeng Sun, Nikos Anerousis
(IBM T.J. Watson Research and Carnegie Mellon University, USA) (805)

The Query-Flow Graph: Model and Applications
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis, Sebastiano Vigna
(Yahoo! Research Barcelona, Spain) (701)

A Framework for Estimating Complex Probability Density Structures in Data Stream
Arnold Boedihardjo, Chang-Tien Lu, Chen Feng
(Virginia Tech, USA) (100)

Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles
Pinar Donmez, Jaime Carbonell
(Carnegie Mellon University, USA) (613)

Panel Discussion: E-Discovery (Chair: David A. Evans)

Why E-Discovery is a CIKM-Hard Problem
David A. Evans, President, CEO, and Chief Scientist
(JustSystems Evans Research, USA)

Panning for Gold in E-Discovery: What Every Information Scientist Should Know About the Way Lawyers Search for Electronic Evidence Jason R. Baron, Director of Litigation
(National Archives and Records Administration, USA)

IR Perspectives on the E-Discovery Problem
Chris Buckley, President
(Sabir Research, USA)

Technical Cases Studies from the E-Discovery Front Lines
Robert S. Bauer, Chief Technology Officer
(H5, USA)

Lunch Break and posters

Session 5


DB: Indexing and Physical Query Optimization
(Session Chair: Agma Traina)

Exploiting Pipeline Interruptions for Efficient Memory Allocation
Joseph Aguilar Saborti, Mohammad Jalali, Dave Sharpe, Victor Muntes-Mulero
(IBM, USA and Universitat Politecnica de Catalunya, Spain) (303)

A New Method for Indexing Genomes Using On-Disk Suffix Trees
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upton
(University of Victoria, Canada) (857)

Supporting Sub-Document Updates and Queries in an Inverted Index
Vuk Ercegovac, Vanja Josifovski, Ning Li, Mauricio Mediano, Eugene Shekita
(IBM Almaden Research Center and Yahoo! Research, USA) (163)

Modeling LSH for Performance Tuning
Wei Dong, Zhe Wang, William Josephson, Moses Charikar, Kai Li
(Princeton University, USA) (908)

IR: Web Search 2
(Session Chair: Thorsten Joachims)

Can Phrase Indexing Help to Process Non-Phrase Queries?
Mingjie Zhu, Shuming Shi
(University of Science and Technology of China and Microsoft Research Asia, China) (996)

Matching Task Profiles and User Needs in Personalized Web Search
Julia Luxenburger, Shady Elbassuoni, Gerhard Weikum
(Max-Planck Institute for Informatics, Germany) (583)

Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs
Rosie Jones, Kristina Klinkner
(Yahoo! Research, USA) (838)

Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion
Hao Ma, Haixuan Yang, Irwin King, Michael R. Lyu
(The Chinese University of Hong Kong, China) (912)

IR: Multilingual & Multimedia
(Session Chair: Jian-Yun Nie)

Simultaneous Multilingual Search for Translingual Information Retrieval
Kristen Parton, Kathleen McKeown, James Allan, Enrique Henestroza
(Columbia University and University of Massachusetts Amherst, USA) (919)

Translation Enhancement: A New Relevance Feedback Method for Cross-Language Information Retrieval
Daqing He, Dan Wu
(University of Pittsburgh, USA) (585)

High-Dimensional Descriptor Indexing for Large Multimedia Databases
Eduardo Valle, Matthieu Cord, Sylvie Philipp-Foliguet
(ETIS, France) (595)

On Low Dimensional Random Projections and Similarity Search
Yu-En Lu, Pietro Lio, Steven Hand
(University of Cambridge, UK) (681)

KM: Data Mining
(Session Chair: Raj Bhatnagar)

Fast Mining of Complex Time-Stamped Events
Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos
(Carnegie Mellon University, USA) (109)

Predicting Individual Disease Risk Based on Medical History
Darcy Davis, Nitesh Chawla, Nicholas Christakis, Nicholas Blumm, Laszlo Barabasi
(University of Notre Dame, USA) (917)

Identification of Gene Function Using Prediction by Partial Matching (PPM) Language Models
Malika Mahoui, W. Tehan, Arvind Kumar Thirumalaiswamy Sekhar, S. Chilukuri
(IUPUI, Bangor University and Dow AgroSciences, India) (735)

Fast Correlation Analysis on Time Series Datasets
Philon Nguyen, Nematollaah Shiri
(Concordia University, USA) (1073)

KM: Semantic Techniques
(Session Chair: Marko Rodriguez)

Wildcards for Lightweight Information Integration in Virtual Desktops
Rodolfo Stecher, Claudia Niederee, Wolfgang Nejdl
(University of Hannover, Germany) (1060)

Finding Informative Commonalities in Concept Collections
Simona Colucci, Eugenio Di Sciascio, Francesco Donini, Eufemia Tinelli
(D.O.O.M. s.r.l., Italy) (685)

Association Thesaurus Construction Methods based on Link Co-occurrence Analysis for Wikipedia
Masahiro Ito, Kotaro Nakayama,Takahiro Hara, Shojiro Nishio
(Osaka University, Japan) (928)

Peer Production of Structured Knowledge - An Empirical Study of Ratings and Incentive Mechanisms
Christian Huetter, Conny Kuehne
(University of Karlsruhe, Germany) (714)

Coffee Break and posters

Session 6

DB: Security and Privacy
(Session Chair: Ioana Stanoi)

Efficient Techniques for Document Sanitization
Venkatesan Chakaravarthy, Himanshu Gupta, Prasan Roy, Mukesh Mohania
(IBM India Research Lab, India) (225)

Vanity Fair: Privacy in Querylog Bundles
Rosie Jones, Ravi Kumar, Bo Pang, Andrew Tomkins
(Yahoo! Research, USA) (851)

Dual Encryption for Query Integrity Assurance
Haixun Wang, Jian Yin, Chang-Shing Perng, Philip Yu
(IBM T. J. Watson Research Center, USA) (1044)

Records Retention in Relational Database Systems
Ahmed Ataullah, Frank Tompa, Ashraf Aboulnaga
(University of Waterloo, Canada) (443)

IR: Medley
(Chair: Jimmy Huang)

Joke Retrieval
Lisa Friedland, James Allan
(University of Massachusetts Amherst, USA) (952)

Ranked Feature Fusion Models for Ad Hoc Retrieval
Jeremy Pickens, Gene Golovchinsky
(FX Palo Alto Lab, Inc., USA) (108)

AdaSum: An Adaptive Model for Summarization
Zhang Jin
(Chinese Academy of Sciences, China) (368)

Modeling Hidden Topics on Document Manifold
Cai Deng
(University of Illinois at Urbana Champaign, USA) (812)

IR: Recommender Systems
(Session Chair: Irwin King)

Tapping on the Potential of Q&A Community by Recommending Answer Providers
Jinwen Guo, Shengliang Xu, Shenghua Bao, Yong Yu
(Shanghai Jiao Tong University, China) (133)

SoRec: Social Recommendation Using Probabilistic Matrix Factorization
Hao Ma, Haixuan Yang, Michael R. Lyu, Irwin King
(Chinese University Hong Kong, China) (885)

Probabilistic Polyadic Factorization and Its Application to Personalized Recommendation
Yun Chi, Shenghuo Zhu, Yihong Gong, Yi Zhang
(NEC Laboratories America and University of California Santa Cruz, USA) (983)

A Random Walk on the Red Carpet: Rating Movies with User Reviews and PageRank
Derry Wijaya, Stephane Bressan
(National University of Singapore, Singapore) (227)

KM: Feature Selection
(Session Chair: Monica Rogati)

REDUS: Finding Reducible Subspaces in High Dimensional Data
Xiang Zhang, Feng Pan, Wei Wang
(University of North Carolina at Chapel Hill, USA) (297)

Mining Influential Attributes that Capture Class and Group Contrast Behaviour
Elsa Loekito, James Bailey
(University of Melbourne, Australia) (427)

Real-Time Data Pre-Processing Technique for Efficient Feature Extraction in Large Scale Data
Ying Liu, Lucian Vlad Lita, Radu Stefan Niculescu, Kun Bai, Prasenjit Mitra, C. Lee Giles
(Penn State University & Siemens, USA) (671)

Structure Feature Selection for Graph Classification
Hongliang Fei, Jun Huan
(University of Kansas, USA) (325)

Panel Discussion 2: The Social (Open) Workspace (Chair: David A. Evans)

Automating Knowledge
Susan Feldman, Research Vice President
(Search and Digital Marketplace Technologies, IDC, USA)

How to Augment Social Cognition
Ed H. Chi, Area Manager and Senior Research Scientist
(Palo Alto Research Center (PARC), USA)

Trust and Design in Social Systems
Natasa Milic-Frayling, Principal Researcher and Director of Research Partnership Programme
(Microsoft Research Cambridge, UK)

Using Social Networks for Social Work
Igor Perisic, Director of Search
(LinkedIn, USA)



Efficient Processing of Probabilistic Spatio-Temporal Range Queries over Moving Objects
Bruce Chung, Wang-Chien Lee, Arbee Chen (Pennsylvania State University, USA) (576)

Data Degradation: Making Private Data Less Sensitive Over Time
Nicolas Anciaux, Luc Bouganim, Harold J. W. van Heerde, Philippe Pucheral, Peter M. G. Apers
(INRIA, France and University of Twente, The Netherlands) (604)

A Light Weighted Damage Tracking Quarantine and Recovery Scheme for Mission-Critical Database Systems
Kun Bai
(Pennsylvania State University, USA) (673)

Query Optimization in XML-Based Information Integration
Dongfeng Chen, Rada Chirkova, Maxim Kormilitsin, Fereidoon Sadri, Timo Salo
(North Carolina State University and University of North Carolina Greensboro, USA) (679)

Estimating the Number of Answers with Guarantees for Structured Queries in P2P Databases
Marcel Karnstedt, Kai-Uwe Sattler, Michael Hass, Manfred Hauswirth, Brahmananda Sapkota, Roman Schmidt
(TU Ilmenau, Germany) (722)

Evaluating Partial Tree-Pattern Queries on XML Streams
Xiaoying Wu, Dimitri Theodoratos
(New Jersey Institute of Technology, USA) (790)

Characterization of TPC-H Queries for a Column-Oriented Database on a Dual-Core AMD Athlon Processor
Pranav Vaidya, Jaehwan John Lee
(Indiana University, Purdue University Indianapolis, USA) (796)


Natural Language Retrieval of Grocery Products
Petteri Nurmi, Eemil Lagerspetz, Wray Buntine, Patrik Floreen, Joonas Kukkonen, Peter Peltonen
(Helsinki Institute for Information Technology, Finland) (90)

Improve the Effectiveness of the Opinion Retrieval and Opinion Polarity Classification
Wei Zhang, Lifeng Jia, Clement Yu, Weiyi Meng
(University of Illinois at Chicago and SUNY Binghamton, USA) (95)

A Latent Variable Model for Query Expansion Using the Hidden Markov Model
Qiang Huang, Dawei Song
(The Open University, UK) (286)

A Survey of Pre-Retrieval Query Performance Predictors
Claudia Hauff, Djoerd Hiemstra, Franciska de Jong
(University of Twente, The Netherlands) (291)

Modeling Document Features for Expert Finding
Jianhan Zhu, Dawei Song, Stefan Rueger, Jimmy Huang
(University College London, The Open University, Imperial College London, UK and York University, Canada) (339)

Mining Named Entity Transliteration Equivalents from Comparable Corpora
Raghavendra Udupa, Saravanan K. Kumaran, A. Jagadeesh Jagarlamudi
(Microsoft Research, India) (435)

Estimating Retrieval Effectiveness using Rank Distributions
Vishwa Vinay, Ingemar Cox, Natasa Milic-Frayling
(Microsoft Research Labs, Cambridge, UK) (478)

Semi-supervised Ranking Aggregation
Shouchun Chen, Fei Wang, Yaangqiu Song, Changshui Zhang
(Tsinghua University, China) (544)

Ranking in Folksonomy Systems? Can Context Help?
Fabian Abel, Nicola Henze, Daniel Krause
(Leibniz University, Germany) (548)

Evaluating Topic Models for Information Retrieval
Xing Yi, James Allan
(University of Massachusetts Amherst, USA) (560)

A Novel Statistical Chinese Language Model and Its Application in Pinyin-to-Character Conversion
Bo Lin, Jun Zhang
(NTU, Singapore) (716)

Integrating Clustering and Multi-Document Summarization to Improve Document Understanding
Dingding Wang, Shenghuo Zhu, Yun Chi, Tao Li
(Florida International University and NEC Laboratories America, USA) (826)

Answering General Time Sensitive Queries
Wisam Dakka, Luis Gravano, Panagiotis Ipeirotis
(Columbia University and New York University, USA) (837)

Search-based Query Suggestion
Jiang-Ming Yang, Rui Cai, Feng Jing, Shuo Wang, Lei Zhang, Wei-Ying Ma
(Microsoft Research Asia, China) (966)

Entity-Based Query Reformulation Using Wikipedia
Yang Xu, Fan Ding, Bin Wang
(Institute of Computing Technology, China) (1093)


Group-based Learning -- A Boosting Approach
Weijian Ni, Jun Xu, Hang Li, Yalou Huang
(Nankai University and Microsoft Research Asia, China) (23)

Collaborative Partitioning with Maximum User Satisfaction
Fred Annexstein, Svetlana Strunjas
(University of Cincinnati, USA) (96)

Efficient Frequent Pattern Mining over Data Streams
Syed Tanbeer, Chowdhury Ahmed, Byeong-Soo Jeong, Young-Koo Lee
(Kyung Hee University, Korea) (210)

GHOST: An Effective Graph-based Framework for Name Distinction
Xiaoming Fan, Jianyong Wang, Bing Lv, Lizhu Zhou, Wei Hu
(Tsinghua University, China) (219)

Deriving Non-Redundant Approximate Association Rules from Hierarchical Datasets
Gavin Shaw, Yue Xu, Shlomo Geva
(Queensland University of Technology, Australia) (238)

Pattern-based Semantic Class Discovery with Multi-Membership Support
Shuming Shi, Xiaokang Liu, Ji-Rong Wen
(Microsoft Research Asia, China) (259)

Detecting Significant Distinguishing Sets Among Bi-clusters
Faris Alqadah, Raj Bhatnagar
(University of Cincinnati, USA) (307)

Semi-Supervised Metric Learning by Maximizing Constraint Margin
Fei Wang, Shouchun Chen, Tao Li, Changshui Zhang
(Tsinghua University, China and Florida International University, USA) (431)

On Quantifying Changes in Temporally Evolving Dataset
Rohan Choudhary, Sameep Mehta, Amitabha Bagchi
(Indian Institute of Technology, India) (519)

Fast Spatial Co-location Mining Without Cliqueness Checking
Zhongshan Lin, SeungJin Lim
(Utah State University, USA) (566)

Decomposition of Terminology Graphs for Domain Knowledge Acquisition
Fidelia Ibekwe-SanJuan, Eric SanJuan, Michael Vogeley
(University of Lyon, France and Drexel University, USA) (630)

In the Development of a Spanish MetaMap
Francisco Carrero, Jose Carlos Cortizo, Jose Maria Gomez Hidalgo, Manuel de Buenaga
(Universidad Europea de Madrid, Spain) (635)

Scalable Complex Pattern Search in Sequential Data
Kaghazian Leila, Reza Sadri, Dennis McLeod
(University of Southern California, USA) (834)

Combining Concept Hierarchies and Statistical Topic Models
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers
(University of California, Irvine, USA) (967)

Keynote Speech
(Session Chair: Yi Zhang)
Unsolved Problems in Search (and How We Might Approach Them)
W. Bruce Croft
(University of Massachusetts Amherst, USA)

Poster Booster

Coffee Break and posters

Session 7


IR: Advertising & Filtering
(Session Chair: Tetsuya Sakai)

To Swing or not to Swing: Learning When (not) to Advertise
Andrei Broder, Massimiliano Ciaramita, Marcus Fontoura, Evgeniy Gabrilovich, Vanja Josifovski, Donald Metzler, Vanessa Murdock, Vassilis Plachouras
(Yahoo! Research, USA, Pontificia Universidade Catolica do Rio de Janeiro, Brazil and Yahoo! Research Barcelona, Spain) (132)

Search Advertising using Web Relevance Feedback
Andrei Broder, Peter Ciccolo, Marcus Fontoura, Evgeniy Gabrilovich, Vanja Josifovski, Lance Riedel
(Yahoo! Research, USA and Pontificia Universidade Catolica do Rio de Janeiro, Brazil) (900)

A Two-stage Text Mining Model for Information Filtering
Yuefeng Li, Xujuan Zhou, Peter Bruza, Yue Xu, Raymond Y.K. Lau
(Queensland University of Technology, Australia) (203)

Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory
Canhui Wang, Min Zhang, Liyun Ru, Shaoping Ma
(Tsinghua University, China) (437)

IR: Blog
(Session Chair: Belle Tseng)

Key Blog Distillation: Ranking Aggregates
Craig Macdonald, Iadh Ounis
(University of Glasgow, UK) (72)

Blog Site Search Using Resource Selection
Jangwon Seo, Bruce Croft
(University of Massachusetts Amherst, USA) (397)

An Effective Statistical Approach to Blog Post Opinion Retrieval
Ben He, Craig Macdonald, Jiyin He, Iadh Ounis
(University of Glasgow, UK and University of Amsterdam, NL) (266)

KM: Clustering
(Session Chair: Nitesh Chawla)

A Consensus Based Approach to Constrained Clustering of Software Requirements
Chuan Duan, Jane Cleland-Huang and Bamshad Mobasher
(DePaul University, USA) (833)

Data Weaving: Scaling Up the State of the Art in Data Clustering
Ron Bekkerman, Martin Scholz
(HP Labs, USA) (800)

EDSC: Efficient Density-Based Subspace Clustering
Ira Assent, Ralph Krieger, Emmanuel Muller, Thomas Seidl
(RWTH Aachen University, Germany) (464)

An Effective Algorithm for Mining 3-Clusters in Vertically
Faris Alqadah, Raj Bhatnagar
(University of Cincinnati, USA) (308)

Industry Day (1)

The Mountains or the Street Lamp: Search, Research, and Research Again
Christopher J. C. Burges
(Microsoft Research, USA)

Crowdsourcing for Relevance Evaluation
Daniel E. Rose
(, USA)

Hadoop: Industrial-Strength Open Source for Data-Intensive Supercomputing
Doug Cutting
(Yahoo!, USA)

Lunch Break and posters

Session 8

IR: Enterprise Search
(Session Chair: David A. Evans)

Multi-Aspect Expertise Matching for Review Assignment
Maryam Karimzadehgan, ChengXiang Zhai, Geneva Belford
(University of Illinois at Urbana-Champaign, USA) (700)

Dr. Searcher and Mr. Browser: A Unified Hyperlink-Click Graph
Barbara Poblete, Carlos Castillo, Aristides Gionis
(University Pompeu Fabra and Yahoo! Research Barcelona, Spain) (541)

Modeling Multi-step Relevance Propagation for Expert Finding
Pavel Serdyukov, Henning Rode, Djoerd Hiemstra
(University of Twente, The Netherlands) (51)

Trada: Tree Based Ranking Function Adaptation
Keke Chen, Rongqing Lu, CK Wong, Gordon Sun, Larry Heck, Belle Tseng
(Yahoo!, USA) (705)

IR: Structured Documents
(Session Chair: Mario J. Gaspar da Silva)

Structural Relevance: A Common Basis for the Evaluation of Structured Document Retrieval
Sadek Ali, Mariano Consens, Gabriella Kazai, Mounia Lalmas
(University of Toronto, Canada, Microsoft Research Cambridge and Queen Mary University of London, UK) (684)

A Generative Retrieval Model for Structured Documents
Le Zhao, Jamie Callan
(Carnegie Mellon University, USA) (628)

A Densitometric Approach to Web Page Segmentation
Christian Kohlschutter, Wolfgang Nejdl
(University of Hannover, Germany) (549)

Using Structured Text for Large-Scale Attribute Extraction
Sujith Ravi, Marius Pasca
(University of Southern California and Google, Inc., USA) (107)

KM: Text Mining
(Session Chair: Alessandro Moschitti)

Identification of Class Specific Discourse Patterns
Anup Kumar Chalamalla, Sumit Negi, L. Venkata Subramaniam, Ganesh Ramakrishnan
(IBM India Research Lab and Indian Institute of Technology, India) (247)

Scalable Community Discovery on Textual Data with Relations
Huajing Li, Zaiqing Nie, Wang-Chien Lee, C. Lee Giles, Ji-Rong Wen
(Pennsylvania State University, USA and Microsoft Research Asia, China) (728)

Information Shared by Many Objects
Chong Long, Xiaoyan Zhu, Ming Li, Bin Ma
(Tsinghua University, China) (942)

Extremely Fast Text Feature Extraction for Classification and Indexing
George Forman, Evan Kirshenbaum
(HP Labs, USA) (98)

Industry Day (2)

Statistical Learning as the Ultimate Agile Development Tool
Peter Norvig
(Google, USA)

The Evolving Computational Advertising Landscape
Andrei Broder
(Yahoo! Research, USA)

Coffee Break and posters

Session 9

DB: Mobile and Distributed Data Management
(Session Chair: Talel Abdessalem)

Valid Scope Computation for Location-Dependent Spatial Query in Mobile Broadcast Environments
Ken C. K. Lee, Josh Schiffman, Baihua Zheng, Wang-Chien Lee
(Pennsylvania State University, USA and Singapore Management University, Singapore) (416)

Adaptive Distributed Indexing for Structured Peer-to-Peer Networks
Linh Nguyen, Wai Gen Yee, Ophir Frieder
(Illinois Institute of Technology, USA) (720)

PROQID: Partial Restarts of Queries in Distributed Databases
Jon Olav Hauglid, Kjetil Norvag
(Norwegian University of Science and Technology, Norway) (244)

(Session Chair: Lucian Vlad Lita)

Answering Questions with Authority
Andrew Hickl
(Language Computer Corporation, USA) (124)

Cache-aware Load Balancing for Question Answering
David Dominguez-Sal, Mihai Surdeanu, Josep Aguilar-Saborit, Josep-LL. Larriba-Pey
(DAMA-UPC, Spain) (653)

A System for Finding Biological Entities that Satisfy Certain Conditions from Texts
Wei Zhou, Clement Yu, Weiyi Meng
(University of Illinois at Chicago and SUNY Binghamton, USA) (385)

KM: Information Extraction
(Session Chair: Eugene Agichtein)

Intra-Document Structural Frequency Features for Semi-Supervised Domain Adaptation
Andrew Arnold, William Cohen
(Carnegie Mellon University, USA) (176)

Academic Conference Homepage Understanding Using Constrained Hierarchical Conditional Random Fields
Xin Xin, Juanzi Li, Jie Tang, Qiong Luo
(Tsinghua University and HKUST, China) (398)

Identifying Table Boundaries in Digital Documents via Sparse Line Detection
Ying Liu, Prasenjit Mitra, C. Lee Giles
(Pennsylvania State University, USA) (625)

Industry Day (3)
(Chair: Andrew Tomkins)

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent
William Chang
(, China)

Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO
Ronny Kohavi
(Microsoft, USA)

The Secret History of Silicon Valley
Steven Gary Blank
(Stanford University, USA)

Closing Remarks



Energy-Efficient Skyline Query Processing and Maintenance in Sensor Networks
Weifa Liang, Baichen Chen, Jeffrey Xu
(Australian National University, Australia and Chinese University of Hong Kong, China) (896)

Table Summarization with the Help of Domain Lattices
K. Selcuk Candan, Huiping Cao, Yan Qi, Maria Luisa Sapino
(Arizona State University, USA) (907)

Protecting Location Privacy against Location-Dependent Attack in Mobile Services
Xiao Pan, Jianliang Xu, Xiaofeng Meng
(RUC,PR. and Hong Kong Baptist University, China) (1018)

Polyhedral Transformation for Indexed Rank Order Correlation Queries
Philon Nguyen, Nematollaah Shiri
(Concordia University, USA) (1072)

Workload-Based Optimization of Integration Processes
Matthias Boehm, Dirk Habich, Wolfgang Lehner, Uwe Wloka
(Dresden University of Applied Sciences, Germany) (1077)


Re-Considering Collaborative Filtering Parameters in the Context of New Data
Adele Howe, Ryan Forbes
(Colorado State University, USA) (75)

Efficient Estimation of the Size of Text Deep Web Data Source
Jianguo Lu
(University of Windsor, Canada) (144)

A GeoReferencing Multistage Method for Locating Geographic Context in Web Search
Alvaro Zubizarreta, Pablo de la Fuente, Jose M. Cantera, Mario Arias, Jorge Cabrero, Guido Garcia, Cesar Llamas, Jesus Vegas
(Universidad de Valladolid, Spain) (242)

Suppressing Outliers in Pairwise Preference Ranking
Vitor Carvalho, Jonathan Elsas, William Cohen, Jaime Carbonell
(Carnegie Mellon University, USA) (256)

Incorporating Place Name Extents into Geo-IR Ranking
Hiroyuki Toda, Norihito Yasuda, Yumiko Matsuura, Ryoji Kataoka
(NTT Corporation, Japan) (280)

The Effect of Contextualization at Different Granularity Levels in Content-oriented XML Retrieval
Paavo Arvola, Jaana Kekalainen, Marko Junkkari
(University of Tampere, Finland) (341)

Using Current Browsing Context to Improve Search Relevance
Mandar Rahurkar, Silviu Cucerzan
(University of Illinois at Urbana Champaign and Microsoft Research, USA) (590)

Using a Graph-based Ontological User Profile for Personalizing Search
Mariam Daoud, Lynda Tamine-Lechani, Mohand Boughanem
(IRIT, France) (721)

Measuring User Preference Changes in Digital Libraries
Yang Sun, Huajing Li, Isaac G. Councill, Wang-Chien Lee, C. Lee Giles
(Pennsylvania State University, USA) (729)

Utilization of Navigational Queries for Result Presentation and Caching in Search Engines
Rifat Ozcan, Ismail Altingovde, Ozgur Ulusoy
(Bilkent University, Turkey) (768)

ShopSmart: Making Recommendations based on Technical Specifications and User Reviews
Alexander Yates, James Joseph, Alexander Cohn, Nick Sillick, Ana-Maria Popescu
(Temple University, USA) (802)

Trust, Authority and Popularity in Social Information Retrieval
Gabriella Kazai, Natasa Milic-Frayling
(Microsoft Research, UK) (817)

A Spam Resistant Family of Concavo-Convex Ranks for Link Analysis
Sreangsu Acharyya, Joydeep Ghosh
(University of Texas Austin, USA) (981)


Boosting Social Annotations Using Propagation
Shenghua Bao, Bohai Yang, Ben Fei, Shengliang Xu, Zhong Su, Yong Yu
(Shanghai Jiao Tong University, China) (148)

Effective Pattern Taxonomy Mining in Text Documents
Yuefeng Li, Sheng-Tang Wu, Xiaohui Tao
(Queensland University of Technology, Australia) (193)

Incorporating Topical Support Documents into a Small Training Set in Text Categorization
Kyung Soon Lee
(Chonbuk National University, Korea) (199)

Exploiting Context to Detect Sensitive Information in Call Center Conversations
Tanveer Faruquie, Sumit Negi , Anup Kumar Chalamalla, L. Venkata Subramaniam
(IBM India Research Laboratory, India) (250)

Multi-scale Characterization of Social Network Dynamics in the Blogosphere
Munmun De Choudhury, Hari Sundaram, Ajita John, Doree Duncan Seligmann
(Arizona State University, USA) (363)

Semi-supervised Text Categorization by Active Search
Zenglin Xu, Rong Jin, Kaizhu Huang, Michael R. Lyu, Irwin King
(Chinese University Hong Kong, China and Michigan State University, USA) (413)

Clustering Multi-way data via Adaptive Subspace Iteration
Wei Peng, Tao Li, Bo Shao
(Florida International University, USA) (418)

A Coarse-grain Grid-based Subspace Clustering Method for Online Multi-dimensional Data Streams
Jae Woo Lee, Won Suk Lee
(Yonsei University, Korea) (471)

A Matrix-based Approach for Semi-supervised Document Co-clustering
Yanhua Chen, Lijun Wang, Ming Dong
(Wayne State University, USA) (772)

Categorizing Bloggers' Interests Based on Short Snippets of Blog Posts
Jiahui Liu, Larry Birnbaum, Bryan Pardo
(Northwestern University, USA) (902)

