Accessing Cultural Heritage at Scale

Wednesday, June 22 (2-5pm) - half-day workshop

Our focus is on challenges and opportunities, current and emerging developments in the area of information access via exploration and discovery in large-scale digital libraries and collections, particularly in the cultural heritage domain. We will consider the underlying technologies which enable this access, as well as interaction functionalities, and user evaluations. Our goal is to identify the needs of providers and their users, assess the current state-of-the-art, and to identify challenges and prioritize areas of future research potential.

The workshop is focused on all aspects of supporting access, exploration and discovery within large-scale digital libraries, especially within cultural heritage. This fits with the JCDL conference theme of 'Big Libraries, Big Data, Big Innovation' to include information access issues and solutions in cultural heritage that focus on volume, variety and velocity of library content, and also variety (complexity, diversity) of users and uses. Contributions may include findings from completed empirical studies or work-in-progress, as well as position papers inviting discussions of emerging and future developments.

Organising Committee

  • Paul Clough (University of Sheffield, UK)
  • Paula Goodale (University of Sheffield, UK),
  • Maristella Agosti (University of Padua, Italy)
  • Séamus Lawless (Trinity College, Dublin, Ireland)

Physical Samples and Digital Libraries

Wednesday, June 22 (2-5pm) - Thursday, June 23, 2016 (9am-12pm)

This workshop will bring together the community of researchers, curators, and practitioners from both the earth sciences (and related sciences that collect and manage samples such as hydrology, archeology, etc.) and digital library scholarly communities who are interested in studying the issues involved in the management of samples, sample collections, and sample-based data in the field, in the lab, in repositories, in data systems and scientific publications. The intention is both to assemble the existing community as well as invite those with emerging interests in this area. A secondary goal is to focus the attention of the digital libraries community on the tremendous opportunities for research in this space and for collaborating with researchers in the Earth Sciences.

Research in the Earth Science disciplines depends on the availability of representative samples collected above, at, and beneath Earth's surface, on the moon and in space, or those generated in experiments. These physical samples serve as fundamental references for generating new knowledge about the earth and the entire universe, contribute to and a deeper understanding of the processes that created and shaped it, assess the the availability of natural resources, and measure the risk of natural hazards. Many samples have been collected at great cost and with substantial difficulty, are rare or unique, and irreplaceable. The EarthCube ( Research Coordination Network (RCN) iSamplES (Internet of Samples in the Earth Sciences) aims to advance the use of innovative cyberinfrastructure to connect physical samples and sample collections across the Earth Sciences with digital data infrastructures to revolutionize their utility in the support of science.


  • Unmil Karadkar (, School of Information, The University of Texas at Austin
  • Kerstin Lehnert (, Lamont-Doherty Earth Observatory, Columbia University
  • Chris Lenhardt (, Renaissance Computing Institute, University of North Carolina at Chapel Hill 

Web Archiving and Digital Libraries (WADL)

Wednesday, June 22 (2-5pm) - Thursday, June 23, 2016 (9am-12pm)

This workshop will explore integration of Web archiving and digital libraries, so the complete life cycle involved is covered: creation/authoring, uploading/publishing in the Web (2.0), (focused) crawling, indexing, exploration (searching, browsing), archiving (of events), etc. It will include particular coverage of current topics of interest, like: big data, mobile web archiving, and systems (e.g., Memento, SiteStory, Hadoop processing).

Workshop Co-chairs:
  • Chair, Edward A. Fox, Professor and Director Digital Library Research Laboratory, Virginia Tech,
  • Co-chair, Zhiwu Xie,, Associate Professor and Technology Development Librarian, Center for Digital Research and Scholarship, University Libraries, Virginia Tech
  • Co-chair, Martin Klein, UCLA,
Program Committee:
  • Jefferson Bailey, Internet Archive, 
  • Mohamed Magdy Farig, Virginia Tech, 
  • Vinay Goel, Internet Archive, 
  • Gina Jones, Library of Congress, 
  • Frank McCown, Harding University, 
  • Michael Nelson, Old Dominion Univ.,
  • Thomas Risse, L3S Research Center, Leibniz Universitat Hannover,
  • Nicholas Taylor, Stanford,
  • Matthew Weber, Rutgers, 
  • Robert Wolven, Columbia, 

Workshop on Mining Scientific Publications (WOSP)

Wednesday, June 22 (2-5pm) - Thursday, June 23, 2016 (9am-12pm)

Digital libraries that store scientific publications are becoming increasingly central to the research process. They are not only used for traditional tasks, such as finding and storing research outputs, but also as a source for discovering new research trends or evaluating research excellence. With the current growth of scientific publications deposited in digital libraries, it is no longer sufficient to provide only access to content. To aid research, it is especially important to leverage the potential of text and data mining technologies to improve the process of how research is being done. 

This workshop aims to bring together people from different backgrounds who are interested in analysing and mining databases of scientific publications, developing systems that enable such analysis and mining of scientific databases (especially those who run databases of publications), and developing novel technologies that improve the way research is being done.

Organizing Committee:
  • Petr Knoth, Knowledge Media institute, The Open University, UK
  • Drahomira Herrmannova, Knowledge Media institute, The Open University, UK
  • Lucas Anastasiou, Knowledge Media institute, The Open University, UK
  • Nancy Pontika, Knowledge Media Institute, The Open University, UK

Joint Workshop on Bibliometric-enhanced IR and NLP for Digital Libraries (BIRNDL)

Thursday, June 23, 2016 (9am-5pm)

After the success of the 1st NLPIR4DL workshop in 2009 in Singapore co-located with IJCNLP-ACL 2009 and the first two BIR workshops co-located with ECIR in 2014 and 2015, this joint workshop updates this theme to focus on scholarly publications and data. 

The workshop will investigate how natural language processing, information retrieval, scientometric and recommendation techniques can advance the state-of-the-art in scholarly document understanding, analysis and retrieval at scale. Researchers are in need of assistive technologies to track developments in an area, identify the approaches used to solve a research problem over time and summarize research trends. Digital libraries require semantic search, question-answering and automated recommendation and reviewing systems to manage and retrieve answers from scholarly databases. Full document text analysis can help to design semantic search, translation and summarization systems; citation and social network analyses can help digital libraries to visualize scientific trends, bibliometrics and relationships and influences of works and authors. All these approaches can be supplemented with the metadata supplied by digital libraries, inclusive of usage data, such as download counts.

This workshop will be relevant to scholars in the cross-disciplinary field of Computer Science and Digital Libraries, in particular in the research areas of Natural Language Processing and in Information Retrieval; it will also be important for all stakeholders in the publication pipeline: implementers, publishers and policymakers.

We will be running shared task on scholarly paper processing as part of the workshop. The current shared task will be on automatic paper summarization in the Computational Linguistics (CL) domain. The output summaries will be of two types: faceted summaries of the traditional self-summary (the abstract) and the community summary (the collection of citation sentences ‘citances’). We also propose to group the citances by the facets of the text that they refer to.  For details on the CL-SciSumm Shared Task, see the CL-SciSumm Shared Task at

  • Guillaume Cabanac, University of Toulouse, France
  • Muthu Kumar Chandrasekaran, School of Computing, National University of Singapore, Singapore
  • Ingo Frommholz, University of Bedfordshire in Luton, UK
  • Kokil Jaidka, Big Data Experience Lab, Adobe Research, India
  • Min-Yen Kan, School of Computing, National University of Singapore, Singapore
  • Philipp Mayr, GESIS - Leibniz Institute for the Social Sciences, Germany
  • Dietmar Wolfram, School of Information Studies, University of Wisconsin-Milwaukee, USA