The 19^th ACM Symposium on Document Engineering

September 23, 2019 to September 26, 2019
Berlin, Germany

Symposium

Event	Time
Workshops & tutorials	Sept. 23rd
Workshops & tutorials dinner	Sept. 23rd
Main program start	Sept. 24th, 9:00 am
Welcome reception	Sept. 24th
Conference dinner	Sept. 25th
Main program end	Sept. 26th, 1:00 pm

Program

The full program is available here:

Notes for Presenters: Long and Short papers

If your paper is listed in a session, it will be presented orally, as a long or short presentation. Long presentations are scheduled 25 min (aim for 20 min presentation, 5 min questions) and short presentations are scheduled 20 min (aim for 15 min presentation, 5 min questions).

Notes for Presenters: Posters

If your paper is listed in the poster session, it will be presented as a presentation in the lightning talk session and as a poster in the poster session.

Your paper will be presented during the lightning talk session on Tuesday. Lightning talks are scheduled approx. 2 min. Please prepare 1-2 slides in PDF format and send them to doceng2019@fokus.fraunhofer.de until September, 20th. These slides will be shown during your lightning talk.

Your paper will also be presented during the poster session on Tuesday. Please prepare a poster in A0 (or 3' x 4') and bring it with you to the symposium. Please hand it in at the registration desk. Posters will be posted on day 1 and will be taken down at the end of the symposium. Attendees are welcome to browse posters and chat with the authors during the breaks in addition to the designated poster session.

Tutorials & Competition

(morning) Tutorial: Introduction to XProc 3.0 (Erik Siegel) More details can be found here: Webpage
XProc 3.0 is an XML based programming language for complex document processing. Documents flow through pipelines in which steps perform processing like conversion, validation, split, merge, report, etc. It's an almost perfect fit for the kind of processing necessary in document engineering. The tutorial will cover the applications, fundamentals and syntax/semantics of the language.
(morning) Tutorial: SGML to the rescue - Using SGML with modern HTML (Marcus Reichardt) More details can be found here: Webpage
The tutorial will explore practical techniques for parsing and processing HTML 5 using SGML. There will be ample time for hands-on exercises to start a discussion about the approach and its limitations, and also about recent changes at W3C for the doceng community and others interested in formal web standards.
(afternoon) Tutorial: More than just digital paper - hidden features of the PDF format (Tamir Hassan, Klaas Posselt, Dietrich von Seggern & Thomas Zellmann) More details can be found here: Webpage
The aim of this tutorial is to introduce the audience to the PDF’s additional features, which have grown over the past few years and give practical examples on how they can benefit from generating and exchanging PDF files that go beyond digital representations of the printed page.
(afternoon) Competition: Extractive Text Summarization (Rafael Dueire Lins, Rafael Ferreira & Steven J. Simske) More details can be found here: Webpage
The competition focuses on the challenges of automatic extractive and semi-extractive text summarization.

Keynotes

Franziska Heine, Wikimedia Deutschland

Franziska Heine is the Head of Software & Development at Wikimedia Deutschland, the German chapter of the organisation behind Wikipedia. Wikidata is the biggest project being developed by the department. During the last six years, it’s become a central hub in the landscape of linked open data, containing more than 50 million items.

At DocEng 2019, she will give a keynote on:

Wikidata: The biggest Linked Open Data Commons in the world and how you can make use of it

Wikidata is the biggest, freely accessible database in the world with around 60 million entries. It contains the knowledge about our world in a structured, machine readable form, describes and connects it. But it is not only the data that is being produced but, equally important, the data is connected to it original sources which makes it reliable and trustworthy.

The talk will trace the history of the Wikidata project from its early beginnings as a means to help the Wikipedias around the world to the future we are moving towards. A future in which organizations like libraries, archives, museums but also research labs, universities and governmental institutions are part of a network of independent Wikibase instances that are connected with each other allowing their data to be freely accessible where possible, enriching and cross-linking it where wanted.

We will look at projects that are exploring the possibilities we offer today but also at the challenges that we need to tackle in order to be successful.

Manfred Hauswirth, Fraunhofer FOKUS & TU Berlin

Prof. Dr. Manfred Hauswirth is managing director of the Fraunhofer Institute FOKUS and holds the chair of “Open Distributed Systems” at the Technische Universität Berlin. His research focuses on distributed information systems, the internet of things, data stream processing and linked data, semantics, and artificial intelligence. In these fields, he has garnered numerous international prizes for his projects and is an active member of many scientific and political committees for the development of digitalization. He is a Principal Investigator at the Weizenbaum Institute, the Einstein Center Digital Future (ECDF), the Berlin Big Data Center (BBDC) and the Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS).

At DocEng 2019, he will give a welcome address.

BoF Session

The "Birds of a Feather" (BoF) session has become a regular part of the Document Engineering Symposium. The purpose of the BoF session is to allow informal technical conversation among the attendees, on whatever topics are suggested that year. We ask only that the topics have some relevance to the theme of the conference.

You are invited to suggest one or more topics for the BoF session! Just email your ideas to Charles Nicholas nicholas@umbc.edu, between now and when the conference starts. He will work through the suggestions, and will present a mix of topics at the plenary session. Participants will break up into groups, over the lunch break, and have discussions. Participants may move from one group to another. Each group will select a person to record the main points raised, and any conclusions reached. These will then be presented during the BoF reporting session. These BoF reports have, on occasions, stimulated lively and enjoyable discussion in the plenary session and beyond.

So, suggest some topics! Thanks!

Doctoral Students

Student researchers are invited to join our doctoral lunch on Monday to get in touch with other students and senior and experienced researchers to discuss the student research, providing advice, feedback and constructive criticism. If you plan to attend, please send a short email to doceng2019@fokus.fraunhofer.de.

Student researchers are also invited to present their research proposal as a 1-2 min presentation during the lightning talk session and as a poster during the poster session. If you plan to do so, please send a short email to doceng2019@fokus.fraunhofer.de.

Accepted Papers

Long Papers

Multi-Objective GP Strategies for Topical Search Integrating Wikipedia Concepts Cecilia Baggio, Rocio Cecchini, Ana Maguitman and Evangelos Milios
On the Expressive Power of Declarative Constructs in Interactive Document Scripts John Boyer
Digital Degree Certificates for Higher Education in Brazil Cristiane Dias Lepiane, Fernando Lauro Pereira, Giovani Pieri, Douglas Marcelino Bepler Martins, Jean Everson Martina and Mauro Luiz Rabelo
Using Knowledge Base Semantics in Context-Aware Entity Linking Cheikh Brahim El Vaigh, François Goasdoué, Guillaume Gravier and Pascale Sébillot
Searching Document Repositories using 3D Model Reconstruction Cristopher Flagg and Ophir Frieder
TRIVIR: A Visualization System to Support Document Retrieval with High Recall Amanda Gonçalves Dias, Evangelos Milios and Maria Cristina F. De Oliveira
Modeling Multimodal-Multiuser Interactions in Declarative Multimedia Languages Alan Livio Vasconcelos Guedes, Roberto Gerson Azevedo, Sérgio Colcher and Simone Diniz Junqueira Barbosa
Searching and Ranking Questionnaires: an Approach to Calculate Similarity Between Questionnaires Richard Henrique de Souza and Carina Dorneles
Text Localization in Scientific Figures using Fully Convolutional Neural Networks on Limited Training Data Morten Jessen, Falk Böschen and Ansgar Scherp
Predictable and Consistent Information Extraction Besat Kassaie and Frank Tompa
Prediction of Mathematical Expression Declarations based on Spatial, Semantic, and Syntactic Analysis Jason Lin, Xing Wang, Zelun Wang, Donald Beyette and Jyh-Charn Liu
The CNN-Corpus: A Large Textual Corpus for Single-Document Extractive Summarization Rafael Lins, Hilário Oliveira, Luciano Cabral, Jamilson Batista, Rafael Ferreira, Gabriel França, Rinaldo Lima and Steven Simske
Augmenting Music Sheets with Harmonic Fingerprints Matthias Miller, Alexandra Bonnici and Mennatallah El-Assady
An Effective Scheme for Generating An Overview Report over A Very Large Corpus of Documents Jingwen Wang, Hao Zhang, Cheng Zhang, Wenjing Yang, Liqun Shao and Jie Wang
PaperWork: Exploring the Potential of Electronic Paper on Office Work Elliott Wen, Gerald Weber and Jim Warren

Short Papers & Application Notes

An Exploratory Analysis of Precedent Relevance in the Brazilian Supreme Court Rulings Fernando Alberto Correia dos Santos Junior, José Luiz Nunes, Guilherme da Franca Couto Fernades de Almeida, Alexandre Augusto Abreu Almeida and Helio Cortes Vieira Lopes
Multi-Context Information for Word Representation Learning Swapnil Dewalkar and Maunendra Desarkar
Multi-Layered Edits for Meaningful Interpretation of Textual Differences Angelo Di Iorio, Gianmarco Spinaci and Fabio Vitali
Writer Characterization and Identification in Short Modern and Historical Documents: Reconsidering Paleographic Tables Shira Faigenbaum-Golovin, David Levin, Eli Piasetzky and Israel Finkelstein
Automatic Identification and Normalisation of Physical Measurements in Scientific Literature Luca Foppiano, Laurent Romary, Masashi Ishii and Mikiko Tanifuji
XLIndy: Interactive Recognition and Information Extraction in Spreadsheets Elvis Koci, Maik Thiele, Julius Gonsior, Oscar Romero and Wolfgang Lehner
Generating Digital Libraries of M.Sc. and Ph.D. Theses Rafael Lins, Paulo Espirito Santo and Gabriel França
A Cell-Detection-Based Table Structure Recognition Method Manabu Ohta, Ryoya Yamada, Teruhito Kanazawa and Atsuhiro Takasu
A Vision for User-Defined Semantic Markup Michael Piotrowski
Sentiment Classification Improvement Using Semantically Enriched Information Ricardo Brigato Scheicher, Roberta Akemi Sinoara, Jonas De Carvalho Felinto and Solange Oliveira Rezende
Enhanced Automated Policy Enforcement eXchange Framework (eAPEX) Ahmed Shatnawi and Ethan Munson
Enhanced Document Retrieval and Discovery Based on a Combination of Implicit and Explicit Document Relationships Ahmed Tayeh, Ngoc Tran and Beat Signer
An Algorithm for Extracting Shape Expression Schemas from Graphs Yutori Tsuboi and Nobutaka Suzuki
Globally Optimal Page Breaking with Column Balancing – a Case Study Marcin Woliński
Impact of In-domain Vector Representations on the Classification of Disease-related Tweets: Avian Influenza Case Study Samira Yousefinaghani, Rozita Dara and Shayan Sharif

Posters

Semi-Automatic LaTeX-Based Labeling of Mathematical Objects in PDF Documents: MOP Data Set Donald Beyette, Jason Lin, Zelun Wang and Jyh-Charn Liu
A Hybrid AI Tool to Extract Key Performance Indicators from Financial Reports for Benchmarking Eduardo Brito, Rafet Sifa, Christian Bauckhage, Rüdiger Loitz, Uwe Lohmeier and Christin Pünt
Combining Word Embeddings with Taxonomy Information for Multi-Label Document Classification Stefan Hirschmeier and Detlef Schoder
The CNN-Corpus in Spanish: a Large Corpus for Extractive Text Summarization in the Spanish Language Rafael Lins, Bruno Ávila, Hilário Oliveira, Luciano Cabral, Gabriel França, Rafael Ferreira, Rinaldo Lima and Steven Simske
Enhancing Document-Camera Images Ednardo Mariano, Rafael Lins and Jian Fan
The Next Millennium Document Format Svante Schubert
Towards Automated Auditing with Machine Learning Rafet Sifa, Anna Ladi and Rajkumar Ramamurthy

The 19th ACM Symposium on Document Engineering