The 20th ACM Symposium on Document Engineering

September 29, 2020 to October 1, 2020
San Jose, CA, USA


DocEng will be a virtual event on BlueJeans and Slack. No installation is required, apart from a Web browser. See for information on how to access the virtual events.

The date and times of the individual paper presentations will be provided soon. Papers will be presented as pre-recorded videos followed by a live Q&A. Please see Presentation Guidelines for Authors below for more details.

All times shown are in Pacific Standard Time (PST).

Tuesday, September 29th

Time (PST) Paper ID Session Authors
7:00am ‑ 7:15am Welcome
7:15am ‑ 8:00am Keynote: OCR and Document Understanding Cha Zang
8:00am ‑ 8:05am - break -
Session 1: OCR and Images Processing
8:05am ‑ 8:20am 1.1 Direct Sampling of Multiview Line Drawings for Document Retrieval Cristopher Flagg and Ophir Frieder
8:20am ‑ 8:35am 1.2 Cardinal Graph Convolution Framework for Document Information Extraction Rinon Gal, Shai Ardazi and Roy Shilkrot
8:35am ‑ 8:45am 1.3 HTR-Flor++: A handwritten text recognition system based on a pipeline of optical and language models Arthur Flor de Sousa Neto, Byron Leite Dantas Bezerra, Alejandro Héctor Toselli and Estanislau Baptista Lima
8:45am ‑ 8:55am 1.4 The Old Bailey and OCR: Benchmarking AWS, Azure, and GCP with 180K Page Images William Ughetta and Brian Kernighan
8:55am ‑ 9:00am - break -
Session 2: COVID Documents and Data
9:00am ‑ 9:10am 2.1 COVID-19 Kaggle Literature Organization Nick Solovyev, Maksim Eren, Charles Nicholas, Edward Raff and Ben Johnson
9:10am ‑ 9:20am 2.2 COVIDSeer : Extending the CORD-19 Dataset Shaurya Rohatgi, Zeba Karishma, Jason Chhay, Sai Raghav Reddy Keesara, Jian Wu, Cornelia Caragea and C. Lee Giles
9:20am ‑ 9:30am Open discussion

Wednesday, September 30th

Time (PST) Paper ID Session Authors
7:00am ‑ 7:30am Challenges and tutorial report Rafael Dueire Lins, Steven Simske, Rafael Ferreira de Mello, Rodrigo Bernardino, Alexandra Bonnici and Kenneth Camilleri
Session 3: Text Processing
7:30am ‑ 7:45am 3.1 An Assessment of Sentence Simplification Methods in Extractive Text Summarization Rafaella Vale, Rafael Lins and Rafael Ferreira Mello
7:45am ‑ 7:55am 3.2 Short Text Stream Clustering via Frequent Word Pairs and Reassignment of Outliers to Clusters Md Rashadul Hasan Rakib, Norbert Zeh and Evangelos Milios
7:55am ‑ 8:05am 3.3 Improving query expansion strategies with word embeddings Alfredo Silva and Marcelo Mendoza
8:05am ‑ 8:15am 3.4 Assessing Causality Structures learned from Digital Text Media Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé, Ana G. Maguitman and Evangelos E. Milios
8:15am ‑ 8:20am - break -
Session 4: Document Changes
8:20am ‑ 8:35am 4.1 Change Detection on JATS Academic Articles: An XML Diff Comparison Study Milos Cuculovic, Frédéric Fondement, Maxime Devanne, Jonathan Weber and Michel Hassenforder
8:35am ‑ 8:45am 4.2 Interactive and Scalable visualization framework for Version-aware XML documents Ahmed Shatnawi and Ethan Munson
8:45am ‑ 8:55am 4.3 A Framework for Extracted View Maintenance Besat Kassaie and Frank Tompa
8:55am ‑ 9:00am - break -
9:00am ‑ 9:15am ACM Town Hall

Thursday, October 1st

Time (PST) Paper ID Session Authors
7:00am ‑ 7:30am BoF report Rafael Dueire Lins, Steven Simske, Rafael Ferreira de Mello, Rodrigo Bernardino, Alexandra Bonnici and Kenneth Camilleri
Session 5: Markup Languages and Standards
7:30am ‑ 7:40am 5.1 Automatic Generation of Electrical Plan Documents from Architectural Data Melissa Cote, Alireza Rezvanifar and Alexandra Branzan Albu
7:40am ‑ 7:50am 5.2 Parsing a markup language that supports overlap and discontinuity Ronald Haentjens Dekker, Bram Buitendijk and Elli Bleeker
7:50am ‑ 7:55am - break -
Session 6: Document Analysis
7:55am ‑ 8:10am 6.1 PDF2LaTeX: A Deep Learning System to Convert Mathematical Documents from PDF to LaTeX Zelun Wang and Jyh-Charn Liu
8:10am ‑ 8:25am 6.2 Order out of Chaos: Construction of Knowledge Models from PDF Textbooks Isaac Alpizar Chacon and Sergey Sosnovsky
8:25am ‑ 8:35am 6.3 A Framework to Evaluate Webpage Segment Recognizers Nicola Raffaele Di Matteo and James Blustein
8:35am ‑ 8:45am 6.4 ServiceMarq: Extracting Service Contributions from Call for Papers Tian Shi, Abhinav Ramesh Kashyap and MinYen Kan
8:45am ‑ 9:00am Closing

Presentation Guidelines for Authors

DocEng will hold three live sessions. In order to facilitate an on-time schedule, we are requiring authors to pre-record their presentations. Long papers will have 10 minutes for their presentation recording with 5 minutes for a live Q&A. Short papers will have 7 minutes for their presentation recording with 3 minutes for a live Q&A. While we do not provide strict guidelines on which tools should be used to record the presentations, we recommend using the built in screen recording tool over the presentation of the slides ( or We do not require a video stream of the author/presentor to appear in this video.

Videos should be submitted one week before the presentation date. Further upload instructions to be provided soon.

Please contact if you have any questions.