The 21st ACM Symposium on Document Engineering
August 24, 2021 to August 27, 2021
This year DocEng will be a virtual event using Zoom for the video conference and Slack for instant messaging/collaboration.
- Meeting ID: 947 0450 9684
- Passcode: doceng2021
- The Slack workspace doceng2021 is available at https://doceng2021.slack.com/ .
Please use Channel Browser in Slack to view all the available channels, one for each paper or presentation. Feel free to join the relevant channel for the papers that you want to engage and discuss with others.
The video presentations for each paper will be made available in each paper's channel as the conference progresses.
Please contact the organizers via Slack on the #general-networking channel or by email firstname.lastname@example.org
The full programme is available for download in PDF format: https://doceng.org/assets/doceng2021/ACM-DocEng2021-Program.pdf
The proceedings for DocEng2021 are available at: https://dl.acm.org/doi/proceedings/10.1145/3469096
All times below are IST (Irish Standard Time or UTC+1)
Tuesday, 24th August, 2021 - Tutorials
Session Chair: Rafael Dueire Lins (Universidade Federal de Pernambuco, Brazil)
|3:00 PM||Domain-specific Modelling in Document Engineering||Domain-Specific Modelling raises the level of abstraction beyond current programming languages by specifying the solution directly using problem domain concepts. The final artifacts needed, such as code, tests, reports, protocols, configuration files and documents are then generated from these high-level specifications. This automation is possible because both the modelling language and generators need fit the requirements of a narrow domain — often inside only one company. This tutorial describes Domain-Specific Modelling approach and demonstrates the practical benefits of applying it to modelling business / automation processes where documents are used. It describes the process of creating your modelling solution along with examples.||Verislav Djukic and Juha-Pekka Tolvanen|
|3:00 PM||Document Engineering Issues in Malware Analysis||We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of tutorials presented in previous years, with more information about newly-available tools. More info at: https://www.csee.umbc.edu/courses/undergraduate/CMSC491malware/docEng2021.html||Charles Nicholas, Robert Joyce, Steven Simske|
Wednesday, 25th August, 2021
|3:00 PM||Welcome note||Patrick Healy, Mihai Bilauca|
|3:15 PM||Keynote I: Searching Harsh Documents||Ophir Frieder|
|S1||Document Content Analysis||Chair: Besat Kassaie|
|4:35 PM||Efficient Clustering of Short Text Streams using Online-Offline Clustering||Md Rashadul Hasan Rakib, Norbert Zeh and Evangelos Milios|
|4:50 PM||Efficient Sparse Spherical k-Means for Document Clustering||Johannes Knittel, Steffen Koch and Thomas Ertl|
|5:00 PM||Small-step Pipelines Reduce the Complexity of XSLT/XPath Programs||Marcel Schaeben and Gioele Barabucci|
|5:10 PM||MTLV: A Library for Building Deep Multi-task Learning Architectures||Fatemeh Rahimi, Evangelos E. Milios and Stan Matwin|
|5:20 PM||ELSKE: Efficient Large-Scale Keyphrase Extraction||Johannes Knittel, Steffen Koch and Thomas Ertl|
|S2||Generation, Manipulation and Presentation||Chair: Steven Bagley|
|5:50 PM||Ordering Sentences and Paragraphs with Pre-trained Encoder-Decoder Transformers and Pointer Ensembles||Rémi Calizzano, Malte Ostendorff and Georg Rehm|
|6:05 PM||SlideGen: An Abstractive Section-Based Slide Generator for Scholarly Documents||Athar Sefid, Prasenjit Mitra and C. Lee Giles|
|6:15 PM||Engineering of An Artificial Intelligence Safety Data Sheet Document Processing System for Environmental, Health, and Safety Compliance||Kevin Fenton and Steven Simske|
|6:25 PM||The DocEng Book Series||Steve Simske and Nicki Dennis|
|6:35 PM||Birds of a feather||Charles Nicholas|
|7:35 PM||End of Day|
Thursday, 26th August, 2021
|3:00 PM||Keynote II: 20 Years of Physical Document and Product Protection Using Digital Methods||Justin Picard|
|S3||Security and Sensitive Documents||Chair: Charles Nicholas|
|4:20 PM||A Novel Approach on the Joined De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm||Fabian Singhofer, Aygul Garifullina, Mathias Kern and Ansgar Scherp|
|4:35 PM||Pornographic Content Classification Using Deep-Learning||Andre Tabone, Kenneth Camilleri, Alexandra Bonnici, Stefania Cristina, Reuben Farrugia and Mark Borg|
|4:50 PM||Counterfeit Detection with QR Codes||Justin Picard, Paul Landry and Michael Bolay|
|5:00 PM||Trustworthiness of Spam Email Addresses Using Machine Learning||Francisco Jáñez-Martino, Rocio Alaiz-Rodríguez, Víctor González-Castro and Eduardo Fidalgo|
|S4||Applications and User Experiences||Chair: Dick Bulterman|
|5:30 PM||Recognizing Creative Visual Design: Multiscale Design Characteristics in Free-Form Web Curation Documents||Ajit Jain, Andruid Kerne, Nic Lupfer, Gabriel Britain, Aaron Perrine, Yoonsuck Choe, John Keyser and Ruihong Huang|
|5:45 PM||Rescuing Historical Climate Observations to Support Hydrological Research: A Case Study of Solar Radiation Data||Odunayo Ogundepo, Naveela Sookoo, Gautam Bathla, Anthony Cavallin, Bhaleka Persaud, Kathy Szigeti, Philippe Van Cappellen and Jimmy Lin|
|5:55 PM||ALiBERT - Improved Automated List Inspection (ALI) with BERT||Rajkumar Ramamurthy, Maren Pielka, Robin Stenzel, Christian Bauckhage, Rafet Sifa, Tim Khameneh, Uli Warning, Bernd Kliem and Rüdiger Loitz|
|6:05 PM||A Large-Scale Exploration of Terms of Service Documents on the Web||Soundarya Nurani Sundareswara, Mukund Srinath, Shomir Wilson and C. Lee Giles|
|6:15 PM||Metadata-Driven Eye Tracking for Real-Time Applications||Yasith Jayawardana, Gavindya Jayawardena, Andrew Duchowski and Sampath Jayarathna|
|6:25 PM||ACM Town hall||Peter Brusilovsky|
|6:35 PM||Networking/Free Chat|
|7:35 PM||End of Day|
Friday, 27th August, 2021
|3:00 PM||DocEng 2022||Matthew Hardy, Curtis Wigington|
|S5||Systems for Visual Document Analysis||Chair: Tamir Hassan|
|3:10 PM||Table-structure Recognition Method Using Neural Networks for Implicit Ruled Line Estimation and Cell Estimation||Manabu Ohta, Ryoya Yamada, Teruhito Kanazawa and Atsuhiro Takasu|
|3:25 PM||Evaluating Deep Neural Networks for Image Document Enhancement||Lucas Kirsten, Ricardo Piccoli and Ricardo Ribani|
|3:35 PM||Towards Extraction of Theorems and Proofs in Scholarly Articles||Shrey Mishra, Lucas Pluvinage and Pierre Senellart|
|3:45 PM||A Comparative Study on Methods and Tools for Handwritten Mathematical Expression Recognition||Daniela Costa, Carlos Mello and Marcelo d'Amorim|
|3:55 PM||Short Break|
|4:00 PM||Text line extraction using deep learning and minimal sub seams||Adi Azran, Alon Schclar and Raid Saabni|
|4:10 PM||Direct Binarisation A Quality-and-Time Efficient Binarisation Strategy||Rafael Lins, Rodrigo B. Bernardino, Ricardo Barboza and Zanoni Lins|
|4:20 PM||Challenges in Chart Image Classification: A Comparative Study of Different Deep Learning Methods||Jennil Thiyam, Sanasam Ranbir Singh and Prabin K. Bora|
|S6||Collections, Systems and Management||Chair: Angelo Di Iorio|
|4:45 PM||On Minimizing Cost in Legal Document Review Workflows||Eugene Yang, David Lewis and Ophir Frieder|
|5:00 PM||Heuristic Stopping Rules For Technology-Assisted Review||Eugene Yang, David Lewis and Ophir Frieder|
|5:15 PM||Shock Wave: a Graph Layout Algorithm for Text Analyzing||Maxime Cauz, Julien Albert, Anne Wallemacq, Isabelle Linden and Bruno Dumas|
|5:25 PM||COVID-19 Multidimensional Kaggle Literature Organization||Maksim Eren, Nick Solovyev, Chris Hamer, Renee McDonald, Boian Alexandrov and Charles Nicholas|
|5:55 PM||Binarisation challenge summary||Steve Simske|
|6:05 PM||Birds of a Feather presentations||Charles Nicholas|
|6:15 PM||Best paper awards|
|6:25 PM||Closing remarks|
|6:35 PM||Networking/Free Chat|
|6:45 PM||End of Symposium|
Birds of a Feather
Charles Nicholas will serve as chair of this year's Birds of a Feather (BoaF) session. We invite you to go to the bof-general Slack channel or message Charles Nicholas on Slack with ideas or suggestions for discussion. The only constraint is that the topic must have some relationship to Document Engineering. These suggestions will be boiled down to a few specific topics, and shared with the participants a day or so before the conference. We'll use Zoom to set up meeting spaces for those who want to take part in one or more BoaF discussions. Each session will have the chance to give a brief (two minutes) presentation on the last day of the conference.
A message from Charles:
Greetings, DocEng 2021 Participants!
It is once again my privilege to serve as chair of this year's Birds of a Feather (BoaF) session. As such, I invite you to email me (Charles Nicholas email@example.com) with ideas or suggestions for discussion.
We already have two BoaF topics:
- How to get a book published in Document Engineering
- Tensor methods in Document Engineering
Other topics are welcome, up to the designated time, near the end of the first day of the conference, as shown in the conference program . Each group will have the chance to give a brief (two minutes) presentation on the last day of the conference.
So, put your thinking caps on! Let me hear from you!
Presentation Guidelines for Authors
DocEng'21 will hold four live sessions on each day of the conference, starting with the Tutorials on the 24th of August.
In order to facilitate an on-time schedule, we are requiring authors to pre-record their presentations. Long papers will have 10 minutes for their presentation recording with 5 minutes for a live Q&A. Short papers will have 7 minutes for their presentation recording with 3 minutes for a live Q&A.
There are several video conferencing tools available to easily record a presentation. In this method, you can show your face via webcam and display your slides as you talk. You can use any meeting software as long as you get a good quality recording and your final file is in the MP4 format. Here are some links to instructions on recording a meeting on common platforms:
- Google Meet: Record a video meeting - Meet Help (https://support.google.com/meet/answer/9308681?hl=en)
- Zoom: Local Recording – Zoom Help Center (https://support.zoom.us/hc/en-us/articles/201362473-Local-Recording)
- Microsoft Teams: Record a meeting in Teams - Office Support (https://support.microsoft.com/en-us/office/record-a-meeting-in-teams-34dfbe7f-b07d-4a27-b4c6-de62f1348c24?ui=en-us&rs=en-us&ad=us)
- You can also use the two-step method covered here: Create Voice Over PowerPoint (https://support.microsoft.com/en-us/office/record-a-slide-show-with-narration-and-slide-timings-0b9502c6-5f6c-40ae-b1e7-e47d8741161c?ui=en-us&rs=en-us&ad=us) and convert to MP4 ( https://nursing.vanderbilt.edu/knowledge-base/knowledgebase/how-to-save-voppt-to-mp4/)
In order for videos to be verified by the technical program committee, please upload your video using easyChair no later than Friday, 13th August 2021.
- Duration: x minutes
- File size: 250MB max
- Video file format: mp4
- Dimensions: Minimum height 720 pixels, aspect ratio: 16:9
Please contact firstname.lastname@example.org if you have any questions.