The 24th ACM Symposium on Document Engineering

August 20th, 2024 to August 23rd, 2024
San Jose, CA, USA

Note for presentors: Long papers are allocated 25 minutes for the presentation and 5 minutes for questions. Short papers are allocated 15 minutes for the presentation and 5 minutes for questions.

Program

DocEng2024 will be held at Adobe in San Jose, CA, USA.

Venue: Park Conference Room, 321 Park Ave, San Jose, CA 95113


Tuesday, August 20th

Tutorials - Park Conference Room, Adobe

08:00am: Breakfast, Registration, and Networking

Tutorial 1 — Document Engineering Issues in Malware Analysis
Chairs: Charles Nicholas, Robert Joyce, Steve Simske
09:00am: Tutorial 1, Session 1
10:30am: Coffee Break
11:00am: Tutorial 1, Session 2

12:30pm: Lunch

Tutorial 2 — Creating Accessible Documents from LaTeX Sources via Automated Workflows
Chairs: Ulrike Fischer, Joseph Wright, Frank Mittelbach
2:00pm: Tutorial 2, Session 1
3:30pm: Coffee Break
4:00pm: Tutorial 2, Session 2

5:30pm: End of Day

6:00pm-8:00pm: Welcome Reception — Venue: Adobe-Layers Café


Wednesday, August 21th

Main Program Day 1 — Park Conference Room, Adobe

08:00am: Breakfast, Registration, and Networking

09:00am: Welcome Message
Chairs: Matthew Hardy and Curtis Wigington

09:20am: Keynote 1 — Generative AI and knowledge work
Speaker: LN Renganarayana, Sr Director, Adobe Document Cloud AI

10:20am: Coffee Break

Session 1 — Scanning, Document Input, and Binarization
Chair: Ethan Munson
10:50am: Handheld Video Document Scanning: A Robust On-Device Model for Multi-Page Document Scanning.
11:20am: Which is the most suitable scanner resolution for documents? Detailing the answer given to the question raised by Professor George Nagy.
11:40am: ZigZag: A Robust Adaptive Approach to Non-Uniformly Illuminated Document Image Binarization.
12:00pm: Texture-based Document Binarization.

12:30pm: Lunch

Session 2 — Data Representation and Markup
Chair: Tamir Hassan
2:10pm: A Heuristic Algorithm for Mathematical Markup Encoding Based on the Relative Positions of Characters.
2:40pm: Graph Detective: A User Interface for Intuitive Graph Exploration Through Visualized Queries.
3:10pm: CatalogBank: A Structured and Interoperable Catalog Dataset with a Semi-Automatic Annotation Tool DocumentLabeler for Engineering System Design.

3:40pm: Coffee Break

Birds of a Feather
Chair: Charles Nicholas
4:10pm: Introduction and Invitation for Ideas.
4:30pm: Breakout Session.

5:10pm: End of Day

6:00pm-8:00pm: Conference Dinner - Venue: Scott’s Seafood


Thursday, August 22nd

Main Program Day 2 — Park Conference Room, Adobe

08:30am: Breakfast, Welcome, and Networking

09:30am: Keynote 2 — Teaching old docs new tricks
Speaker: Frank Mittelbach, LaTeX Project Lead

10:30am: Coffee Break

Session 3 — Artificial Intelligence
Chair: Steve Simske
11:00am: TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs.
11:20am: Post-OCR Correction with OpenAI’s GPT Models on Challenging English Prosody Texts.
11:40am: Detecting AI-Generated Texts in Cross-Domains.

12:00pm: Lunch

1:30pm: Document Engineering Future Discussion
Chairs: Steve Simske and Alexandra Bonnici

2:30pm: ACM SigWeb Town Hall
Chair: Alexandra Bonnici

2:50pm: Binarization Competition Report
Chair: Rafael Lins

3:10pm: Coffee Break

Session 4 — Summarization
Chair: Alexandra Bonnici
3:30pm: Assessing Abstractive and Extractive Methods for Automatic News Summarization.
4:00pm: Assessing the Reliability and Validity of the Measures for Automatic Text Summarization.

4:30pm: End of Day

6:00pm-8:00pm: Social Event — Topgolf, 10 Topgolf Drive, San Jose, CA 95002


Friday, August 23rd

Main Program Day 3 — Park Conference Room, Adobe

08:00am: Breakfast, Welcome, and Networking

Session 5 — Security and PDF Applications
Chair: Steven Bagley
09:00am: An Efficient PDF Malware Detection Method Using Highly Compact Features.
09:20am: Automatically producing accessible and reusable PDFs with LaTeX.
09:40am: Birds of a Feather Recap.

10:15am: Coffee Break

Session 6 — Algorithms
Chair: Curtis Wigington
10:45am: LexBoost: Improving Lexical Document Retrieval with Nearest Neighbors.
11:15am: Similarity Problems in Paragraph Justification: an Extension to the Knuth-Plass Algorithm.

11:35am: DocEng 2024
Chair: Curtis Wigington

11:50am: DocEng Book Series
Chair: Steve Simske

12:05pm: Announcement of Best Paper Awards
Chair: Steven Bagley

12:20pm: Closing remarks
Chairs: Steve Simske and Alexandra Bonnici

12:30pm: End of DocEng 2024