Fetch: Book as Database
- Problem: Proofing highly variable content page-by-page was inefficient and missed minor inconsistencies.
- Solution: I built a tool that allowed users to extract and proof all instances of any element across an entire book.
When working with thousands of books it’s critical that the common HTML patterns are flexible enough to handle varying lengths, content types, and responsive layout combinations without breaking. Traditional, linear proofing techniques won’t reliably find small inconsistencies on objects that appear only occasionally. But if we conceive of a book not as a stream but as a database, methods for extracting and reshuffling its content into new proofing patterns are revealed.
figure
element, on the right, the final implementation into Habitat with additional options to search by text or regular expressions.I built a working prototype (Figure 1, on the left) of Fetch so that editors could view all the instances of a given element—that appear throughout a book—on one page. Searches were made using standard CSS class chains like section .callout
, which would return only those callouts nested in sections, or broader searches, like h2
for all level 2 headings. Because the Table of Contents file is stored as XML it was relatively easy to parse and comb through an entire book.
These results allowed for focused, horizontal proofing and helped unearth typos, design deviations, as well as weakness in the HTML and CSS. It proved useful for authors and designers alike and was rolled in as a primary tool of Habitat.