“Tags” sounds like HTML, doesn’t it? What’s that got to do with PDF?
PDF was invented to provide an electronic equivalent of paper. As such, it’s perfectly natural for PDF files to contain characters and images located at such-and-such coordinates on a page – and that’s it. No concept of paragraphs, sentences or even words. The fact that letters happen to line up on the page to such as way as to form meaningful words, lines and so on as a function of X and Y coordinates is purely in the eye of the reader. If they are using eyes to read, that is.
The organization of characters on the page into meaningful content is something most users, disabled and otherwise, take for granted. Computers, however, take nothing for granted. Words, lines, paragraphs (and so on) occur as a combination of basic assumptions and explicit semantic models.
From an accessibility point of view, the more that can be assumed, the better, and some forms of content includes more inherent assumptions than others.
With HTML, for example, there’s an implicit assumption that text inside a <p> tag is in correct linear sequence, that words are delineated with spaces, sentences end with periods, and so on.
With PDF, you can’t assume anything of the sort. Characters on the page are just that; characters on a page. That’s what Add Tags has to start with.
The Add Tags Magic
If you’ve started to think about making PDF files accessible and/or Section 508 compliant, you’ve probably run into the “Add Tags” feature, currently available for PDFs only in Adobe Acrobat Professional.
It’s vital to understand the capabilities and limitations of this powerful feature in order to get results from your attempts to improve accessibility. First things first, Add Tags assumes that your PDF isn’t tagged at all. If you want to Add Tags, then you’ll need to delete whatever tags you’ve got already. Be sure to make a backup first!
Fundamentally, Add Tags does two things:
First, Add Tags examines your PDF file in an attempt to discern the correct organization of the content at the level of words and lines as well as the order in which they appear. To do this, Add Tags has to examine the position of every character and image on the page in context, then make a series of educated guesses. You can think of this step as “structuring”.
Second, Add Tags takes these words and lines and attempts to discern the “tags” that should be present to denote headings, paragraphs, tables, rows, columns, page-headers, line-items and so on.
But it’s still just guesswork. On many complex pages (and on many simple ones that have other problems), Add Tags gets it very, very wrong. Reading order can be out-of-whack, Add Tags may see tables where there are none, or miss rows or columns.
The results have to be checked. And corrected. Period.
Still the Tagger of Last Resort
Quite often, the producer of a PDF file is not the author. In other cases, the author may produce the PDF, but fail to utilize the correct software or settings to create the document structure and tags that are necessary to compliance with Section 508.
That’s really a shame. Acrobat’s Add Tags feature, for all its magic, should be your last resort. Why?
- As stated above, Add Tags can only make an informed guess as to the structure, reading-order and semantics of your document. On complex layouts, or documents created in certain applications, the results are often completely worthless.
- Add Tags cannot generate alternate text for images, which is a clear requirement for Section 508 compliance.
- Add Tags is a poor substitute for structuring the document correctly in the authoring application; it takes more time and costs more money.
Learn more about accessibility and Section 508 compliance for PDF documents.