As stated in the Introduction to ISO 19005-1, the primary purpose of PDF/A is to provide:
“… a mechanism for representing [PDF] electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing, or rendering the files.”
The point of these limitations is to ensure that the work of displaying PDF/A files remains as simple and unambiguous as possible. As such, the Standard is limited to file-format and reader requirements bearing on accurate rendering to screen and print. While permitting digital signatures, the first iteration of PDF/A does not address the topic of authenticity at all. The forthcoming update, ISO 19005-2, states explicitly that authenticity is simply out of scope for PDF/A.
ISO standards contain a lot of dry technical language about what shall or shall not be present in a conforming PDF/A file. The precise role of of the Standard in the document life-cycle and the manner in which software should interact with PDF/A documents is unstated.
That is as it should be. International Standards are not prescriptive; they don’t offer specific guidance or best-practices when it comes to writing software. Most of PDF/A is concerned with the format of the PDF itself; there are very few rules for so-called “conforming readers”. The juicy questions are left untouched:
- What are the uses of PDF/A documents?
- What should a user opening a PDF/A-flagged document understand about it?
- Does the Standard imply that PDF/A documents are or should be read-only?
- What’s the right behavior for PDF/A-aware software outside of the strict conformance requirements?
These are the questions I’ll try to answer here.
What is PDF/A for?
There are literally hundreds of different pieces of software that can create PDF files, and quality varies. PDF/A was intended to specify a subset of PDF that was as reliable as PDF could be – as close to electronic microfilm as possible.
The original impetus for an archival version of PDF came from the U.S. Federal Courts, quickly joined by the Library of Congress and the National Archives and Records Administration, all of whom were concerned about the cost and of maintaining collection of non-standard PDFs.
Records-managers and archivists obsess on reliability for good reason. There are tens of millions of bad, ‘odd’ and environment-dependent PDFs in the world, files that don’t meet the basic promise of PDF to look the same in all places and on all systems.
Since the entire point of PDF/A is to guarantee that documents render correctly on request, the obvious and pre-eminent use-case for PDF/A is to assess physical quality and longevity before entry into formal public or private records.
But what else might users legitimately want to do with files that have been successfully converted to PDF/A sometime in the past?
Parking a PDF in the corporate archive or submitting it to a court doesn’t necessarily mean the document has come to the end of it’s working life. An hour or a year or a decade later, an authorized user might need to…
- Forward the document to another organization, for whom the inbound document is “live”.
- Extract pages in order to share only a portion of the document.
- Add comments, or mark up for redaction prior to release.
- Add Bates numbers for litigation purposes.
- Add watermarks indicating dates, classification status or other information.
- Digitally-sign the PDF to ensure authenticity from that point forwards.
- Collate the file together with other PDFs.
- Replace, insert or delete placeholder, damaged, rotated or “problem” pages.
PDF/A permits all of these activities and more. PDF/A doesn’t mean out-of-circulation, it means “suited for archival purposes”. There’s a big difference.
Don’t Confuse Standards with Policies
Organization policy and procedure dictate user rights and responsibilities for managing and (when necessary) manipulating documents. These aren’t the sorts of questions one defers to a choice of file-format.
Editing (or marking up) a PDF/A file may be perfectly reasonable, desirable and authorized as a function of circumstance. Whether or not a change in a document can or should lead to a change in a WORM archive is up to the user, their employer, the permissions architecture and the nature of the storage media. It’s out of scope for PDF/A, as 19005-2 makes clear.
In the real world, records will be updated and files will be stamped. Many non-Adobe applications will edit a PDF/A file, but leave the PDF/A flag in place. There’s nothing in ISO 19005 to suggest these are impermissible processes (although leaving the PDF/A flag in-place after an edit and without re-validation is really bad form). Since encryption is prohibited and digital signatures are optional, you can’t trust an unsigned PDF/A document as PDF/A without first re-running PDF/A validation.
The “philosophy” of PDF/A as expressed in Acrobat 9 is pretty simple, but does not follow my prescription for correct behavior. Adobe’s implementation assumes that the default policy for PDF/A files is “read-only” – except for certain features.
Adobe Acrobat 9 offers “PDF/A Mode” as the default when a PDF/A-flagged file is opened. The interface does not offer advice on how to disable PDF/A Mode (it’s in Preferences > Documents).
While in PDF/A Mode…
- The Document Message Bar indicates “You are now viewing this document in PDF/A Mode.” No information on turning this feature off is offered.
- External hyperlinks are disabled.
- Page-extraction is disabled (no extracting of a specific page-range for use elsewhere).
- Most editing functions are disabled, including page-level editing (no insertion, replacement, deletion or re-sorting of pages).
- It’s not possible to add security to the PDF file.
- PDF/A prohibits encryption, nonetheless, users see this dialog when attempting Document -> Reduce File Size:
- This dialog notwithstanding, PDF/A cannot be managed from the Document Properties dialog (although it’s a good idea).
- Users can’t add form-fields “…due to security settings”.
- Tags may not be added (ie, the document cannot be made accessible).
- Linearization information is ignored (ie, Fast Web View is disabled).
But some editing is ok…
- Users may add backgrounds and watermarks as well as headers, footers and bates numbers. Some of these changes may contain arbitrary content, and could completely alter or replace the PDF page.
- Preflight functions are available, many of which can damage or change the PDF.
- Of course, you can still add a digital signature.
- Acrobat plugins (such as Appligent’s Redax) can still operate, even if the result is an edited (ie, redacted) PDF file.
- Other Adobe applications (PhotoShop, for example) will open PDF/A files without Acrobat’s restrictions.
Third Party Software
- At this time, most 3rd party desktop PDF software developers prefer not to acknowledge PDF/A, ignoring both file-format and reader requirements and failing to even present the PDF/A flag. See my article on free PDF viewers.
I’ve made the case that automatically turning off editing features when you see the PDF/A flag is an unwelcome appropriation of user-intent. Now I’ll offer some thoughts outlining the correct behavior for software encountering a PDF/A file.
I’ll stipulate two things before we start:
- I’m discussing software with PDF editing capabilities, not just readers.
- It’s presumed that because the user is in possession of an unencrypted, unsigned PDF, the user has the authority (legally, materially) to edit it.
Five General Principles for PDF/A Implementations
When developing software to manage or manipulate PDF files, software developers should bear in mind certain guidelines when encountering documents with the PDF/A flag set.
First, it should be possible to remove the PDF/A flag. Users should be able to decide if the file is to be represented as PDF/A or not.
Second, if there is to be any option of selection between viewer (or editor) behaviors when encountering a PDF/A file, that option should be readily accessible, not buried deep within Preferences. See the Conclusion for my proposed options.
Third, if a user chooses an action that would make it impossible to re-validate for PDF/A, the application should warn of the consequences to PDF/A status in specific terms. Example: “You cannot use audio-clip annotations in a PDF/A document. Please choose another annotation type, or turn off PDF/A for this document.”
Fourth, an altered document should lose its PDF/A flag when saved unless it is (re)validated.
Fifth, Give the user accurate information; don’t get in their way any more than is strictly (and I mean strictly) necessary.
You can’t stop someone from editing a PDF/A file, don’t pretend otherwise. If your requirement includes security or authenticity, your options range from physical access control to the tamper detection of digital signatures to providing a read-only option in controlled environments. PDF/A prohibits the use of encryption on the PDF/A file itself, but that doesn’t in any way stop you from delivering a PDF/A file inside an encrypted PDF Portfolio, for example.
The key thing is to understand and respect the limits of ISO 19005. It’s not a policy prescription, it’s a set of file-format and reader requirements. Other developers might change the file but leave the PDF/A flag alone, for example, and your implementation needs to be able to accommodate that.
What should archivists implementing PDF/A understand?
Fundamentally, PDF/A sets technical specifications for rendering of PDF files. The Standard makes no statement regarding authenticity; digital signatures are optional, not required.
If you need to establish authenticity, you could consider storing the PDF’s message digest in a document management system. Other options include signing PDF files as you receive them, or establishing procedures controlling physical access to your PDF/A files. If you need to release documents with a useful statement of integrity, you can consider a digital signature on release to establish that documents were valid as they originated from your archive.
If a PDF/A file is changed in any way, it should not be considered PDF/A any longer (regardless of the messages in one’s user-interface) until and unless it is revalidated.
What should professionals opening a PDF/A file understand?
PDF/A means that the file will appear the same way on any platform, with any PDF/A conforming viewing software, more-or-less forever.
PDF/A does not mean that a chain-of-evidence or other mechanism in any way assures the document’s authenticity. An unsigned PDF/A file may be changed by any user, at any time, with no warning offered to the next user to open the file.
What should a recipient of a PDF/A file understand?
If you see a message stating that you are looking at a PDF/A file, you should know that you’re dealing with a file that will render correctly. It will look exactly the same 25 years hence, even using a computer you’ve not yet imagined. Perhaps you assumed that all PDF files were that reproducible? Sorry, you were mistaken.
PDF/A establishes confidence that a file appears the same as it did the last time it was changed. Yes, absolutely, a PDF/A file can be changed. PDF/A isn’t about ensuring a file can’t be changed – it’s about ensuring that the pages can be viewed into the long-term.
Unlike an ISO standard, I get to offer explicit suggestions to software developers!
PDF/A-aware software should offer three basic options to users who open a PDF file with the PDF/A flag set.
Advisory. The default mode when opening a PDF/A file. This Mode will appeal to general users.
When opening a PDF/A file, the user would be advised (perhaps via a Document Message Bar, or similar) that the file “claims to comply” with PDF/A. The application should not identify the file as PDF/A-conforming until it’s been validated.
If capable, the application should offer validation (with the appropriate warning regarding how long it might take). Editing is permitted, however, changes that violate PDF/A should generate a warning of consequences for future PDF/A conformance.
Without validation, any edit should cause the PDF/A flag to be removed when the file is saved.
In Advisory mode, the viewer may use linearization information, if available in the PDF. After all, this is just “advisory” mode, and Fast Web View is useful for high-performance delivery of large documents.
Strict PDF/A Mode. As Advisory mode, but the application treats all the “should” and “should not” reader-requirement statements as “shall” and “shall not”. As such, it does not use linearization dictionary information, nor does it allow links. This Mode may appeal to archivists.
Strict PDF/A & Read Only Mode. As Strict PDF/A Mode, but the application is read-only in all respects. This Mode may appeal to archivists operating a policy in which read-only software is part of an overall strategy for maintaining document integrity.
More information about PDF/A
PDF/A on Wikipedia
AIIM’s PDF/A Committee [Note: Duff Johnson is a member of the US Technical Advisory Group (TAG) for TC 171 WG 5 (PDF/A)]
The PDF/A Competence Center: www.pdfa.org
Buy: 19005-1 from ISO
By Duff Johnson