Tools Affect Content

November 25, 2003

I keep returning to the concept that software provides an environment for creativity, and that these tools create a context for the content we create within them. What we publish is shaped by the way we publish it.

The canonical example is Microsoft PowerPoint, which is (often wrongly) accused of crapifying anything that's entered into the slideshow program. But I've been thinking a bit about Adobe Acrobat, especially since very little of the content published in Acrobat's PDF format is actually created with the tool. I'd suspect the overwhelming majority of PDFs are created in Word or a similar tool and then output in the format. But there seem to be conventions that have developed among doucments created for PDF output, even if different tools were used to author them.

The PDFs which prompted this most recent rumination are, for the most part, excellent. Mark Hurst's Uncle Mark 2004 Gift Guide & Almanac, Mark Pilgrim's Atom API presentation, the Pew Internet Project's Internet consumer report, and the Ad Age Marketing 50 are all well-written, well-designed documents. This makes sense, as choosing the PDF over valid XHTML only makes sense if exact positioning and detailed formatting are the most important concerns for the presentation of a document.

But none of these documents use any features exclusive to PDF that would preclude them being published as HTML. Indeed, features such as the affiliate links in Uncle Mark's guide and the URLs referenced in Mark Pilgrim's API presentation would work better in a browser context than they do in the Acrobat reader. So why PDF? Especially given that the harsh criticisms levelled at the format are largely true. (Granted, these problems are due to Adobe's shoddy implementation of the reader experience while web browsing, but the difference is largely academic since the format can't be separated from the reader used to view 99% of the documents in the format.)

It seems that the PDF format signifies something now, and it's something more than just user inconvenience. In addition to requiring the user to shift mental modes, ("I'm seeing something designed as a PDF now, this must be serious information...") the requirement that a document either be downloaded or viewed in a context that's radically different from standard web pages seems like a subtle assertion of authority by a document's creator. The decision to switch from standard HTML to PDF isn't arbitrary, but it isn't based on technical requirements either. It's based on the value that an author wants to assign to the work, and it benefits from the still-prevalent, though rapidly fading, consensus that print work is somehow more inherently valuable and authoritative than web pages and other online content.

This is evidenced in several ways. Documents which are offered up for a fee are frequently in PDF format, though for unprotected documents there's no reason the content can't be presented as HTML. And even password-protected PDF documents rarely make use of any of the advanced features which theoretically distinguish PDF from HTML. If the goal is to preserve formatting fidelity for the user while providing a good user experience, Macromedia's FlashPaper offers a much more pleasant in-browser experience that doesn't require the document viewer to take over the entire window and chrome from the standard browser toolbar. So the PDF decision is entirely about communicating intent.

The implications of the space which PDF has marked out in the content arena are very interesting. Without support for anything like HTML's IFRAME tag, it's not easy to insert dynamic content into a PDF unless the document is generated at the time it's downloaded, instead of being a static file that can be emailed around or passed along. Since Overture's context-sensitive text ads and Google's AdWords program have redefined so much of the financial model for content that's provided online, this presents a significant impediment to PDF's ongoing dominance of the market for profitable text content online. And FlashPaper's ability to exist within the larger framework of an HTML document means that publishers interested in augmenting a print-like publication format with text ads may well choose to use the newer format, especially given that the Flash plugin has a broader reach than the Acrobat reader and is less resource-intensive.

Going forward, it will be key for the PDF format to embrace users' new expectation of simple publishing functionality, which has become a baseline expectation with the proliferation of weblogs and other lightweight publishing systems. But the most interesting area to watch going forward will be watching whether PDFs remain the format that authors use to communicate seriousness of purpose or professionalism.

In the weblog realm, we already see perception of content influenced by the tools used to publish it, and with the ability to generate well-formatted, device-independent documents through simple tools publishing standard CSS and XHTML finally reaching maturity, it's possible we'll see a return to traditional markup and the relatively rich experience of contemporary web browsers being chosen as the preferred medium for publishing "serious" information.

3 TrackBacks

Dash on docs: Anil Dash has an essay on the different significances of various types of documentation formats on the web... the main focus is on the decision to deploy in either HTML or PDF, but he also goes into... Read More

Anil Dash has posted a short essay about how our choice of the tools we use to present information shapes the reader's expectations. Why exactly do so many people like receiving information as a PDF file instead of a web page?It seems that the PDF form... Read More

Adobe's Portable Document Format is so advanced it makes you wonder why anyone bothers with primitive HTML. It's a completely vector-based layout, display and resolution independent. You sacrifice almost nothing compared to traditional book and magazi... Read More

19 Comments

Actually, I designed the slides in Powerpoint, because the conference planners said they would only accept materials in Powerpoint or PDF, and I took them at their word. Turns out it was just boilerplate that everyone ignores, and all the cool people use something like SlideML to create their slides and publish them as XHTML. Which is ironic, since I spent most of my time copying and pasting XHTML markup from my draft into Powerpoint and then fixing Powerpoint's braindead defaults. And then I discovered that Powerpoint's HTML export is so bad that I just couldn't bring myself to foist it upon the world. PDF export (via OS X's print dialog) was the only vaguely acceptable solution. I'll know better next time. Never trust a format you can't edit in Emacs or vi.

To me the limitations of FlashPaper are far greater than the benefits (http://www.macromedia.com/software/contribute/productinfo/flashpaper/#item04)

I constantly copy and paste content and reuse content in reviews, summaries, weblogs, documentation, recommendations, etc. FlashPaper does not allow this or any other information reuse techniques. It is hell to print also. It is a step into the analog past for all those luddites who only want their words read and mistyped when quoted.

I am no fan of PDF, I find them horrible for reading text and really only good for printing out the document (okay it does have word searching and I can most often copy text out), but it is horrible for screen reading.

Part of the problem is the electronic medium is being use as if it were paper. Information designers and information producers have not learned to adapt to the electronic medium. I find this over and over at work (equally in the public and private sectors) documents are developed as if for print output, but never printed. The information is placed on the Web, but in horrible formats that are painful for users to consume.

PDFs succeed (despite of the bloated reader and expensive creation tools) because they easily map to a mental model for the user. This is a "document." Different from a word file in that the creator doesn't intend for you to edit the file. Different from a web page because it's not intended to produce dynamic content. (And you can argue best practices until you're blue in the face, but users know from experience that URLs decay faster than old bananas, and thus don't trust them to last very long.) And it's impossible to underestimate the reinforcing power of email. PDFs get emailed around. And around. And around. Like "documents."

(Speaking of which -- have you tried lately to send someone a snapshot of a web page? Especially a web page the user doesn't have access to? You could screenshot it (and stitch any scrolling content together); you could "send page by email," but that's unreliable at best; or you could print to a PDF and send them the doc. It's guaranteed to work cross-platform (unlike Microsoft's .mht archive format), it's archivable, forward-able, comment-able, copy-and-paste-able, DRM-able, even.)

I too am kind of surprised how little of PDF functionality is used despite the popularity. Even though the PDF writer software isn't an editing environment and is more of a one-stop-shop for adding PDF tricks to a document created elsewhere, I'm surprised at how few times a URL in a PDF is clickable, when creating a clickable URL is so easy to do.

With the hyperlinking possibilities in PDF, you can give talks with it, I gave a presentation back in '98 using PDF (whenever 3.0 first came out) with a small "next" in the lower right of each slide that jumped to the next slide. I built my slides in PageMaker at the time, so I had full control over the look, though it was a lot of work given a print software package isn't totally geared towards presentations.

I've seen even fewer ingenious PDF uses in business. I know the Adaptive Path folks use it to mockup wireframes, complete with working forms that link one mockup page to another, and they can test interfaces quickly in PDF. Are there other businesses pushing the boundaries of PDF?

I look forward to seen flashpaper in action, I'm just as annoyed as anyone else when I hit a PDF link unexpectedly and have to wait for my multi-gigahertz computer to lockup for a minute while the PDF browser plugin loads, but the lack of copy/paste and adequate printing tools will certainly hold back its potential.

Michael nails it--PDFs are popular because of their print mechanics. A designer (often on a Mac) knows he can deliver a document that looks identical on all machines (often PCs) and, more importantly, will print the same way wherever it ends up. The ready-to-print nature of the PDF makes it a quick solution for materials that need to be distributed "just so." Most of the items mentioned in the article are too heavily designed to be shared in, say, Word, which can morph the layout, or HTML, which changes the user experience from reading to interaction. Sometimes a readable, printable document is appropriate.

At the website openDemocracy.net we make PDF versions of all our HTML articles available to subscribers. Not only are we trying to signal "this article is important, worth keeping, worth printing", but it has also proved to be a handy way of driving up subscriptions to the site. Some articles are long, complex, and much easier to read offline. And what looks good on the Web, doesn't always look so hot on paper. Who wants to look at a nav bar in the margin while they are reading? Another thing we've started to offer is PDFs of a whole week's worth of articles on the site. We call it a compendium. Might also be compensation for those who wish they could get us in the post and on paper like "real" magazines. We're pretty serious about our content, so you must be onto something...

I'd agree that PDF works best for documents that you want to print...and I agree as well with just about all the "harsh criticisms" that Anil linked to above.

Solana's comment reminded me of Open Letters magazine (now, alas, defunct), which used to put together a very nice weekly .PDF of all of the week's content. (They're still available online, if you want to check out some really good writing.)

Give it time. It's taken a long, long time to get away from the print-centric view we started with (and we've made precious little progress). In the meantime, however, there are a few things to keep in mind:
1) We definitely need the permanence that PDFs offer. Get your permalinks set up, and get 'em set up for good.
2) Good, clean printability is a huge thing. Fortunately, we have print style sheets for that. Use them!

One of the important premises with PDF is the (already mentioned) document metaphor. A PDF document (we rarely talk about a "file" is self-contained ... just as one would expect from a document.

The PDF document ensures integrity of contents, which is IMHO the most important feature. This is achieved by putting all the needed resources into the document. The printability is simply a consequence of this ... although Adobe originally stressed the "appearance" factor much more, probably because that was easier to explain.

I am always surprised on the references to the Nielsen articles. However, that mentioned article is not the first one, as Nielsen published another one two years ago. At that time, his conclusion was that anything newer than four years is not usable at all for public use... Nielsen's material has been extensively discussed on PlanetPDF. You can also use their search engine with the keyword "Nielsen" and get the whole set of articles and discussions. One must also note that Nielsen bases his remarks on very good bad examples, and his allegations are mainly based on a lack of knowledge.

It is indeed true that not every PDF out there makes use of the technology at its best. It is also true that there are many PDFs out there which are too fat and too clumsy. The reason for this is mainly in a lack of knowledge of the format. The primary tool to create PDFs, Acrobat Distiller, is a very powerful tool with a lot of possibilities to control the result. And, as there are many "dials" to turn, chances for sub-optimal results are high.

The above mentioned document metaphor is the main argument for chosing PDF over ?ML. HTML does not know the document concept, particularly when it comes to integrity. There is no way for the end user to have the assurance that a HTML page is exactly what it is intended to be, and the same is also valid for the issuer of the page, that the end user exactly gets what is intended. A document provides the means to validate its authenticity.

PDFs do suck, but as others mentioned, they keep everything together. How many times have you googled to some old HTML page and none of the image links are valid? If it was a PDF, they would have to be there. The same goes for powerpoint... as much as I hate it, it keeps everything together, not matter how many images or graphs you have embedded in your slideshow. A new document format with the flexibilty of HTML but without the distributed embedded content is needed. .pdf + .swf are close but no cigar.

The interesting thing is that later versions of the PDF file format are capable of tremendous underlying markup complexity. Did you know that PDFs can contain XML?

So let's say you the author know how to make valid Web sites and have made an intelligent choice to use PDF. (I do both.) My issue is how difficult it is to learn how to manipulate the myriad advanced features of PDF. As an example, I am one of the very few people who knows how to make a tagged accessible PDF (also based on XML), and there's almost no documentation whatsoever about it. It becomes a kind of oral history.

It seems, then, there is no "middle class" when it comes to PDF creation. Either you're one of the po' folk who exports tacky MS Word documents to PDF because it's a single-click process... or you're a nouveau riche super-user who has somehow hacked and reverse-engineered the PDF-creation mechanism to produce something advanced. Who's in the middle? Nobody.

And advanced PDFs? Jeremy Tankard's typeface-ordering form, which I now cannot find on his site, was quite something.

seems anil wants to get some comission for promoting FlashPaper. but without copy and paste its rather useless.

To Joe Clark's comment about a "middle class" PDF creation tool.

The question is, of course, what one can understand with "middle class". If it is about pricing, there are a few products which are able to create viable PDF, such as the JAWS PDFCreator from Global Graphics.

If it is for a "limited set of contents/capabilities", it is highly advisable to reconsider. The intended use of the PDF might play some role, but as soon as publishing becomes an issue, there is no way aside of Distiller (unless we talk about InDesign, Adobe Illustrator or Adobe Photoshop in their newest incarnations, which have their own quality PDF generation functionality).

Dammit, Franz caught me! My nefarious plan to get rich by promoting a Flash-based document format is revealed!

One very good use for PDF is to "revive"
old books. A good example would be the works
of Mises. Somehow it's good to have the look
of the book plus the ability to navigate the
pages. My guess is that it would be a slightly more difficult proposition to do it
in HTML.
check it out at : http://www.mises.org/scholar.asp

An interesting article but it seems to have a "Format X" vs. "Format Y" approach. I think the intent of the information and the needs of the client drive the format. PDF, HTML, and SWF all have their strengths and weaknesses (even if we did not hear much about SWF and HTML limitations). I think the real answer is that content authors will need to grow more savvy in the tools available and their proper use.

As a PDF creator I found a number of arguable statements;
- "But none of these documents use any features exclusive to PDF that would preclude them being published as HTML."
These documents use embedded fonts, can be authored in any application and coverted to PDF, and the layout of many of the pages could not be reproduced with CSS2.
- "Especially given that the harsh criticisms levelled at the format are largely true."
These criticisms are targeted mostly at how the author chose to implement PDF not limitations of PDF.
- "the document viewer to take over the entire window and chrome from the standard browser toolbar."
PDFs can open full-screen in browsers to eliminate viewing the UI, and can even be embedded in web page to be part of the HTML content.
- "since very little of the content published in Acrobat's PDF format is actually created with the tool."
Acrobat does not have a FILE->NEW command. It is not an authoring application. The intent is that authors can use whatever content authoring tools they like and still produce PDFs that can be reliably rendered and printed across many platforms.

The debate goes on �

My favourite PDF creator: LaTeX ( I frequently use Windows, so WinEdt is the preferred editor) - documents are rendered beautifully as pdf (or ps or anything you choose), and it's so easy to edit the documents on any platform (including Linux for which there are excellent LaTeX editors)

Leave a comment