Back to Blog

Sunday, 13 July 2025

What Is PDF? History, Structure, and why It Still Matters in 2025.

Posted by

Huma Raisha

At Moainex Taskspace, we deal with PDFs daily. Whether converting, encrypting, or viewing — understanding the inner workings of the format is core to how we design secure and efficient features.

Introduction

Illustration of secure file upload with shield icon

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents — including text formatting and images — in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images, and other information needed to display it.

PDF has its roots in “The Camelot Project” initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008.

PDF Encryption and Digital Signatures

A PDF file may be encrypted for security — in which case, a password is required to view or edit its contents. PDF 2.0 defines 256-bit AES encryption as the standard. The PDF Reference also outlines methods for third parties to implement their own encryption systems.

PDF files can also be digitally signed to provide secure authentication. Complete details on implementing digital signatures in PDFs are provided in ISO 32000-2.

History and Evolution of PDF

The development of PDF began in 1991 when John Warnock wrote a paper for a project then code-named Camelot, in which he proposed the creation of a simplified version of PostScript called Interchange PostScript (IPS). Unlike traditional PostScript, which was tightly focused on rendering print jobs to output devices, IPS would be optimized for displaying pages to any screen and on any platform.

Adobe Systems made the PDF specification available free of charge in 1993. In the early years, PDF was popular mainly in desktop publishing workflows, and competed with several other formats, including DjVu, Envoy, Common Ground Digital Paper, Farallon Replica, and even Adobe's own PostScript format.

PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extensions for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized, and their specification is published only on Adobe's website. Many of them are not supported by popular third-party implementations of PDF.

ISO published version 2.0 of PDF, ISO 32000-2, in 2017 (available for purchase), replacing the free specification provided by Adobe. In December 2020, the second edition of PDF 2.0, ISO 32000-2:2020, was published, with clarifications, corrections, and critical updates to normative references. PDF 2.0 does not include any proprietary technologies as normative references. In April 2023, the PDF Association made ISO 32000-2 available for download free of charge.

How PDF Files Work (Technical Details)

A PDF file is often a combination of vector graphics, text, and bitmap graphics.

Typeset text stored as content streams (i.e., not encoded in plain text)
Vector graphics for illustrations and designs that consist of shapes and lines
Raster graphics for photographs and other types of images
Other multimedia objects

In later PDF revisions, a PDF document can also support links (inside the document or to web pages), forms, JavaScript (initially available as a plugin for Acrobat 3.0), or any other types of embedded content that can be handled using plug-ins.

PDF’s Core Technologies Explained

An equivalent subset of the PostScript page description programming language (in declarative form) for generating the layout and graphics
A font-embedding/replacement system to allow fonts to travel with the documents
A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate

PostScript and Its Role in PDF

PostScript is a page description language run in an interpreter to generate an image. It can handle graphics and has standard features of programming languages such as branching and looping. PDF is a subset of PostScript, simplified to remove such control flow features, while graphics commands remain.

PostScript was originally designed for a drastically different use case: transmission of one-way linear print jobs, in which the PostScript interpreter would collect a series of commands until it encountered the showpage command, then execute all the commands to render a page as a raster image to a printing device. PostScript was not intended for long-term storage and real-time interactive rendering of electronic documents to computer monitors. So there was no need to support anything other than consecutive rendering of pages.

If there was an error in the final printed output, the user would correct it at the application level and send a new print job in the form of an entirely new PostScript file. Thus, any given page in a PostScript file could be accurately rendered only as the cumulative result of executing all preceding commands to draw all previous pages—any of which could affect subsequent pages—plus the commands to draw that particular page. There was no easy way to bypass that process to skip around to different pages.

PDF enforces the rule that the code for any particular page cannot affect any other pages. That rule is strongly recommended for PostScript too but has to be implemented explicitly (see, e.g., the Document Structuring Conventions), as PostScript is a full programming language. All data required for rendering is included within the file itself, improving portability.

Its disadvantages are:

A loss of flexibility and limitation to a single use case
A (sometimes much) larger file size

PDF File Structure and Format

A PDF file is organized using ASCII characters, except for certain elements that may have binary content. The file starts with a header containing a magic number (as a readable string) and the version of the format (e.g., %PDF-1.7). The format is a subset of a COS ("Carousel" Object Structure) format. A COS tree file consists primarily of objects, of which there are nine types:

Boolean values (true or false)
Real numbers
Integers
Strings (enclosed in parentheses or hexadecimal in angle brackets)
An index table (cross-reference table)

The cross-reference table, located near the end of the file, gives the byte offset of each indirect object from the start of the file. This allows efficient random access and small updates without rewriting the whole file (incremental update).

PDF 1.5 introduced cross-reference streams in binary format. A hybrid-reference PDF may include both traditional and binary formats for compatibility.

PDF Imaging Model (Graphics Rendering)

PDF graphics use a device-independent Cartesian coordinate system. It supports transformations (scale, rotate, skew) using a transformation matrix.

The graphics state (25 properties as of PDF 2.0) includes:

Current transformation matrix (CTM)
Clipping path
Color space
Transparency
Black point compensation

Vector Graphics in PDF

PDF uses paths (lines, Bézier curves, text outlines) that can be stroked, filled, clipped, or used for patterns.

Raster Images (Image XObjects)

Called Image XObjects, they include:

A dictionary of image properties
A stream of raw data (often compressed)

Filters include:

ASCII85Decode
ASCIIHexDecode

How Text Is Stored in PDFs

Text is drawn using text elements in content streams. Fonts can be:

Embedded (e.g., TrueType, OpenType, Type 1, Type 3)
Unembedded (system fonts)

Tagged PDF and Accessibility

Logical Structure and Accessibility

Tagged PDFs (ISO 32000 clause 14.8) allow reliable text extraction and screen reader compatibility. PDF/A and PDF/UA are compliance standards.

Layers in PDF (Optional Content Groups)

Introduced in PDF 1.5. Layers can be toggled by the viewer—useful for CAD, multi-language documents, and maps.

PDF Metadata and Document Info

PDF supports two types:

Document Information Dictionary (deprecated in PDF 2.0)
Metadata Streams using XMP (ISO standard)

PDF 2.0 allows metadata to be attached to any object (e.g., fonts, images, catalog).

Embedding Multimedia in PDFs

Rich Media PDFs can contain:

Audio, video
Buttons and product previews (e.g., digital catalogs)
Embedded or linked media

PDF Forms: AcroForms vs XFA

PDF supports two types:

AcroForms (PDF 1.2) – text boxes, buttons, JS, etc.
XFA Forms (PDF 1.5, deprecated in PDF 2.0)

PDF Security Issues and Vulnerabilities

In 2019–2021, researchers revealed several PDF security issues:

Exfiltration of plaintext
Shadow attacks (misusing flexibility in the spec)
Malware (e.g., Peachy worm in 2001)
Hidden scripts in embedded PDFs
Browser-triggered PDF exploits
Multiple vulnerabilities in Adobe Reader and others
Arbitrary code execution via malicious attachments

PDF Viewing and Editing Software

Creation & Viewing:

PDF printers (macOS, Linux, Windows, PDFTeX, Ghostscript, DocBook tools)
LibreOffice, Word (2007 SP2+), Scribus, WordPerfect

Editing:

Adobe Acrobat
Foxit Reader (Windows, macOS, Linux)
PDFedit (GNU)
Web-based annotation tools

Native Display Model in macOS

Quartz graphics in macOS is based on the PDF model (aka Display PDF). Preview app and Safari 2.0+ support native rendering.

Screenshots in early macOS versions saved as PDF (10.0–10.3), later changed to PNG.

PDF in Printing: RIPs, Prepress, and Workflow Tools

PDF is the standard for print workflows.

RIPs like Adobe PDF Print Engine, JAWS, Harlequin
First native RIP for PDF: JAWS (1993), Harlequin (1997)
First prepress system: Agfa Apogee (1997)
PDF accepted as standard in 2006 Printing Summit

Annotations and Markups in PDF

PDF annotations allow users to:

Highlight,
Add comments,
Draw shapes,
Use stamps or attachments.

As we continue to build tools that handle sensitive PDF files, this deep understanding of the format helps us protect your data while delivering lightning-fast performance. Learn more about how we handle encryption and rendering here.