Back to Blog
Sunday, 13 July 2025
What Is PDF? History, Structure, and why It Still Matters in 2025.
Posted by

This blogs includes:
- Introduction
- PDF Encryption and Digital Signatures
- History and Evolution of PDF
- How PDF Files Work (Technical Details)
- PDF’s Core Technologies Explained
- PostScript and Its Role in PDF
- PDF File Structure and Format
- PDF Imaging Model (Graphics Rendering)
- Vector Graphics in PDF
- Raster Images (Image XObjects)
- How Text Is Stored in PDFs
- Tagged PDF and Accessibility
- Layers in PDF (Optional Content Groups)
- PDF Metadata and Document Info
- Embedding Multimedia in PDFs
- PDF Forms: AcroForms vs XFA
- PDF Security Issues and Vulnerabilities
- PDF Viewing and Editing Software
- Native Display Model in macOS
- PDF in Printing: RIPs, Prepress, and Workflow Tools
- Annotations and Markups in PDF
At Moainex Taskspace, we deal with PDFs daily. Whether converting, encrypting, or viewing — understanding the inner workings of the format is core to how we design secure and efficient features.
Introduction

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents — including text formatting and images — in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images, and other information needed to display it.
PDF has its roots in “The Camelot Project” initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008.
PDF Encryption and Digital Signatures
A PDF file may be encrypted for security — in which case, a password is required to view or edit its contents. PDF 2.0 defines 256-bit AES encryption as the standard. The PDF Reference also outlines methods for third parties to implement their own encryption systems.
PDF files can also be digitally signed to provide secure authentication. Complete details on implementing digital signatures in PDFs are provided in ISO 32000-2.
History and Evolution of PDF
The development of PDF began in 1991 when John Warnock wrote a paper for a project then code-named Camelot, in which he proposed the creation of a simplified version of PostScript called Interchange PostScript (IPS). Unlike traditional PostScript, which was tightly focused on rendering print jobs to output devices, IPS would be optimized for displaying pages to any screen and on any platform.
Adobe Systems made the PDF specification available free of charge in 1993. In the early years, PDF was popular mainly in desktop publishing workflows, and competed with several other formats, including DjVu, Envoy, Common Ground Digital Paper, Farallon Replica, and even Adobe's own PostScript format.
PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extensions for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized, and their specification is published only on Adobe's website. Many of them are not supported by popular third-party implementations of PDF.
ISO published version 2.0 of PDF, ISO 32000-2, in 2017 (available for purchase), replacing the free specification provided by Adobe. In December 2020, the second edition of PDF 2.0, ISO 32000-2:2020, was published, with clarifications, corrections, and critical updates to normative references. PDF 2.0 does not include any proprietary technologies as normative references. In April 2023, the PDF Association made ISO 32000-2 available for download free of charge.
How PDF Files Work (Technical Details)
A PDF file is often a combination of vector graphics, text, and bitmap graphics.
- Typeset text stored as content streams (i.e., not encoded in plain text)
- Vector graphics for illustrations and designs that consist of shapes and lines
- Raster graphics for photographs and other types of images
- Other multimedia objects
In later PDF revisions, a PDF document can also support links (inside the document or to web pages), forms, JavaScript (initially available as a plugin for Acrobat 3.0), or any other types of embedded content that can be handled using plug-ins.
PDF’s Core Technologies Explained
- An equivalent subset of the PostScript page description programming language (in declarative form) for generating the layout and graphics
- A font-embedding/replacement system to allow fonts to travel with the documents
- A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate
PostScript and Its Role in PDF
PostScript is a page description language run in an interpreter to generate an image. It can handle graphics and has standard features of programming languages such as branching and looping. PDF is a subset of PostScript, simplified to remove such control flow features, while graphics commands remain.
PostScript was originally designed for a drastically different use case: transmission of one-way linear print jobs, in which the PostScript interpreter would collect a series of commands until it encountered the showpage command, then execute all the commands to render a page as a raster image to a printing device. PostScript was not intended for long-term storage and real-time interactive rendering of electronic documents to computer monitors. So there was no need to support anything other than consecutive rendering of pages.
If there was an error in the final printed output, the user would correct it at the application level and send a new print job in the form of an entirely new PostScript file. Thus, any given page in a PostScript file could be accurately rendered only as the cumulative result of executing all preceding commands to draw all previous pages—any of which could affect subsequent pages—plus the commands to draw that particular page. There was no easy way to bypass that process to skip around to different pages.
PDF enforces the rule that the code for any particular page cannot affect any other pages. That rule is strongly recommended for PostScript too but has to be implemented explicitly (see, e.g., the Document Structuring Conventions), as PostScript is a full programming language. All data required for rendering is included within the file itself, improving portability.
Its disadvantages are:
- A loss of flexibility and limitation to a single use case
- A (sometimes much) larger file size
PDF File Structure and Format
A PDF file is organized using ASCII characters, except for certain elements that may have binary content. The file starts with a header containing a magic number (as a readable string) and the version of the format (e.g., %PDF-1.7). The format is a subset of a COS ("Carousel" Object Structure) format. A COS tree file consists primarily of objects, of which there are nine types:
- Boolean values (true or false)
- Real numbers
- Integers
- Strings (enclosed in parentheses or hexadecimal in angle brackets)
- An index table (cross-reference table)
The cross-reference table, located near the end of the file, gives the byte offset of each indirect object from the start of the file. This allows efficient random access and small updates without rewriting the whole file (incremental update).
PDF 1.5 introduced cross-reference streams in binary format. A hybrid-reference PDF may include both traditional and binary formats for compatibility.
PDF Imaging Model (Graphics Rendering)
PDF graphics use a device-independent Cartesian coordinate system. It supports transformations (scale, rotate, skew) using a transformation matrix.
The graphics state (25 properties as of PDF 2.0) includes:
- Current transformation matrix (CTM)
- Clipping path
- Color space
- Transparency
- Black point compensation
Vector Graphics in PDF
PDF uses paths (lines, Bézier curves, text outlines) that can be stroked, filled, clipped, or used for patterns.
Raster Images (Image XObjects)
Called Image XObjects, they include:
- A dictionary of image properties
- A stream of raw data (often compressed)
Filters include:
- ASCII85Decode
- ASCIIHexDecode
How Text Is Stored in PDFs
Text is drawn using text elements in content streams. Fonts can be:
- Embedded (e.g., TrueType, OpenType, Type 1, Type 3)
- Unembedded (system fonts)
Tagged PDF and Accessibility
Logical Structure and Accessibility
Tagged PDFs (ISO 32000 clause 14.8) allow reliable text extraction and screen reader compatibility. PDF/A and PDF/UA are compliance standards.
Layers in PDF (Optional Content Groups)
Introduced in PDF 1.5. Layers can be toggled by the viewer—useful for CAD, multi-language documents, and maps.
PDF Metadata and Document Info
PDF supports two types:
- Document Information Dictionary (deprecated in PDF 2.0)
- Metadata Streams using XMP (ISO standard)
PDF 2.0 allows metadata to be attached to any object (e.g., fonts, images, catalog).
Embedding Multimedia in PDFs
Rich Media PDFs can contain:
- Audio, video
- Buttons and product previews (e.g., digital catalogs)
- Embedded or linked media
PDF Forms: AcroForms vs XFA
PDF supports two types:
- AcroForms (PDF 1.2) – text boxes, buttons, JS, etc.
- XFA Forms (PDF 1.5, deprecated in PDF 2.0)
PDF Security Issues and Vulnerabilities
In 2019–2021, researchers revealed several PDF security issues:
- Exfiltration of plaintext
- Shadow attacks (misusing flexibility in the spec)
- Malware (e.g., Peachy worm in 2001)
- Hidden scripts in embedded PDFs
- Browser-triggered PDF exploits
- Multiple vulnerabilities in Adobe Reader and others
- Arbitrary code execution via malicious attachments
PDF Viewing and Editing Software
Creation & Viewing:
- PDF printers (macOS, Linux, Windows, PDFTeX, Ghostscript, DocBook tools)
- LibreOffice, Word (2007 SP2+), Scribus, WordPerfect
Editing:
- Adobe Acrobat
- Foxit Reader (Windows, macOS, Linux)
- PDFedit (GNU)
- Web-based annotation tools
Native Display Model in macOS
Quartz graphics in macOS is based on the PDF model (aka Display PDF). Preview app and Safari 2.0+ support native rendering.
Screenshots in early macOS versions saved as PDF (10.0–10.3), later changed to PNG.
PDF in Printing: RIPs, Prepress, and Workflow Tools
PDF is the standard for print workflows.
- RIPs like Adobe PDF Print Engine, JAWS, Harlequin
- First native RIP for PDF: JAWS (1993), Harlequin (1997)
- First prepress system: Agfa Apogee (1997)
- PDF accepted as standard in 2006 Printing Summit
Annotations and Markups in PDF
PDF annotations allow users to:
- Highlight,
- Add comments,
- Draw shapes,
- Use stamps or attachments.
As we continue to build tools that handle sensitive PDF files, this deep understanding of the format helps us protect your data while delivering lightning-fast performance. Learn more about how we handle encryption and rendering here.