Next.js logo
HomeBlogs

Back to Blog

Sunday, 13 July 2025

What Is PDF? History, Structure, and why It Still Matters in 2025.

Posted by

Moainex LogoHuma Raisha

This blogs includes:

At Moainex Taskspace, we deal with PDFs daily. Whether converting, encrypting, or viewing — understanding the inner workings of the format is core to how we design secure and efficient features.

Introduction

Illustration of secure file upload with shield icon

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents — including text formatting and images — in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images, and other information needed to display it.

PDF has its roots in “The Camelot Project” initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008.

PDF Encryption and Digital Signatures

A PDF file may be encrypted for security — in which case, a password is required to view or edit its contents. PDF 2.0 defines 256-bit AES encryption as the standard. The PDF Reference also outlines methods for third parties to implement their own encryption systems.

PDF files can also be digitally signed to provide secure authentication. Complete details on implementing digital signatures in PDFs are provided in ISO 32000-2.

History and Evolution of PDF

The development of PDF began in 1991 when John Warnock wrote a paper for a project then code-named Camelot, in which he proposed the creation of a simplified version of PostScript called Interchange PostScript (IPS). Unlike traditional PostScript, which was tightly focused on rendering print jobs to output devices, IPS would be optimized for displaying pages to any screen and on any platform.

Adobe Systems made the PDF specification available free of charge in 1993. In the early years, PDF was popular mainly in desktop publishing workflows, and competed with several other formats, including DjVu, Envoy, Common Ground Digital Paper, Farallon Replica, and even Adobe's own PostScript format.

PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extensions for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized, and their specification is published only on Adobe's website. Many of them are not supported by popular third-party implementations of PDF.

ISO published version 2.0 of PDF, ISO 32000-2, in 2017 (available for purchase), replacing the free specification provided by Adobe. In December 2020, the second edition of PDF 2.0, ISO 32000-2:2020, was published, with clarifications, corrections, and critical updates to normative references. PDF 2.0 does not include any proprietary technologies as normative references. In April 2023, the PDF Association made ISO 32000-2 available for download free of charge.

How PDF Files Work (Technical Details)

A PDF file is often a combination of vector graphics, text, and bitmap graphics.

In later PDF revisions, a PDF document can also support links (inside the document or to web pages), forms, JavaScript (initially available as a plugin for Acrobat 3.0), or any other types of embedded content that can be handled using plug-ins.

PDF’s Core Technologies Explained

PostScript and Its Role in PDF

PostScript is a page description language run in an interpreter to generate an image. It can handle graphics and has standard features of programming languages such as branching and looping. PDF is a subset of PostScript, simplified to remove such control flow features, while graphics commands remain.

PostScript was originally designed for a drastically different use case: transmission of one-way linear print jobs, in which the PostScript interpreter would collect a series of commands until it encountered the showpage command, then execute all the commands to render a page as a raster image to a printing device. PostScript was not intended for long-term storage and real-time interactive rendering of electronic documents to computer monitors. So there was no need to support anything other than consecutive rendering of pages.

If there was an error in the final printed output, the user would correct it at the application level and send a new print job in the form of an entirely new PostScript file. Thus, any given page in a PostScript file could be accurately rendered only as the cumulative result of executing all preceding commands to draw all previous pages—any of which could affect subsequent pages—plus the commands to draw that particular page. There was no easy way to bypass that process to skip around to different pages.

PDF enforces the rule that the code for any particular page cannot affect any other pages. That rule is strongly recommended for PostScript too but has to be implemented explicitly (see, e.g., the Document Structuring Conventions), as PostScript is a full programming language. All data required for rendering is included within the file itself, improving portability.

Its disadvantages are:

PDF File Structure and Format

A PDF file is organized using ASCII characters, except for certain elements that may have binary content. The file starts with a header containing a magic number (as a readable string) and the version of the format (e.g., %PDF-1.7). The format is a subset of a COS ("Carousel" Object Structure) format. A COS tree file consists primarily of objects, of which there are nine types:

The cross-reference table, located near the end of the file, gives the byte offset of each indirect object from the start of the file. This allows efficient random access and small updates without rewriting the whole file (incremental update).

PDF 1.5 introduced cross-reference streams in binary format. A hybrid-reference PDF may include both traditional and binary formats for compatibility.

PDF Imaging Model (Graphics Rendering)

PDF graphics use a device-independent Cartesian coordinate system. It supports transformations (scale, rotate, skew) using a transformation matrix.

The graphics state (25 properties as of PDF 2.0) includes:

Vector Graphics in PDF

PDF uses paths (lines, Bézier curves, text outlines) that can be stroked, filled, clipped, or used for patterns.

Raster Images (Image XObjects)

Called Image XObjects, they include:

Filters include:

How Text Is Stored in PDFs

Text is drawn using text elements in content streams. Fonts can be:

Tagged PDF and Accessibility

Logical Structure and Accessibility

Tagged PDFs (ISO 32000 clause 14.8) allow reliable text extraction and screen reader compatibility. PDF/A and PDF/UA are compliance standards.

Layers in PDF (Optional Content Groups)

Introduced in PDF 1.5. Layers can be toggled by the viewer—useful for CAD, multi-language documents, and maps.

PDF Metadata and Document Info

PDF supports two types:

PDF 2.0 allows metadata to be attached to any object (e.g., fonts, images, catalog).

Embedding Multimedia in PDFs

Rich Media PDFs can contain:

PDF Forms: AcroForms vs XFA

PDF supports two types:

PDF Security Issues and Vulnerabilities

In 2019–2021, researchers revealed several PDF security issues:

PDF Viewing and Editing Software

Creation & Viewing:

Editing:

Native Display Model in macOS

Quartz graphics in macOS is based on the PDF model (aka Display PDF). Preview app and Safari 2.0+ support native rendering.

Screenshots in early macOS versions saved as PDF (10.0–10.3), later changed to PNG.

PDF in Printing: RIPs, Prepress, and Workflow Tools

PDF is the standard for print workflows.

Annotations and Markups in PDF

PDF annotations allow users to:

As we continue to build tools that handle sensitive PDF files, this deep understanding of the format helps us protect your data while delivering lightning-fast performance. Learn more about how we handle encryption and rendering here.

Faded logo

Copyright 2025 Free My Task. All rights reserved.