Methodology
How we test, measure, and document research. Last reviewed: April 2026.
Overview
Every research asset published under /research follows the methodology documented on this page. The goal is reproducibility: a reader who downloads the published dataset and follows the steps below should arrive at the same numbers, modulo hardware differences.
Methodology decisions are taken before any test runs to avoid outcome-biased measurement. When a methodology is updated (for example, switching from PSNR to SSIM as the primary visual-quality metric), the change is dated and documented in the relevant research asset.
Source provenance and version pinning
Utilavo's current research catalog is built from published authoritative standards rather than original measurements. The rules for sourcing and version pinning are:
- Each cited standard is recorded with its full revision identifier (for example, "NIST SP 800-131A Rev 2", not just "NIST 800-131A").
- The accessed-on date is captured at publication and shown in every citation row.
- When a standard is superseded, the affected research asset is updated within 14 days and the previous citation is preserved with a "superseded by" note.
- Publicly hosted PDFs of cited standards are linked directly; paywalled standards link to the publishing body's catalog page.
If we add measurement-based research in the future (for example, a PDF compression benchmark with reproducible test environment), the methodology section will be expanded with hardware specs, library versions, and dataset construction rules. None of the assets published today rely on internal measurements.
Cipher security assessment
Cipher classifications are not original research — they reflect the current published guidance from authoritative bodies. The primary source is NIST SP 800-131A Rev 2 (Transitioning the Use of Cryptographic Algorithms and Key Lengths) and NIST SP 800-57 Part 1 Rev 5 (Recommendation for Key Management).
A cipher is labeled "deprecated" when the cited NIST publication lists it as disallowed for federal use, "legacy-use" when permitted only for decrypting existing data, and "approved" otherwise. The deprecation year column reflects the publication date of the NIST document that disallowed the algorithm.
PDF/A archival classification
The PDF/A conformance matrix is sourced from the ISO 19005 family of standards (Parts 1–4). Each conformance level (1a, 1b, 2a, 2b, 2u, 3a, 3b, 3u, and 4) is mapped to its publishing standard and its requirements for fonts, color spaces, transparency, and embedded files.
Use cases are drawn from the standard's introductory annex and from the U.S. Library of Congress Sustainability of Digital Formats catalog. The matrix is updated when ISO publishes amendments or when a new conformance level is introduced.
Data integrity
Raw measurement data for every research asset is published as a downloadable CSV linked from the article. Once published, raw values are not modified. If a re-test is performed (for example, after a library version change), the new dataset is published as a new study with its own date, not as an overwrite of the prior one.
If an error is discovered in a published dataset (a measurement that was logged incorrectly, an encoder configured wrongly), the affected rows are flagged in the CSV and a note is added to the article rather than silently editing historical data.
Methodology questions
Methodology suggestions, replication results, or measurement corrections: [email protected]. See also our editorial standards.