OWASP A08:2021 Software and Data Integrity Failures Explained

TL;DR

Software and Data Integrity Failures is a new category in the OWASP Top 10 (2021), created in direct response to the SolarWinds-era wave of supply chain attacks. It absorbs the older Insecure Deserialization entry (A08:2017) and broadens it with build pipeline tampering, unverified auto-updates, untrusted plugins, and dependency confusion. The unifying theme: code, configuration, or data is loaded or applied without anyone verifying that it has not been tampered with. Detection blends source-code analysis for deserialization sinks with composition analysis for the dependency tree and signed-build verification at the platform layer. Prevention starts with signed packages, pinned dependencies, SBOMs, and a SLSA-aligned build pipeline.

The 2020 SolarWinds compromise was the moment software supply chain security stopped being an academic conversation. Attackers did not exploit a vulnerability in the published binary — they tampered with the build pipeline that produced it. A trusted vendor's signed update became the delivery mechanism into thousands of downstream organizations, including federal agencies. When OWASP rebuilt the Top 10 a year later, the lesson was unmistakable: the threat model had to expand beyond the running application to cover everything that produces it. A08:2021 Software and Data Integrity Failures is that expansion, formally encoded.

The category is genuinely new in 2021 in title and framing, but it is not new in substance. It absorbs the previous A08:2017 Insecure Deserialization entry — Java readObject gadget chains and Python pickle abuse have been in scope for years — and adds the supply chain integrity issues that SolarWinds, CodeCov, and event-stream made impossible to ignore. The unifying property is that a system trusts code, configuration, or data without verifying that it has not been tampered with. Whether the untrusted artifact is a deserialized object, a CI/CD plugin, an auto-update binary, or a polyfill loaded from a CDN, the defect shape is the same.

What A08 Covers

A08 covers two intertwined classes of weakness. The software integrity half deals with code and infrastructure: build pipeline tampering, unsigned package installs, auto-updates pulled from servers without integrity checks, untrusted CI/CD plugins, container images pulled by mutable tag, and dependency confusion attacks where a private package name is squatted on a public registry. The shared characteristic is that executable code or build inputs reach the running system without anyone verifying the chain of custody back to a trusted publisher.

The data integrity half deals with serialized state: untrusted data being deserialized by a runtime that interprets it as instructions for object reconstruction. Java's ObjectInputStream.readObject(), Python's pickle.loads(), PHP's unserialize(), .NET's BinaryFormatter, and Ruby's Marshal.load all share the same flaw class: the deserialization step instantiates types and invokes their callbacks based on what is in the byte stream, so a crafted stream can trigger arbitrary execution through a chain of class side effects known as a gadget chain. Cookies, message queue payloads, session blobs, and inter-service RPC are the typical entry points.

What ties the two halves together is integrity verification — or its absence. A signed package that the installer verifies before extracting closes the supply chain side. A safe data-interchange format like JSON with strict schema validation closes the deserialization side. Both depend on the same discipline: do not act on untrusted bytes, whether those bytes are a binary update or a serialized object, without first verifying their origin or constraining what they are allowed to express.

Common Patterns

Native deserialization of untrusted data. Java's readObject reading from a session cookie, Python's pickle.loads reading from a message queue, PHP's unserialize on a request parameter. The exploit primitive is a gadget chain — a sequence of classes already on the classpath whose callbacks combine to execute attacker-controlled commands during the deserialization itself. The Apache Commons Collections gadget chain published in 2015 turned this into a remote code execution primitive against any Java application that accepted serialized objects from untrusted sources, and similar chains exist for almost every major runtime.

Untrusted CI/CD plugins. A pipeline that pulls a third-party Action, Orb, or marketplace plugin by mutable reference rather than pinned commit hash. Anyone who compromises the plugin upstream can push code that runs inside the trusted build environment, with the build's secrets and the right to publish the resulting artifact under the project's identity. The blast radius is enormous because the plugin runs with the pipeline's full privilege.

Missing package signature verification. A package manager configured to skip signature checks, an internal mirror that strips signatures, or a private registry that publishes unsigned tarballs. Without signature verification at install time, a compromised registry or a man-in-the-middle on the download path can swap any package for a malicious one and the consumer has no way to detect it.

Auto-update without integrity check. Desktop applications, IoT firmware, and even some server agents that periodically poll an update server, download a new binary, and execute it without verifying a signature. A compromised update server — or any attacker who can intercept the download — can push backdoored binaries to every customer. SolarWinds was a sophisticated variant of this pattern at the build pipeline level.

Browser polyfills and CDN-loaded scripts. A site that includes a script tag pointing at a third-party CDN gives that CDN full execution capability inside the page on every load. If the CDN owner is bought out, the domain expires, or the operator is compromised, the consequence is silent malicious code injection across every site that uses the script. The 2024 polyfill.io incident was the textbook case.

Dependency confusion. An organization uses a private package named, say, @acme/internal-utils. An attacker publishes a package with the same name on the public npm or PyPI registry with a higher version number. A misconfigured installer, looking at both registries, picks the higher-versioned public package — which now runs malicious code in the build. Alex Birsan's 2021 research demonstrated this against more than 35 major companies, including Apple, Microsoft, and Tesla.

Container images pulled by mutable tag. A deployment that references node:18 rather than node:18@sha256:... trusts whoever controls the upstream tag to never push a malicious image under the same label. A registry compromise becomes a deployment compromise the next time the image is pulled.

Real-World Incidents

SolarWinds (2020). The defining incident for this category. Attackers, later attributed to a nation-state group, compromised the build environment for SolarWinds' Orion network monitoring product and inserted a backdoor known as SUNBURST into a routine signed software update. Roughly 18,000 organizations downloaded the trojanized update, and the attackers selectively activated the implant against high-value targets, including multiple US federal agencies and major enterprises. The build pipeline itself, not the source repository, was the point of compromise — making conventional source review and binary scanning blind to the attack.

CodeCov bash uploader (2021). Attackers gained access to CodeCov's build process and modified the popular bash uploader script that thousands of CI pipelines fetched and executed on every build. The modification quietly exfiltrated environment variables — including CI secrets, cloud credentials, and source code — from any pipeline that ran the compromised script during the roughly two-month window before discovery. The attack vector was a tampered script, fetched live and executed without integrity verification, exactly the A08 pattern.

event-stream npm package (2018). A widely used JavaScript library called event-stream, with around two million weekly downloads, was handed off by its original maintainer to a contributor who had offered to help. That contributor published a release that added a new dependency, flatmap-stream, which contained obfuscated code targeting users of the Copay cryptocurrency wallet — one of the few applications that included event-stream as a transitive dependency. The attack demonstrated how trust transfer in the open-source maintainer model can become a supply chain attack vector all on its own.

polyfill.io (2024). The polyfill.io domain, which served JavaScript polyfills to hundreds of thousands of websites including major brands, was acquired by a new operator in early 2024. Within months, security researchers observed the CDN serving malicious code to mobile visitors of sites that referenced it. Sites that had been trusting the CDN by simple script tag for years suddenly delivered drive-by malware to their users. The incident kicked off an industry-wide effort to remove third-party CDN dependencies and verify any remaining ones with subresource integrity hashes.

Apache Commons Collections (2015 onward). The publication of practical Java deserialization gadget chains in commons-collections turned what had been a theoretical concern into a steady stream of remote code execution exploits across enterprise Java applications. WebSphere, JBoss, Jenkins, and many others shipped fixes specifically because their endpoints accepted serialized Java objects from untrusted sources and the gadget chains made exploitation trivial. The lesson was permanent: native binary deserialization is unsafe by design when the input is not authenticated.

Relevant CWE Mappings

A08:2021 aggregates ten underlying CWEs in the official OWASP mapping. The six below are the ones most teams will encounter in scanner output and finding tickets, and they cover both halves of the category — software integrity and data integrity.

CWE	Title	Where It Shows Up
CWE-345	Insufficient Verification of Data Authenticity	Umbrella CWE for failing to verify the origin or integrity of any data the system acts on
CWE-353	Missing Support for Integrity Check	Protocols and file formats that lack any built-in integrity verification field
CWE-494	Download of Code Without Integrity Check	Auto-update downloads, plugin installers, and CDN-loaded scripts without checksum or signature verification
CWE-502	Deserialization of Untrusted Data	Java `readObject`, Python `pickle.loads`, PHP `unserialize`, .NET `BinaryFormatter` on attacker-reachable input
CWE-829	Inclusion of Functionality from Untrusted Control Sphere	Dynamically loading remote scripts, plugins, or modules from sources outside the application's control
CWE-915	Improperly Controlled Modification of Dynamically-Determined Object Attributes	Mass-assignment-style flaws where an object's attributes are set from untrusted input without an allow-list

CWE-502 is the workhorse of this category in scanner output — almost every deserialization finding maps here. CWE-494 is the supply chain side of the same shape, applied to executable downloads rather than serialized objects. CWE-829 is the umbrella for browser polyfills, CDN scripts, and remote plugin loading. CWE-345 is the parent that the others descend from when no more specific entry fits.

Detection: SAST, SCA, SBOM, and Admission Control

A08 spans both source-level patterns and build-and-deploy-time controls, so detection is necessarily layered. No single tool catches both the deserialization sink in code and the unsigned image admitted at deploy. A serious program runs each detection at the layer where it is cheapest to operate.

SAST is the strongest layer for the data integrity half. A taint-aware static analyzer flags deserialization sinks reading from untrusted sources directly: ObjectInputStream.readObject on a request body, pickle.loads on a queue message, unserialize on a cookie. It also catches missing signature verification on update downloads — a network fetch followed by a write-and-execute with no intervening checksum or signature call. Hardcoded download URLs without integrity hashes show up the same way. GraphNode SAST uses interprocedural taint propagation to trace these patterns across method boundaries, which is what catches the multi-hop cases where the deserialized data passes through a deserializer wrapper before reaching the dangerous call.

SCA covers the supply chain half. Walking the full transitive dependency tree and matching every component-version pair against vulnerability advisories catches dependencies pulled from compromised sources, packages with known malicious versions, and the dependency drift that leaves projects on a vulnerable release months after a fix is published. GraphNode SCA generates the SBOM that downstream consumers need and tracks vulnerable component-version pairs continuously, including the transitive depth where most supply chain risk actually lives.

Signed-image verification at admission closes the deploy-time gap. Kubernetes admission controllers (Sigstore policy-controller, Kyverno, OPA Gatekeeper) can reject any container image that does not carry a valid signature from an approved key. Pinning images by digest rather than tag and validating the signature at admission converts a registry compromise from a deploy-time disaster into a denied admission decision. The pattern works equally well for serverless function packages and OS package installs that support signature pinning.

SBOM as visibility primitive. A Software Bill of Materials does not by itself prevent supply chain attacks, but it is the prerequisite for responding to them. When the next Log4Shell-class disclosure lands, the question "which of our deployments includes the vulnerable component-version pair?" can only be answered with confidence if an SBOM was generated at build time and stored alongside each release. SCA and SBOM together are the visibility layer; admission control and signing are the enforcement layer.

SLSA and Signed Builds

The most direct industry response to the SolarWinds-class threat is the SLSA framework — Supply-chain Levels for Software Artifacts — originated at Google and now stewarded by the OpenSSF. SLSA does not scan code or block deploys by itself. It is a specification for the integrity of the build process and the provenance attestations that prove a given artifact came from a specified source through a specified builder. SLSA Level 2, achievable on GitHub Actions or any other hosted build platform that signs provenance automatically, is the most pragmatic starting point and closes the largest single class of supply chain risk relative to effort.

Pairing SLSA with Sigstore — the OpenSSF project that provides keyless signing through OIDC identity attestation, with transparency-log backing in Rekor — gives every release a signed provenance document that downstream consumers can verify before installing or deploying. Cosign for container images, slsa-verifier for arbitrary artifacts, and the in-toto attestation framework for the underlying envelope format are the open implementations that turn the spec into an actual control. None of this prevents a determined attacker from compromising the source itself, but it raises the bar dramatically against the build-pipeline tampering pattern that A08 exists to address.

Prevention

Avoid native serialization on untrusted boundaries. Java's ObjectInputStream, Python's pickle, PHP's unserialize, and .NET's BinaryFormatter should not run on any data that crosses a trust boundary. If the goal is data interchange, use JSON, Protocol Buffers, or another format that does not invoke object construction during parse. If serialized state must traverse an untrusted channel, sign it with HMAC and verify the signature before deserializing. Microsoft has formally deprecated BinaryFormatter for exactly this reason.

Sign packages and verify on install. Use Sigstore cosign for container images, Maven Central's GPG signatures for Java artifacts, npm's package signatures and provenance attestations, and similar mechanisms wherever they exist. The verification step at install time is what closes the loop; signing without enforcing verification provides no security benefit.

Generate and ship an SBOM with every release. SPDX or CycloneDX format, attached to the release, listing every direct and transitive dependency with its version. This is increasingly a procurement requirement for federal contractors under NIST SP 800-218 (SSDF) and Executive Order 14028, and it is the precondition for responding quickly to the next Log4Shell-scale disclosure.

Pin dependencies and use lockfiles. A lockfile that records exact versions and integrity hashes is the difference between reproducible builds and rolling exposure. package-lock.json, poetry.lock, Cargo.lock, go.sum, and Pipfile.lock all serve this purpose and should be committed to source control. For container deployments, pin images by SHA256 digest, never by mutable tag.

Use trusted base images and registries. Pull from official, verifiable upstream registries or from an internal mirror that has its own signature verification. Scan base images on ingest. Maintain a curated allow-list for production images and reject anything outside it at admission.

Target SLSA Level 2 or higher for build provenance. Move builds onto a hosted platform that automatically generates signed provenance, store the attestations alongside the artifacts, and verify them before deployment. The combination of provenance, signing, and admission control is what turns the SolarWinds attack pattern into a denied deploy decision rather than a year-long undetected breach.

Defend against dependency confusion. Configure package managers to scope private packages explicitly to the internal registry. Reserve your organization's namespace on public registries even if you do not publish there. Many incidents would not have happened if the squatted name had simply been claimed defensively.

Where GraphNode Fits

GraphNode SAST addresses the source-code half of A08 with deep interprocedural data flow analysis. Deserialization sinks reading from untrusted sources — ObjectInputStream.readObject, pickle.loads, unserialize, BinaryFormatter.Deserialize — are exactly the pattern the engine is built to surface. It also flags missing signature verification on update downloads, hardcoded download URLs without integrity checks, and dynamic class loading from request-controlled paths. Coverage spans 13+ languages with 780+ rules, including the deserialization and supply chain integrity patterns relevant to this category.

GraphNode SCA addresses the supply chain half. Walking the full transitive dependency tree, matching every component-version pair against vulnerability advisories, and producing an SBOM for each release covers the visibility primitives that A08 prevention explicitly requires. For background on how the supply chain shape connects to known incidents, see Log4Shell and the supply chain lesson and what is SLSA.

The honest position is that GraphNode covers two of the three layers this category needs. The third layer — signed builds, provenance attestations, and admission-time verification — is the SLSA-and-Sigstore stack on the build platform, which sits outside static analysis but pairs naturally with the SAST and SCA outputs. For the broader category map, see the OWASP Top 10 hub.

Frequently Asked Questions

What are software and data integrity failures?

Software and data integrity failures are cases where code, infrastructure, or serialized data is updated, loaded, or processed without verifying that it has not been tampered with. The category covers two sides: the software integrity side (build pipeline tampering, unsigned package installs, auto-updates without signature checks, untrusted CI/CD plugins, dependency confusion, browser polyfills from untrusted CDNs) and the data integrity side (deserialization of untrusted data into native object types, which historically was a category of its own). It is OWASP A08:2021, a new category in the 2021 edition created in response to the SolarWinds incident and the broader wave of supply chain attacks.

Is insecure deserialization the same as A08?

Insecure deserialization is one half of A08:2021, not the whole category. In the 2017 edition of the OWASP Top 10, insecure deserialization was its own entry (A08:2017). In the 2021 reorganization, OWASP folded it into the broader Software and Data Integrity Failures category along with build pipeline integrity, unsigned updates, and untrusted plugin loading. The unifying property is the failure to verify integrity before acting on data — whether that data is a serialized object reaching readObject or a binary update reaching the auto-installer. Deserialization findings still map cleanly to CWE-502 within the A08 umbrella.

What is a supply chain attack?

A supply chain attack is one in which the attacker compromises a trusted upstream component — a build pipeline, a software publisher, an open-source maintainer, a CI/CD plugin, or a package registry — to reach the downstream consumers of that component. SolarWinds in 2020 (compromised build pipeline pushing a backdoor through a signed update), CodeCov in 2021 (tampered bash uploader script exfiltrating CI secrets), event-stream in 2018 (npm package handed off to a malicious maintainer), and polyfill.io in 2024 (CDN compromised after acquisition) are canonical examples. The defining characteristic is that the victim's defenses against direct attack are bypassed because the malicious code arrives through a trusted channel.

How does SLSA help with A08?

SLSA — Supply-chain Levels for Software Artifacts — is a framework, originated at Google and stewarded by OpenSSF, for hardening the build pipeline and producing signed provenance attestations that downstream consumers can verify. SLSA does not scan code or block deploys by itself. It defines what good build integrity looks like in measurable terms, with Levels 1 through 3 covering progressively stronger guarantees about how an artifact was built. Pairing SLSA with Sigstore for keyless signing and admission-time verification with cosign or slsa-verifier converts the SolarWinds-class threat from an undetected breach into a denied deploy decision. SLSA Level 2 is the pragmatic starting point for most organizations.

Can SAST detect A08 software and data integrity failures?

SAST detects the data integrity half of A08 directly. A taint-aware static analyzer flags deserialization sinks reading from untrusted sources — Java ObjectInputStream.readObject, Python pickle.loads, PHP unserialize, .NET BinaryFormatter.Deserialize — and traces the input across method boundaries to confirm reachability. It also catches missing signature verification on update downloads, hardcoded download URLs without integrity hashes, and dynamic class loading from request-controlled paths. For the supply chain half — vulnerable transitive dependencies, dependency confusion exposure, missing SBOMs — SCA is the right layer. Build provenance and signed-image admission control sit outside both SAST and SCA and require platform-level tooling such as SLSA-conformant build platforms and Sigstore.