SCA Scanning Explained: A Complete Guide to Software Composition Analysis
Somewhere between 70 and 90 percent of a modern application is not code your team wrote. It is open-source libraries, frameworks, and utility packages that get pulled into the build by a single line in a manifest file. The bug your team writes is one risk; the bug shipped in the dependency you imported five years ago and never thought about again is another. Software Composition Analysis (SCA) is the tooling category that exists to make the second risk visible — to give you an honest, current inventory of every component in your application and a list of every known vulnerability and license obligation attached to it.
This guide walks through what an SCA scan actually does under the hood, why transitive dependencies are the part most teams underestimate, where vulnerability data comes from and why the source matters, what a useful scan report looks like, and where SCA fits alongside SAST in a complete application security program. It is vendor-neutral on purpose: the goal is to help you evaluate any SCA tool — including the one you already own — with a clearer set of expectations.
What an SCA Scan Actually Does
At a mechanical level, an SCA scan does three things. First, it generates a complete dependency tree for the project being scanned. That means parsing the package manager manifest (package.json, pom.xml, requirements.txt, go.mod, Gemfile, csproj, build.gradle, Cargo.toml, composer.json, and so on) plus the corresponding lockfile (package-lock.json, yarn.lock, pnpm-lock.yaml, poetry.lock) to enumerate not just the libraries you explicitly declared, but also every library those libraries pulled in, recursively, all the way down.
Second, the scanner takes that inventory and, for each component-version pair, queries one or more vulnerability databases. The query is typically a Package URL (purl) lookup or a coordinate match (group:artifact:version for Maven, name@version for npm) against advisory records that map vulnerable version ranges to specific CVEs.
Third, the scanner aggregates the matches and presents them with the metadata a developer or security engineer needs to act: severity, CVSS vector, the fixed version (if one exists), whether a known exploit is available, the dependency path from your project root down to the vulnerable component, and ideally the license terms attached to each component as well. The good ones also generate a Software Bill of Materials (SBOM) in CycloneDX or SPDX format, so the inventory is portable and auditable outside the tool that produced it.
What an SCA scan does not do is execute your code. It is a metadata operation: it reads what is declared, looks up what is known, and reports the intersection. That is both its strength (fast, deterministic, runs without a full build environment in many cases) and its limitation (it cannot tell you whether a vulnerable function is actually invoked at runtime — that is the job of reachability analysis, which we cover below).
Direct vs Transitive Dependencies
The most important distinction in SCA is between direct and transitive dependencies. Your manifest — package.json, pom.xml, requirements.txt — lists direct dependencies: the libraries you explicitly chose to import. The dependency tree underneath those is transitive: every library that your direct dependencies depend on, and every library those depend on, and so on. A typical Node.js project with 30 declared dependencies routinely resolves to 800 or more transitive packages. A Java project with 20 direct dependencies often drags in 200+ JARs.
The Log4Shell incident in December 2021 was the canonical demonstration of why transitive depth matters. CVE-2021-44228 affected Log4j 2, but most of the affected applications never declared Log4j directly. They declared Spring Boot, or Apache Solr, or Elasticsearch, or some enterprise Java framework that pulled in Log4j four or five layers down the tree. Teams that only audited their direct dependencies missed the vulnerability entirely. The teams that found and patched it quickly were the ones whose SCA tooling already enumerated and monitored the full transitive graph.
A useful SCA tool reports the dependency path for every finding: which direct dependency pulled the vulnerable package in, and through what intermediate hops. That path is what tells you whether the fix is "bump our direct dependency to a version with a patched transitive" or "force a transitive override in the lockfile." For a deeper look at why this matters, see our companion piece on transitive dependencies as an attack surface.
Where Vulnerability Data Actually Comes From
The accuracy and freshness of an SCA scan is bounded by the vulnerability data feeding it. There is no single canonical source; serious tools aggregate several. The most commonly used feeds are:
- NVD (National Vulnerability Database): The US government's CVE database, maintained by NIST. Comprehensive in scope but historically slow — a CVE may sit "Awaiting Analysis" for days or weeks before its CPE coordinates and CVSS score are published, which means tools relying solely on NVD can lag the disclosure timeline.
- GitHub Advisory Database (GHSA): A community + GitHub Security Lab feed that often publishes ecosystem-specific advisories (npm, Maven, pip, RubyGems, Go, NuGet, Composer, Rust) faster than NVD, with package coordinates already attached.
- OSV (Open Source Vulnerabilities): Google's open schema and aggregator that pulls from GHSA, PyPI, RustSec, Go, and other ecosystem feeds into a single queryable format with precise version ranges.
- Vendor-specific advisories: RustSec for Rust, Snyk's research database, Sonatype OSS Index, the npm Security Advisories feed, and various individual project security mailing lists.
- Reserved or pre-disclosure CVE IDs: A vulnerability may have a CVE ID assigned before the technical details are public. Some advisories enter ecosystem feeds before NVD picks them up.
The latency between disclosure and database entry is the part most buyers underestimate. Two SCA scanners pointed at the same project on the same day can report different findings purely because one's data feed has ingested an advisory the other has not. When evaluating a tool, ask explicitly which feeds it consumes, how often it refreshes them, and what its median lag is between a public CVE and the matching record being scannable.
What a Good SCA Scan Report Contains
A scan report is only as useful as the metadata it carries. At minimum, every finding should include the following columns:
| Field | Why It Matters |
|---|---|
| Component + version | Identifies the exact package and resolved version (not just a range). |
| CVE / advisory ID | Cross-reference for tracking, ticketing, and external lookups. |
| CVSS score + vector | Standard severity baseline; vector explains attack complexity and impact. |
| Fixed version | The lowest version that resolves the issue — the actionable remediation target. |
| Exploit availability | Whether public exploit code or PoC exists (e.g. CISA KEV listing, ExploitDB entry). |
| Dependency path | Direct dependency that pulled the component in, and the chain to the finding. |
| License | SPDX identifier so license risk is reportable in the same view as security risk. |
Reports that omit dependency path or fixed version force engineers to do remediation research themselves, which is where SCA programs lose momentum.
Reachability Analysis: Optional but Valuable
A vulnerable function existing inside a library you depend on is not the same as your application actually calling that function. Reachability analysis tries to close that gap. The technique builds a call graph of your application code, walks the edges into the dependency code, and asks whether any path from your entry points reaches the vulnerable symbol. If no such path exists, the vulnerability is present in the artifact but not exploitable through your application — which lets a security team deprioritize it relative to vulnerabilities that are actually reachable.
Done well, reachability is one of the highest-value features in modern SCA. A typical Node.js or Java project carries hundreds of CVEs against transitive packages, but the unreachable subset is often 60-80 percent of the total. Filtering noise of that magnitude is the difference between a backlog engineering can actually burn down and a perpetual list nobody touches.
Done poorly, reachability is dangerous. Call-graph construction is undecidable in the general case for languages with reflection, dynamic dispatch, eval, plugin loading, or runtime classloading — and almost every real-world stack uses several of these. A reachability engine that confidently marks a finding "not reachable" when the call actually happens through a Spring proxy or a Python decorator is silently false-negativing your security program. Treat reachability as a prioritization signal, not a suppression mechanism. Findings flagged unreachable still belong in the inventory; they just sink below the reachable ones.
License Compliance: The Other Half of SCA
Vulnerability detection gets the headlines, but license compliance is often what gets SCA budget approved in the first place. Open-source licenses are not all interchangeable. Permissive licenses (MIT, Apache 2.0, BSD) impose attribution but otherwise let you ship freely. Copyleft licenses are different: GPL v2 and v3 require that derivative works distributed publicly be released under the same license, and GPL is generally considered incompatible with closed-source commercial distribution.
A handful of licenses pose specific commercial risk: AGPL extends copyleft over networked services (use it inside a SaaS product and you may owe source disclosure), SSPL (the MongoDB / Elastic license) has similar service-side obligations, and commons-clause or other "source-available" hybrid licenses prohibit selling the software. SCA tools surface every component's SPDX license identifier and let policy authors flag which licenses require legal review before a build is allowed to ship. The result is that license violations get caught at the pull request, not by an external auditor during M&A due diligence.
Where to Run SCA in the Lifecycle
SCA can run almost anywhere in the development lifecycle. Each location trades off feedback speed against signal completeness:
- Pre-commit / IDE: Fastest feedback, but only useful for catching obviously-vulnerable packages a developer is about to add. Full transitive resolution at this stage is usually too slow for the inner loop.
- CI build: The standard placement. Run on every push to a feature branch, fail the build on policy violations (e.g. new critical CVEs introduced by the change). This is where most SCA tools deliver their core value.
- Pull request decoration: The best developer experience. The scan runs in CI, but findings are posted as inline PR comments and a summary check. The reviewer sees "this PR introduces 2 new criticals via spring-boot 2.7.0 -> 3.1.4 upgrade" without leaving the PR.
- Continuous monitoring of shipped releases: The most under-used placement. A new CVE disclosed today might affect a library version you shipped three months ago in a release that hasn't been rebuilt since. Continuous monitoring re-evaluates already-built artifacts against the latest advisory feed and notifies you of new exposures without requiring a fresh scan run.
- Registry / artifact gate: Block vulnerable artifacts from being published or promoted between environments. This is the strictest placement and the hardest to roll out — it requires that earlier-stage feedback already be reliable enough that nothing surprises engineering at the gate.
Most mature programs run SCA in at least two of these places: CI for new findings, and continuous monitoring for vulnerabilities disclosed against already-shipped code.
SCA + SAST Together: Different Bug Populations
SCA and SAST are sometimes treated as alternatives. They are not. They cover non-overlapping bug populations. SAST analyzes code your team wrote and finds vulnerabilities your developers introduced — injection flaws, broken authentication, insecure cryptography, business-logic mistakes. SCA analyzes code your team imported and finds vulnerabilities someone else's developers introduced and disclosed publicly. If 80 percent of your application is third-party, then SCA covers 80 percent of the codebase by line count and SAST covers the 20 percent your team is actually paid to write. You need both.
The strongest argument for buying SAST and SCA from the same vendor is workflow consolidation: one set of integrations, one policy engine, one queue of findings to triage, one reporting surface. GraphNode's platform pairs GraphNode SCA with GraphNode SAST in a single engine for exactly this reason. Whether you choose a unified platform or best-of-breed tools, the rule stands: ship neither alone. Supply-chain security is also worth understanding more broadly — see our overview of SLSA and supply-chain integrity.
Frequently Asked Questions
What is the difference between SCA and SAST?
SAST (Static Application Security Testing) analyzes the source code your team wrote, looking for vulnerability patterns like injection, insecure crypto, or broken authentication. SCA (Software Composition Analysis) analyzes the open-source components your application depends on, matching them against known CVE databases. They cover different bug populations — your code vs imported code — and a complete program needs both.
Is SCA the same as dependency scanning?
Dependency scanning is a subset of SCA. The terms are often used interchangeably, but a full SCA tool covers more than just CVE lookup against direct dependencies. It enumerates the complete transitive tree, performs license compliance checks, generates an SBOM, and ideally adds reachability analysis and continuous monitoring. "Dependency scanning" usually refers to the basic CVE-matching layer alone.
What does an SCA scan find that npm audit doesn't?
npm audit is limited to the npm Security Advisories feed and only covers JavaScript packages. A commercial SCA tool aggregates multiple feeds (NVD, GHSA, OSV, vendor research), supports every major package ecosystem, generates SBOMs in standard formats, surfaces license risk alongside security risk, supports policy-as-code for fail-the-build rules, and typically adds reachability analysis. npm audit is a useful free baseline; it is not a replacement for an enterprise SCA program.
How accurate is reachability analysis?
Reachability accuracy depends heavily on the language and the codebase. Languages with simple, statically resolvable call graphs (Go, Rust, modern Java without reflection-heavy frameworks) yield high-confidence results. Languages with dynamic dispatch, reflection, decorators, or runtime metaprogramming (Python, Ruby, JavaScript with dynamic imports, Spring with proxies) carry meaningful false-negative risk. Treat reachability as a prioritization signal — useful for ranking the backlog — not as a suppression mechanism that lets you ignore findings entirely.
What's the most important feature in an SCA tool?
Vulnerability database freshness and breadth, by a wide margin. Every other feature — reachability, auto-remediation PRs, policy engines, SBOM export — is built on top of the underlying advisory data. If the data feed lags by a week or misses ecosystem-specific advisories, every downstream feature inherits that blind spot. When evaluating tools, ask which feeds they aggregate, how often they refresh, and request a sample report against a known recent CVE to verify the lag.