E-Invoice Archiving API: Adding Compliant Retention
Jun 4, 2026
Thomas Hepp
Jun 4, 2026
Content
Why "Saving the XML to S3" Fails the Audit
Anatomy of a Compliant Archiving API
Step-by-Step: Integrating the Archive via API
Client-Side Hashing vs. Server-Side Sealing
Operational Concerns: Retention, Redundancy, and Sovereignty
Build vs. Buy, in One Paragraph
Why an API Layer Survives the Next Mandate
Conclusion

Why "Saving the XML to S3" Fails the Audit
A platform team integrates e-invoicing, ships it, and moves on. Then the first tax audit arrives and someone asks a question nobody designed for: prove this invoice has not changed since the day you received it. The XML sitting in an S3 bucket cannot answer that. It has no integrity proof, no tamper-evident history, and no way to demonstrate that an administrator did not quietly edit it last quarter. That gap, between storing a file and being able to prove its state, is the difference between a weekend feature and a multi-month archiving subsystem.
This article is about avoiding that subsystem. A compliant e-invoice archiving API lets your ERP or SaaS platform hand off long-term retention with a single decoupled call, instead of building a certified archive in-house. We will walk through the actual mechanics: authentication, the seal request, the verify endpoint, multi-tenant isolation, and the asynchronous patterns that keep all of it fast at volume.
First, why the bucket fails. A compliant archive must prove immutability and carry a tamper-evident audit trail for the full statutory retention window, which is why a database blob does not qualify as an archive. The full set of properties, original-format preservation, integrity, and a complete audit trail, is covered in our guide to e-invoice archiving requirements. For an integration engineer, the practical consequence is simple: retention is its own concern, and coupling it into invoice generation turns every tax-law change into a core-code rewrite. The cleaner path is an e-invoice archiving API that delivers those guarantees behind a stable interface, without your team owning the compliance liability.
Anatomy of a Compliant Archiving API
A well-designed archiving API is built on one principle: clean separation of concerns. Invoice generation and long-term retention are different problems with different lifecycles, and the REST boundary keeps them apart. Your platform produces the invoice, in XRechnung, ZUGFeRD, or Factur-X, and hands it to the archive through a single endpoint. What happens after that is the archive's responsibility, not yours.
Cryptographic Sealing: The Technical Foundation
At the core of the API is the seal operation, which works in two layers.
SHA-256 hashing generates a unique 256-bit fingerprint of the document at the moment of ingestion. The hash functions recommended by NIST in SP 800-107 are deterministic and collision-resistant: no two different documents produce the same hash, and changing a single character produces a completely different one. That fingerprint is what later proves the document is unaltered.
AES-256 encryption protects the stored document itself. Even if the underlying storage is breached, the content stays inaccessible to anyone without the keys, including infrastructure administrators. The hash proves integrity; the encryption protects confidentiality. Together they are what separates a seal from plain at-rest encryption.
The seal also returns a cryptographic receipt, which simply records that this hash existed at this timestamp. When blockchain anchoring is enabled, the receipt includes a public-ledger transaction ID. The receipt is what you hand an auditor. Why a blockchain anchor is more durable than a conventional e-signature, and why tamper-evidence has to hold even against your own operators, is a separate topic covered in tamper-proof vs secure storage. For integration purposes you only need to know the receipt is an opaque artifact you store and replay.
Metadata That Makes Retrieval Deterministic
A compliant archive never stores documents in isolation. Every sealed invoice carries structured metadata: taxpayer IDs, invoice number, issue date, counterparty reference, and the receipt from the sealing operation. This is what turns retrieval during an audit into a deterministic query rather than a manual file hunt. Index these fields on your side too, so a request from a tax inspector resolves in seconds.
Asynchronous Processing for High-Volume Environments
Enterprise ERP systems routinely push thousands of invoices a day. A synchronous seal call, where the request blocks until the seal is confirmed, adds latency you cannot afford at that scale. A properly designed ISO/IEC 27001-aligned archiving API seals asynchronously: the invoice is queued, your application gets an immediate 202 Accepted with a seal_id, and the final receipt arrives by webhook once the seal is confirmed.
Three details separate a robust integration from a fragile one here:
- Idempotency on retries. Send a client-generated
Idempotency-Keywith every seal request. If a network timeout makes you retry, the API recognizes the key and returns the originalseal_idinstead of sealing the same invoice twice. Without it, a flaky connection quietly creates duplicate archived records. - Webhook signature verification. The seal-confirmation webhook should carry an HMAC signature over the payload. Verify it before trusting the callback, otherwise a spoofed POST could mark an unsealed invoice as sealed. Treat webhook delivery as at-least-once and make your handler idempotent on
seal_id. - Bulk batching. For backfills or end-of-month runs, a bulk-seal endpoint that accepts an array of documents amortizes round-trip cost and lets you stay inside rate limits instead of hammering the single-seal route.
This is the part of the integration worth getting right, because a silent failure in the queue means an invoice that looks archived but cannot be proven later.
Step-by-Step: Integrating the Archive via API
The following walkthrough reflects the standard implementation pattern ERP and SaaS vendors use when embedding the archive.
Step 1: Authentication and Header Setup
Every call is authenticated. The two standard mechanisms are OAuth 2.0 bearer tokens (RFC 6749) and static API keys. Use OAuth 2.0 for multi-tenant deployments and reserve static keys for simple server-to-server jobs.
In a multi-tenant ERP, OAuth 2.0 is not a preference, it is a prerequisite. Each tenant gets its own token scope, so a request authenticated for Customer A cannot read or modify Customer B's records. In shared-infrastructure environments that scoping is also what keeps you on the right side of GDPR.
Standard required headers for every archiving request:
Authorization: Bearer {token}
Content-Type: application/json
X-Tenant-ID: {tenant_identifier}
Idempotency-Key: {client_generated_uuid}
X-Correlation-ID: {unique_request_id}
The X-Correlation-ID links each call to a specific business transaction, which makes log analysis during an audit deterministic. The Idempotency-Key is what protects you on retries, as described above.
Step 2: The Seal Request
The seal request is the core operation. The payload carries the document (base64-encoded XML or PDF), its metadata, and the requested retention period.
{
"document": "{base64_encoded_content}",
"document_type": "xrechnung",
"metadata": {
"invoice_number": "INV-2025-00847",
"issue_date": "2025-06-15",
"supplier_vat_id": "DE123456789",
"retention_years": 10
}
}
The response returns the cryptographic receipt: the SHA-256 hash, the seal timestamp, the seal status, and, when anchoring is enabled, the ledger transaction ID.
{
"seal_id": "seal_9f3a2b...",
"sha256_hash": "e3b0c44298fc...",
"sealed_at": "2025-06-15T14:32:01Z",
"blockchain_tx": "0x4a7f...",
"status": "sealed"
}
Store this receipt in your own database. It is the evidence artifact you present to auditors. Plan for the status codes too: 202 while a seal is queued, 200 once confirmed, 409 when an idempotency key replays an existing seal, and 422 when the document fails format validation before it ever reaches the seal queue.
Step 3: The Verify Endpoint
Verification is the mirror of sealing. When an auditor asks for proof that an invoice is unchanged, your system retrieves the stored document, recomputes its SHA-256 hash, and calls the verify endpoint with the current hash and the original seal ID.
GET /v1/seals/{seal_id}/verify?current_hash={sha256_of_current_document}
The response is binary: the hashes match (intact) or they do not (tampered). This turns audit verification into a programmatic check instead of manual document inspection, and it produces an integrity result you can defend in front of a regulator.
Step 4: Multi-Tenant Isolation
For a vendor serving hundreds or thousands of end-customers, tenant isolation is the most architecturally critical piece. Each tenant's archive is logically, and ideally physically, separated. The X-Tenant-ID header routes every operation to the correct isolated space, and retention policies, access controls, and audit logs are all scoped per tenant. One integration at the vendor level then covers the entire customer base with no cross-tenant leakage.
That single-integration, unlimited-tenant model is exactly what makes white-label archiving for software vendors commercially workable.
Client-Side Hashing vs. Server-Side Sealing
There are two integration shapes, and the right one depends on who needs to hold the document.
The most privacy-preserving pattern sends only the document's hash to the archiving API, never the document itself. The hash is enough to anchor proof of existence, and the actual invoice stays inside your infrastructure, encrypted under your own keys. This satisfies the data-minimization principle in GDPR Article 5 while still producing a legally valid integrity proof. It also means a breach at the archiving provider exposes nothing but hashes.
The server-side pattern is the right call when the archive must also store the document for compliant retrieval, for example where the law requires the archive itself to surrender the original on demand. Here AES-256 encryption keeps the stored content inaccessible to anyone without the keys, including the provider's own administrators. Pick the pattern per regulatory requirement, not by default, and the API supports both behind the same seal/verify contract.
Operational Concerns: Retention, Redundancy, and Sovereignty
Sealing is not the whole job. The archive also has to keep the document available and deletable on the right schedule.
Statutory retention periods run for years and vary by country, so the archive must guarantee availability across that whole window, including disaster-recovery scenarios. Geographically redundant storage, automated integrity checks, and documented recovery procedures are the baseline. At the end of the window, automated deletion policies satisfy GDPR's right to erasure without manual intervention, so your platform never holds personal data past its legal justification.
Data residency is the other operational lever. Swiss-based infrastructure carries specific advantages for European data, since the Swiss Federal Act on Data Protection provides strong guarantees and the jurisdiction is stable for long-term retention. That matters for buyers in regulated sectors. For an ERP vendor selling into healthcare, finance, or public administration, the residency of the archive is frequently a hard procurement requirement rather than a nice-to-have.
Build vs. Buy, in One Paragraph
The economics here are not subtle, so I will state them once. Building and certifying your own archive is a multi-year, multi-million-euro effort spanning engineering, documented audit trails, and third-party software audits, plus permanent maintenance as the rules shift. Embedding a pre-certified API is a two-to-four week integration that ships under your own brand. The payoff has three parts, and each has its own deep dive: the full cost comparison lives in build vs buy a compliant archive; the branded multi-tenant productization is covered in white-label archiving for vendors; and how to turn the module into recurring revenue is the subject of a compliance revenue strategy for EDI and accounting platforms. The short version: a specialist carries the certification liability, and you ship a billable feature in weeks. For a fuller treatment, the whitepaper on digital archiving as a strategic advantage is worth a read before you commit either way.
A pre-certified archive also absorbs the regional frameworks so your team does not have to master them. It covers Germany's GoBD and Switzerland's GeBueV without your engineers becoming experts in either, which is the practical reason the build-vs-buy line lands where it does.
Why an API Layer Survives the Next Mandate
The regulatory direction is one-way: more jurisdictions, more mandates, more real-time reporting. The EU's ViDA reform is the clearest signal, introducing continuous transaction controls that will reshape how invoicing platforms report to tax authorities toward the end of the decade.
The integration takeaway is the only part you need to act on: an API layer absorbs new mandates so your core application stays stable. A platform with a rigid, proprietary storage subsystem faces a costly rewrite every time a new country or format enters scope. When the archiving provider ships the compliance change behind the same seal/verify contract, your integration does not move. That decoupling is the durable engineering reason to put retention behind an API rather than inside your codebase, and it is the design behind OriginVault's compliant invoice archiving infrastructure for ERP partners.
Conclusion
The unique work of integrating an e-invoice archiving API is small and well-bounded: authenticate, seal, verify, scope by tenant, and handle the asynchronous queue with idempotent retries and signed webhooks. Get those right and you have a provable archive without owning a single line of certification logic.
That is the whole trade. Instead of a multi-month build and an open-ended compliance liability, you make a handful of REST calls and hand an auditor a cryptographic receipt on demand. Explore how OriginVault's white-label invoice archiving infrastructure can become the retention backbone of your ERP or SaaS platform, sealed, verifiable, and deployable under your own brand.
Thomas Hepp
Co-Founder
Thomas Hepp is the founder of OriginStamp and creator of the OriginStamp timestamp, which has set the standard for tamper-proof blockchain timestamps since 2013. As one of the earliest innovators in the field, he combines deep technical expertise with a pragmatic focus on solving real business problems, and is a recognized voice in blockchain security, AI analytics, and data-driven decision support. His work has earned multiple international awards, including a top Best Project recognition from ETH Zurich and the Swiss Confederation. He publishes regularly on blockchain, AI, and digital innovation.





