Document Parser
Structure extraction
PDF, DOCX, scanned images converted to structured text preserving article and section hierarchy. Vision LLM handles tables and charts.
PDFDOCXOCR
EU-Only Infrastructure| Zero Data Retention| DORA Art. 28-30| GDPR Art. 9| KNF Audit Rights| No US CLOUD Act| Art. 104 Banking Law
Everything you need
AthenaVault processes documents through three layers: parse, anonymize and verify. Each available independently or as a connected pipeline.
Structure extraction
PDF, DOCX, scanned images converted to structured text preserving article and section hierarchy. Vision LLM handles tables and charts.
PDFDOCXOCR
3-layer PII removal
Regex, NLP and contextual LLM working in cascade. PESEL, names, addresses detected and replaced with structured placeholders.
PIIRODOPDF
Epistemological Q&A
Ask questions to regulatory documents. Every answer cited by article, paragraph and document. Every query logged for audit.
DORAGDPRRODO
For technical teams, building on the same infrastructure
Visual pipeline builder
Build machine learning pipelines by dragging blocks onto a canvas. AI assistant generates pipeline from a natural language description.
PipelineAutoML
Custom AI assistants
Connect custom AI assistants to your document corpus. Configure permissions, define behavior, deploy for your teams. Full audit trail included.
AssistantsDocument Q&A
Verifact
Verifact runs 1-5 epistemic reasoning cycles before responding. Every statement is classified by the taxonomy, not by probability, but by evidentiary status. Your auditor will love it.
// Verifact · evidence map
// query · Is the claim covered under §12?
Coverage confirmed under §12 ust. 1
DIRECTLY CITEDDeductible reduces payout to 46 300 PLN
INFERRED FROM CONTEXTInvoice conflicts with claimed drying costs
CONTRADICTION FOUNDNo proof of timely notification
GAP IDENTIFIEDepistemic cycles · ●●●○○ · 3/5
// Anonymizer · 9 PII removed · RODO compliant
Anonymizer
W1 regex catches PESEL and NIP in milliseconds. W2 Polish NLP model finds names and addresses. W3 contextual LLM handles everything the first two layers missed.
The Challenge
Banks, insurers, and law firms operate under legal confidentiality obligations that public cloud AI cannot satisfy. AthenaVault was designed specifically to close that gap.
Public LLMs
ChatGPT, Copilot, and Gemini process your prompts in the United States. Every document you upload is outside your regulatory jurisdiction, subject to US CLOUD Act, with no audit rights for your regulator.
AthenaVault
AthenaVault runs entirely within EU infrastructure. Inference is ephemeral, processed in RAM, never written to storage. Your documents stay inside your regulatory perimeter, always.
Architecture
AthenaVault
AthenaVault is the architectural principle that makes AthenaVault the deployable regulatory AI infrastructure for regulated industries. Every inference runs in ephemeral RAM. No prompt, no output, no document is ever written to storage by the processing layer. When the session ends, nothing remains.
Platform
Three components. Each works standalone or as part of a fully integrated pipeline. Everything runs inside your EU perimeter, in ephemeral memory, with no trace on infrastructure.
001
Structure extraction from any format
Transforms any regulated document into structured, queryable data. Scanned PDFs, complex tables, multi-page contracts, embedded charts. Every section understood, every element placed, regardless of format or quality.
002
Three-layer PII removal before inference
Every document passes three anonymization layers before reaching any model. Personally identifiable information, account numbers and sensitive identifiers are stripped. Your data is analyzed, not exposed.
03 · VERIFACT
Verifact turns your document corpus into a queryable, cited intelligence layer. Every answer references the exact source passage. Every query leaves a permanent audit trail. No hallucinations. No paraphrasing. No liability.
// query
What are the early repayment conditions under Article 7.3 of the credit agreement signed on 12 March 2024?
// response
Pursuant to Article 7.3 of the credit agreement dated 12 March 2024, early repayment is permitted after 6 months from the drawdown date, subject to a prepayment fee of 1.5% of the outstanding principal. No fee applies after the 24th month.
// source
KR-2024-1234.pdf · Art. 7.3 · p. 14 · logged 14:23:01
query logged · audit trail active · DORA Art. 28-ready
Applications
Purpose-built domain applications for regulated industries. Each runs on AthenaVault regulatory infrastructure, EU-only, zero data retention, full audit trail. Deploy directly into your compliance perimeter.
[AML-01]COMING SOON
Transaction monitoring for AMLD6
Suspicious transaction pattern detection calibrated to your institution. Trained on your transaction history, not industry averages. Flags what matters, ignores what doesn't.
AMLD6Real-TimeBanking
[FRD-01]COMING SOON
Anomaly detection on transaction streams
Real-time anomaly detection on payment flows. Learns your institution's normal patterns to minimize false positives on legitimate cross-border activity.
Transaction MonitoringAnomaly DetectionReal-Time
[INS-01]COMING SOON
Actuarial risk scoring on your claims portfolio
Risk scoring for insurance products trained on your historical claims data. Reflects your actual portfolio, not industry benchmarks.
Risk ScoringActuarialClaims
For technical teams
For teams that need custom workflows, domain-specific logic, or proprietary models. Build directly on AthenaVault regulatory infrastructure, without adding any compliance overhead.
[MLL-01]COMING SOON
Visual ML pipeline builder
Design, test and deploy machine learning pipelines with a visual drag-and-drop interface. Build custom models on your own data without infrastructure complexity. Your logic, our compliant infrastructure.
Custom PipelinesVisual BuilderAny Data Type
[LLM-01]COMING SOON
Build and deploy custom AI assistants
Connect custom AI assistants to your document corpus. Configure permissions, define behavior, deploy for your teams. Full audit trail and EU-only inference included.
Custom AssistantsWorkflow AutomationDocument Q&A
Our Models
Four purpose-built models trained on tens of thousands of Polish regulated documents: contracts, rulings, policies, filings. Open-weight foundations, EU-only inference, commercial licenses.
01 / 04
Embedding and reranking
Embedding and reranking models fine-tuned on tens of thousands of query-document pairs from Polish regulated documents. Best-in-class semantic matching accuracy on our internal benchmarks.
SFT · DPOPL · EN · DEEU Self-HostedCommercial License
02 / 04
Sparse lexical retrieval
Precision lexical model for regulated documents, contract numbers, article references, clause identifiers, dates. Anchors retrieval to exact regulatory terminology and document structure.
Lexical PrecisionLegal TerminologyEU Self-HostedCommercial License
03 / 04
Vision-language model for document OCR
Structural extraction from regulated documents, scanned PDFs, complex tables, multi-column layouts, forms. Understands document geometry and extracts structured meaning.
Structural ExtractionTables · FormsEU Self-HostedCommercial License
04 / 04
Reasoning model fine-tuned on Polish regulation
Reasoning model fine-tuned on Polish regulated documents, banking contracts, insurance policies, court rulings, KNF guidelines. Trained using SFT and DPO on curated regulatory corpora.
SFT · DPOPL · EN · DE · FR · ES · ITEU ServerlessCommercial License
Performance
All benchmarks run on Polish regulated document corpora, banking contracts, insurance policies, legal briefs. Real-world accuracy, not lab conditions.
98%
Document accuracy (F1) on our Polish regulatory corpus
<850ms
End-to-end latency from query to a sourced answer
99%
OCR accuracy on scanned regulatory documents
6 lang
Guaranteed quality in PL, EN, DE, FR, ES, IT
Regulatory Compliance
✓ DORA
ICT contract register, documented jurisdictional risk analysis, exit strategy per Art. 28(8), KNF audit rights, included in every enterprise contract as standard.
✓ GDPR / RODO
Health data, legal data, sensitive financial records, all processed under GDPR Art. 9 safeguards. DPA Art. 28 with EU legal entity. Per-client encryption at rest.
✓ Banking
Zero Data Retention means no confidential banking information persists outside your perimeter. Art. 104 and banking secrecy annexes included.
✓ Legal
Zero data retention by the processing infrastructure makes AthenaVault compatible with attorney-client privilege and legal professional secrecy under Polish law.
✓ Audit
Contractual audit chain: your regulator → you → xdivision. → infrastructure. KNF audit rights guaranteed. Documented sub-processor chain with EU legal entities at every level.
✓ AI Act
AthenaVault operates with complete AI system documentation as required for high-risk AI deployments, technical documentation, conformity assessment, human oversight controls.
Built for Your Industry
sector 01
Credit documentation analysis, KYC/AML document review, regulatory filings, internal procedure Q&A, under Art. 104 and DORA.
DORAArt. 104KNFRek. D
→sector 02
Claims documentation analysis, policy review, underwriting documents, DORA compliance for insurers, under GDPR Art. 9 for health data.
DORAGDPR Art. 9KNFPIU
→sector 03
M&A due diligence, contract analysis, case law research, client Q&A, with full attorney-client privilege protection and AI Act documentation.
Attorney PrivilegeRODOAI Act
→Founding Partner Program
Fixed-scope engagement on your real documents. Deliverable: performance report on your corpus, compliance documentation package, and a clear decision point for rollout.
Founding partner pricing, a 6-week fixed scope, EU-sovereign from day one. From xdivision in Warsaw.