The Problem
Karatage's deal knowledge is trapped in a large OneDrive folder structure. Thousands of documents — term sheets, financial statements, board resolutions, DD reports, NDAs — sit in company-grouped folders. Getting an answer means digging through folders manually.
What’s Missing
- ✗ No single source of truth across deals
- ✗ No way to ask questions without manual search
- ✗ No structured view of deal pipelines
- ✗ No audit trail of how information was derived
What We Want
- ✓ System watches OneDrive automatically
- ✓ Reads & understands every document
- ✓ Builds a queryable knowledge base
- ✓ Answers questions via WhatsApp
Solution Overview
Karatage Brain is purpose-built for PE/VC deal operations. It watches your existing OneDrive workflow, reads every document, builds structured state, and lets the team query it naturally via WhatsApp or a web console.
flowchart LR
OD["OneDrive\n(Source of Truth)"]
Mirror["Local Mirror"]
Pipeline["Pipeline\n6 Stages"]
DB["PostgreSQL\n+ pgvector\n(The Brain)"]
WA["WhatsApp"]
Web["Web Console"]
CLI["CLI"]
OD -->|"Microsoft Graph\nDelta Sync"| Mirror
Mirror -->|"6-stage pipeline"| Pipeline
Pipeline --> DB
DB --> WA
DB --> Web
DB --> CLI
style OD fill:#e0effe,stroke:#0073c5,color:#064e84
style DB fill:#f0fdf4,stroke:#16a34a,color:#166534
style Pipeline fill:#fef9c3,stroke:#ca8a04,color:#854d0e
Core Principle
Watch-and-surface, operator-decides. The system reads documents and proposes structured state. It never silently fabricates data. When it’s uncertain, it surfaces a proposal for a human to resolve.
Architecture
graph TB
subgraph office["Office (Private Network)"]
subgraph app["Application Server"]
subgraph docker["Docker Compose"]
pg["PostgreSQL 16\n+ pgvector\n+ pg_trgm"]
parser["Parser\nSidecar"]
api["API\n(FastAPI)"]
worker["Worker\n(Procrastinate)"]
web["Web UI\n(Next.js)"]
waha["WAHA\n(WhatsApp)"]
end
mirror["OneDrive Mirror"]
cli["brain CLI"]
end
ai["AI Server\n(vLLM / Ollama)\nLocal GPU inference"]
worker --> ai
api --> ai
end
onedrive["OneDrive (M365)"]
entra["Entra IDP"]
cf["Cloudflare Access"]
users["Team"]
onedrive -->|"Graph API\nDelta Queries"| mirror
mirror --> worker
cli --> pg
api --> pg
worker --> pg
worker --> parser
api --> parser
web --> api
waha --> api
users --> cf
cf --> entra
cf --> web
style pg fill:#d1fae5,stroke:#059669
style ai fill:#fef9c3,stroke:#ca8a04
style docker fill:#f0f9ff,stroke:#0284c7
style app fill:#fafafa,stroke:#d4d4d4
style office fill:#f0fdf4,stroke:#16a34a
Service Responsibilities
PostgreSQL
All state. Typed schema + pgvector embeddings + full-text search. Single database, no external deps.
Parser Sidecar
Stateless. Converts bytes → text. PDF, Office, images. Vision LLM for OCR on scanned docs.
API (FastAPI)
HTTP API. Serves web UI, WhatsApp webhook, RAG queries. Enqueues async jobs.
Worker
Procrastinate async jobs. Pipeline stages, OneDrive sync, embedding backfills.
Web (Next.js)
Operator console. Dashboard, entity browsers, pipeline queues, ask-the-brain.
WAHA
Self-hosted WhatsApp gateway. One dedicated number. Bridges WhatsApp ↔ HTTP.
AI Server
Dedicated GPU hardware on the private network. Serves all LLM inference (classify, extract, embed, OCR, RAG). OpenAI-compatible API.
The Ingestion Pipeline
Six stages transform unstructured documents into typed, queryable state. Each stage is independently cached and re-runnable.
flowchart TD
mirror["OneDrive Mirror"] --> discover
discover["DISCOVER\nSHA-256 hash = identity\nNew hash? Create row.\nKnown hash? Skip."]
parse["PARSE\nParser sidecar: PDF, DOCX,\nXLSX, images to text chunks\nwith page locators"]
classify["CLASSIFY\nLLM scores document against\nall known types.\nThreshold: min 0.80, margin min 0.20"]
adjudicate["ADJUDICATE\nPair-specific LLM call\nto disambiguate"]
extract["EXTRACT\nPer-type structured extraction\nLLM to Pydantic model to JSON\nwith per-field confidence"]
resolve["RESOLVE\nMap names/IDs to canonical\ndatabase entities.\nAmbiguous = proposal"]
apply["APPLY\nStratified fixpoint loop.\nPreconditions + effects.\nLoop until convergence."]
unknown["UNKNOWN\nOperator triage queue"]
discover --> parse --> classify
classify -->|"Confident"| extract
classify -->|"Below threshold"| unknown
classify -->|"Margin too narrow"| adjudicate
adjudicate -->|"Resolved"| extract
adjudicate -->|"Still unclear"| unknown
extract --> resolve --> apply
style discover fill:#e0effe,stroke:#0073c5
style parse fill:#e0effe,stroke:#0073c5
style classify fill:#fef9c3,stroke:#ca8a04
style adjudicate fill:#fef3c7,stroke:#d97706
style extract fill:#d1fae5,stroke:#059669
style resolve fill:#d1fae5,stroke:#059669
style apply fill:#d1fae5,stroke:#059669
style unknown fill:#fee2e2,stroke:#dc2626
Stage Caching (Make-Style Invalidation)
Each stage writes output to columnar caches on the documents row. Re-runs only when the version changes.
| Stage | Cached Columns | Version Key | Invalidation Trigger |
|---|---|---|---|
| Parse | parsed_at, parser_version, parse_output | PARSER_VERSION constant | Bump the constant |
| Classify | classified_at, classifier_version, document_type | SHA-256 of all classifier profiles | Edit any profile |
| Extract | extracted_at, extractor_version, extraction_result | SHA-256 of per-type profile | Edit the type’s profile |
| Apply | application_status | (none — runs every eligible doc) | Every fixpoint pass |
Document Identity: SHA-256 Content Hash
Every document is identified by the SHA-256 hash of its file bytes — not its filename or path. A renamed file is the same document. A file copied to a different folder is the same document. Weekly full syncs skip all unchanged files. Deduplication is automatic.
OneDrive Integration
sequenceDiagram
participant Op as Operator
participant CLI as brain CLI
participant Graph as Microsoft Graph API
participant Mirror as Local Mirror
participant DB as PostgreSQL
participant Pipeline as Pipeline
Op->>CLI: brain sync
CLI->>DB: Load delta token
DB-->>CLI: token (or null for first sync)
alt First sync (no token)
CLI->>Graph: GET /delta (full tree)
Graph-->>CLI: All files + deltaLink
else Incremental sync
CLI->>Graph: GET {deltaLink}
Graph-->>CLI: Changed files + new deltaLink
end
loop Each changed file
CLI->>Graph: Download file
Graph-->>CLI: File bytes
CLI->>Mirror: Write to mirror/
end
CLI->>DB: Store new delta token
CLI->>Pipeline: Trigger ingestion on changed files
Pipeline->>DB: discover, parse, classify, ...
Why Manual Trigger?
The OneDrive corpus changes slowly — a few documents per week, not per minute. Continuous polling would be complexity without payoff.
- ✓ Operator runs
brain syncfrom CLI or WhatsApp - ✓ Delta queries ensure only changed files download
- ✓ Content hash catches anything delta missed
- ✓ Can upgrade to webhook-driven sync later
Resilience
- 🛡️ Crash recovery: Delta token persists. Next run picks up where it left off.
- 🛡️ Download failure: File skipped, retried next cycle.
- 🛡️ Token expiry (~90 days): Falls back to full sync. Content hash prevents wasted reprocessing.
- 🛡️ OneDrive is source of truth: Local mirror is read-only derivative.
Entity Model
The typed schema captures the PE/VC deal domain. Core entities are linked by relationship tables that model the real-world connections.
erDiagram
FUND ||--o{ INVESTMENT : "invests via"
DEAL ||--o{ INVESTMENT : "facilitates"
COMPANY ||--o{ INVESTMENT : "receives"
FUND {
uuid id PK
string name
int vintage_year
numeric fund_size
string status
}
DEAL ||--o{ DEAL_PARTY : "involves"
DEAL ||--o{ DEAL_MILESTONE : "tracks"
DEAL ||--o{ CONTRACT : "has"
DEAL ||--o{ DD_ITEM : "requires"
DEAL }o--|| COMPANY : "targets"
DEAL {
uuid id PK
string name
string deal_type
string status
numeric deal_value
date signed_on
}
COMPANY ||--o{ FINANCIAL_DATA : "reports"
COMPANY ||--o{ DEAL_PARTY : "participates"
CONTACT ||--o{ DEAL_PARTY : "participates"
CONTACT }o--o| COMPANY : "works at"
COMPANY {
uuid id PK
string name
string registration_number
string country
string industry
}
CONTACT {
uuid id PK
string full_name
string email
string phone
string role_type
}
Canonical Identity Challenge
Unlike regulatory compliance (where entities have government-issued IDs), the PE/VC domain has weaker identifiers:
| Entity | Strong ID | Fallback | Resolution Strategy |
|---|---|---|---|
| Company | Registration number | Name + country | Operator confirms via proposal |
| Contact | Email address | Name + company affiliation | Operator confirms via proposal |
| Deal | — | Operator-assigned slug | Folder name as strong hint |
| Fund | — | Operator-assigned name | Exact match only |
The system never guesses identity. When it can’t match with confidence, it creates a proposal for the operator to resolve. This prevents the worst outcome: silently linking the wrong entities.
Discovery-First Schema Design
This entity model is hypothesised. Phase 0 includes a deliberate discovery step: explore the actual OneDrive corpus, classify sample documents, identify what entities and document types exist, and design the schema from evidence — not speculation.
Classification & Extraction
Classification
The classifier determines what kind of document each file is. It uses markdown profiles — one per document type — assembled into a single LLM prompt.
# term_sheet.md
What it is: Binding or non-binding offer
Signals: “term sheet”, “indicative offer”, purchase price, conditions precedent
Distinguish from: LOI, SPA, MOU
Decision Gate
Classified
Score ≥ 0.80 AND gap to 2nd ≥ 0.20
Adjudicator
Margin too narrow → pair-specific disambiguation
Unknown
Below threshold → operator triage queue
▶ The Adjudicator — How It Resolves Close Calls
When the classifier can’t decide between two types (both score high, but the gap is too narrow), a specialised adjudicator fires for that specific pair.
flowchart LR
C["Classifier Output\nbank_confirmation: 0.82\nproof_of_address: 0.75\nGap: 0.07 < 0.20"]
A["Adjudicator\nbank_vs_proof_of_address\n\n'Does this confirm banking\ndetails or merely an address?'"]
R["Resolved:\nbank_confirmation (0.91)"]
C -->|"Margin fail"| A -->|"Sharp question\nsharp answer"| R
style A fill:#fef3c7,stroke:#d97706
style R fill:#d1fae5,stroke:#059669
Why not just make the classifier better? The classifier sees ~30 types simultaneously. Adjudicators see exactly two. Sharper question → sharper answer. Cheaper and more reliable than one mega-prompt handling every pairwise confusion.
Four Document-Type Shapes
Not every document needs full extraction. Four levels of processing:
Marketing brochure, duplicate cover letter. Not worth processing.
NDA (link to company), CV (link to contact). Shows on entity profiles. 30 min to add.
DD report (findings stored as JSON, not typed rows). Visible + searchable, no schema change.
Term sheet → creates Deal + Contract + Milestones. Full pipeline. ~1 day to add.
The Stratified Fixpoint
The Problem: Out-of-Order Documents
Documents arrive in unpredictable order. A board resolution approving a deal might be ingested before the term sheet that creates the deal record. Naïve approaches (ordered processing, retry queues) add complexity and don’t converge.
The Solution
Borrowed from Datalog evaluation. Each applier declares preconditions and effects. The apply stage runs in a loop until convergence.
flowchart TD
subgraph R1["Round 1"]
ts1["Term Sheet\nPrecondition: none\nCreates Deal 'Acme'\nCreates Company 'Acme Corp'"]
br1["Board Resolution\nPrecondition: Deal exists\nDeal doesn't exist yet\nStatus: PENDING"]
end
subgraph R2["Round 2"]
br2["Board Resolution\nPrecondition: Deal exists\nDeal now exists!\nCreates Milestone 'board_approval'"]
end
subgraph R3["Round 3"]
conv["No changes = CONVERGED"]
end
R1 --> R2 --> R3
style ts1 fill:#d1fae5,stroke:#059669
style br1 fill:#fef3c7,stroke:#d97706
style br2 fill:#d1fae5,stroke:#059669
style conv fill:#e0effe,stroke:#0073c5
Self-Healing
New documents that satisfy blocked preconditions automatically unlock blocked docs in the next pass. No manual intervention.
Convergence
Max N passes (default 10). Each pass only adds state (monotonic). No changes = converged. Still pending after convergence = blocked.
Status Machine
extracted → pending → applied | blocked | rejected
RAG — Answering Questions
sequenceDiagram
participant U as User (WhatsApp/Web)
participant P as Phase 1: PLANNER
participant D as Phase 2: DISPATCH
participant S as Phase 3: SYNTHESIZER
U->>P: "What's the status of the Acme deal?"
Note over P: LLM call #1 (JSON mode)
P->>P: Maps to tool: deal_status
P->>P: Params: {kind: "deal", name: "Acme"}
P->>D: {intent, tool, params}
Note over D: Deterministic code (no LLM)
D->>D: Resolve "Acme" → Deal ID
D->>D: Run deal_status tool → structured data
D->>D: Retrieve top-k chunks (vector + lexical)
D->>S: tool_results + retrieved_chunks
Note over S: LLM call #2 (plain text)
S->>S: Write grounded answer
S->>S: Cite only provided documents
S->>U: "The Acme acquisition is in due diligence.\nTerm sheet signed 2026-03-15 for R50M.\n[doc:a1b2c3]"
Why Three Phases?
| Approach | Problem |
|---|---|
| Single LLM call | Can’t do structured lookups. Hallucinates data. No verifiable citations. |
| LLM + function calling | Model picks wrong tools, retries burn tokens, latency spikes. |
| Planner → Code → Synthesizer | Each phase is constrained. Planner can’t access data. Dispatch is deterministic. Synthesizer only writes from provided facts. |
Available Tools
deal_statusStage, value, key dates
deal_partiesCompanies & contacts with roles
deal_timelineMilestones: done, pending, overdue
company_profileRegistration, industry, deals
fund_portfolioAll investments in a fund
financials_forRevenue, EBITDA, valuations
dd_statusDD progress, findings, risk
contracts_forAgreements: type, status, dates
whats_outstandingPending items, overdue milestones
▶ Hybrid Retrieval: Vector + Lexical (RRF)
Retrieval combines two search modes via Reciprocal Rank Fusion — neither vector nor keyword search alone is sufficient.
flowchart LR
Q["Query: 'Acme financials'"]
V["Vector Search\nEmbed query,\ncosine similarity\nover chunks"]
L["Lexical Search\nFull-text match\non document_type + path"]
RRF["Reciprocal Rank\nFusion (RRF)\nscore = sum 1/(K+rank)\nK = 60"]
R["Top-k chunks\nwith doc IDs\n+ page locators"]
Q --> V & L
V & L --> RRF --> R
style RRF fill:#e0effe,stroke:#0073c5
Entity-scoped retrieval: When the planner resolves a specific entity, search is scoped to documents linked to that entity via document_subjects. Prevents leakage between deals.
WhatsApp Integration
sequenceDiagram
participant WA as WhatsApp User
participant WAHA as WAHA Gateway
participant API as Brain API
participant RAG as RAG Agent
participant DB as PostgreSQL
WA->>WAHA: Send message
WAHA->>API: POST /whatsapp/webhook (HMAC-SHA512)
API->>API: Verify HMAC signature
API->>DB: Resolve sender → known contact
API->>DB: Check wa_operators allowlist
alt Authorized operator
API->>RAG: Process question
RAG->>DB: Plan → Dispatch → Retrieve
RAG-->>API: Grounded answer + citations
API->>WAHA: Send reply
WAHA->>WA: Deliver answer
else Unknown sender
API->>DB: Store message (no answer)
end
Self-Hosted
WAHA runs locally. No Meta Business API approval. Data stays on our hardware. One dedicated phone number.
Group Intelligence
In group chats, bot only responds when @mentioned. DMs always answered (if authorized).
Media Ingestion
Documents sent via WhatsApp (PDFs, photos of contracts) are ingested into the pipeline.
User Experience
Three surfaces for interacting with the Brain: a desktop web console for operators, a mobile-responsive view for on-the-go access, and WhatsApp for instant Q&A with rich deal summaries.
Desktop — Operator Dashboard
Desktop — Deal Profile
Mobile & WhatsApp
Mobile-responsive dashboard
WhatsApp Q&A with rich deal summaries
Example Interactions
Questions via WhatsApp
“What deals are in due diligence?”
Lists all active DD deals with target companies, values, and days in DD.
“Send me the Phoenix deal summary”
Returns a formatted deal card with status, milestones, parties, and outstanding items.
“What’s outstanding across all deals?”
Aggregates overdue milestones, pending DD items, and expiring contracts across the portfolio.
“Who is the legal advisor on Greenfield?”
Returns the contact and firm with a link to the engagement letter.
Actions via WhatsApp
“Sync”
Triggers an OneDrive sync. Reports back with how many new files were found and processed.
[sends a PDF via WhatsApp]
Document is ingested immediately. Classified, extracted, and linked to the relevant deal.
“What’s blocked?”
Lists documents stuck in the pipeline with reasons (missing entity, ambiguous classification).
“Weekly digest”
Summary of new documents ingested, deals that changed status, and items needing attention.
Auth & Access Control
flowchart LR
U["Team Members"]
CF["Cloudflare Access"]
E["Entra IDP\n(Azure AD)"]
JWT["JWT Token"]
API["Brain API"]
WA["WhatsApp"]
HMAC["HMAC-SHA512\nVerification"]
U -->|"Web / API"| CF
CF --> E
E --> JWT
JWT --> API
WA -->|"Webhook"| HMAC
HMAC --> API
style CF fill:#f0f9ff,stroke:#0284c7
style E fill:#e0effe,stroke:#0073c5
Web & API
Cloudflare Access with Entra IDP (Azure AD). Zero-trust. No VPN. Existing Microsoft identity. Group-based access control.
Separate path. HMAC-verified webhooks. Phone number allowlist (wa_operators). No Cloudflare in this path.
Tech Stack
| Layer | Technology | Why |
|---|---|---|
| Backend | Python 3.12 / FastAPI | Async-first. Strong LLM ecosystem. Production-grade. |
| Database | PostgreSQL 16 + pgvector | One database for everything: schema, vectors, FTS, job queue. |
| Job Queue | Procrastinate | Postgres-native. No Redis/RabbitMQ. Transactional guarantees. |
| Frontend | Next.js 15 / React / Tailwind | Server components. Radix UI. Fast iteration. |
| WAHA (self-hosted) | Free core. No Meta approval. Self-hosted = data stays local. | |
| LLM | Self-hosted (vLLM / Ollama) | All inference on local hardware. No data leaves the network. Gemma / Llama class models for classify/extract. Local embedding model for vectors. |
| OneDrive | Microsoft Graph SDK | Official SDK. Delta queries. Client credentials flow. |
| Auth | Cloudflare Access + Entra | Zero-trust. Existing Microsoft identity. |
| Deploy | Docker Compose / self-hosted server | Simple. Office server (Mac mini or similar). Data never leaves the building. No cloud infra. |
▶ Why Not…?
| Alternative | Why Not |
|---|---|
| Cloud (AWS/GCP) | Overkill. Single team. Office server sufficient. Data stays local. |
| Pinecone / Weaviate | pgvector is fine for <100K chunks. One less service. |
| Celery / Redis | Procrastinate uses Postgres. One less service. |
| LangChain | Too abstract. Direct LLM calls are simpler and debuggable. |
| Cloud LLM APIs (OpenAI, etc.) | Deal documents are confidential. Local inference = zero data exfiltration risk. No per-token cost at scale. |
| Fine-tuned models | Prompt-based is sufficient. Profiles editable by operators. |
Local AI Inference
All AI inference runs on a dedicated server inside the office. No document text, no extracted data, and no queries ever leave the private network.
flowchart LR
subgraph office["Office Private Network"]
app["Brain\n(Application Server)"]
ai["AI Server\n(GPU)"]
app -->|"classify / extract\n/ embed / OCR"| ai
ai -->|"structured output"| app
end
internet["Public Internet"]
app -.->|"OneDrive sync only\n(file download)"| internet
style office fill:#f0fdf4,stroke:#16a34a
style ai fill:#fef9c3,stroke:#ca8a04
style internet fill:#fee2e2,stroke:#dc2626
Zero Data Exfiltration
Deal documents are confidential. With local inference, document text is never sent to a third-party API. The only outbound traffic is OneDrive file downloads and (optionally) model weight updates.
AI Server Roles
| Role | Model Class | Serving | Notes |
|---|---|---|---|
| Classifier | Gemma 3 27B / Llama 3.1 8B | vLLM or Ollama | Scores documents against type profiles. ~30 types per prompt. |
| Extractor | Gemma 3 27B / Llama 3.1 70B | vLLM or Ollama | Per-type structured extraction. JSON mode output. |
| Embedder | BGE / Nomic Embed / E5 | Sentence Transformers | 768-dim vectors for document chunks. Batch processing. |
| OCR | Gemma 4 / Llama Vision | vLLM | Vision model for scanned documents and images. |
| RAG Synthesizer | Gemma 3 27B / Llama 3.1 70B | vLLM or Ollama | Generates grounded answers from retrieved context. |
Hardware Options
M2 Ultra or M4 Max with 192GB+ unified memory. Runs 70B models comfortably via MLX or Ollama. Silent. Low power. Fits on a shelf.
Unified memory = no GPU VRAM bottleneck. Entire model in memory.
RTX 4090 (24GB) or A6000 (48GB). vLLM with CUDA. Higher throughput for batch workloads. Standard MLOps tooling.
Requires active cooling. Higher power draw. Better batch throughput.
Privacy
No API calls. No data leaving the network. Full control over model versions and behavior.
Cost
One-time hardware investment. No per-token charges. Thousands of documents processed for the cost of electricity.
Flexibility
Swap models freely. Test new releases same day. No vendor lock-in. OpenAI-compatible API (vLLM/Ollama) means the application code doesn’t change.
Key Design Decisions
▶ 1. Content-Hash Document Identity
Decision: SHA-256 of file bytes is the unique identifier, not filename or path.
Why: Files get renamed, moved, duplicated. “Term Sheet v2 FINAL (2).pdf” is the same document as “Term Sheet v2 FINAL.pdf” if the bytes are identical. Hash identity = zero wasted reprocessing on renames, automatic deduplication.
▶ 2. Watch-and-Surface, Operator-Decides
Decision: The system proposes, never acts unilaterally on uncertain data.
Why: In PE/VC, linking the wrong entity to a deal has real consequences. Three operator queues:
- Blocked — preconditions not met (self-healing when resolved)
- Proposals — uncertain entity resolutions needing human judgement
- Errors — pipeline failures needing technical attention
▶ 3. Markdown Classifier Profiles (Not Code)
Decision: Document type signals described in markdown files, not Python code.
Why: Operators can edit profiles without touching code. Adding a new document type is a 30-minute task. Edits auto-trigger reclassification. Fastest path from “we found a new doc type” to “the system handles it.”
▶ 4. Stratified Fixpoint (Not Retry Queues)
Decision: One mechanism (fixpoint loop) handles all temporal dependencies.
Why: Retry queues are ad-hoc — per-type retry logic, dead-letter queues, manual reprocessing. The fixpoint is one mechanism: out-of-order docs, missing references, cascading creation. Self-healing. Convergence guaranteed.
▶ 5. Three-Phase RAG (Not Single-Call)
Decision: Separate planning, data retrieval, and answer synthesis into three phases.
Why: Single-call RAG hallucinates. It invents data that sounds right but isn’t in the documents. Three phases: planner can’t access data, dispatch is deterministic code, synthesizer only writes from provided facts. Citations are verifiable.
▶ 6. OneDrive as Source of Truth
Decision: OneDrive is canonical. Brain is a read-only derivative.
Why: The team already works in OneDrive. Asking them to upload to a separate system = friction that kills adoption. Brain watches their existing workflow — the “ghost in the machine” principle.
▶ 7. Office-Hosted, Air-Gapped from the Internet
Decision: Self-hosted on a server inside the office, isolated from the public internet.
Why: Deal documents are sensitive. The server (Mac mini or similar) lives inside the office on the private network. Data never leaves the building. Docker Compose keeps operations simple. Architecture ports to any Docker host if scale demands change.
▶ 8. Local AI Inference (Not Cloud APIs)
Decision: All LLM inference runs on a dedicated server inside the office. No document text sent to external APIs.
Why: Deal documents are confidential. Cloud LLM APIs mean every document, every query, every extracted fact transits a third party’s infrastructure. Local inference eliminates that risk entirely. One-time hardware cost replaces ongoing per-token charges. Models are swappable without code changes (OpenAI-compatible serving via vLLM/Ollama).
Implementation Phases
Foundation + Discovery
Weeks 1–2Stand up infrastructure. Explore the corpus. Design entity schema from evidence.
Core Pipeline
Weeks 3–5Ingest the full corpus through the pipeline.
RAG + WhatsApp
Weeks 6–7The team can ask the brain questions via WhatsApp.
🎯 First value delivery: team asks questions via WhatsApp and gets grounded, cited answers about any deal.
Operator Console
Week 8+Web UI for reviewing pipeline output and browsing entities.
Risk & Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
| OneDrive API issues | Sync breaks | Delta tokens resilient. Full-sync fallback. Content hash = cheap reprocessing. |
| LLM classification accuracy | Wrong types → wrong extraction | Threshold gate. Operator triage queue. Adjudicators for common confusions. |
| Entity resolution ambiguity | Wrong entities linked | Never-guess principle. Proposals queue. Canonical IDs where available. |
| Sensitive data exposure | Deal data leaked | Office-hosted server on private network. All inference local. Cloudflare + Entra auth. HMAC webhooks. |
| Server failure | System down | Docker volumes are only state. Backup + restore on any Docker host. pg_dump. |
| Corpus too large | First ingest takes days | 8–24 parallel docs. Content hash skip. Process incrementally by folder. |
| Schema wrong | Rework needed | Phase 0 discovery = schema from evidence, not speculation. Migrations evolve. |