Subprocessor Register — Context Pilot Trust Center

1. LLM Inference Providers

At least one LLM provider must be configured for AI inference capabilities. The operator selects their provider and supplies their own API key. Context Pilot supports the following providers. Only the configured provider receives data.

Provider	Endpoint	Data Categories Transmitted	Opt-In Mechanism
Anthropic api.anthropic.com	Direct API	System prompt, conversation messages, code context from open files, tool definitions	`ANTHROPIC_API_KEY`
Anthropic (OAuth) api.claude.ai	Claude Code OAuth	System prompt, conversation messages, code context from open files, tool definitions	Claude Code CLI credentials
OpenAI api.openai.com	Direct API	System prompt, conversation messages, code context from open files, tool definitions	`OPENAI_API_KEY`
Google AI generativelanguage.googleapis.com	Generative Language API	System prompt, conversation messages, code context from open files, tool definitions	`GOOGLE_AI_API_KEY`
Mistral AI api.mistral.ai	Direct API	System prompt, conversation messages, code context from open files, tool definitions	`MISTRAL_API_KEY`
Groq api.groq.com	Direct API	Short prompts for auxiliary inference (soul journal)	`GROQ_API_KEY`
MMiniMax api.minimaxi.com	Direct API	System prompt, conversation messages, code context from open files, tool definitions	`MINIMAX_API_KEY`

2. Auxiliary Services

The following services provide optional capabilities. Each requires explicit operator configuration. None are active by default.

Service	Endpoint	Purpose	Data Categories Transmitted	Opt-In Mechanism
Voyage AI api.voyageai.com	REST API	Text embedding generation for hybrid semantic search (voyage-code-3 model)	Code and text chunks from indexed project files	`VOYAGE_API_KEY`
Brave Search api.search.brave.com	REST API	Web search (result snippets and deep content extraction)	Search query strings	`BRAVE_API_KEY`
Firecrawl api.firecrawl.dev	REST API	Web page scraping, crawling, and structured content extraction	Target URLs for scraping	`FIRECRAWL_API_KEY`
Datalab (Surya) www.datalab.to	REST API	OCR and document-to-text conversion	Document files (PDF, images) submitted for text extraction	`DATALAB_API_KEY`
GitHub api.github.com	REST API (via `gh` CLI)	Repository operations (issues, pull requests, releases)	Git operations scoped to operator's authenticated repositories	`GITHUB_TOKEN`

3. Local Services

The following services run locally on the operator's workstation. No data leaves the machine through these services.

Service	Binding	Purpose	External Traffic
Meilisearch 127.0.0.1 (localhost)	Loopback only	Full-text and semantic project search indexing	None
CConsole Server Unix domain socket	Unix socket (filesystem)	Child process lifecycle management (build commands, shells)	None
OOrchestrator 127.0.0.1:7878	Loopback only	Multi-agent fleet management, SSE streaming, REST API	None
SQLite (Entities) Embedded library	In-process	Structured entity and domain knowledge storage	None

4. Subprocessor Compliance Matrix

Detailed compliance posture for each external subprocessor, based on independent research conducted June 2026. Operators subject to regulatory requirements should verify all claims directly with each provider.

Provider	Trust Center	SOC 2	ISO 27001	DPA	API Data Retention	Training on API Data	Data Location	Risk
Anthropic	trust.anthropic.com	Type II	27001	Auto-included	7 days (ZDR avail.)	No	US (EU via Bedrock/Vertex)	Low
OpenAI	trust.openai.com	Type 2	27001	Public DPA	30 days (ZDR avail.)	No	US (EU avail. Enterprise)	Low
Google AI	Cloud Compliance	Type 2	27001	Cloud DPA	55 days (paid API)	No (paid)	Global (EU via Vertex)	Low
Mistral AI	trust.mistral.ai	Type II	Aligned	Public DPA	30 days	No	EU + US (since Feb 2025)	Medium
Groq	trust.groq.com	Type II	None	Public DPA	Transient (no storage)	N/A	US only	Low
MiniMax	None	None	None	Not public	Not documented	Unclear	China	High
Brave Search	API Security	Type II	Aligned	Public DPA	90 days (query logs)	N/A	US	Low
Firecrawl	Enterprise	Type II	None	Available (not public)	Transient (ZDR default)	N/A	US (self-host avail.)	Low
Voyage AI	None	None	None	Not public	Not documented	Default YES	US	High
Datalab	SOC 2 badge (no portal)	Type II	None	Custom (VPC tier)	Transient (OCR)	Unclear	US (self-host avail.)	Medium
GitHub	trust.github.com	Type 2	27001	Enterprise Agreement	Per service (90d audit logs)	No (gh CLI)	US (EU preview)	Low

5. Detailed Compliance Notes

Material compliance observations for each subprocessor, including certifications, jurisdictional considerations, and known risks. Research conducted June 2026.

Anthropic (Low Risk)

SOC 2 Type II + ISO 27001:2022 + ISO/IEC 42001:2023 (AI Management System). API data retention reduced to 7 days (Sept 2025). Zero Data Retention available for Enterprise. DPA with SCCs auto-included in Commercial Terms. HIPAA BAA available. EU data residency for Team/Enterprise (Aug 2025). FedRAMP in progress. API data never used for training.

OpenAI (Low Risk)

SOC 2 Type 2 + ISO 27001/27017/27018/27701 + CSA STAR + PCI-DSS. Public DPA (effective Dec 2025) and public subprocessor list (updated Feb 2026). 30-day default retention; ZDR approval-gated. EU data residency available for Enterprise/Edu/API Projects (Nov 2025). HIPAA BAA available. AES-256 at rest. EU-US DPF certified.

Google AI — Gemini (Low Risk)

Context Pilot uses the paid Gemini API tier (not free AI Studio). Inherits Google Cloud certifications: SOC 1/2/3, ISO 27001/27017/27018/27701, ISO 42001 (AI Management), FedRAMP High, PCI-DSS v4.0. 55-day retention for abuse monitoring only. Section 17 "Training Restriction" contractually prohibits training. Cloud DPA auto-incorporated for EEA. Note: Free AI Studio tier has no DPA, is prohibited for EEA/UK users, and uses data for training — operators must not use the free tier.

Mistral AI (Medium Risk)

French company (Paris), directly subject to GDPR. SOC 2 Type I+II. ISO 27001/27701 framework alignment (not explicitly certified). Public DPA with SCCs. 30-day API retention, no training. Risks: (1) US processing added Feb 2025 creates CLOUD Act exposure despite EU headquarters; (2) CNIL complaint (Feb 2025) regarding free-tier opt-out difficulty — decision pending. Self-deployment option eliminates both risks.

Groq (Low Risk)

Inference platform (LPU hardware) running third-party open-source models — not a model developer. SOC 2 Type II. Data transient by design (not stored beyond request lifecycle). Formal DPA with SCCs covering GDPR/CCPA/PDPL. US-only (GCP). HIPAA explicitly not available. Training question is N/A since Groq does not train models.

MiniMax (High Risk)

Chinese company (Shanghai/Beijing) subject to PIPL, Cybersecurity Law, and Data Security Law. No trust center, no SOC 2, no ISO 27001, no public DPA, no public subprocessor list. Data retention and training policies not clearly documented. Key risks: Chinese jurisdiction enables government data access under domestic law; no Western-standard certifications; limited public transparency; PIPL cross-border transfer restrictions. Operators handling regulated data should avoid this provider or evaluate jurisdictional exposure carefully.

Brave Search (Low Risk)

SOC 2 Type II (Oct 2025). External pentests (HackerOne). ISO 27001/27701 framework aligned but not yet certified. Public DPA with SCCs — note: DPA does not cover Search Query Data (Brave's legal position: API queries are machine-to-machine, not personal data). 90-day query log retention; ZDR available for Enterprise. Independent 40B+ page search index. TEE for AI features.

Firecrawl (Low Risk)

SOC 2 Type II. Fundamentally transient processing: web pages processed in memory and immediately deleted — zero data retention by default. DPA available but not publicly linked. Full self-host option for air-gapped deployments. 99.9% SLA on Enterprise tier. Context Pilot sends only target URLs; scraped content returned and discarded by Firecrawl.

Voyage AI (High Risk)

No trust center, no SOC 2, no ISO 27001, no public DPA. Critical: Privacy policy grants Voyage AI a "worldwide, irrevocable, perpetual, royalty-free license" to use customer content for training by default. Opt-out requires explicit request. No public subprocessor list. Limited compliance maturity. Mitigating factor: Context Pilot sends only code chunks and log entries for embedding generation (semantic search indexing), not conversation content or prompts. Operators concerned about code exposure should consider disabling the Voyage AI embedder (VOYAGE_API_KEY is optional — keyword search works without it).

Datalab / Surya (Medium Risk)

SOC 2 Type II for Managed Cloud tier. Open-source models (Surya 56.6k stars, Marker 11.1k stars on GitHub) — fully auditable code. Three deployment tiers: Managed Cloud (SOC 2, pay-as-you-go), VPC (AWS/GCP/Azure, custom BAA/DPA), and On-prem/Air-gapped (zero internet). Trusted by Anthropic, Siemens Healthineers, Stanford, MIT. Context Pilot sends documents temporarily for OCR processing. Self-host option eliminates external data transfer entirely.

GitHub (Low Risk)

Microsoft subsidiary. SOC 1/2 Type 2, SOC 3, ISO 27001, CSA CAIQ, FedRAMP authorized. DPA included in Enterprise Customer Agreement; Microsoft DPA apparatus inherited. EU data residency in preview. Context Pilot uses the gh CLI for repository operations (issues, PRs) — not GitHub Copilot. Private repos not used for training without consent. EU-US DPF certified (via Microsoft).

6. Subprocessor Change Log

Material changes to the subprocessor register are documented below. Enterprise customers evaluating Context Pilot for procurement may reference this log to track third-party dependency changes across versions.

Date	Change Type	Service	Details
June 2026	Added	MiniMax	Added as optional LLM provider. Note: Chinese jurisdiction — see Compliance Matrix for risk assessment
June 2026	Enhanced	All providers	Independent compliance research conducted: trust centers, certifications, DPA availability, data retention, training policies documented per subprocessor
June 2026	Added	Groq	Added as optional LLM provider for auxiliary inference workloads
May 2026	Added	Voyage AI	Added for text embedding generation (hybrid search). Optional; keyword search works without it
May 2026	Added	Datalab (Surya)	Added for OCR and document conversion. Optional service
April 2026	Initial	All others	Initial subprocessor register published with Anthropic, OpenAI, Google AI, Mistral, Brave, Firecrawl, GitHub, Meilisearch

7. Important Notice

Operator Responsibility

The operator is solely responsible for evaluating the data handling practices of their selected third-party providers. Context Pilot facilitates connections to these services but does not act as an intermediary, does not negotiate data processing terms on behalf of operators, and does not have access to data transmitted between the operator's workstation and the provider's endpoint.

Operators subject to regulatory requirements (GDPR, HIPAA, CCPA, etc.) should independently verify that their chosen providers offer appropriate data processing agreements, data residency options, and contractual safeguards for their jurisdiction and use case.

8. Vendor Risk Assessment Criteria

The following criteria are evaluated when considering new third-party service integrations for Context Pilot. This framework ensures that any addition to the subprocessor register meets minimum security and privacy standards.

Criterion	Requirement	Weight
API-key authentication	Provider must support API key authentication (no OAuth flows that require Context Pilot to hold session tokens on behalf of users)	Mandatory
TLS encryption	Provider must serve all API endpoints over HTTPS with TLS 1.2 or higher	Mandatory
Published privacy policy	Provider must publish a clear privacy policy describing data handling, retention, and processing purposes	Mandatory
Opt-in activation	Integration must require explicit operator configuration (API key provisioning). No provider may be active by default.	Mandatory
Data minimization	Only data strictly necessary for the service's function is transmitted (e.g., search queries for search, code context for inference)	Mandatory
No training on API data	Provider should not use API-submitted data for model training without explicit opt-in from the operator	Strong preference
DPA availability	Provider should offer a Data Processing Agreement for enterprise customers	Preferred
SOC 2 / ISO 27001	Provider should hold recognized security certifications or demonstrate equivalent controls	Preferred

9. Data Flow Summary

The following diagram summarizes the data flow architecture. All arrows represent operator-initiated, operator-configured connections. No connection exists that is not explicitly listed below.

Operator's Workstation All processes local — no inbound connections

TUI Agent Rust binary

Orchestrator :7878 (loopback)

Web Frontend React / Vite

Meilisearch localhost only

Console Server Unix socket

SQLite Entities Embedded

TLS 1.2+ · Outbound only · Operator-initiated

Configured API Providers Operator's API keys — only configured providers receive data

LLM Inference