Subprocessor Register
This register provides a complete inventory of all third-party services that Context Pilot is capable of communicating with. Context Pilot has no mandatory third-party dependencies beyond a single LLM provider. All other integrations are opt-in and require explicit operator configuration.
1. LLM Inference Providers
At least one LLM provider must be configured for AI inference capabilities. The operator selects their provider and supplies their own API key. Context Pilot supports the following providers. Only the configured provider receives data.
| Provider | Endpoint | Data Categories Transmitted | Opt-In Mechanism |
|---|---|---|---|
|
api.anthropic.com
|
Direct API | System prompt, conversation messages, code context from open files, tool definitions | ANTHROPIC_API_KEY |
|
api.claude.ai
|
Claude Code OAuth | System prompt, conversation messages, code context from open files, tool definitions | Claude Code CLI credentials |
|
api.openai.com
|
Direct API | System prompt, conversation messages, code context from open files, tool definitions | OPENAI_API_KEY |
|
generativelanguage.googleapis.com
|
Generative Language API | System prompt, conversation messages, code context from open files, tool definitions | GOOGLE_AI_API_KEY |
|
api.mistral.ai
|
Direct API | System prompt, conversation messages, code context from open files, tool definitions | MISTRAL_API_KEY |
|
api.groq.com
|
Direct API | Short prompts for auxiliary inference (soul journal) | GROQ_API_KEY |
|
MMiniMax
api.minimaxi.com
|
Direct API | System prompt, conversation messages, code context from open files, tool definitions | MINIMAX_API_KEY |
2. Auxiliary Services
The following services provide optional capabilities. Each requires explicit operator configuration. None are active by default.
| Service | Endpoint | Purpose | Data Categories Transmitted | Opt-In Mechanism |
|---|---|---|---|---|
|
api.voyageai.com
|
REST API | Text embedding generation for hybrid semantic search (voyage-code-3 model) | Code and text chunks from indexed project files | VOYAGE_API_KEY |
|
api.search.brave.com
|
REST API | Web search (result snippets and deep content extraction) | Search query strings | BRAVE_API_KEY |
|
api.firecrawl.dev
|
REST API | Web page scraping, crawling, and structured content extraction | Target URLs for scraping | FIRECRAWL_API_KEY |
|
www.datalab.to
|
REST API | OCR and document-to-text conversion | Document files (PDF, images) submitted for text extraction | DATALAB_API_KEY |
|
api.github.com
|
REST API (via gh CLI) |
Repository operations (issues, pull requests, releases) | Git operations scoped to operator's authenticated repositories | GITHUB_TOKEN |
3. Local Services
The following services run locally on the operator's workstation. No data leaves the machine through these services.
| Service | Binding | Purpose | External Traffic |
|---|---|---|---|
|
127.0.0.1 (localhost)
|
Loopback only | Full-text and semantic project search indexing | None |
|
CConsole Server
Unix domain socket
|
Unix socket (filesystem) | Child process lifecycle management (build commands, shells) | None |
|
OOrchestrator
127.0.0.1:7878
|
Loopback only | Multi-agent fleet management, SSE streaming, REST API | None |
|
Embedded library
|
In-process | Structured entity and domain knowledge storage | None |
4. Subprocessor Compliance Matrix
Detailed compliance posture for each external subprocessor, based on independent research conducted June 2026. Operators subject to regulatory requirements should verify all claims directly with each provider.
| Provider | Trust Center | SOC 2 | ISO 27001 | DPA | API Data Retention | Training on API Data | Data Location | Risk |
|---|---|---|---|---|---|---|---|---|
| Anthropic | trust.anthropic.com | Type II | 27001 | Auto-included | 7 days (ZDR avail.) | No | US (EU via Bedrock/Vertex) | Low |
| OpenAI | trust.openai.com | Type 2 | 27001 | Public DPA | 30 days (ZDR avail.) | No | US (EU avail. Enterprise) | Low |
| Google AI | Cloud Compliance | Type 2 | 27001 | Cloud DPA | 55 days (paid API) | No (paid) | Global (EU via Vertex) | Low |
| Mistral AI | trust.mistral.ai | Type II | Aligned | Public DPA | 30 days | No | EU + US (since Feb 2025) | Medium |
| Groq | trust.groq.com | Type II | None | Public DPA | Transient (no storage) | N/A | US only | Low |
| MiniMax | None | None | None | Not public | Not documented | Unclear | China | High |
| Brave Search | API Security | Type II | Aligned | Public DPA | 90 days (query logs) | N/A | US | Low |
| Firecrawl | Enterprise | Type II | None | Available (not public) | Transient (ZDR default) | N/A | US (self-host avail.) | Low |
| Voyage AI | None | None | None | Not public | Not documented | Default YES | US | High |
| Datalab | SOC 2 badge (no portal) | Type II | None | Custom (VPC tier) | Transient (OCR) | Unclear | US (self-host avail.) | Medium |
| GitHub | trust.github.com | Type 2 | 27001 | Enterprise Agreement | Per service (90d audit logs) | No (gh CLI) | US (EU preview) | Low |
5. Detailed Compliance Notes
Material compliance observations for each subprocessor, including certifications, jurisdictional considerations, and known risks. Research conducted June 2026.
Anthropic (Low Risk)
SOC 2 Type II + ISO 27001:2022 + ISO/IEC 42001:2023 (AI Management System). API data retention reduced to 7 days (Sept 2025). Zero Data Retention available for Enterprise. DPA with SCCs auto-included in Commercial Terms. HIPAA BAA available. EU data residency for Team/Enterprise (Aug 2025). FedRAMP in progress. API data never used for training.
OpenAI (Low Risk)
SOC 2 Type 2 + ISO 27001/27017/27018/27701 + CSA STAR + PCI-DSS. Public DPA (effective Dec 2025) and public subprocessor list (updated Feb 2026). 30-day default retention; ZDR approval-gated. EU data residency available for Enterprise/Edu/API Projects (Nov 2025). HIPAA BAA available. AES-256 at rest. EU-US DPF certified.
Google AI — Gemini (Low Risk)
Context Pilot uses the paid Gemini API tier (not free AI Studio). Inherits Google Cloud certifications: SOC 1/2/3, ISO 27001/27017/27018/27701, ISO 42001 (AI Management), FedRAMP High, PCI-DSS v4.0. 55-day retention for abuse monitoring only. Section 17 "Training Restriction" contractually prohibits training. Cloud DPA auto-incorporated for EEA. Note: Free AI Studio tier has no DPA, is prohibited for EEA/UK users, and uses data for training — operators must not use the free tier.
Mistral AI (Medium Risk)
French company (Paris), directly subject to GDPR. SOC 2 Type I+II. ISO 27001/27701 framework alignment (not explicitly certified). Public DPA with SCCs. 30-day API retention, no training. Risks: (1) US processing added Feb 2025 creates CLOUD Act exposure despite EU headquarters; (2) CNIL complaint (Feb 2025) regarding free-tier opt-out difficulty — decision pending. Self-deployment option eliminates both risks.
Groq (Low Risk)
Inference platform (LPU hardware) running third-party open-source models — not a model developer. SOC 2 Type II. Data transient by design (not stored beyond request lifecycle). Formal DPA with SCCs covering GDPR/CCPA/PDPL. US-only (GCP). HIPAA explicitly not available. Training question is N/A since Groq does not train models.
MiniMax (High Risk)
Chinese company (Shanghai/Beijing) subject to PIPL, Cybersecurity Law, and Data Security Law. No trust center, no SOC 2, no ISO 27001, no public DPA, no public subprocessor list. Data retention and training policies not clearly documented. Key risks: Chinese jurisdiction enables government data access under domestic law; no Western-standard certifications; limited public transparency; PIPL cross-border transfer restrictions. Operators handling regulated data should avoid this provider or evaluate jurisdictional exposure carefully.
Brave Search (Low Risk)
SOC 2 Type II (Oct 2025). External pentests (HackerOne). ISO 27001/27701 framework aligned but not yet certified. Public DPA with SCCs — note: DPA does not cover Search Query Data (Brave's legal position: API queries are machine-to-machine, not personal data). 90-day query log retention; ZDR available for Enterprise. Independent 40B+ page search index. TEE for AI features.
Firecrawl (Low Risk)
SOC 2 Type II. Fundamentally transient processing: web pages processed in memory and immediately deleted — zero data retention by default. DPA available but not publicly linked. Full self-host option for air-gapped deployments. 99.9% SLA on Enterprise tier. Context Pilot sends only target URLs; scraped content returned and discarded by Firecrawl.
Voyage AI (High Risk)
No trust center, no SOC 2, no ISO 27001, no public DPA. Critical: Privacy policy grants Voyage AI a "worldwide, irrevocable, perpetual, royalty-free license" to use customer content for training by default. Opt-out requires explicit request. No public subprocessor list. Limited compliance maturity. Mitigating factor: Context Pilot sends only code chunks and log entries for embedding generation (semantic search indexing), not conversation content or prompts. Operators concerned about code exposure should consider disabling the Voyage AI embedder (VOYAGE_API_KEY is optional — keyword search works without it).
Datalab / Surya (Medium Risk)
SOC 2 Type II for Managed Cloud tier. Open-source models (Surya 56.6k stars, Marker 11.1k stars on GitHub) — fully auditable code. Three deployment tiers: Managed Cloud (SOC 2, pay-as-you-go), VPC (AWS/GCP/Azure, custom BAA/DPA), and On-prem/Air-gapped (zero internet). Trusted by Anthropic, Siemens Healthineers, Stanford, MIT. Context Pilot sends documents temporarily for OCR processing. Self-host option eliminates external data transfer entirely.
GitHub (Low Risk)
Microsoft subsidiary. SOC 1/2 Type 2, SOC 3, ISO 27001, CSA CAIQ, FedRAMP authorized. DPA included in Enterprise Customer Agreement; Microsoft DPA apparatus inherited. EU data residency in preview. Context Pilot uses the gh CLI for repository operations (issues, PRs) — not GitHub Copilot. Private repos not used for training without consent. EU-US DPF certified (via Microsoft).
6. Subprocessor Change Log
Material changes to the subprocessor register are documented below. Enterprise customers evaluating Context Pilot for procurement may reference this log to track third-party dependency changes across versions.
| Date | Change Type | Service | Details |
|---|---|---|---|
| June 2026 | Added | MiniMax | Added as optional LLM provider. Note: Chinese jurisdiction — see Compliance Matrix for risk assessment |
| June 2026 | Enhanced | All providers | Independent compliance research conducted: trust centers, certifications, DPA availability, data retention, training policies documented per subprocessor |
| June 2026 | Added | Groq | Added as optional LLM provider for auxiliary inference workloads |
| May 2026 | Added | Voyage AI | Added for text embedding generation (hybrid search). Optional; keyword search works without it |
| May 2026 | Added | Datalab (Surya) | Added for OCR and document conversion. Optional service |
| April 2026 | Initial | All others | Initial subprocessor register published with Anthropic, OpenAI, Google AI, Mistral, Brave, Firecrawl, GitHub, Meilisearch |
7. Important Notice
Operator Responsibility
The operator is solely responsible for evaluating the data handling practices of their selected third-party providers. Context Pilot facilitates connections to these services but does not act as an intermediary, does not negotiate data processing terms on behalf of operators, and does not have access to data transmitted between the operator's workstation and the provider's endpoint.
Operators subject to regulatory requirements (GDPR, HIPAA, CCPA, etc.) should independently verify that their chosen providers offer appropriate data processing agreements, data residency options, and contractual safeguards for their jurisdiction and use case.
8. Vendor Risk Assessment Criteria
The following criteria are evaluated when considering new third-party service integrations for Context Pilot. This framework ensures that any addition to the subprocessor register meets minimum security and privacy standards.
| Criterion | Requirement | Weight |
|---|---|---|
| API-key authentication | Provider must support API key authentication (no OAuth flows that require Context Pilot to hold session tokens on behalf of users) | Mandatory |
| TLS encryption | Provider must serve all API endpoints over HTTPS with TLS 1.2 or higher | Mandatory |
| Published privacy policy | Provider must publish a clear privacy policy describing data handling, retention, and processing purposes | Mandatory |
| Opt-in activation | Integration must require explicit operator configuration (API key provisioning). No provider may be active by default. | Mandatory |
| Data minimization | Only data strictly necessary for the service's function is transmitted (e.g., search queries for search, code context for inference) | Mandatory |
| No training on API data | Provider should not use API-submitted data for model training without explicit opt-in from the operator | Strong preference |
| DPA availability | Provider should offer a Data Processing Agreement for enterprise customers | Preferred |
| SOC 2 / ISO 27001 | Provider should hold recognized security certifications or demonstrate equivalent controls | Preferred |
9. Data Flow Summary
The following diagram summarizes the data flow architecture. All arrows represent operator-initiated, operator-configured connections. No connection exists that is not explicitly listed below.