Skip to main contentIt starts with the product
Every technical decision we make flows from our product vision.
Quest is a personal AI mentor that tells you what to do, and what not to do.
Unlike first-generation AI products that sit idle waiting for prompts, Quest is an ambient AI agent: it holds your full context, observes continuously, and acts proactively.
Not an assistant that waits for instructions, but a mentor in your corner who tells you what’s actually worth doing.
To deliver on this vision, Quest must be:
-
Deeply personal:
It must truly know the user, which is why Memory is a key part of the product.
-
Ambient:
It must continuously capture and update user context in the background.
Which means it needs to be multi-modal — accepting any content type as context (text, images, audio, video) — and multi-platform — capturing context from anywhere (iOS, macOS, wearables, browser).
-
Proactive:
It must continuously observe how user context evolves and decide when to step in.
-
Fast and natural:
It must feel instantly available, as a constant, trusted presence.
-
Extensible:
Quest’s architecture must make it trivial to add new app integrations as sources of user context.
-
Scalable with models:
As any other AI product, it must improve as model capabilities improve.
We are betting on model scaling laws.
These principles drive the technical bets we’re making today.
Current architecture
Our architecture has three main components: a per-user Sandbox that stores user data and runs our agent, a central Server that orchestrates sandboxes and handles iMessage integration, and an iOS client that surfaces Quests to the user.
Sandbox
Each user gets an isolated compute environment: their own filesystem, their own agent process, their own persistent state.
Think of it as a personal computer in the cloud, dedicated to a single user.
Why sandboxing is non-negotiable:
-
Performance:
User data lives in the local filesystem, enabling fast agentic search (see Memory).
-
Model scalability:
Agents operate in a contained environment where we can safely expose powerful tools (raw SQL queries, code execution, filesystem access) without risk of cross-user contamination.
-
Experimentation:
Sandboxes are trivially easy to snapshot and replicate.
We can spin up identical copies of a user’s environment with different model tuning for evals.
-
Privacy:
True isolation by design, not by policy.
The sandbox runs a webserver that exposes artifacts (Quests) and the memory system via a REST API, and provides a WebSocket interface for real-time chat with the mentor agent.
Sandboxes shut down after ~15s of network inactivity, except when certain background processes are active, and resume instantly on any incoming request.
Server
The server is deliberately thin: it just orchestrates, as most compute happens in sandboxes.
It is notably responsible for:
-
Sandbox lifecycle:
Creates sandboxes on user signup, authenticates iOS clients, and hands out credentials so clients can interact directly to their sandbox.
-
Integration gateway:
Authenticates to third-party services and passes credentials to sandboxes.
The server never stores integration credentials, it’s just a passthrough.
-
iMessage bridge:
Today, all mentor conversations happen via iMessage through Linq.
Rather than registering Linq webhooks to every sandbox, the server receives messages and proxies them to the appropriate sandbox over WebSocket.
Although, by design, clients could also chat directly with the mentor via the same
/chat API.
iOS client
Today, the client is minimal: it handles authentication (Google sign-in) and surfaces artifacts from the mentor (daily Quests, long-term Quests).
It aims to be much more powerful in the future via Morphing UI: dynamically generated interfaces tailored to each Quest.
A fitness Quest gets a workout tracker, a writing Quest gets a word count dashboard.
The mentor doesn’t just tell you what to do, it gives you the right tool to do it.
Tech Stack
We are early adopters of the latest tech, not because it’s fancy, but because it matches our product requirements and gives us an unfair advantage over more established products and bigger companies.
-
Blaxel sandboxes:
Two main reasons we chose Blaxel over more popular alternatives (such as E2B or Daytona):
-
Persistent sandboxes:
Without that, you’d need to backup and reupload (or mount) user data on startup.
This works but download has slow kickoff time, and mounting doesn’t guarantee fast operations on user data (grep and SQLite queries are much slower on a gfuse-mounted partition).
-
Super fast startup:
25ms vs 200ms for E2B.
-
TypeScript:
Better performance than Python, plus fewer languages to maintain when we’ll build the webapp, and thus better for coding agents working across our monorepo.
-
Bun with
bun:sqlite and native WebSocket (plus Zod and Drizzle):
Fast startup time is a key metric for us and Bun starts much faster than Node (10ms vs 45ms).
It may not sound like much, but latency adds up across sandbox start, webserver init, agent response, Linq roundtrip, WebSocket overhead.
bun:sqlite and native WebSocket are also significantly faster than alternatives.
-
Claude Agent SDK:
This is the backend of Claude Code and the best starting point for building an agent with tool use and skills.
Claude currently produces the most natural conversational output, which matters for a mentor product.
We may remove this dependency eventually, but it’s serving us well for now.
-
SwiftUI:
Might be controversial, but with coding agents, time to proficiency in a new language is low and native feel outperforms cross-platform alternatives.
We’ll likely still use React Native (or similar) for AI-generated components.
Mentor agent
As mentioned above, we use the Claude Agent SDK as the foundation.
While the agent and its system prompt (which we customize) matter, what really makes the mentor are the tools and skills.
Tools are MCP servers, local and remote.
In our view, the more powerful and flexible the tools, the more powerful the agent.
This is notably why we give the agent the ability to write its own scripts as well as direct SQL access to memory (read-only) and workspace (read-write for generated artifacts like Quests).
And models are better at raw SQL than custom APIs! This is obviously where sandboxes are critical.
We already integrate several tools and plan to add many more: social media scraping, location retrieval…
Skills provide instructions on how to use tools for specific tasks: building user profiles, generating Quest plans, suggesting daily Quests…
They bridge tools to user value.
Memory
Every LLM call should be a freshly computed projection against a durable state (here, our user data).
You can have access to all the user data and still produce poor output without properly managing your context window.
And this isn’t just a model intelligence problem.
The problem is that every token added to the context window competes for the model’s attention.
Which means that besides the model and user data, what really makes the Memory is the harness.
The Claude Code team has demonstrated that filesystem and agentic search is an effective answer to this problem and drastically outperforms standard RAG pipelines with vector DBs or knowledge graphs (like mem0).
The only tradeoff is latency due to the many tool calls, but we bet this will improve with model capabilities.
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great.
The two methods that seem to scale arbitrarily in this way are search and learning.
The Bitter Lesson
This is why our Memory is built around search tools, optimized for speed and flexibility.
Structure
Our Memory is currently structured around five layers:
-
Session:
Today, this is essentially a Claude Code session: messages, tool calls, automatic compaction when nearing context window limits.
We automatically create or resume sessions based on incoming message context.
At the end of a session, the conversation is stored as a Record and used to extract Insights (see below).
-
PROFILE.md:
Information about the user that should always be in the context window: personality, values, key identity facts, interests, skills, constraints…
-
Insights:
Stable, short statements about the user that are useful to the mentor (e.g., “Likes cycling”). Stored in
memory.sqlite, sorted across categories, and linked to the Records they were extracted from. Insights are proactively injected into user messages via semantic and keyword search, and also retrievable by the agent at any time.
-
Records:
Unmodified user data (emails, calendar events, etc.) stored in Markdown format in
memory.sqlite.
Metadata enables filtering.
Never proactively added to the context window but always retrievable by the agent at any time.
-
Unstructured data:
User data that doesn’t benefit from structured storage.
Stored in the filesystem (typically as Markdown files), searchable via standard tools (
Grep, Glob, Read).
Why SQLite and not just files?
Grep alone isn’t enough.
We need metadata filtering, and SQLite is significantly faster on large corpora, especially after indexing.
Pipeline
Records and Insights are built using the following pipeline: user data is first cleaned, then Insights are extracted and chunks (for search retrieval) computed in parallel.
Both Insights and chunks are indexed using FTS5 for keyword search and sqlite-vec for semantic search via embeddings.
Why a separate pipeline instead of doing everything in the main agent?
This isn’t a fixed choice, but today we do it to:
-
Use different models:
Faster and cheaper than the core agent model (Opus/Sonnet 4.5).
-
Parallelize with the main agent:
The core agent could spawn sub-agents, but it can’t (in its current form) keep answering the user at the same time.
-
Avoid interfering with the main agent context window.
What comes next
Beyond refining our memory design and scaling integrations, there are many other technical challenges we need to figure out.
Morphing UI
Adaptive, user-specific interfaces dynamically shaped by each user’s Quests.
Morphing UI is what unlocks our horizontal approach, enabling us to scale across domains and compete with vertically specialized coaching and mentoring products.
We still have a lot to discover here, but imagine leveraging user sandboxes as backends to generate dedicated apps for each user.
Evals
Building evals for a deeply personal product is hard as judging the output quality is extremely subjective.
One intuition: we should leverage sandbox replication and run evals with trusted testers.
By creating daily sandbox recordings (daily snapshots + monitoring all external inputs like web search results, new emails, calendar events…), we could replay the same day with different agent configurations and let trusted testers provide feedback.
We also have a number of product-level KPIs: number of Quests achieved vs rejected, number of interactions with the mentor, etc.
But also…
-
Infrastructure:
Updates (shipping to millions of sandboxes, not just a single server) and security.
We’re handling sensitive user data, which means zero leakage is critical.
-
Multi-modality:
Today we only use text as context input.
We need to support images, audio, and video — extracting insights from user photos, voice memos, meetings, and more.
-
Multi-platform:
We need Quest clients beyond just iOS to capture context from anywhere.
This could include a macOS app, a webapp, and even a Chrome extension.
-
Always-on AI wearables:
This is one of the key platforms we need to build on.
Tech giants (OpenAI, Meta) and startups (Friend, Plaud) are all racing to build hardware that captures real-world context.
We want to be one of the first teams to leverage their APIs and build on top of their SDKs.
-
LLM sycophancy:
We need to build mentors that think with the user, not just agree with them.
A true mentor should challenge, question, and expand perspectives rather than echo statistically likely responses.
We want to be the best AI-applied team in the consumer space, always finding how the latest AI research applies to building products.
Further reading