Executive summary

Free, world-class tutoring for every African learner — engineered to scale.

Luma is a free, curriculum-aligned AI tutor delivered where learners already are: on WhatsApp, in their own language. This document specifies Luma v2 — the enterprise platform that adds memory, a real model of each learner, curriculum grounding, and the compute strategy to serve millions at a cost that keeps it free.

250k+

learners reached

8M+

tutoring interactions

countries

UN / TIME

featured impact

What changed since the last revision

NVIDIA Inception membership is confirmed. This is not a "nice to have" — it is the hinge of the economic model. Confirmed access to GPU credits, developer systems, model-serving software (NIM/NeMo), African-language speech (Riva), and edge kits (Jetson) moves self-hosted inference and offline delivery from "later" to the core plan. The whole Compute & inference strategy and unit-economics sections below are new and drive the roadmap.

The thesis in three sentences

1) Demand is proven — v1 reached 250k+ learners and 8M+ interactions as an MVP. 2) To be genuinely transformative it must remember learners, model what they know, and ground answers in the real curriculum — none of which v1 does yet. 3) To stay free at national scale, marginal cost per learner must trend toward zero, which we achieve by running our own models on NVIDIA-accelerated infrastructure instead of paying per-token to a frontier API for every turn.

The mission

Free education, powered by AI. Equity of access is the product.

The moat

A curriculum-native learner model + human-reviewed content + in-language voice — expensive to copy, compounding with use.

The unlock

NVIDIA compute turns "free tutoring for millions" from a cost problem into an engineering one.

v1 → v2 at a glance

Capability	v1 (live today)	v2 (this document)
Memory of the learner	Last ~20 messages, as one text blob	Durable sessions + three-tier long-term memory
Knows what a learner has mastered	Nothing tracked	Multi-dimensional mastery model (source of truth)
Grounding in curriculum	Ungrounded (retrieval is a stub)	Retrieval over an approved, cited curriculum store
Scale ceiling	Effectively ~1 instance as configured	Designed for 1M+ concurrent learners
Inference & cost	Frontier API only; no cost tracking	Self-hosted-first model router; per-turn cost accounting
Reach	WhatsApp text, online only	+ voice (in-language) + offline "school-in-a-box"

How to read this: use the Plain-English / Technical switch (top-right). Plain-English is for everyone — investors, partners, product owners. Technical reveals stack tables, data models, scale numbers and hardware detail for architects and the dev team.

Technical view is on — engineering detail, numbers, and hardware specifics are now visible.

Product & requirements · PRD

What we're building, for whom, and why

One tutor, several surfaces. The core experience — personalised, curriculum-aligned tutoring — is shared; the surfaces differ by audience and go-to-market.

Product offerings

Live

Luma Tutor

The core WhatsApp tutor any learner uses free, in their language.

In build

Luma School

Classroom/school deployments with teacher & parent visibility.

Planned

Luma International

Curriculum-swappable tutor beyond South Africa (Lesotho piloted).

Planned

Luma Skills

Beyond-school skills & adult learning pathways.

Planned

White-label / Luma-powered

The engine behind partners' branded products — a revenue line that funds the free tier.

Planned

Own platform (beyond WhatsApp)

Richer web/app + offline surface for deeper learning.

Who we serve

Learner primary

Often a shared/low-end Android, tight data budgets, prefers voice notes, home language frequently isiZulu, isiXhosa, Sesotho, Afrikaans or others rather than English. Needs: patient, in-language help that meets them at their level and remembers them.

Teacher

Stretched across large classes. Needs: visibility of where a class struggles; trustworthy, curriculum-aligned help that extends their reach without adding admin.

Parent / guardian

May have limited time or schooling. Needs: a simple trusted signal that learning is happening — and consent & control over their child's data.

School / partner / sponsor

Deploys or funds Luma at scale. Needs: outcomes evidence, safe data handling, and a credible cost story.

Capability requirements

Area	Requirement	Status
Learning	Model each learner's mastery per curriculum idea; choose the pedagogically correct next move	build
Memory	Remember learners across sessions; resume where they left off	build
Grounding	Answer from approved, cited curriculum; no invented facts	build
Voice	Understand & reply to voice notes in African languages	planned
Safety	Child-appropriate behaviour; human escalation for sensitive cases	build
Cost	Per-turn cost tracked & governed; free at the point of use	build
Reach	Work on low-end devices, low data, and offline in classrooms	planned

Goals, non-goals & success metrics

Goals

Personalised, remembered, curriculum-grounded tutoring.
Free at the point of use, at national scale.
POPIA-safe handling of children's data.
Measurable learning outcomes, not just engagement.

Non-goals (v2)

Replacing teachers — Luma extends them.
A social network or general chatbot.
Monetising the learner. Revenue = schools, partners, white-label.

Mastery gain

learning per active learner — the north star

WAU · retention

habit & reach

Grounded accuracy

answers correct & on-curriculum

Cost / turn

sustainability of "free"

Business model & sustainability

How "free" is funded

The WhatsApp tutor is free for everyone, always — that is the mission. The business sits around the free tier: a premium web app, brand-funded learning, and B2B. Crucially, most revenue depends on being able to prove learning outcomes.

The core principle

Free WhatsApp drives reach and impact; the premium web experience and partners drive revenue and sustainability. Without proof of learning, Luma is "a WhatsApp homework chatbot". With proof, it's "a mastery-based learning system that proves outcomes, reachable anywhere via WhatsApp" — a different, defensible market.

Offerings

Luma School

Core K–12 — free on WhatsApp, premium on the web app.

Luma Skills

Practical skills beyond school.

White-Label

Partner-branded use of Luma's engine — platform fee + revenue share.

Data & Insights

Aggregated, consented learning insights for institutions / ministries.

Brand-Funded Learning

Age-appropriate, curriculum-aware brand sponsorship.

Corporate Training

Per-employee / enterprise upskilling.

Revenue streams

Stream	Model	Status
Brand-Funded Learning	CPM ($3–$15) / per-message / sponsorship packages ($1k–$10k/mo); ~$0 cost per impression (pre-written, in the free window)	DHL agreed in principle — first to build
Luma Tutor (consumer premium)	Subscription ~R50–R100/mo — web app with memory, mastery tracking, dashboards, practice	Planned
White-Label	Platform fee + revenue share	Planned
Corporate Training	Per-employee / enterprise licence	Planned
Data & Insights	Subscription / per-report	Planned
Certification & credentialing	Per-certificate	Later

Why the unit economics matter here

Brand-funded and free-tier reach only work if the cost of a free turn trends toward zero — exactly what the self-hosted NVIDIA inference strategy delivers. Cheap reach funds the mission; proven outcomes unlock the paid streams.

Source: luma-docs monetization-strategy.md, product-offerings.md, monetization-and-offerings.md (2026-04-23).

Impact, metrics & market

Proving learning — and where Luma wins

Engagement is easy; learning is the point. Luma's north-star is proven mastery gain, measured by the learner model and a monitoring-and-evaluation (MEL) framework — the same proof that unlocks the business.

The metric framework

Learning

mastery gain per learner (north-star); per-KC progress

Reach

weekly active learners, retention, countries

Trust

grounded-answer accuracy; teacher/parent satisfaction

Sustainability

cost per turn; revenue per stream

Targets and baselines are set per pilot; the MEL framework + pilot studies generate the evidence (pre/post assessment, mastery trajectories) that both proves impact and makes the paid streams sellable.

Theory of change

Inputs (free in-language tutor · CAPS curriculum · cheap NVIDIA compute) → activities (personalised, grounded tutoring at scale) → outputs (learners tutored · mastery tracked) → outcomes (improved understanding · exam readiness · confidence) → impact (equity of access; better life outcomes). Aligned to UN SDG-4 (quality education).

Market & competition

The reference competitor, Khan Academy's Khanmigo, is $4/month for learners/parents (free for US teachers) — but individual subscriptions are US-billing only; the AI tutor is not purchasable for individuals outside the US. Khan's content is free globally; its AI tutor is not reachable by an African learner.

Where Luma wins

Free at the point of use, not $4/mo.
On WhatsApp, where learners already are — no app, low data.
In-language (African languages) and CAPS-native.
A real mastery model + misconceptions, not just Q&A.
Offline reach and POPIA data residency.

The category

Global AI tutors (Khanmigo, Duolingo, Google/LearnLM), homework-help chatbots, and local edtech — mostly paid, English-first, app-based and Western-curriculum. Luma's wedge is free + WhatsApp + African-curriculum + proven mastery, a segment the incumbents don't serve.

Khanmigo pricing/availability: khanmigo.ai/pricing (2026). Impact framing: luma-docs 12-learning-outcomes/ (MEL framework, pilot studies).

System architecture · SPEC

How the platform is put together

v2 is a greenfield platform on .NET 10 + Microsoft Agent Framework (production GA, April 2026), built beside the live product, sharing one source of truth for teaching, on the event-driven AWS backbone the team already runs.

In plain terms: Luma works like a well-run tutoring company. The learner model is the head of teaching — it decides what each learner does next. The tutor agents are the teachers who explain it well. Memory is the student record. Grounding is the approved textbook. A reliable message system makes sure nothing is lost between a learner asking and Luma replying.

Every learner turn flows through one workflow that consults the learner model (what to teach), grounding (approved material) and memory (who this learner is) — and renders via a model router that prefers self-hosted inference.

The tutoring turn, step by step

Steps 2 & 3 run in parallel. Step 4 (the learner model) decides the move; step 5 (the agent) only renders it. The turn is checkpointed so a crash resumes without re-spending tokens.

The stack

Concern	Technology	Role
Core spine	.NET 10 + Microsoft Agent Framework (GA Apr 2026)	Agents + graph workflows; C#-first, enterprise-grade
Pedagogy	.NET learner-model (existing domain → runtime)	Curriculum graph, mastery, misconceptions, routing
ML / content	Python	Authoring pipeline, embeddings, retrieval, voice, evolution
Event backbone	Amazon SQS / SNS (+ EventBridge)	Choreography; idempotent; buffers load spikes
Hot state	PostgreSQL (Aurora)	Accounts, mastery projections, sessions
Sessions + checkpoints	DynamoDB	Externalized agent/workflow state → stateless compute
Working-memory cache	Redis (ElastiCache)	Per-session context; reliable streaming
Per-user data lake	S3 (prefix-per-user + Access Points)	Transcripts, work, media, event archive
Vector retrieval	S3 Vectors (cold) + pgvector/OpenSearch (hot)	Curriculum + per-user embeddings
Inference	Self-hosted NIM/vLLM · DGX Cloud · frontier (Bedrock/Anthropic/OpenAI)	Model router across tiers — see Compute
Region	AWS af-south-1 (Cape Town)	Data residency / POPIA

Design rule: agents render, the learner model decides, events connect, state lives outside compute. One source of truth; no duplicate memory systems.

Why greenfield-parallel, not rewrite-in-place?

The live service carries 250k+ learners and the brand's credibility. The risk in a rewrite is not effort — it's regressing a service families depend on. Building v2 beside v1 lets us build the correct architecture with zero pressure on the live loop, and A/B it before moving anyone. See Roadmap.

The learner model · the brain

What each learner knows — and what to do next

Luma's differentiator and single source of truth for teaching. Not a chatbot prompt: a structured, curriculum-native model of every learner's knowledge, built and tested in .NET today (domain layer), with the runtime being the core v2 build.

Curriculum graph

The curriculum as knowledge components (single teachable ideas) linked by prerequisites — so Luma knows what must come before what.

Multi-dimensional mastery

Per idea, three scores 0–1: can the learner do it (procedural), understand it (conceptual), and apply it (application)?

Misconceptions, first-class

Specific wrong ideas (e.g. "squaring means ×2") are tracked as their own state and directly corrected — different from simply not knowing yet.

Reach-back pacing

If an earlier building block is missing, Luma steps back to fill the gap, then returns the learner to grade level.

The key design decision

The learner model — not the AI agent — chooses the teaching move each turn. The agent's job is to say it well. This keeps Luma pedagogically sound and consistent, rather than leaving teaching to an unpredictable language model.

The coherence router

Each turn, the router reads the mastery vector, active misconceptions and curriculum position, and returns one move + a target knowledge component + a feedback "rung":

TeachPracticeAdvanceReachBackAddressMisconception · rungs: PromptHintWorkedSolution

Mechanic	Detail
Mastery update	Trust-weighted exponential moving average (α≈0.15). In-tutor evidence trusted at 1.0; softer sources (e.g. guardian confirmation) far lower (~0.4).
Mastery threshold	An idea counts mastered only when every relevant dimension clears its threshold (default 0.85). Zero-weighted dimensions are ignored.
Misconception	Modelled as a wrong-rule with a strength 0–1; detected by specific diagnostic items; cleared by remediation + subsequent correct evidence.
Reach-back	ATP sets the expected position; if a prerequisite is unmastered the active position points back, then resumes.

Status & honest gaps

Built & tested in .NET today: the domain — curriculum graph, mastery thresholds, misconceptions, reach-back, POPIA consent (106 tests). The v2 build: per-learner state storage, the router engine, and the HTTP API the tutor calls each turn. Migrated v1 learners begin with a low-confidence mastery estimate (v1 never recorded mastery) that converges with live evidence.

Conversation & memory

Remembering the learner

A good tutor remembers you. v2 gives Luma real memory in three layers, so each conversation builds on the last. (v1 today keeps only the last ~20 messages, flattened into one string, with no session concept.)

Working memory

The current conversation — cached in Redis, sent to the tutor as a clean, role-tagged history plus a rolling summary, not a jumble of raw messages.

Episodic memory

Short summaries of past sessions ("last week: struggled with fractions, ended positive") so Luma resumes intelligently.

Profile & mastery

Durable facts — grade, language, goals, tone — plus a pointer into the mastery model.

Two learner-facing wins

Transparency: a learner or parent can ask "what do you remember about me?" and Luma answers — trust + privacy. Proactive re-engagement: "Yesterday we were on fractions — keep going?" brings learners back.

Engineering note: conversation state is an externalized, serializable session in DynamoDB, cached in Redis, so compute stays stateless and scales horizontally. Memory is injected into the tutor agent via context providers, not hand-built prompt strings. The confusion/engagement signals v1 already computes — and currently discards into an off-by-default debug loop — become episodic-memory inputs in v2. Inbound messages are idempotency-keyed to end v1's duplicate-reply behaviour.

Accessibility & inclusion

Designing for every learner — including neurodivergent children

A tutor that adapts to the individual is inclusive by nature. Luma can accommodate learners with ADHD, autism and other neurodivergent profiles — not as a clinical or diagnostic tool, but through learner-led pacing, multimodal explanation, and a calm, encouraging style. The accommodations that help neurodivergent learners tend to help everyone.

In plain terms: some children learn best in short bursts, some need to hear it, some need to see it, and some just need extra patience and encouragement. Because Luma already adapts to each learner, it can lean into those needs — shorter steps, voice, pictures and short videos, gentle check-ins, and never making a child feel bad for getting something wrong.

How Luma accommodates — a little goes a long way

Small steps, one idea at a time

Break concepts into short, single-focus turns — a natural fit for WhatsApp and for shorter attention spans. No walls of text.

Learner-led pace, no rushing

The learner model already paces to mastery and reaches back to fill gaps — a learner can slow down or repeat, without judgment.

Choose the modality

Text, voice, diagrams, and short personalised videos — deliver a concept in the form that lands for that learner.

Focus-friendly & predictable

A calm, low-distraction interface, a clear "what's next", consistent routines, and optional gentle check-in reminders.

Encouraging, low-shame

Never shames a wrong answer; celebrates progress; eases the anxiety that so often surrounds learning for struggling students.

Remembered preferences

"Shorter answers", "more visuals", "extra encouragement" save to the learner's profile and apply every turn — set by a learner, parent or teacher.

Implementation: model these as accommodation settings on the learner profile (response length, modality preference, pacing, reminder cadence). They flow into the tutor agent through the same context providers as memory, and bias the coherence router toward smaller steps and more practice. This follows Universal Design for Learning — multiple means of engagement, representation, and expression — so the same levers serve every learner.

Handled with care: this is supportive design, not diagnosis or therapy. Luma complements — never replaces — teachers, specialists and caregivers, and is best co-designed with them and with neurodivergent learners themselves. The platform follows accessibility standards (e.g. WCAG). We deliberately keep claims modest: a little thoughtful accommodation, applied consistently, is what helps.

Platform UI · beyond WhatsApp

The own platform — a full learning workspace

WhatsApp is the wedge; the own platform is the depth. For what a chat thread can't do well — guided-learning books, practice sets, a progress dashboard, note-taking, visualisations — Luma reuses the open-source DeepTutor project as a head start, with Luma's learner model as the brain.

In plain terms: WhatsApp is perfect for "help me right now." A web/app platform adds what a chat can't show well — a dashboard of what you've mastered, interactive "living books" that teach a topic, practice quizzes, and somewhere to keep notes. Rather than build all of that from scratch, Luma starts from a mature open-source learning app (DeepTutor) and puts its own brain behind it.

Why reuse DeepTutor

Mature, and moving fast

20k+ GitHub stars; latest release v1.2.3 (Apr 2026); a recent agent-native rewrite (~200k lines). A living project, not an abandoned demo.

A whole workspace, already built

Chat, Deep Solve, Quiz, Deep Research, Visualize, Math Animator, an interactive Book Engine ("living books"), a multi-doc Co-Writer, a Knowledge Hub, and per-user memory.

Modern, hostable stack

Next.js 16 / React 19 front end, FastAPI backend, Docker images, full OpenAPI — the same kind of stack Luma already operates.

Apache-2.0 licence

Permissive — fork, rebrand and ship commercially with attribution. No copyleft trap.

Multi-user built in

Turn on auth and it becomes multi-tenant: per-user isolated workspaces, admin-curated knowledge bases and skills, an audit trail.

Saves roughly a year

Reusing this is about a year of front-end and learning-tooling the team doesn't have to build — freeing it to focus on the learner model and scale.

The reuse model — one brain, DeepTutor as the surface

The key tension to resolve

DeepTutor ships its own memory, RAG and teaching logic. Luma's learner model is the single source of truth. So Luma inverts the brain: keep DeepTutor's interface and study tools, but subordinate its pedagogy to the learner model, feed it Luma's curriculum, and store data in Luma's POPIA-compliant systems. Running both brains side by side would create two competing sources of truth — the exact trap to avoid.

DeepTutor provides the interface and rich study tools; Luma's learner model, grounding and memory stay the system of record behind an integration adapter. The CLI authors curriculum content headlessly.

How DeepTutor's parts map to Luma

DeepTutor	Role in Luma	Integration
Web app (Next.js 16)	The Luma platform UI shell	Fork + rebrand (theming, Skills/Souls)
Capabilities (Deep Solve, Deep Research, Visualize, Quiz, Book)	Study tools for web and WhatsApp	Called as tools via `/plugins/capabilities/{name}/execute-stream` or `run --format json`
Book Engine ("living books")	Luma "Guided Learning"	Fed from the canonical curriculum; compiled headlessly via the Book API / CLI
Knowledge Bases + RAG	Grounding content surface	Populated from Luma's canonical curriculum store
Memory (L2 / L3 API)	Subordinate to Luma memory + learner model	Fed / overridden via the memory API; learner model stays source of truth
Turn decision	Comes from Luma	An adapter/capability calls Luma's `/next-move` each turn
Local `data/` storage	Replaced	Redirected to Luma's POPIA stores (Postgres / S3, in-region)

Where the DeepTutor CLI & Server API fit

The part that wasn't obvious: the CLI and its serve API are an integration and authoring boundary — not a learner-facing surface. They give Luma three concrete things:

1 · Engine for a custom front end

The unified turn WebSocket (/api/v1/ws) streams the same events the UI uses, and full OpenAPI generates typed clients — so Luma's own UI or backend can drive the engine directly.

2 · Capabilities as tools

Deep Solve, Deep Research, Visualize, Math Animator and Quiz can be invoked programmatically — so the WhatsApp tutor gains them too, not just the web platform.

3 · Headless content authoring

In the content pipeline, kb create/add and book compile Luma's approved curriculum into knowledge bases and Guided-Learning books automatically (dev-agents can drive it via its SKILL.md handover).

Fit by product surface

Strong fit

Luma School

DeepTutor's multi-user mode maps almost natively: a school is a tenant, the teacher is the admin (curating knowledge bases, skills and models), students are invited users with isolated workspaces. This is DeepTutor's design centre — class/school scale — and the fastest path to a real platform.

Reuse UI, Luma brain

Consumer platform (millions)

DeepTutor's per-user file/SQLite storage is built for classes, not millions of anonymous individuals. Here Luma reuses the interface and capabilities but keeps the learner model, memory and POPIA data on its own scalable core.

Registration & onboarding — capture school information

Design opportunity: because schooling is compulsory for school-age children in South Africa (broadly ages 7–15, up to Grade 9), onboarding can ask for school information at registration. That's more than a form field — it links a learner to Luma School and a teacher, strengthens safeguarding and guardian-consent flows, and enriches the learner model with school and grade context from the very first turn.

Links to School

School + class maps a learner into the right Luma School tenant and teacher dashboard — turning individual users into a connected classroom.

Safeguarding & consent

Knowing a child's school supports guardian consent, age-appropriate handling, and a route to a trusted adult.

Immediate personalisation

Grade, curriculum and school context sharpen the learner model from turn one, before any mastery evidence exists.

Collect proportionately under POPIA (data minimisation), with guardian consent, treating school details as personal information; make it easy but not a hard barrier to a free service. Compulsory-schooling specifics and any reporting duties should be confirmed with counsel — this is a product/design consideration, not legal advice.

Platform surfaces (information architecture)

Tutor

chat, in-language, voice — the same brain as WhatsApp

Guided Learning

interactive "living books" from the curriculum

Practice

mastery-targeted quizzes & question bank

Progress

mastery dashboard — learner, parent, teacher

Notes

Co-Writer & notebooks

Knowledge

curriculum & uploaded material

Visualize

diagrams, charts, math animations

Account

school, consent, language, guardian links

Risks & how they're held

Fast upstream drift (weekly releases): keep Luma customisations at the edges — auth/SSO, a memory adapter, curriculum sync, theming — and avoid deep core edits so upstream can be rebased.
Two brains: the memory + /next-move inversion must be deliberate, or DeepTutor's pedagogy competes with the learner model.
Tenancy scale: school-scale is native; the consumer platform needs Luma's stores behind DeepTutor's UI.
Data residency / POPIA: DeepTutor defaults to local data/ — redirect to in-region Luma stores.
Licence: Apache-2.0 — fork/rebrand/commercial use is fine with attribution. ✓

Phased approach

Fork & rebrand

Stand up DeepTutor as "Luma", themed, fed one real curriculum slice as a KB + Guided-Learning book. Internal only.

Invert the brain

Wire the memory + /next-move adapter; expose DeepTutor capabilities as tools to the WhatsApp tutor too.

Luma School pilot

Multi-user, per-school, curated by teachers, with school-linked onboarding — DeepTutor's native strength.

Consumer platform

DeepTutor UI on Luma's scalable stores; converge with the WhatsApp learner base.

Recommendation

Reuse DeepTutor as the platform's interface and study-tooling layer, and as a capability engine the WhatsApp tutor can also call — but keep Luma's learner model as the one brain and Luma's POPIA stores as the one system of record. Start with Luma School (where DeepTutor's multi-user design fits natively), use school-linked onboarding, and treat the CLI / Server API as the integration + content-authoring boundary, never the source of truth.

Sources (verified 2026-07-02): DeepTutor README (release v1.2.3, 2026-04-24; 20k+ stars; Next.js 16 / FastAPI; Apache-2.0), docs.deeptutor.info Server API (/api/v1/ws, OpenAPI, capability / memory / book endpoints) and Multi-User Deployment (JWT auth, per-user file/SQLite workspaces, admin grants, single-process caveat); github.com/HKUDS/DeepTutor.

Content & curriculum

The material Luma teaches from

Grounding is only as good as the content behind it. Luma has real curriculum material today — but in three incompatible shapes. v2 unifies them into one canonical, cited store that both the learner model and grounding read from.

CAPS knowledge (SA)

Official CAPS curriculum PDFs, parsed and indexed for retrieval (Maths, Physical Sciences, Business Studies live). Great for grounding, but documents — not structured skills.

Lesotho learning-outcome graph

Hand-verified structured outcomes (e.g. 40 Grade-8 Maths outcomes) — the shape the learner model wants. Piloted, quality-checked.

Legacy curriculum export

An earlier scraped course/unit/component format — mostly placeholder content.

The moat is authoring, not scraping

Turning raw curriculum into assessable knowledge components — with correct answers, misconception diagnostics and prerequisite links — is genuine R&D and the defensible asset. v2 builds an LLM-assisted authoring pipeline with human review: the model proposes drafts; educators approve them into the canonical store. Nothing reaches a learner unreviewed.

Target: one canonical curriculum store = the knowledge-component graph in Postgres + a pgvector/S3 Vectors chunk table (grounding and pedagogy share one source). Retirement path for the three legacy representations. Content tools already in hand: an OCR/parse pipeline for scanned material and a self-hosted media/notebook generator for study assets. Retrieval is KC-scoped, so answers stay on-topic and cite their source.

Data & privacy · POPIA

Handling children's data responsibly

Luma serves minors, so privacy is a first-order design constraint. Data lives in South Africa (AWS Cape Town), is encrypted, governed by consent, and — with edge delivery — can stay on-device entirely.

Per-user data — done at scale

"A storage bucket per user" is the instinct; at a million users it's an anti-pattern (S3 allows 10,000 buckets by default, up to 1,000,000 by request, with per-bucket fees past 2,000). Instead each learner gets an isolated prefix in shared storage, with per-prefix access control, tiered by how the data is used:

Structured, low-latency state stays hot; the per-user "data lake" and vector recall sit in cheaper tiers.

Privacy commitments

Consent & guardianship

Guardian relationships and consent modelled explicitly, with an append-only audit trail.

Residency & encryption

Data stays in-region; personal information encrypted; access isolated per learner.

Right to be forgotten

A learner's data deletes cleanly — storage prefix dropped, hot records purged.

Sovereignty via the edge

Offline school-boxes can keep a child's data on the device — it never leaves the classroom.

Compute & inference strategy · powered by NVIDIA

How we serve a free tutor to millions

Luma's cost is dominated by AI inference, not servers. "Free forever" therefore depends on driving cost-per-turn toward zero by running our own models on efficient NVIDIA infrastructure — reserving paid frontier models for only the hardest turns. NVIDIA Inception (confirmed) provides the credits, hardware and software to do exactly this.

The model router — a three-tier strategy

The Agent Framework's connectors (Bedrock, Anthropic, OpenAI, Ollama/NIM) make the tier a configuration choice — swapping in self-hosted models needs no change to product code. Quality is guarded by evaluation sets so the cheap model never degrades learning.

What NVIDIA Inception provides, mapped to how we use it

Track	NVIDIA assets	How Luma uses it
Developer systems	2–4× DGX Spark (128 GB, ~$4–4.7k); RTX PRO 6000 Blackwell (96 GB, ~$8–9k)	Local fine-tuning & eval of 8B–70B models — no metered cloud during R&D
Production inference	DGX Cloud discount + CSP credits (AWS Activate ≤$100k, Nebius ≤$150k); NIM microservices	Serve the tutor elastically as the platform scales; own inference past break-even
Training / evolution	B200 / GB200 credits; NeMo	Fine-tune on real learner-event data; evolve the mastery algorithms once a corpus exists
Speech	Riva + Parakeet/Canary	African-language speech-to-text & text-to-speech — voice-note tutoring, low-literacy access
Generative media	NVIDIA generative image/video on owned GPUs	Personalised short explainer videos per learner; cached & safety-reviewed
Edge	Jetson AGX Thor (128 GB, 40–130 W); Orin Nano Super (~$249)	Offline "school-in-a-box" running a quantised tutor with no internet
Enablement	DLI training credits; solutions architect; AI Enterprise licence	Upskill the team; NVIDIA help standing up self-hosted inference

Voice-first, in-language

Learners already send voice notes. With Riva speech models, Intake transcribes a voice note (speech-to-text) and Render can reply with spoken audio (text-to-speech) in the learner's language — a major accessibility unlock for low-literacy learners and younger children, and directly on NVIDIA's "sovereign/edge AI" narrative.

Generative media — personalised video

With owned NVIDIA GPUs, Luma can generate short, custom explainer videos tailored to an individual learner — their worked example, at their level, in their language — for concepts that simply land better in motion. It reuses the same inference fabric the model router already manages, and complements DeepTutor's Math Animator and interactive "living-book" visuals.

Used deliberately: video generation is compute-heavy, so Luma generates it for high-value concepts, caches and reuses clips across learners where the example is shared, and routes every child-facing clip through safety and quality review. Personalised video is also a strong accessibility lever (see Accessibility & inclusion).

Edge & offline — the "school-in-a-box"

Where connectivity is poor, a Jetson device runs a quantised tutor + a slice of the learner model + the relevant curriculum entirely offline in a classroom. It serves turns locally, stores progress on-device (POPIA-clean — data never leaves), and syncs mastery + events back to the cloud when a connection returns.

Edge hardware — a two-tier device model

With hardware funding, Luma can build and ship its own classroom devices. Two complementary tiers keep unit cost low while extending all the way to full offline AI tutoring:

~$80

Tier 1 · Raspberry Pi 5 — content & hotspot node

An RPi 5 acts as a local Wi-Fi hotspot + web server, hosting courses, Guided-Learning books, past papers and the tutor's web UI. Learners connect their own phones with no internet and no data cost. Cheap enough to place in many classrooms.

$249–$3.5k

Tier 2 · Jetson — on-device AI brain

A Jetson Orin Nano or AGX Thor runs the quantised tutor model locally for full offline AI tutoring — pairing with the RPi 5 (Pi serves content + hotspot, Jetson serves the AI) or standing alone.

The combined "school-in-a-box" = RPi 5 (hotspot + hosted content) + Jetson (local inference). Courses and content are pushed to the Pi; the Jetson serves the tutor; progress stays on-device (POPIA-clean) and syncs to the cloud when a connection returns. The Pi tier alone (no Jetson) still delivers hosted courses + the tutor UI offline, with AI turns served on the next sync — a very low-cost rollout option.

Honest caveats on self-hosting

Ops burden: running model-serving (NIM/vLLM) and GPU fleets adds real operational work vs a pure API. Budgeted as an infrastructure workstream.
Quality guardrails: the router must be gated by evaluation sets so a cheaper self-hosted model never quietly worsens learning outcomes.
Framework note: Microsoft Agent Framework's turnkey durable hosting favours Azure; on AWS we implement our own state/checkpoint stores + use SQS — deliberate work, and it keeps us cloud-consistent.

Cost & unit economics

Why "free" is sustainable

The single question investors and partners ask about free AI education is "how do you afford it?" The answer is unit economics: measure every turn's cost, and drive the marginal cost of a turn toward zero by self-hosting.

~$0.0016

v1 cost per message today (frontier API, 2 calls/msg)

~$343k/mo

projected API-only floor at 1M active learners

→ near-zero

marginal cost target via self-hosted inference

Illustrative. Frontier-API cost per turn barely improves with volume; self-hosted inference amortises fixed GPU capacity across more turns, so cost per turn falls as Luma grows — the opposite of the usual scaling worry.

How credits fund the transition

AWS Activate (≤$100k) and Nebius (≤$150k) partner credits, plus DGX Cloud discounts, fund production inference while we stand up owned capacity — so the shift to self-hosting is paid for by the programme, not by burning runway.

Governance built in

v2 tracks cost per turn from day one (v1 tracks none), caches repeated work, and enforces per-turn cost caps in the router. A live cost dashboard sits beside the outcomes dashboard.

Figures: v1 per-message cost and the ~$343k/mo 1M-user projection are from the team's internal scalability & cost assessment (frontier model, ~2 uncached calls/message); the launch baseline is ~$1.60 per 1,000 messages with a $2.00 red line. NVIDIA credit ceilings are programme maxima, tier-dependent. The cost curve is illustrative, not a forecast.

Non-functional requirements

The qualities the architecture must guarantee

The hard part of a free AI tutor is the millionth learner, at a cost that keeps it free, without ever failing a child mid-lesson.

Scale — target 1M+ concurrent

Stateless compute behind queues; databases partitioned; no single-instance ceilings. (v1 today is effectively capped near one instance by in-memory defaults, ~25 concurrent generations, and three databases on one small instance.)

Reliability — no silent failures

Every turn checkpointed & resumable; inbound idempotency so duplicates can't double-reply; dead-letter queues catch the rest. (Directly fixes v1's duplicate-reply amplifier and its single-worker outage mode.)

Performance

The teaching decision returns in well under a second (learner-model p99 ≤ ~150 ms, excluding the LLM); the learner mostly waits only for the reply to stream.

Cost

Per-turn cost measured & governed; self-hosted-first routing (see Compute).

Safety

Child-appropriate behaviour, human-in-the-loop escalation, answers grounded in approved material.

Observability

Dashboards for cost, latency, reliability and — the point of it all — learning outcomes.

Infrastructure & topology

Where it runs

v2 runs on AWS in Cape Town, adds a GPU inference tier (owned + credited NVIDIA), and extends to the classroom edge — all reproducible as code.

Cloud (now)

Containerised services on AWS af-south-1, auto-scaling behind queues; managed Postgres/Aurora, Redis, S3, DynamoDB, SQS/SNS.

GPU inference tier

Self-hosted model serving (NIM/vLLM) on owned RTX PRO 6000 Server nodes + DGX Cloud burst on credits.

Edge (reach)

Jetson "school-in-a-box" for offline classrooms; syncs when connectivity returns.

Reproducibility: v1's live AWS was built by hand in the console (Terraform archived, unexecuted). v2 adopts infrastructure-as-code (Terraform/Pulumi) from the start, so environments are reproducible and drift-free. Deployment is via CI/CD with staged rollout and documented rollback. WhatsApp sending moves to multiple numbers to clear the single-number rate ceiling.

NVIDIA Inception — the strategic infrastructure partner

Confirmed membership provides the credits, discounted developer & server GPUs, model-serving/speech software, and edge kits that make owned inference and offline reach affordable. In short: it converts "serve millions for free" from a funding problem into an engineering one.

Delivery roadmap

From proven product to national platform

v2 is built beside the live product and converges only after it out-performs on a pilot. Every phase boundary is a safe stopping point. NVIDIA assets are layered in as each phase needs them.

Phase A

Prove the loop (internal)

v2 core: tutor workflow, learner-model runtime, grounding over a small slice of real, human-authored curriculum; self-hosted inference on dev hardware. Exit: one test learner completes a grounded, remembered, multi-session journey.

Phase B

Pilot cohort (new learners)

Bounded new-learner/school pilot on v2, full cost + outcome instrumentation, voice enabled. Exit: v2 meets go/no-go thresholds on accuracy, latency, cost and learning gain.

Phase C

Shadow existing learners

Silently run live v1 turns through v2 and compare — zero learner impact; self-hosted models carry the majority of turns. Exit: v2 ≥ v1 on agreed metrics.

Phase D

Migrate, converge & reach the edge

Move learners in waves with rollback windows; retire v1; deploy offline school-boxes; begin mastery-evolution on real data. Data migrates on rails via the event backbone.

How the build parallelises (workstreams)

Work splits along clean seams so many engineers — and AI dev-agents under senior review — build at once without colliding, provided interfaces are frozen first and enforced by automated contract tests.

Workstream	Owns
Core & turn workflow	The tutor workflow and agents
Learner-model runtime	Mastery state, coherence router, API
Memory & sessions	Three-tier memory + context providers
Grounding & curriculum	Canonical curriculum store + retrieval
Content authoring	LLM-assisted, human-reviewed pipeline
Inference & cost	Model router, NIM/vLLM serving, cost accounting
Infrastructure	Stores, event wiring, IaC, GPU tier, edge
Channels	WhatsApp/web/voice adapters + cohort routing
Data & migration	v1→v2 ETL + parity harness

Planning & decisions

What to decide, in what order, and who owns it

The working surface for planning: the open decisions, a rough timeline and critical path, the first sprint, and the roles to hire. Owners marked TBD are for the team to assign.

Decisions register

Decision	Recommendation	Status · owner
Core stack / learner-model runtime	.NET 10 + MAF; learner model = runtime source of truth	Decided
Rebuild approach	Greenfield parallel platform; converge later	Decided
Self-hosted model family	Standardise on one open-weight family (8B–70B)	Open · TBD
Second WhatsApp number for v2	Second number for the pilot (isolation + headroom)	Open · TBD
Reuse vs rebuild v1 account-api	Reuse accounts/onboarding as shared services	Open · TBD
Pilot surface	WhatsApp-first for the pilot cohort	Open · TBD
Edge (RPi 5 / Jetson) timing	Phase D, once cloud is proven	Open · TBD
DeepTutor reuse depth	School-first; reuse UI + capabilities, invert the brain	Open · TBD
Doc hosting	Internal portal + separate public/investor brief	In progress

Timeline & critical path (rough)

Critical path: freeze contracts → infra + stores → core turn loop → real curriculum slice → pilot cohort → shadow → migrate. Indicative windows (to set with the team):

Phase	Window	Milestone
A — core loop	Q3 2026	One learner: grounded, remembered, multi-session journey
B — pilot cohort	Q4 2026	Bounded pilot on v2 meets go/no-go
C — shadow	Q1 2027	v2 ≥ v1 on agreed metrics
D — migrate + edge	2027	Waves migrated; offline boxes; evolution

Sprint 0 — the first 2–4 weeks

De-risk & freeze

Verify prod topology — is v1 silently on in-memory defaults?
Freeze the v0 contracts (turn, memory, retrieval) + contract tests.
Add per-turn cost tracking + inbound idempotency to v1 (quick wins).

Stand up the v2 skeleton

WS-INFRA: DynamoDB stores + SQS wiring + IaC.
Learner-model service skeleton + the MAF PoC turn loop compiling against GA.
Seed one real curriculum slice (Gr 7 Term 1 Maths).

Workstreams & owners

Workstream	Owner	First task
Core & turn workflow	TBD	PoC turn loop vs MAF GA
Learner-model runtime	TBD	State projections + router API
Memory & sessions	TBD	Session model + context providers
Grounding & curriculum	TBD	Canonical store + real retrieval
Content authoring	TBD (+ educators)	Author Gr 7 Term 1 slice
Inference & cost	TBD	Model router + cost table
Infrastructure & edge	TBD	DynamoDB stores + IaC
Data & migration	TBD	Dual-run/shadow harness

Hiring (with funding)

Likely early roles, in rough order: Senior .NET / MAF engineerML / RAG & inference engineerContent & curriculum lead + educatorsDevOps / infraProduct / PMSafety & QA / evals — plus senior reviewers to gate the AI dev-agent fleet.

Suggested planning-session agenda

Confirm the open decisions above (owner + the call).
Assign the 8 workstream owners.
Agree Sprint-0 scope and who does what this week.
Pick the pilot cohort (which school / wave) and success metrics.
Hiring: which 2–3 roles first.

Risks & mitigations

What could go wrong, and how we've designed against it

Risk	Mitigation
Regressing the live 250k-user service in a rewrite	Greenfield-parallel; live product untouched until v2 out-performs on a pilot; wave migration with rollback
Cost blows up before it's governed	Per-turn cost tracking + model router from day one; self-hosting the easy majority of turns; credits fund the transition
A cheaper self-hosted model quietly worsens teaching	Router gated by evaluation sets; frontier fallback on low confidence; quality monitored as a first-class metric
Self-hosting / GPU ops burden	NIM microservices + NVIDIA solutions architect + DLI upskilling; owned nodes only past cloud break-even
Confident but wrong answers	Grounding in approved, cited curriculum; accuracy eval sets gate release
Two "sources of truth" drift apart	One learner model owns pedagogy; one curriculum store; no duplicate memory systems
Children's data mishandled	In-region, encrypted, consent-audited, per-learner isolation, clean deletion, on-device edge option
Edge / offline complexity	Edge is a Phase-D deliverable on proven cloud foundations; sync designed for eventual consistency
Many parallel builders diverge	Freeze interfaces first; contract tests as the merge gate; senior review in the loop
Over-reliance on programme credits	Credits accelerate, not sustain; owned-inference break-even and white-label revenue underpin the model

Team & ways of working

How this gets built

Luma builds with a reviewed, AI-assisted engineering process: many development agents working in parallel, with senior engineers reviewing and approving every change.

Contracts first

Interfaces between components are agreed and frozen before parallel work starts — what lets a large, partly-automated team converge instead of thrash.

Tests as the gate

Every change ships with tests; contract tests and parity checks against the existing learner-model act as automated merge gates.

Senior humans in the loop

Product owners, architects and team leads review this documentation and approve direction; senior engineers review code before it lands.

Living documentation

This portal is the shared reference for technical and non-technical stakeholders, kept current as decisions are made.

Open decisions still needing a human call

A second WhatsApp number for v2, or route by cohort on the existing number.
Which open-weight model family to standardise on for self-hosting (drives GPU sizing).
How much of v1's account services to reuse vs rebuild.
Whether the pilot cohort is WhatsApp-first or launches on the new web/voice surface.
Edge timeline — commit Jetson school-boxes in Phase C or hold to Phase D.

Glossary & sources

Plain-language glossary

Agent — an AI unit that reads input, uses tools, and writes a response. In Luma, agents explain; they don't decide the lesson plan.

Workflow — a defined series of steps (one tutoring turn) that runs reliably and can resume if interrupted.

Learner model — Luma's structured picture of what a learner knows; the source of truth for teaching.

Knowledge component — one small teachable idea (e.g. "squaring a number"), linked to its prerequisites.

Mastery vector — three scores per idea: can do it, understands it, can apply it.

Misconception — a specific wrong idea Luma detects and corrects, distinct from not knowing yet.

Grounding — pulling approved, cited curriculum into an answer so it's correct, not invented.

Model router — the switch that sends easy turns to a cheap self-hosted model and hard turns to a premium one.

NIM / NeMo / Riva — NVIDIA software for serving models (NIM), fine-tuning (NeMo) and speech (Riva).

Jetson — low-power NVIDIA hardware that runs AI locally — the offline "school-in-a-box".

POPIA — South Africa's data-protection law; governs handling of learners' (especially children's) data.

CAPS — South Africa's national school curriculum, which Luma is aligned to.

Source documents

This portal synthesises the following working documents (companion artifacts in this workspace):

v1 system review & build plan — survey of the existing platform, scale/cost ceilings.
v2 architecture, interface contracts, and migration/convergence plan (greenfield .NET/MAF).
NVIDIA Inception hardware & programme note (compute strategy inputs).
Microsoft Agent Framework 1.0 (GA Apr 2026) documentation; AWS S3 / S3 Vectors documentation.

Traction figures (250k+ learners, 8M+ interactions, 86 countries, UN/TIME), cost figures, and the NVIDIA relationship are as stated by the Luma team / internal assessments. Cost curve is illustrative. Hardware prices are indicative 2026 figures and move with supply.

Luma v2 — Enterprise Platform Documentation. Prepared for product, architecture and delivery review, and for investor/partner readability. Status legend: Live In build Planned — kept current as the build proceeds.