Free, world-class tutoring for every African learner — engineered to scale.
Luma is a free, curriculum-aligned AI tutor delivered where learners already are: on WhatsApp, in their own language. This document specifies Luma v2 — the enterprise platform that adds memory, a real model of each learner, curriculum grounding, and the compute strategy to serve millions at a cost that keeps it free.
250k+
learners reached
8M+
tutoring interactions
86
countries
UN / TIME
featured impact
What changed since the last revision
NVIDIA Inception membership is confirmed. This is not a "nice to have" — it is the hinge of the economic model. Confirmed access to GPU credits, developer systems, model-serving software (NIM/NeMo), African-language speech (Riva), and edge kits (Jetson) moves self-hosted inference and offline delivery from "later" to the core plan. The whole Compute & inference strategy and unit-economics sections below are new and drive the roadmap.
The thesis in three sentences
1) Demand is proven — v1 reached 250k+ learners and 8M+ interactions as an MVP. 2) To be genuinely transformative it must remember learners, model what they know, and ground answers in the real curriculum — none of which v1 does yet. 3) To stay free at national scale, marginal cost per learner must trend toward zero, which we achieve by running our own models on NVIDIA-accelerated infrastructure instead of paying per-token to a frontier API for every turn.
The mission
Free education, powered by AI. Equity of access is the product.
The moat
A curriculum-native learner model + human-reviewed content + in-language voice — expensive to copy, compounding with use.
The unlock
NVIDIA compute turns "free tutoring for millions" from a cost problem into an engineering one.
v1 → v2 at a glance
Capability
v1 (live today)
v2 (this document)
Memory of the learner
Last ~20 messages, as one text blob
Durable sessions + three-tier long-term memory
Knows what a learner has mastered
Nothing tracked
Multi-dimensional mastery model (source of truth)
Grounding in curriculum
Ungrounded (retrieval is a stub)
Retrieval over an approved, cited curriculum store
Scale ceiling
Effectively ~1 instance as configured
Designed for 1M+ concurrent learners
Inference & cost
Frontier API only; no cost tracking
Self-hosted-first model router; per-turn cost accounting
Reach
WhatsApp text, online only
+ voice (in-language) + offline "school-in-a-box"
How to read this: use the Plain-English / Technical switch (top-right). Plain-English is for everyone — investors, partners, product owners. Technical reveals stack tables, data models, scale numbers and hardware detail for architects and the dev team.
Technical view is on — engineering detail, numbers, and hardware specifics are now visible.
Product & requirements · PRD
What we're building, for whom, and why
One tutor, several surfaces. The core experience — personalised, curriculum-aligned tutoring — is shared; the surfaces differ by audience and go-to-market.
Product offerings
Live
Luma Tutor
The core WhatsApp tutor any learner uses free, in their language.
In build
Luma School
Classroom/school deployments with teacher & parent visibility.
Planned
Luma International
Curriculum-swappable tutor beyond South Africa (Lesotho piloted).
Planned
Luma Skills
Beyond-school skills & adult learning pathways.
Planned
White-label / Luma-powered
The engine behind partners' branded products — a revenue line that funds the free tier.
Planned
Own platform (beyond WhatsApp)
Richer web/app + offline surface for deeper learning.
Who we serve
Learner primary
Often a shared/low-end Android, tight data budgets, prefers voice notes, home language frequently isiZulu, isiXhosa, Sesotho, Afrikaans or others rather than English. Needs: patient, in-language help that meets them at their level and remembers them.
Teacher
Stretched across large classes. Needs: visibility of where a class struggles; trustworthy, curriculum-aligned help that extends their reach without adding admin.
Parent / guardian
May have limited time or schooling. Needs: a simple trusted signal that learning is happening — and consent & control over their child's data.
School / partner / sponsor
Deploys or funds Luma at scale. Needs: outcomes evidence, safe data handling, and a credible cost story.
Capability requirements
Area
Requirement
Status
Learning
Model each learner's mastery per curriculum idea; choose the pedagogically correct next move
build
Memory
Remember learners across sessions; resume where they left off
build
Grounding
Answer from approved, cited curriculum; no invented facts
build
Voice
Understand & reply to voice notes in African languages
planned
Safety
Child-appropriate behaviour; human escalation for sensitive cases
build
Cost
Per-turn cost tracked & governed; free at the point of use
build
Reach
Work on low-end devices, low data, and offline in classrooms
Measurable learning outcomes, not just engagement.
Non-goals (v2)
Replacing teachers — Luma extends them.
A social network or general chatbot.
Monetising the learner. Revenue = schools, partners, white-label.
Mastery gain
learning per active learner — the north star
WAU · retention
habit & reach
Grounded accuracy
answers correct & on-curriculum
Cost / turn
sustainability of "free"
Business model & sustainability
How "free" is funded
The WhatsApp tutor is free for everyone, always — that is the mission. The business sits around the free tier: a premium web app, brand-funded learning, and B2B. Crucially, most revenue depends on being able to prove learning outcomes.
The core principle
Free WhatsApp drives reach and impact; the premium web experience and partners drive revenue and sustainability. Without proof of learning, Luma is "a WhatsApp homework chatbot". With proof, it's "a mastery-based learning system that proves outcomes, reachable anywhere via WhatsApp" — a different, defensible market.
Offerings
Luma School
Core K–12 — free on WhatsApp, premium on the web app.
Luma Skills
Practical skills beyond school.
White-Label
Partner-branded use of Luma's engine — platform fee + revenue share.
Data & Insights
Aggregated, consented learning insights for institutions / ministries.
CPM ($3–$15) / per-message / sponsorship packages ($1k–$10k/mo); ~$0 cost per impression (pre-written, in the free window)
DHL agreed in principle — first to build
Luma Tutor (consumer premium)
Subscription ~R50–R100/mo — web app with memory, mastery tracking, dashboards, practice
Planned
White-Label
Platform fee + revenue share
Planned
Corporate Training
Per-employee / enterprise licence
Planned
Data & Insights
Subscription / per-report
Planned
Certification & credentialing
Per-certificate
Later
Why the unit economics matter here
Brand-funded and free-tier reach only work if the cost of a free turn trends toward zero — exactly what the self-hosted NVIDIA inference strategy delivers. Cheap reach funds the mission; proven outcomes unlock the paid streams.
Engagement is easy; learning is the point. Luma's north-star is proven mastery gain, measured by the learner model and a monitoring-and-evaluation (MEL) framework — the same proof that unlocks the business.
The metric framework
Learning
mastery gain per learner (north-star); per-KC progress
Targets and baselines are set per pilot; the MEL framework + pilot studies generate the evidence (pre/post assessment, mastery trajectories) that both proves impact and makes the paid streams sellable.
Theory of change
Inputs (free in-language tutor · CAPS curriculum · cheap NVIDIA compute) → activities (personalised, grounded tutoring at scale) → outputs (learners tutored · mastery tracked) → outcomes (improved understanding · exam readiness · confidence) → impact (equity of access; better life outcomes). Aligned to UN SDG-4 (quality education).
Market & competition
The reference competitor, Khan Academy's Khanmigo, is $4/month for learners/parents (free for US teachers) — but individual subscriptions are US-billing only; the AI tutor is not purchasable for individuals outside the US. Khan's content is free globally; its AI tutor is not reachable by an African learner.
Where Luma wins
Free at the point of use, not $4/mo.
On WhatsApp, where learners already are — no app, low data.
In-language (African languages) and CAPS-native.
A real mastery model + misconceptions, not just Q&A.
Offline reach and POPIA data residency.
The category
Global AI tutors (Khanmigo, Duolingo, Google/LearnLM), homework-help chatbots, and local edtech — mostly paid, English-first, app-based and Western-curriculum. Luma's wedge is free + WhatsApp + African-curriculum + proven mastery, a segment the incumbents don't serve.
v2 is a greenfield platform on .NET 10 + Microsoft Agent Framework (production GA, April 2026), built beside the live product, sharing one source of truth for teaching, on the event-driven AWS backbone the team already runs.
In plain terms: Luma works like a well-run tutoring company. The learner model is the head of teaching — it decides what each learner does next. The tutor agents are the teachers who explain it well. Memory is the student record. Grounding is the approved textbook. A reliable message system makes sure nothing is lost between a learner asking and Luma replying.
Every learner turn flows through one workflow that consults the learner model (what to teach), grounding (approved material) and memory (who this learner is) — and renders via a model router that prefers self-hosted inference.
The tutoring turn, step by step
Steps 2 & 3 run in parallel. Step 4 (the learner model) decides the move; step 5 (the agent) only renders it. The turn is checkpointed so a crash resumes without re-spending tokens.
Design rule: agents render, the learner model decides, events connect, state lives outside compute. One source of truth; no duplicate memory systems.
Why greenfield-parallel, not rewrite-in-place?
The live service carries 250k+ learners and the brand's credibility. The risk in a rewrite is not effort — it's regressing a service families depend on. Building v2 beside v1 lets us build the correct architecture with zero pressure on the live loop, and A/B it before moving anyone. See Roadmap.
The learner model · the brain
What each learner knows — and what to do next
Luma's differentiator and single source of truth for teaching. Not a chatbot prompt: a structured, curriculum-native model of every learner's knowledge, built and tested in .NET today (domain layer), with the runtime being the core v2 build.
Curriculum graph
The curriculum as knowledge components (single teachable ideas) linked by prerequisites — so Luma knows what must come before what.
Multi-dimensional mastery
Per idea, three scores 0–1: can the learner do it (procedural), understand it (conceptual), and apply it (application)?
Misconceptions, first-class
Specific wrong ideas (e.g. "squaring means ×2") are tracked as their own state and directly corrected — different from simply not knowing yet.
Reach-back pacing
If an earlier building block is missing, Luma steps back to fill the gap, then returns the learner to grade level.
The key design decision
The learner model — not the AI agent — chooses the teaching move each turn. The agent's job is to say it well. This keeps Luma pedagogically sound and consistent, rather than leaving teaching to an unpredictable language model.
The coherence router
Each turn, the router reads the mastery vector, active misconceptions and curriculum position, and returns one move + a target knowledge component + a feedback "rung":
Trust-weighted exponential moving average (α≈0.15). In-tutor evidence trusted at 1.0; softer sources (e.g. guardian confirmation) far lower (~0.4).
Mastery threshold
An idea counts mastered only when every relevant dimension clears its threshold (default 0.85). Zero-weighted dimensions are ignored.
Misconception
Modelled as a wrong-rule with a strength 0–1; detected by specific diagnostic items; cleared by remediation + subsequent correct evidence.
Reach-back
ATP sets the expected position; if a prerequisite is unmastered the active position points back, then resumes.
Status & honest gaps
Built & tested in .NET today: the domain — curriculum graph, mastery thresholds, misconceptions, reach-back, POPIA consent (106 tests). The v2 build: per-learner state storage, the router engine, and the HTTP API the tutor calls each turn. Migrated v1 learners begin with a low-confidence mastery estimate (v1 never recorded mastery) that converges with live evidence.
Conversation & memory
Remembering the learner
A good tutor remembers you. v2 gives Luma real memory in three layers, so each conversation builds on the last. (v1 today keeps only the last ~20 messages, flattened into one string, with no session concept.)
Working memory
The current conversation — cached in Redis, sent to the tutor as a clean, role-tagged history plus a rolling summary, not a jumble of raw messages.
Episodic memory
Short summaries of past sessions ("last week: struggled with fractions, ended positive") so Luma resumes intelligently.
Profile & mastery
Durable facts — grade, language, goals, tone — plus a pointer into the mastery model.
Two learner-facing wins
Transparency: a learner or parent can ask "what do you remember about me?" and Luma answers — trust + privacy. Proactive re-engagement: "Yesterday we were on fractions — keep going?" brings learners back.
Engineering note: conversation state is an externalized, serializable session in DynamoDB, cached in Redis, so compute stays stateless and scales horizontally. Memory is injected into the tutor agent via context providers, not hand-built prompt strings. The confusion/engagement signals v1 already computes — and currently discards into an off-by-default debug loop — become episodic-memory inputs in v2. Inbound messages are idempotency-keyed to end v1's duplicate-reply behaviour.
Accessibility & inclusion
Designing for every learner — including neurodivergent children
A tutor that adapts to the individual is inclusive by nature. Luma can accommodate learners with ADHD, autism and other neurodivergent profiles — not as a clinical or diagnostic tool, but through learner-led pacing, multimodal explanation, and a calm, encouraging style. The accommodations that help neurodivergent learners tend to help everyone.
In plain terms: some children learn best in short bursts, some need to hear it, some need to see it, and some just need extra patience and encouragement. Because Luma already adapts to each learner, it can lean into those needs — shorter steps, voice, pictures and short videos, gentle check-ins, and never making a child feel bad for getting something wrong.
How Luma accommodates — a little goes a long way
Small steps, one idea at a time
Break concepts into short, single-focus turns — a natural fit for WhatsApp and for shorter attention spans. No walls of text.
Learner-led pace, no rushing
The learner model already paces to mastery and reaches back to fill gaps — a learner can slow down or repeat, without judgment.
Choose the modality
Text, voice, diagrams, and short personalised videos — deliver a concept in the form that lands for that learner.
Focus-friendly & predictable
A calm, low-distraction interface, a clear "what's next", consistent routines, and optional gentle check-in reminders.
Encouraging, low-shame
Never shames a wrong answer; celebrates progress; eases the anxiety that so often surrounds learning for struggling students.
Remembered preferences
"Shorter answers", "more visuals", "extra encouragement" save to the learner's profile and apply every turn — set by a learner, parent or teacher.
Implementation: model these as accommodation settings on the learner profile (response length, modality preference, pacing, reminder cadence). They flow into the tutor agent through the same context providers as memory, and bias the coherence router toward smaller steps and more practice. This follows Universal Design for Learning — multiple means of engagement, representation, and expression — so the same levers serve every learner.
Handled with care: this is supportive design, not diagnosis or therapy. Luma complements — never replaces — teachers, specialists and caregivers, and is best co-designed with them and with neurodivergent learners themselves. The platform follows accessibility standards (e.g. WCAG). We deliberately keep claims modest: a little thoughtful accommodation, applied consistently, is what helps.
Platform UI · beyond WhatsApp
The own platform — a full learning workspace
WhatsApp is the wedge; the own platform is the depth. For what a chat thread can't do well — guided-learning books, practice sets, a progress dashboard, note-taking, visualisations — Luma reuses the open-source DeepTutor project as a head start, with Luma's learner model as the brain.
In plain terms: WhatsApp is perfect for "help me right now." A web/app platform adds what a chat can't show well — a dashboard of what you've mastered, interactive "living books" that teach a topic, practice quizzes, and somewhere to keep notes. Rather than build all of that from scratch, Luma starts from a mature open-source learning app (DeepTutor) and puts its own brain behind it.
Why reuse DeepTutor
Mature, and moving fast
20k+ GitHub stars; latest release v1.2.3 (Apr 2026); a recent agent-native rewrite (~200k lines). A living project, not an abandoned demo.
A whole workspace, already built
Chat, Deep Solve, Quiz, Deep Research, Visualize, Math Animator, an interactive Book Engine ("living books"), a multi-doc Co-Writer, a Knowledge Hub, and per-user memory.
Modern, hostable stack
Next.js 16 / React 19 front end, FastAPI backend, Docker images, full OpenAPI — the same kind of stack Luma already operates.
Apache-2.0 licence
Permissive — fork, rebrand and ship commercially with attribution. No copyleft trap.
Multi-user built in
Turn on auth and it becomes multi-tenant: per-user isolated workspaces, admin-curated knowledge bases and skills, an audit trail.
Saves roughly a year
Reusing this is about a year of front-end and learning-tooling the team doesn't have to build — freeing it to focus on the learner model and scale.
The reuse model — one brain, DeepTutor as the surface
The key tension to resolve
DeepTutor ships its own memory, RAG and teaching logic. Luma's learner model is the single source of truth. So Luma inverts the brain: keep DeepTutor's interface and study tools, but subordinate its pedagogy to the learner model, feed it Luma's curriculum, and store data in Luma's POPIA-compliant systems. Running both brains side by side would create two competing sources of truth — the exact trap to avoid.
DeepTutor provides the interface and rich study tools; Luma's learner model, grounding and memory stay the system of record behind an integration adapter. The CLI authors curriculum content headlessly.
How DeepTutor's parts map to Luma
DeepTutor
Role in Luma
Integration
Web app (Next.js 16)
The Luma platform UI shell
Fork + rebrand (theming, Skills/Souls)
Capabilities (Deep Solve, Deep Research, Visualize, Quiz, Book)
Study tools for web and WhatsApp
Called as tools via /plugins/capabilities/{name}/execute-stream or run --format json
Book Engine ("living books")
Luma "Guided Learning"
Fed from the canonical curriculum; compiled headlessly via the Book API / CLI
Knowledge Bases + RAG
Grounding content surface
Populated from Luma's canonical curriculum store
Memory (L2 / L3 API)
Subordinate to Luma memory + learner model
Fed / overridden via the memory API; learner model stays source of truth
Turn decision
Comes from Luma
An adapter/capability calls Luma's /next-move each turn
Local data/ storage
Replaced
Redirected to Luma's POPIA stores (Postgres / S3, in-region)
Where the DeepTutor CLI & Server API fit
The part that wasn't obvious: the CLI and its serve API are an integration and authoring boundary — not a learner-facing surface. They give Luma three concrete things:
1 · Engine for a custom front end
The unified turn WebSocket (/api/v1/ws) streams the same events the UI uses, and full OpenAPI generates typed clients — so Luma's own UI or backend can drive the engine directly.
2 · Capabilities as tools
Deep Solve, Deep Research, Visualize, Math Animator and Quiz can be invoked programmatically — so the WhatsApp tutor gains them too, not just the web platform.
3 · Headless content authoring
In the content pipeline, kb create/add and book compile Luma's approved curriculum into knowledge bases and Guided-Learning books automatically (dev-agents can drive it via its SKILL.md handover).
Fit by product surface
Strong fit
Luma School
DeepTutor's multi-user mode maps almost natively: a school is a tenant, the teacher is the admin (curating knowledge bases, skills and models), students are invited users with isolated workspaces. This is DeepTutor's design centre — class/school scale — and the fastest path to a real platform.
Reuse UI, Luma brain
Consumer platform (millions)
DeepTutor's per-user file/SQLite storage is built for classes, not millions of anonymous individuals. Here Luma reuses the interface and capabilities but keeps the learner model, memory and POPIA data on its own scalable core.
Registration & onboarding — capture school information
Design opportunity: because schooling is compulsory for school-age children in South Africa (broadly ages 7–15, up to Grade 9), onboarding can ask for school information at registration. That's more than a form field — it links a learner to Luma School and a teacher, strengthens safeguarding and guardian-consent flows, and enriches the learner model with school and grade context from the very first turn.
Links to School
School + class maps a learner into the right Luma School tenant and teacher dashboard — turning individual users into a connected classroom.
Safeguarding & consent
Knowing a child's school supports guardian consent, age-appropriate handling, and a route to a trusted adult.
Immediate personalisation
Grade, curriculum and school context sharpen the learner model from turn one, before any mastery evidence exists.
Collect proportionately under POPIA (data minimisation), with guardian consent, treating school details as personal information; make it easy but not a hard barrier to a free service. Compulsory-schooling specifics and any reporting duties should be confirmed with counsel — this is a product/design consideration, not legal advice.
Platform surfaces (information architecture)
Tutor
chat, in-language, voice — the same brain as WhatsApp
Guided Learning
interactive "living books" from the curriculum
Practice
mastery-targeted quizzes & question bank
Progress
mastery dashboard — learner, parent, teacher
Notes
Co-Writer & notebooks
Knowledge
curriculum & uploaded material
Visualize
diagrams, charts, math animations
Account
school, consent, language, guardian links
Risks & how they're held
Fast upstream drift (weekly releases): keep Luma customisations at the edges — auth/SSO, a memory adapter, curriculum sync, theming — and avoid deep core edits so upstream can be rebased.
Two brains: the memory + /next-move inversion must be deliberate, or DeepTutor's pedagogy competes with the learner model.
Tenancy scale: school-scale is native; the consumer platform needs Luma's stores behind DeepTutor's UI.
Data residency / POPIA: DeepTutor defaults to local data/ — redirect to in-region Luma stores.
Licence: Apache-2.0 — fork/rebrand/commercial use is fine with attribution. ✓
Phased approach
1
Fork & rebrand
Stand up DeepTutor as "Luma", themed, fed one real curriculum slice as a KB + Guided-Learning book. Internal only.
2
Invert the brain
Wire the memory + /next-move adapter; expose DeepTutor capabilities as tools to the WhatsApp tutor too.
3
Luma School pilot
Multi-user, per-school, curated by teachers, with school-linked onboarding — DeepTutor's native strength.
4
Consumer platform
DeepTutor UI on Luma's scalable stores; converge with the WhatsApp learner base.
Recommendation
Reuse DeepTutor as the platform's interface and study-tooling layer, and as a capability engine the WhatsApp tutor can also call — but keep Luma's learner model as the one brain and Luma's POPIA stores as the one system of record. Start with Luma School (where DeepTutor's multi-user design fits natively), use school-linked onboarding, and treat the CLI / Server API as the integration + content-authoring boundary, never the source of truth.
Sources (verified 2026-07-02): DeepTutor README (release v1.2.3, 2026-04-24; 20k+ stars; Next.js 16 / FastAPI; Apache-2.0), docs.deeptutor.info Server API (/api/v1/ws, OpenAPI, capability / memory / book endpoints) and Multi-User Deployment (JWT auth, per-user file/SQLite workspaces, admin grants, single-process caveat); github.com/HKUDS/DeepTutor.
Content & curriculum
The material Luma teaches from
Grounding is only as good as the content behind it. Luma has real curriculum material today — but in three incompatible shapes. v2 unifies them into one canonical, cited store that both the learner model and grounding read from.
CAPS knowledge (SA)
Official CAPS curriculum PDFs, parsed and indexed for retrieval (Maths, Physical Sciences, Business Studies live). Great for grounding, but documents — not structured skills.
Lesotho learning-outcome graph
Hand-verified structured outcomes (e.g. 40 Grade-8 Maths outcomes) — the shape the learner model wants. Piloted, quality-checked.
Legacy curriculum export
An earlier scraped course/unit/component format — mostly placeholder content.
The moat is authoring, not scraping
Turning raw curriculum into assessable knowledge components — with correct answers, misconception diagnostics and prerequisite links — is genuine R&D and the defensible asset. v2 builds an LLM-assisted authoring pipeline with human review: the model proposes drafts; educators approve them into the canonical store. Nothing reaches a learner unreviewed.
Target: one canonical curriculum store = the knowledge-component graph in Postgres + a pgvector/S3 Vectors chunk table (grounding and pedagogy share one source). Retirement path for the three legacy representations. Content tools already in hand: an OCR/parse pipeline for scanned material and a self-hosted media/notebook generator for study assets. Retrieval is KC-scoped, so answers stay on-topic and cite their source.
Data & privacy · POPIA
Handling children's data responsibly
Luma serves minors, so privacy is a first-order design constraint. Data lives in South Africa (AWS Cape Town), is encrypted, governed by consent, and — with edge delivery — can stay on-device entirely.
Per-user data — done at scale
"A storage bucket per user" is the instinct; at a million users it's an anti-pattern (S3 allows 10,000 buckets by default, up to 1,000,000 by request, with per-bucket fees past 2,000). Instead each learner gets an isolated prefix in shared storage, with per-prefix access control, tiered by how the data is used:
Structured, low-latency state stays hot; the per-user "data lake" and vector recall sit in cheaper tiers.
Privacy commitments
Consent & guardianship
Guardian relationships and consent modelled explicitly, with an append-only audit trail.
Residency & encryption
Data stays in-region; personal information encrypted; access isolated per learner.
Right to be forgotten
A learner's data deletes cleanly — storage prefix dropped, hot records purged.
Sovereignty via the edge
Offline school-boxes can keep a child's data on the device — it never leaves the classroom.
Compute & inference strategy · powered by NVIDIA
How we serve a free tutor to millions
Luma's cost is dominated by AI inference, not servers. "Free forever" therefore depends on driving cost-per-turn toward zero by running our own models on efficient NVIDIA infrastructure — reserving paid frontier models for only the hardest turns. NVIDIA Inception (confirmed) provides the credits, hardware and software to do exactly this.
The model router — a three-tier strategy
The Agent Framework's connectors (Bedrock, Anthropic, OpenAI, Ollama/NIM) make the tier a configuration choice — swapping in self-hosted models needs no change to product code. Quality is guarded by evaluation sets so the cheap model never degrades learning.
What NVIDIA Inception provides, mapped to how we use it
Personalised short explainer videos per learner; cached & safety-reviewed
Edge
Jetson AGX Thor (128 GB, 40–130 W); Orin Nano Super (~$249)
Offline "school-in-a-box" running a quantised tutor with no internet
Enablement
DLI training credits; solutions architect; AI Enterprise licence
Upskill the team; NVIDIA help standing up self-hosted inference
Voice-first, in-language
Learners already send voice notes. With Riva speech models, Intake transcribes a voice note (speech-to-text) and Render can reply with spoken audio (text-to-speech) in the learner's language — a major accessibility unlock for low-literacy learners and younger children, and directly on NVIDIA's "sovereign/edge AI" narrative.
Generative media — personalised video
With owned NVIDIA GPUs, Luma can generate short, custom explainer videos tailored to an individual learner — their worked example, at their level, in their language — for concepts that simply land better in motion. It reuses the same inference fabric the model router already manages, and complements DeepTutor's Math Animator and interactive "living-book" visuals.
Used deliberately: video generation is compute-heavy, so Luma generates it for high-value concepts, caches and reuses clips across learners where the example is shared, and routes every child-facing clip through safety and quality review. Personalised video is also a strong accessibility lever (see Accessibility & inclusion).
Edge & offline — the "school-in-a-box"
Where connectivity is poor, a Jetson device runs a quantised tutor + a slice of the learner model + the relevant curriculum entirely offline in a classroom. It serves turns locally, stores progress on-device (POPIA-clean — data never leaves), and syncs mastery + events back to the cloud when a connection returns.
Edge hardware — a two-tier device model
With hardware funding, Luma can build and ship its own classroom devices. Two complementary tiers keep unit cost low while extending all the way to full offline AI tutoring:
~$80
Tier 1 · Raspberry Pi 5 — content & hotspot node
An RPi 5 acts as a local Wi-Fi hotspot + web server, hosting courses, Guided-Learning books, past papers and the tutor's web UI. Learners connect their own phones with no internet and no data cost. Cheap enough to place in many classrooms.
$249–$3.5k
Tier 2 · Jetson — on-device AI brain
A Jetson Orin Nano or AGX Thor runs the quantised tutor model locally for full offline AI tutoring — pairing with the RPi 5 (Pi serves content + hotspot, Jetson serves the AI) or standing alone.
The combined "school-in-a-box" = RPi 5 (hotspot + hosted content) + Jetson (local inference). Courses and content are pushed to the Pi; the Jetson serves the tutor; progress stays on-device (POPIA-clean) and syncs to the cloud when a connection returns. The Pi tier alone (no Jetson) still delivers hosted courses + the tutor UI offline, with AI turns served on the next sync — a very low-cost rollout option.
Honest caveats on self-hosting
Ops burden: running model-serving (NIM/vLLM) and GPU fleets adds real operational work vs a pure API. Budgeted as an infrastructure workstream.
Quality guardrails: the router must be gated by evaluation sets so a cheaper self-hosted model never quietly worsens learning outcomes.
Framework note: Microsoft Agent Framework's turnkey durable hosting favours Azure; on AWS we implement our own state/checkpoint stores + use SQS — deliberate work, and it keeps us cloud-consistent.
Cost & unit economics
Why "free" is sustainable
The single question investors and partners ask about free AI education is "how do you afford it?" The answer is unit economics: measure every turn's cost, and drive the marginal cost of a turn toward zero by self-hosting.
~$0.0016
v1 cost per message today (frontier API, 2 calls/msg)
~$343k/mo
projected API-only floor at 1M active learners
→ near-zero
marginal cost target via self-hosted inference
Illustrative. Frontier-API cost per turn barely improves with volume; self-hosted inference amortises fixed GPU capacity across more turns, so cost per turn falls as Luma grows — the opposite of the usual scaling worry.
How credits fund the transition
AWS Activate (≤$100k) and Nebius (≤$150k) partner credits, plus DGX Cloud discounts, fund production inference while we stand up owned capacity — so the shift to self-hosting is paid for by the programme, not by burning runway.
Governance built in
v2 tracks cost per turn from day one (v1 tracks none), caches repeated work, and enforces per-turn cost caps in the router. A live cost dashboard sits beside the outcomes dashboard.
Figures: v1 per-message cost and the ~$343k/mo 1M-user projection are from the team's internal scalability & cost assessment (frontier model, ~2 uncached calls/message); the launch baseline is ~$1.60 per 1,000 messages with a $2.00 red line. NVIDIA credit ceilings are programme maxima, tier-dependent. The cost curve is illustrative, not a forecast.
Non-functional requirements
The qualities the architecture must guarantee
The hard part of a free AI tutor is the millionth learner, at a cost that keeps it free, without ever failing a child mid-lesson.
Scale — target 1M+ concurrent
Stateless compute behind queues; databases partitioned; no single-instance ceilings. (v1 today is effectively capped near one instance by in-memory defaults, ~25 concurrent generations, and three databases on one small instance.)
Reliability — no silent failures
Every turn checkpointed & resumable; inbound idempotency so duplicates can't double-reply; dead-letter queues catch the rest. (Directly fixes v1's duplicate-reply amplifier and its single-worker outage mode.)
Performance
The teaching decision returns in well under a second (learner-model p99 ≤ ~150 ms, excluding the LLM); the learner mostly waits only for the reply to stream.
Cost
Per-turn cost measured & governed; self-hosted-first routing (see Compute).
Safety
Child-appropriate behaviour, human-in-the-loop escalation, answers grounded in approved material.
Observability
Dashboards for cost, latency, reliability and — the point of it all — learning outcomes.
Infrastructure & topology
Where it runs
v2 runs on AWS in Cape Town, adds a GPU inference tier (owned + credited NVIDIA), and extends to the classroom edge — all reproducible as code.
Self-hosted model serving (NIM/vLLM) on owned RTX PRO 6000 Server nodes + DGX Cloud burst on credits.
Edge (reach)
Jetson "school-in-a-box" for offline classrooms; syncs when connectivity returns.
Reproducibility: v1's live AWS was built by hand in the console (Terraform archived, unexecuted). v2 adopts infrastructure-as-code (Terraform/Pulumi) from the start, so environments are reproducible and drift-free. Deployment is via CI/CD with staged rollout and documented rollback. WhatsApp sending moves to multiple numbers to clear the single-number rate ceiling.
NVIDIA Inception — the strategic infrastructure partner
Confirmed membership provides the credits, discounted developer & server GPUs, model-serving/speech software, and edge kits that make owned inference and offline reach affordable. In short: it converts "serve millions for free" from a funding problem into an engineering one.
Delivery roadmap
From proven product to national platform
v2 is built beside the live product and converges only after it out-performs on a pilot. Every phase boundary is a safe stopping point. NVIDIA assets are layered in as each phase needs them.
Phase A
Prove the loop (internal)
v2 core: tutor workflow, learner-model runtime, grounding over a small slice of real, human-authored curriculum; self-hosted inference on dev hardware. Exit: one test learner completes a grounded, remembered, multi-session journey.
Phase B
Pilot cohort (new learners)
Bounded new-learner/school pilot on v2, full cost + outcome instrumentation, voice enabled. Exit: v2 meets go/no-go thresholds on accuracy, latency, cost and learning gain.
Phase C
Shadow existing learners
Silently run live v1 turns through v2 and compare — zero learner impact; self-hosted models carry the majority of turns. Exit: v2 ≥ v1 on agreed metrics.
Phase D
Migrate, converge & reach the edge
Move learners in waves with rollback windows; retire v1; deploy offline school-boxes; begin mastery-evolution on real data. Data migrates on rails via the event backbone.
How the build parallelises (workstreams)
Work splits along clean seams so many engineers — and AI dev-agents under senior review — build at once without colliding, provided interfaces are frozen first and enforced by automated contract tests.
Workstream
Owns
Core & turn workflow
The tutor workflow and agents
Learner-model runtime
Mastery state, coherence router, API
Memory & sessions
Three-tier memory + context providers
Grounding & curriculum
Canonical curriculum store + retrieval
Content authoring
LLM-assisted, human-reviewed pipeline
Inference & cost
Model router, NIM/vLLM serving, cost accounting
Infrastructure
Stores, event wiring, IaC, GPU tier, edge
Channels
WhatsApp/web/voice adapters + cohort routing
Data & migration
v1→v2 ETL + parity harness
Planning & decisions
What to decide, in what order, and who owns it
The working surface for planning: the open decisions, a rough timeline and critical path, the first sprint, and the roles to hire. Owners marked TBD are for the team to assign.
Decisions register
Decision
Recommendation
Status · owner
Core stack / learner-model runtime
.NET 10 + MAF; learner model = runtime source of truth
Decided
Rebuild approach
Greenfield parallel platform; converge later
Decided
Self-hosted model family
Standardise on one open-weight family (8B–70B)
Open · TBD
Second WhatsApp number for v2
Second number for the pilot (isolation + headroom)
Open · TBD
Reuse vs rebuild v1 account-api
Reuse accounts/onboarding as shared services
Open · TBD
Pilot surface
WhatsApp-first for the pilot cohort
Open · TBD
Edge (RPi 5 / Jetson) timing
Phase D, once cloud is proven
Open · TBD
DeepTutor reuse depth
School-first; reuse UI + capabilities, invert the brain
Open · TBD
Doc hosting
Internal portal + separate public/investor brief
In progress
Timeline & critical path (rough)
Critical path: freeze contracts → infra + stores → core turn loop → real curriculum slice → pilot cohort → shadow → migrate. Indicative windows (to set with the team):
Phase
Window
Milestone
A — core loop
Q3 2026
One learner: grounded, remembered, multi-session journey
B — pilot cohort
Q4 2026
Bounded pilot on v2 meets go/no-go
C — shadow
Q1 2027
v2 ≥ v1 on agreed metrics
D — migrate + edge
2027
Waves migrated; offline boxes; evolution
Sprint 0 — the first 2–4 weeks
De-risk & freeze
Verify prod topology — is v1 silently on in-memory defaults?
Freeze the v0 contracts (turn, memory, retrieval) + contract tests.
Edge is a Phase-D deliverable on proven cloud foundations; sync designed for eventual consistency
Many parallel builders diverge
Freeze interfaces first; contract tests as the merge gate; senior review in the loop
Over-reliance on programme credits
Credits accelerate, not sustain; owned-inference break-even and white-label revenue underpin the model
Team & ways of working
How this gets built
Luma builds with a reviewed, AI-assisted engineering process: many development agents working in parallel, with senior engineers reviewing and approving every change.
Contracts first
Interfaces between components are agreed and frozen before parallel work starts — what lets a large, partly-automated team converge instead of thrash.
Tests as the gate
Every change ships with tests; contract tests and parity checks against the existing learner-model act as automated merge gates.
Senior humans in the loop
Product owners, architects and team leads review this documentation and approve direction; senior engineers review code before it lands.
Living documentation
This portal is the shared reference for technical and non-technical stakeholders, kept current as decisions are made.
Open decisions still needing a human call
A second WhatsApp number for v2, or route by cohort on the existing number.
Which open-weight model family to standardise on for self-hosting (drives GPU sizing).
How much of v1's account services to reuse vs rebuild.
Whether the pilot cohort is WhatsApp-first or launches on the new web/voice surface.
Edge timeline — commit Jetson school-boxes in Phase C or hold to Phase D.
Glossary & sources
Plain-language glossary
Agent — an AI unit that reads input, uses tools, and writes a response. In Luma, agents explain; they don't decide the lesson plan.
Workflow — a defined series of steps (one tutoring turn) that runs reliably and can resume if interrupted.
Learner model — Luma's structured picture of what a learner knows; the source of truth for teaching.
Knowledge component — one small teachable idea (e.g. "squaring a number"), linked to its prerequisites.
Mastery vector — three scores per idea: can do it, understands it, can apply it.
Misconception — a specific wrong idea Luma detects and corrects, distinct from not knowing yet.
Grounding — pulling approved, cited curriculum into an answer so it's correct, not invented.
Model router — the switch that sends easy turns to a cheap self-hosted model and hard turns to a premium one.
NIM / NeMo / Riva — NVIDIA software for serving models (NIM), fine-tuning (NeMo) and speech (Riva).
Jetson — low-power NVIDIA hardware that runs AI locally — the offline "school-in-a-box".
POPIA — South Africa's data-protection law; governs handling of learners' (especially children's) data.
CAPS — South Africa's national school curriculum, which Luma is aligned to.
Source documents
This portal synthesises the following working documents (companion artifacts in this workspace):
v1 system review & build plan — survey of the existing platform, scale/cost ceilings.
v2 architecture, interface contracts, and migration/convergence plan (greenfield .NET/MAF).
Traction figures (250k+ learners, 8M+ interactions, 86 countries, UN/TIME), cost figures, and the NVIDIA relationship are as stated by the Luma team / internal assessments. Cost curve is illustrative. Hardware prices are indicative 2026 figures and move with supply.
Luma v2 — Enterprise Platform Documentation. Prepared for product, architecture and delivery review, and for investor/partner readability. Status legend: LiveIn buildPlanned — kept current as the build proceeds.
Deep dive
The tutoring turn loop
One MAF workflow runs every turn. Steps 2 & 3 run in parallel; step 4 (the learner model) decides the move, step 5 (the agent) only renders it. Click a step.
Click a step to see what it does and what it calls.
Microsoft Agent Framework
Agents render; graph workflows orchestrate the turn — typed, checkpointed and resumable (a crash doesn't re-spend tokens).
Stateless compute
The AgentSession + workflow checkpoints live in DynamoDB, so any worker can pick up any turn — horizontal scale.
One brain
Pedagogy is decided at step 4 by the learner model; the agent never overrides it.
Event backbone
SQS/SNS between services; every inbound message is idempotency-keyed so a duplicate can't double-reply.
Curriculum and learner state live in six Postgres schemas, composed deterministically. Click one.
Click a schema to see what it holds.
POPIA from the schema up
PII lives only on the guardian record, encrypted (AES-GCM-256); the learner carries a display name only; right-to-erasure is a tested one-command operation.
Per-user, at scale
Not a bucket per user — an isolated S3 prefix per learner + tiered vectors (hot pgvector, cold S3 Vectors).
Residency
In-region (AWS af-south-1); offline edge boxes can keep a child's data on the device.
Deterministic & auditable
Same CAPS input + parser version ⇒ identical IDs and projections — fully replayable from the event log.
Source: Learner Model Canonical Spec v1.1 §2–§7; v2 data design (this bundle).
Deep dive
Platform UI & DeepTutor
Reuse the open-source DeepTutor project as the platform surface — but keep Luma's learner model as the one brain. Click a part.
Click a DeepTutor part to see how it maps into Luma.
Luma School — native fit
DeepTutor's multi-user mode maps almost 1:1: school = tenant, teacher = admin, students = invited users with isolated workspaces.
Consumer platform
Reuse the interface + capabilities, but keep the learner model, memory and POPIA data on Luma's scalable core (DeepTutor storage is class-scale).
Invert the brain
DeepTutor's own memory/RAG is subordinated; the learner model stays the source of truth (no two brains).
Apache-2.0
Fork, rebrand and ship commercially with attribution — ~a year of frontend saved.
Curriculum is ingested from official CAPS material, never invented (Invariant 1). Click a stage.
Click a stage to see what happens there.
Authoring is the moat
Assessable items with graduated hints + dimension_signal — LLM-drafted at authoring time, human-reviewed. Runtime delivery is free and auditable.
One canonical store
The KC graph in Postgres + pgvector/S3 Vectors for grounding — retiring the three legacy curriculum formats.
Durable across revisions
Conflict detection re-points overlays when CAPS moves a target (like a git rebase) — CAPS-as-Spine stays stable.
Grounded answers
Retrieval is KC-scoped, so answers stay on-topic and cite their CAPS source.
Source: Learner Model Canonical Spec v1.1 §3 (CAPS-as-Spine ingestion).
Deep dive
Accessibility & inclusion
Universal Design for Learning — the same levers that help neurodivergent learners help everyone. Click a principle.
Click a principle to see how Luma accommodates.
Accommodation settings
Response length, modality, pacing and reminder cadence save to the learner profile and flow into the tutor + router — set by learner, parent or teacher.
Handled with care
Supportive design, not diagnosis or therapy — it complements teachers, specialists and caregivers, and follows accessibility standards (WCAG).
Source: this bundle's Accessibility & inclusion section; Universal Design for Learning.
Deep dive
The learner model
A per-child, multi-dimensional model of what a learner knows — the system of record that decides what to teach next. Faithful to the canonical spec (v1.1). Click a mastery bar or a teaching move below to explore.
This learner can do the procedure (0.72) but doesn't yet understand it (0.41) or apply it (0.18) — so drilling alone would miss the point. Click a bar or a move to see how the model reasons.
The five moves the router can choose
How mastery updates (the maths, simplified)
Each answer nudges only the dimensions the item tests, weighted by how much we trust the source and how well the difficulty matches the learner:
α ≈ 0.15. A knowledge component is mastered only when every relevant dimension ≥ its threshold (default 0.85; a procedural-only skill authors {0.85, 0, 0}). Cross-dimension movement happens only through the item's dimension_signal — a pure drill {1,0,0} moves procedural and nothing else.
Evidence & trust
In-tutor answers count fully (trust 1.0); a parent's confirmation nudges gently (0.4) and can never fast-forward past the model's own observation. Suspected gaming (rushing, answer-fishing, hint-exhaustion) halves the weight — the event is still logged honestly.
Misconceptions
Tracked as their own strength 0–1. A matching diagnostic answer raises it; a correct answer lowers it faster. Crossing 0.6 flags it active (the tutor addresses it); falling below 0.2 marks it remediated.
Reach-back pacing
The learner has an expected position (from the term plan) and an active one. If a prerequisite is missing, the active position points back to fill it, then resumes — a learner can "move on with the class" while a gap is quietly repaired underneath.
Curriculum: ingested, not authored
The knowledge-component graph, prerequisites and named misconceptions are derived deterministically from official CAPS material — Luma never paraphrases or invents curriculum. Human authoring survives only as reviewed overlays and assessment questions.
The eight non-negotiable invariants
Ingest CAPS; never author it — every derived row traces to a CAPS source + version.
Deterministic end to end — same input + version ⇒ identical graph, IDs and projections.
Mastery is a 3-D vector {procedural, conceptual, application}, never a scalar.
Misconceptions are first-class — actively-applied wrong rules, not just "low mastery".
Evidence is multi-source and trust-weighted (in-tutor 1.0, parent 0.4).
Multilingual on day one — content carries language; the learner carries current + home language.
POPIA from the schema up — PII only on the guardian record, encrypted; one-command erasure.
Runtime-agnostic system of record — owns curriculum, mastery, misconceptions and routing; the tutor calls it over HTTP.
How it's built (engineering)
.NET 10, Clean Architecture. Learner state is event-sourced projections (proj.LearnerKCMastery, proj.LearnerMisconceptionState) over an append-only log (session.initialized, ai.answered, misconception.detected, reach_back.initiated, coherence.measured…). Curriculum composes as caps_derived → overlay → app. In v2 this runs behind Luma's turn loop; the spec's SpacetimeDB live-cache and Apache-AGE graph are deferred in favour of Postgres, keeping the pedagogy identical.
Source: DigitalRoot Learner Model — Canonical Specification v1.1 (2026-06-02, amended 2026-06-10). Mastery values shown are illustrative.