NestleAI
AI-Powered Parenting Platform
AI-powered parenting platform focused on intelligent family support systems and AI-assisted user experiences. Built production mobile applications and backend infrastructure supporting AI-powered workflows, notifications, analytics, user tracking, and cloud-based content systems.
Tech Stack
System Focus
- ·Multi-tenant SaaS architecture
- ·Real-time AI inference pipeline
- ·Mobile-first offline-capable design
- ·Event-driven microservices
Security
- ·JWT + refresh token rotation
- ·End-to-end encryption for sensitive data
- ·Rate limiting & abuse prevention
- ·HIPAA-aware data handling
- ·Row-level security in PostgreSQL
Performance
Executive Summary
NestleAI is an AI-powered parenting companion mobile application. The backend system serves as the intelligence layer for the entire product — powering real-time AI conversations, adaptive sleep and schedule recommendations, video memory generation, co-parent coordination, emergency support mode, and a rich multimodal content platform.
The backend is a single-process Node.js/TypeScript monolith deployed via containerized infrastructure, intentionally chosen for early-stage velocity while maintaining enough modularity to extract microservices where needed. It handles over 30 distinct API feature domains, 15+ background worker queues, and a deeply layered AI pipeline that routes each user interaction through safety guardrails, personality selection, context enrichment, and model selection before producing a response.
The system is designed around three core principles:
- Safety First. Any surface that touches infant care must pass safety validation before a response reaches the user. A centralized safety doctrine validator, age-gated guardrails, and medical image detection were built before any feature work.
- Graceful Degradation. Every AI-dependent feature has a fallback. Circuit breakers protect TTS and chat AI. Workers retry with exponential backoff. The app continues functioning when any third-party dependency fails.
- Privacy as Infrastructure. Telemetry is categorized into three tiers with global kill switches. User content never leaks into analytics events. Privacy enforcement happens at the middleware layer, not the application layer.
High-Level Architecture
NestleAI's backend is a modular Express.js monolith running on Node.js with TypeScript, deployed as a single container that also hosts all background workers in-process.

The architecture is organized in strict layers: Route Handlers → Middleware Chain → Service Layer → Repository Layer → PostgreSQL. Redis backs the BullMQ worker queue system and all circuit breaker state. External services — OpenAI, Cloudflare R2, Clerk, RevenueCat, PostHog, Sentry, Expo Push, and YouTube Data API — are all isolated behind service wrappers with graceful fallback behavior.
External service integrations:
| Service | Purpose |
|---|---|
| OpenAI API | Chat, Vision, TTS, Sleep Whisperer, Schedule Generation |
| Cloudflare R2 | Object storage — images, videos, documents, TTS audio |
| Clerk | Authentication and user management |
| RevenueCat | Subscription billing |
| PostHog | Product analytics |
| Sentry | Error tracking |
| Expo Push | Mobile push notifications |
| YouTube Data API | Content syndication |
| Puppeteer/Chromium | PDF export rendering |
Technology Stack
| Category | Technology | Rationale |
|---|---|---|
| Runtime | Node.js + TypeScript | Type safety across the entire stack |
| Framework | Express.js | Battle-tested, minimal, composable middleware |
| Database | PostgreSQL via pg (raw SQL) | Full control, no ORM abstraction overhead |
| Queue System | BullMQ + Redis (ioredis) | Reliable job processing with visibility |
| AI Provider | OpenAI (GPT-4o, GPT-4o-mini, GPT-4.5) | Best-in-class multimodal capabilities |
| Object Storage | Cloudflare R2 | S3-compatible, zero egress cost |
| Auth | Clerk (@clerk/backend) | Managed auth with webhook support |
| Billing | RevenueCat | Cross-platform subscription management |
| Analytics | PostHog | Self-hostable product analytics |
| Error Tracking | Sentry | Full exception capture with context |
| Push Notifications | Expo Push | Cross-platform iOS/Android |
| Schema Validation | Zod v4 | Runtime type validation at boundaries |
| PDF Export | Puppeteer (Chromium) | Headless rendering for schedule exports |
| Testing | Vitest | Fast ESM-compatible test runner |
AI Personality Engine
The Personality Engine is the single most impactful piece of novel engineering in the backend. It is the brain that shapes every AI response the user receives.
NestleAI ships 10 distinct personality modes, spanning free and premium tiers — from a calming Nurturing Therapist to a detail-obsessed Pediatric Expert to a Sleep Whisperer with its own dedicated model and timeout configuration.
Each personality has versioned prompts stored in ai_prompt_versions, with a draft/published workflow managed through the Admin Prompt Studio. Prompt engineers iterate on personality voice without touching code or triggering a deployment.
Personality Resolution Pipeline:
- Resolve
personalityIdfrom user settings - Detect sentiment from the message (
stressed / worried / cheerful / neutral) - Determine context domain (
sleep / feeding / emergency / general) - Fetch the published versioned prompt from the database (falls back to hardcoded defaults)
- Assemble full system prompt: personality voice + response budget rules + formatting rules + context addons + safety guardrails + baby context + time-of-day modifiers
- Select model based on tier and whether the message contains an image
- Call OpenAI with the assembled prompt and the message history window
Response Budget Rules:
- Default: 2 short paragraphs, max 90 words
- If
stressedorworried: 1–2 paragraphs, max 70 words - If
late_night: 1 paragraph, max 55 words - Explicit requests for steps/plans → cap at 160 words
Model Selection:
| Tier | Has Image | Model |
|---|---|---|
| Free | No | gpt-4o-mini |
| Free | Yes | gpt-4o-mini (vision) |
| Premium | No | Configurable (env-driven) |
| Premium | Yes | gpt-4o (full vision) |
| Sleep Whisperer | — | Dedicated OPENAI_SLEEP_MODEL |
All model configs are environment-variable-driven — temperature, max tokens, and model name can all be tuned without a deployment.
AI Chat Pipeline
The chat pipeline is stateful, context-aware, and protected by multiple independently deployed safety layers. Every message flows through safety validation before it reaches the user.

Context Enrichment (assembled in parallel on every message):
| Source | Service |
|---|---|
| Sleep context | buildSleepContext() + formatEnhancedSleepContextForPrompt() |
| Feeding context | buildFeedingContext() + formatEnhancedFeedingContextForPrompt() |
| Journey context | formatJourneyContextForPrompt() |
| Emotion context | buildEmotionContext() + formatEmotionContextForPrompt() |
| Development context | buildDevelopmentContext() + formatDevelopmentContextForPrompt() |
| Rhythm/EASY doctrine | formatRhythmDoctrineForPrompt() |
Safety Guards Applied Post-Generation:
- Medical Image Detection — Intercepts vision calls before processing to block inappropriate diagnostic image requests
- Sleep Guardrails — Post-generation validator checks for unsafe sleep surface advice, stomach-sleeping recommendations, loose bedding, or bed-sharing as a primary recommendation
- Safety Doctrine Validator — Central
safety-validator.service.tsruns against all responses, domain-aware (sleep / feeding / general / illness) - Feeding Guardrails — Analogous to sleep, intercepts dangerous feeding advice (honey before 12 months, choking hazards, etc.)
Circuit Breaker (Redis-backed):
- Opens at 50% failure rate over a rolling 1-minute window (min 10 requests)
- 5-minute cooldown before half-open retry
- State persisted in Redis — survives server restarts, preventing thundering herd on recovery
Security Architecture
Security is enforced at every layer — from network ingress to database queries.

Network & Transport Layer:
- Helmet.js sets security-relevant HTTP headers (CSP, HSTS, X-Frame-Options)
enforceHttpsmiddleware redirects all HTTP traffic in production- CORS restricted via
CORS_ORIGINenv variable
Authentication Layer:
- Clerk JWT validated on every request via JWKS endpoint
- No session tokens in the application — Clerk owns the auth token lifecycle
- Admin routes protected by a separate session mechanism with a minimum 64-character session secret
Application Layer:
- Zod v4 validates all request inputs at the controller boundary before business logic executes
- All SQL uses parameterized inputs — no string interpolation, zero injection risk at the repository layer
- Baby data access controlled through
baby_membershipswith roles:OWNER / COPARENT / CAREGIVER / VIEW_ONLY - Per-user rate limits on AI endpoints (Free: 60/hour, Premium: 1,000/hour)
- Brute-force protection middleware on auth-sensitive endpoints
Telemetry Tier Model:
All analytics events are classified into three tiers with user-controllable toggles for Tiers B and C:
| Tier | What it tracks | User control |
|---|---|---|
| A — Essential | Account creation, subscription changes, security events | Always sent |
| B — Diagnostic | API error rates, worker failures, performance signals | Respects send_crash_reports |
| C — Product | Feature usage, funnel progression, content engagement | Respects share_anonymous_analytics |
Global kill switches (TELEMETRY_ENABLED, PRODUCT_ANALYTICS_ENABLED, CRASH_REPORTING_ENABLED) allow instant compliance response without a deployment.
Background Jobs & Worker Queue Architecture
All asynchronous work runs through BullMQ backed by Redis. Auto-retry is set to 3 attempts with exponential backoff (starting at 5 seconds). Failed jobs are retained for 7 days; completed jobs for 24 hours.

The system runs 25+ distinct workers spanning every major domain:
| Worker | Trigger | Interval |
|---|---|---|
scheduleGenerationWorker | BullMQ (API) | On demand |
scheduleDailyHydrationWorker | BullMQ (nightly cron) | 24 hours |
schedulePdfExportWorker | BullMQ | On demand |
sleepAlertsWorker | setInterval | Every 2 min |
sleepPlanAlertsWorker | setInterval | Every 5 min |
sleepLearningScheduler | setInterval (daily 3 AM) | 24 hours |
appointmentRemindersWorker | setInterval | Every 15 min |
documentScanWorker | BullMQ | On demand |
emergencyFollowupWorker | setInterval | Every 5 min |
ttsCleanupScheduler | setInterval (daily 3 AM) | 24 hours |
youtubeSyncWorker | setInterval (daily 3 AM) | 24 hours |
coParentContextWorker | BullMQ (always-on) | Event-driven |
analysisWorker (video) | BullMQ | On demand |
storyboardWorker (video) | BullMQ | On demand |
renderWorker (video) | BullMQ | On demand |
Sleep Intelligence Engine
The Sleep Intelligence Engine transforms raw session logs into a full analytics platform with AI-driven predictions and proactive alerts. It is one of the most sophisticated subsystems in the backend.
Capabilities:
| Capability | Description |
|---|---|
| Daily Metrics | Total sleep, nap count, night wakings, sleep debt, wake windows |
| Weekly Metrics | Trends, consistency scores, week-over-week comparisons |
| Regression Detection | 4-month, 8-month, separation anxiety, illness regressions, and more |
| Predictions | Next nap time, expected bedtime, overtired alert threshold |
| Crisis Detection | Sleep debt accumulation, night waking spikes, split-night episodes |
| AI Context Builder | Packages all sleep metrics into a structured context blob for Sleep Whisperer |
Crisis Thresholds:
| Metric | Moderate | Severe |
|---|---|---|
| Sleep Debt | 60 min | 120 min |
| Night Wakings (above age baseline) | 5 | 8 |
| Max Wake Window | 4 hours | 6 hours |
| Short Nap Cluster | 3+ naps under 45 min | — |
A nightly learning loop (sleep-learning.service.ts) analyzes historical patterns across all active babies to refine individual predictions — the foundation for a future personalized ML layer.
Emergency Mode
Emergency Mode is the most safety-critical feature in the product. It is activated when a caregiver is in genuine distress and needs immediate support.
Non-negotiable design principles (hardcoded):
- Never provide medical advice, diagnosis, or treatment instructions
- Always encourage contacting emergency services for urgent situations (911, Poison Control)
- Detect and flag potentially dangerous requests before AI response generation
- Maintain a calm, supportive, non-alarming tone at all times
- Focus exclusively on emotional regulation and practical next steps
The Emergency AI uses a completely separate OpenAI client instance isolated from the main chat client. If OpenAI is unavailable entirely, a curated library of pre-written fallback responses is always served. Safety flag detection runs before the AI call — flagged requests get immediate pre-written guidance without waiting for the model.
A followup worker (every 5 minutes) monitors active sessions and sends proactive check-in push notifications. Sessions auto-expire after 60 minutes (configurable). Emergency session data is retained for 90 days as a configurable compliance knob.
Key Design Decisions
Monolith First, Modular Always
Starting with a single-process Express monolith — instead of microservices — was a deliberate early-stage decision. Worker processes collocated with the API server eliminate distributed system coordination overhead while module boundaries (controllers → services → repositories) keep future extraction tractable.
BullMQ workers are logically independent. Each could be extracted into a separate process or container with minimal refactoring, meaning the monolith is a deployment decision, not an architectural trap.
Raw SQL Over ORM
Direct pg pool queries with explicit mapping functions, rather than Prisma, Drizzle, or TypeORM. Full control over query construction, explicit EXPLAIN ANALYZE tuning, and zero abstraction overhead for complex joins. All queries use parameterized inputs — no string interpolation, eliminating SQL injection risk at the repository layer.
Versioned Prompts in Database
AI personality prompts live in ai_prompt_versions with a draft/published workflow managed through the Admin Prompt Studio — not in code. Prompt engineering iteration (hours/days) runs far faster than deployment cycle. Non-engineers can iterate on personality voice. Version history enables rollback without a deployment. A/B testing is future-proofed by an environment column.
Centralized Safety Doctrine
A single safety-validator.service.ts is the audit point for all AI response validation. Domain-specific validators (sleep, feeding, emergency, general) compose it. Safety logic is never duplicated. A single place to update policy means every feature that generates content automatically inherits any new rule.
Redis-Backed Circuit Breakers for Cost-Critical Services
TTS and AI chat circuit breakers persist state to Redis. In-memory circuit breaker state is lost on server restart, creating a thundering herd against a degraded provider on recovery. Redis persistence means the breaker stays open across restarts until the provider actually recovers. Without this, a brief OpenAI TTS outage could multiply costs by orders of magnitude through automatic retries.
Engineering Challenges
AI Safety for Infant Care
The boundary between "helpful parenting advice" and "medical advice" is contextual — you cannot write a static rule set that covers every case. The resolution was layered defenses: pre-generation keyword detection, post-generation regex/semantic checks, age-aware guardrails, a fallback response library, and emergency mode completely isolated from the regular AI client with hardcoded system prompt constraints.
Chat Context Assembly at Scale
A meaningful AI chat response needs the baby's current sleep state, recent feeding patterns, active journeys, emotional context, parenting style, and more. All context sources are fetched in parallel (Promise.all). Behavior summaries are pre-computed every 6 hours so AI doesn't need raw event data at request time. Chat history is passed as a truncated window — not the full thread.
TTS Cost Control
Without a Redis-backed circuit breaker, a brief OpenAI TTS outage could result in hundreds of retry attempts hitting a degraded API, massively multiplying costs. The Redis-persistence decision paid for itself immediately. Per-account rate limits (10 requests/minute, 100 requests/day), a 4,000 character cap, and a daily cleanup job removing stale audio files from R2 complete the cost control surface.
Emergency Mode Reliability Guarantee
Emergency Mode was designed as if the main AI pipeline didn't exist. The dependency inversion forced clear thinking about which parts of the system must be resilient to failure. A parent in crisis at 3 AM cannot receive an error screen. The emergency system has a pre-written fallback response library, a separate OpenAI client, and an independent followup worker — none of which depend on the main chat stack.
Schedule Hydration at Scale
Every active baby needs their schedule pre-hydrated every night. The fan-out pattern — one coordinator triggering one BullMQ job per baby — makes each hydration independent and retryable. Failure for one baby doesn't affect others. The scheduleReconciliationWorker catches any drift between template and daily instance.
Lessons Learned
Invest in the safety layer before feature work. For any product in a high-stakes domain (healthcare, childcare, mental health), safety infrastructure is not a feature — it is the foundation. Every subsequent feature that generates AI content hooks into existing safety infrastructure rather than inventing its own.
Prompt engineering is a product discipline. The most impactful changes to user experience quality came not from model changes but from prompt refinements. The response budget rules — 90 words by default, 55 at night, 70 when stressed — transformed user perception of response quality by eliminating verbosity that caused users to stop reading.
Circuit breakers need Redis from day one for cost-critical services. In-memory is fine for rate limiting. It is not fine for services where failure causes financial harm.
Feature flags as infrastructure. Adding FEATURE_* env-validated flags from the beginning allowed merging incomplete features without impacting users. The discipline of never shipping untested features without a flag gate dramatically improved stability.
Graceful degradation must be designed in, not bolted on. The pattern of "try AI, fall back to a safe default" appears in emergency mode, TTS, sleep guardrails, and the chat circuit breaker. In every case, the fallback path was a first-class engineering task. For a product parents rely on at 3 AM with a sick infant, "error screen" is not an acceptable outcome.
Admin tooling is an engineering multiplier. Every bug debuggable in the admin panel was a bug that didn't need a developer. Every prompt change made by a product manager through the Prompt Studio was a developer not deployed for a prompt update.
What We Would Do Differently
Distributed rate limiting from day one. The current in-memory rate limiter is explicitly marked as "not cluster-safe." Redis-backed rate limiting (sliding window with sorted sets) should be in place before any horizontal scaling attempt.
Structured logging. A structured logging library (Pino or Winston) with consistent log levels, request correlation IDs, and structured fields would make production debugging significantly faster. Sentry captures exceptions, but non-error observability is sparse.
Cron as a service. The setInterval crons in index.ts are vulnerable to double-execution in a multi-instance deployment and have no visibility. A dedicated cron service (pg-boss, Inngest, or a serverless scheduler) would be safer and more observable.
Request/response contracts. The API surface spans 30+ domains with no machine-readable contract. Generating an OpenAPI spec from Zod schemas would enable automatic SDK generation, contract testing, and documentation without additional manual work.
Context caching for AI calls. Per-request context assembly makes multiple database reads on every AI chat message. A short-TTL per-baby context cache would reduce database load significantly at scale without meaningfully affecting response quality.
Event sourcing for the timeline. The timeline reconstructs a chronological feed from multiple source tables. An event log pattern — all domain events written to a unified timeline_events table — would make timeline queries a single indexed read instead of a fan-out across N tables.