Joshua Damon
FlagshipActive2024

NestleAI

AI-Powered Parenting Platform

AI-powered parenting platform focused on intelligent family support systems and AI-assisted user experiences. Built production mobile applications and backend infrastructure supporting AI-powered workflows, notifications, analytics, user tracking, and cloud-based content systems.

Tech Stack

React NativeNestJSPostgreSQLRedisOpenAI GPT-4oWebSocketsS3Docker

System Focus

  • ·Multi-tenant SaaS architecture
  • ·Real-time AI inference pipeline
  • ·Mobile-first offline-capable design
  • ·Event-driven microservices

Security

  • ·JWT + refresh token rotation
  • ·End-to-end encryption for sensitive data
  • ·Rate limiting & abuse prevention
  • ·HIPAA-aware data handling
  • ·Row-level security in PostgreSQL

Performance

<180msP95 Latency
99.97%Uptime
<1.2sAI Response
Engineering Case Study
Internal Document

Executive Summary

NestleAI is an AI-powered parenting companion mobile application. The backend system serves as the intelligence layer for the entire product — powering real-time AI conversations, adaptive sleep and schedule recommendations, video memory generation, co-parent coordination, emergency support mode, and a rich multimodal content platform.

The backend is a single-process Node.js/TypeScript monolith deployed via containerized infrastructure, intentionally chosen for early-stage velocity while maintaining enough modularity to extract microservices where needed. It handles over 30 distinct API feature domains, 15+ background worker queues, and a deeply layered AI pipeline that routes each user interaction through safety guardrails, personality selection, context enrichment, and model selection before producing a response.

The system is designed around three core principles:

  • Safety First. Any surface that touches infant care must pass safety validation before a response reaches the user. A centralized safety doctrine validator, age-gated guardrails, and medical image detection were built before any feature work.
  • Graceful Degradation. Every AI-dependent feature has a fallback. Circuit breakers protect TTS and chat AI. Workers retry with exponential backoff. The app continues functioning when any third-party dependency fails.
  • Privacy as Infrastructure. Telemetry is categorized into three tiers with global kill switches. User content never leaks into analytics events. Privacy enforcement happens at the middleware layer, not the application layer.

High-Level Architecture

NestleAI's backend is a modular Express.js monolith running on Node.js with TypeScript, deployed as a single container that also hosts all background workers in-process.

High-level system architecture diagram showing the full request path from mobile app through Express, middleware, services, repositories, PostgreSQL, Redis, BullMQ workers, and all external services
High-level system architecture diagram showing the full request path from mobile app through Express, middleware, services, repositories, PostgreSQL, Redis, BullMQ workers, and all external services

The architecture is organized in strict layers: Route Handlers → Middleware Chain → Service Layer → Repository Layer → PostgreSQL. Redis backs the BullMQ worker queue system and all circuit breaker state. External services — OpenAI, Cloudflare R2, Clerk, RevenueCat, PostHog, Sentry, Expo Push, and YouTube Data API — are all isolated behind service wrappers with graceful fallback behavior.

External service integrations:

ServicePurpose
OpenAI APIChat, Vision, TTS, Sleep Whisperer, Schedule Generation
Cloudflare R2Object storage — images, videos, documents, TTS audio
ClerkAuthentication and user management
RevenueCatSubscription billing
PostHogProduct analytics
SentryError tracking
Expo PushMobile push notifications
YouTube Data APIContent syndication
Puppeteer/ChromiumPDF export rendering

Technology Stack

CategoryTechnologyRationale
RuntimeNode.js + TypeScriptType safety across the entire stack
FrameworkExpress.jsBattle-tested, minimal, composable middleware
DatabasePostgreSQL via pg (raw SQL)Full control, no ORM abstraction overhead
Queue SystemBullMQ + Redis (ioredis)Reliable job processing with visibility
AI ProviderOpenAI (GPT-4o, GPT-4o-mini, GPT-4.5)Best-in-class multimodal capabilities
Object StorageCloudflare R2S3-compatible, zero egress cost
AuthClerk (@clerk/backend)Managed auth with webhook support
BillingRevenueCatCross-platform subscription management
AnalyticsPostHogSelf-hostable product analytics
Error TrackingSentryFull exception capture with context
Push NotificationsExpo PushCross-platform iOS/Android
Schema ValidationZod v4Runtime type validation at boundaries
PDF ExportPuppeteer (Chromium)Headless rendering for schedule exports
TestingVitestFast ESM-compatible test runner

AI Personality Engine

The Personality Engine is the single most impactful piece of novel engineering in the backend. It is the brain that shapes every AI response the user receives.

NestleAI ships 10 distinct personality modes, spanning free and premium tiers — from a calming Nurturing Therapist to a detail-obsessed Pediatric Expert to a Sleep Whisperer with its own dedicated model and timeout configuration.

Each personality has versioned prompts stored in ai_prompt_versions, with a draft/published workflow managed through the Admin Prompt Studio. Prompt engineers iterate on personality voice without touching code or triggering a deployment.

Personality Resolution Pipeline:

  1. Resolve personalityId from user settings
  2. Detect sentiment from the message (stressed / worried / cheerful / neutral)
  3. Determine context domain (sleep / feeding / emergency / general)
  4. Fetch the published versioned prompt from the database (falls back to hardcoded defaults)
  5. Assemble full system prompt: personality voice + response budget rules + formatting rules + context addons + safety guardrails + baby context + time-of-day modifiers
  6. Select model based on tier and whether the message contains an image
  7. Call OpenAI with the assembled prompt and the message history window

Response Budget Rules:

  • Default: 2 short paragraphs, max 90 words
  • If stressed or worried: 1–2 paragraphs, max 70 words
  • If late_night: 1 paragraph, max 55 words
  • Explicit requests for steps/plans → cap at 160 words

Model Selection:

TierHas ImageModel
FreeNogpt-4o-mini
FreeYesgpt-4o-mini (vision)
PremiumNoConfigurable (env-driven)
PremiumYesgpt-4o (full vision)
Sleep WhispererDedicated OPENAI_SLEEP_MODEL

All model configs are environment-variable-driven — temperature, max tokens, and model name can all be tuned without a deployment.


AI Chat Pipeline

The chat pipeline is stateful, context-aware, and protected by multiple independently deployed safety layers. Every message flows through safety validation before it reaches the user.

AI chat pipeline diagram showing the full path from user message through circuit breaker, sentiment detection, personality resolution, context enrichment, model selection, OpenAI API call, sleep guardrail validation, TTS, and delivery
AI chat pipeline diagram showing the full path from user message through circuit breaker, sentiment detection, personality resolution, context enrichment, model selection, OpenAI API call, sleep guardrail validation, TTS, and delivery

Context Enrichment (assembled in parallel on every message):

SourceService
Sleep contextbuildSleepContext() + formatEnhancedSleepContextForPrompt()
Feeding contextbuildFeedingContext() + formatEnhancedFeedingContextForPrompt()
Journey contextformatJourneyContextForPrompt()
Emotion contextbuildEmotionContext() + formatEmotionContextForPrompt()
Development contextbuildDevelopmentContext() + formatDevelopmentContextForPrompt()
Rhythm/EASY doctrineformatRhythmDoctrineForPrompt()

Safety Guards Applied Post-Generation:

  1. Medical Image Detection — Intercepts vision calls before processing to block inappropriate diagnostic image requests
  2. Sleep Guardrails — Post-generation validator checks for unsafe sleep surface advice, stomach-sleeping recommendations, loose bedding, or bed-sharing as a primary recommendation
  3. Safety Doctrine Validator — Central safety-validator.service.ts runs against all responses, domain-aware (sleep / feeding / general / illness)
  4. Feeding Guardrails — Analogous to sleep, intercepts dangerous feeding advice (honey before 12 months, choking hazards, etc.)

Circuit Breaker (Redis-backed):

  • Opens at 50% failure rate over a rolling 1-minute window (min 10 requests)
  • 5-minute cooldown before half-open retry
  • State persisted in Redis — survives server restarts, preventing thundering herd on recovery

Security Architecture

Security is enforced at every layer — from network ingress to database queries.

Security and privacy architecture diagram showing the three-tier middleware chain, telemetry tier model, and data isolation boundaries
Security and privacy architecture diagram showing the three-tier middleware chain, telemetry tier model, and data isolation boundaries

Network & Transport Layer:

  • Helmet.js sets security-relevant HTTP headers (CSP, HSTS, X-Frame-Options)
  • enforceHttps middleware redirects all HTTP traffic in production
  • CORS restricted via CORS_ORIGIN env variable

Authentication Layer:

  • Clerk JWT validated on every request via JWKS endpoint
  • No session tokens in the application — Clerk owns the auth token lifecycle
  • Admin routes protected by a separate session mechanism with a minimum 64-character session secret

Application Layer:

  • Zod v4 validates all request inputs at the controller boundary before business logic executes
  • All SQL uses parameterized inputs — no string interpolation, zero injection risk at the repository layer
  • Baby data access controlled through baby_memberships with roles: OWNER / COPARENT / CAREGIVER / VIEW_ONLY
  • Per-user rate limits on AI endpoints (Free: 60/hour, Premium: 1,000/hour)
  • Brute-force protection middleware on auth-sensitive endpoints

Telemetry Tier Model:

All analytics events are classified into three tiers with user-controllable toggles for Tiers B and C:

TierWhat it tracksUser control
A — EssentialAccount creation, subscription changes, security eventsAlways sent
B — DiagnosticAPI error rates, worker failures, performance signalsRespects send_crash_reports
C — ProductFeature usage, funnel progression, content engagementRespects share_anonymous_analytics

Global kill switches (TELEMETRY_ENABLED, PRODUCT_ANALYTICS_ENABLED, CRASH_REPORTING_ENABLED) allow instant compliance response without a deployment.


Background Jobs & Worker Queue Architecture

All asynchronous work runs through BullMQ backed by Redis. Auto-retry is set to 3 attempts with exponential backoff (starting at 5 seconds). Failed jobs are retained for 7 days; completed jobs for 24 hours.

Worker queue architecture diagram showing the full BullMQ queue topology — API layer enqueuing jobs into Redis queues, worker processes consuming them, and external services being called
Worker queue architecture diagram showing the full BullMQ queue topology — API layer enqueuing jobs into Redis queues, worker processes consuming them, and external services being called

The system runs 25+ distinct workers spanning every major domain:

WorkerTriggerInterval
scheduleGenerationWorkerBullMQ (API)On demand
scheduleDailyHydrationWorkerBullMQ (nightly cron)24 hours
schedulePdfExportWorkerBullMQOn demand
sleepAlertsWorkersetIntervalEvery 2 min
sleepPlanAlertsWorkersetIntervalEvery 5 min
sleepLearningSchedulersetInterval (daily 3 AM)24 hours
appointmentRemindersWorkersetIntervalEvery 15 min
documentScanWorkerBullMQOn demand
emergencyFollowupWorkersetIntervalEvery 5 min
ttsCleanupSchedulersetInterval (daily 3 AM)24 hours
youtubeSyncWorkersetInterval (daily 3 AM)24 hours
coParentContextWorkerBullMQ (always-on)Event-driven
analysisWorker (video)BullMQOn demand
storyboardWorker (video)BullMQOn demand
renderWorker (video)BullMQOn demand

Sleep Intelligence Engine

The Sleep Intelligence Engine transforms raw session logs into a full analytics platform with AI-driven predictions and proactive alerts. It is one of the most sophisticated subsystems in the backend.

Capabilities:

CapabilityDescription
Daily MetricsTotal sleep, nap count, night wakings, sleep debt, wake windows
Weekly MetricsTrends, consistency scores, week-over-week comparisons
Regression Detection4-month, 8-month, separation anxiety, illness regressions, and more
PredictionsNext nap time, expected bedtime, overtired alert threshold
Crisis DetectionSleep debt accumulation, night waking spikes, split-night episodes
AI Context BuilderPackages all sleep metrics into a structured context blob for Sleep Whisperer

Crisis Thresholds:

MetricModerateSevere
Sleep Debt60 min120 min
Night Wakings (above age baseline)58
Max Wake Window4 hours6 hours
Short Nap Cluster3+ naps under 45 min

A nightly learning loop (sleep-learning.service.ts) analyzes historical patterns across all active babies to refine individual predictions — the foundation for a future personalized ML layer.


Emergency Mode

Emergency Mode is the most safety-critical feature in the product. It is activated when a caregiver is in genuine distress and needs immediate support.

Non-negotiable design principles (hardcoded):

  • Never provide medical advice, diagnosis, or treatment instructions
  • Always encourage contacting emergency services for urgent situations (911, Poison Control)
  • Detect and flag potentially dangerous requests before AI response generation
  • Maintain a calm, supportive, non-alarming tone at all times
  • Focus exclusively on emotional regulation and practical next steps

The Emergency AI uses a completely separate OpenAI client instance isolated from the main chat client. If OpenAI is unavailable entirely, a curated library of pre-written fallback responses is always served. Safety flag detection runs before the AI call — flagged requests get immediate pre-written guidance without waiting for the model.

A followup worker (every 5 minutes) monitors active sessions and sends proactive check-in push notifications. Sessions auto-expire after 60 minutes (configurable). Emergency session data is retained for 90 days as a configurable compliance knob.


Key Design Decisions

Monolith First, Modular Always

Starting with a single-process Express monolith — instead of microservices — was a deliberate early-stage decision. Worker processes collocated with the API server eliminate distributed system coordination overhead while module boundaries (controllers → services → repositories) keep future extraction tractable.

BullMQ workers are logically independent. Each could be extracted into a separate process or container with minimal refactoring, meaning the monolith is a deployment decision, not an architectural trap.

Raw SQL Over ORM

Direct pg pool queries with explicit mapping functions, rather than Prisma, Drizzle, or TypeORM. Full control over query construction, explicit EXPLAIN ANALYZE tuning, and zero abstraction overhead for complex joins. All queries use parameterized inputs — no string interpolation, eliminating SQL injection risk at the repository layer.

Versioned Prompts in Database

AI personality prompts live in ai_prompt_versions with a draft/published workflow managed through the Admin Prompt Studio — not in code. Prompt engineering iteration (hours/days) runs far faster than deployment cycle. Non-engineers can iterate on personality voice. Version history enables rollback without a deployment. A/B testing is future-proofed by an environment column.

Centralized Safety Doctrine

A single safety-validator.service.ts is the audit point for all AI response validation. Domain-specific validators (sleep, feeding, emergency, general) compose it. Safety logic is never duplicated. A single place to update policy means every feature that generates content automatically inherits any new rule.

Redis-Backed Circuit Breakers for Cost-Critical Services

TTS and AI chat circuit breakers persist state to Redis. In-memory circuit breaker state is lost on server restart, creating a thundering herd against a degraded provider on recovery. Redis persistence means the breaker stays open across restarts until the provider actually recovers. Without this, a brief OpenAI TTS outage could multiply costs by orders of magnitude through automatic retries.


Engineering Challenges

AI Safety for Infant Care

The boundary between "helpful parenting advice" and "medical advice" is contextual — you cannot write a static rule set that covers every case. The resolution was layered defenses: pre-generation keyword detection, post-generation regex/semantic checks, age-aware guardrails, a fallback response library, and emergency mode completely isolated from the regular AI client with hardcoded system prompt constraints.

Chat Context Assembly at Scale

A meaningful AI chat response needs the baby's current sleep state, recent feeding patterns, active journeys, emotional context, parenting style, and more. All context sources are fetched in parallel (Promise.all). Behavior summaries are pre-computed every 6 hours so AI doesn't need raw event data at request time. Chat history is passed as a truncated window — not the full thread.

TTS Cost Control

Without a Redis-backed circuit breaker, a brief OpenAI TTS outage could result in hundreds of retry attempts hitting a degraded API, massively multiplying costs. The Redis-persistence decision paid for itself immediately. Per-account rate limits (10 requests/minute, 100 requests/day), a 4,000 character cap, and a daily cleanup job removing stale audio files from R2 complete the cost control surface.

Emergency Mode Reliability Guarantee

Emergency Mode was designed as if the main AI pipeline didn't exist. The dependency inversion forced clear thinking about which parts of the system must be resilient to failure. A parent in crisis at 3 AM cannot receive an error screen. The emergency system has a pre-written fallback response library, a separate OpenAI client, and an independent followup worker — none of which depend on the main chat stack.

Schedule Hydration at Scale

Every active baby needs their schedule pre-hydrated every night. The fan-out pattern — one coordinator triggering one BullMQ job per baby — makes each hydration independent and retryable. Failure for one baby doesn't affect others. The scheduleReconciliationWorker catches any drift between template and daily instance.


Lessons Learned

Invest in the safety layer before feature work. For any product in a high-stakes domain (healthcare, childcare, mental health), safety infrastructure is not a feature — it is the foundation. Every subsequent feature that generates AI content hooks into existing safety infrastructure rather than inventing its own.

Prompt engineering is a product discipline. The most impactful changes to user experience quality came not from model changes but from prompt refinements. The response budget rules — 90 words by default, 55 at night, 70 when stressed — transformed user perception of response quality by eliminating verbosity that caused users to stop reading.

Circuit breakers need Redis from day one for cost-critical services. In-memory is fine for rate limiting. It is not fine for services where failure causes financial harm.

Feature flags as infrastructure. Adding FEATURE_* env-validated flags from the beginning allowed merging incomplete features without impacting users. The discipline of never shipping untested features without a flag gate dramatically improved stability.

Graceful degradation must be designed in, not bolted on. The pattern of "try AI, fall back to a safe default" appears in emergency mode, TTS, sleep guardrails, and the chat circuit breaker. In every case, the fallback path was a first-class engineering task. For a product parents rely on at 3 AM with a sick infant, "error screen" is not an acceptable outcome.

Admin tooling is an engineering multiplier. Every bug debuggable in the admin panel was a bug that didn't need a developer. Every prompt change made by a product manager through the Prompt Studio was a developer not deployed for a prompt update.


What We Would Do Differently

Distributed rate limiting from day one. The current in-memory rate limiter is explicitly marked as "not cluster-safe." Redis-backed rate limiting (sliding window with sorted sets) should be in place before any horizontal scaling attempt.

Structured logging. A structured logging library (Pino or Winston) with consistent log levels, request correlation IDs, and structured fields would make production debugging significantly faster. Sentry captures exceptions, but non-error observability is sparse.

Cron as a service. The setInterval crons in index.ts are vulnerable to double-execution in a multi-instance deployment and have no visibility. A dedicated cron service (pg-boss, Inngest, or a serverless scheduler) would be safer and more observable.

Request/response contracts. The API surface spans 30+ domains with no machine-readable contract. Generating an OpenAPI spec from Zod schemas would enable automatic SDK generation, contract testing, and documentation without additional manual work.

Context caching for AI calls. Per-request context assembly makes multiple database reads on every AI chat message. A short-TTL per-baby context cache would reduce database load significantly at scale without meaningfully affecting response quality.

Event sourcing for the timeline. The timeline reconstructs a chronological feed from multiple source tables. An event log pattern — all domain events written to a unified timeline_events table — would make timeline queries a single indexed read instead of a fan-out across N tables.