AI Systems Engineer (LLM Performance, Cost & Reliability) | Audit → Recommend → Implement
Overview
Jules is a mobile AI-powered style and dating photo coach. We analyze outfit photos and dating profile images, score them, and give actionable feedback using LLMs and vision models.
The product is live, architected, and thoughtfully built.
What we need now is systems-level optimization.
We’re looking for a senior engineer to audit, optimize, and harden our LLM infrastructure — reducing latency and cost while improving reliability and consistency — without changing product flows or UX.
This is not a greenfield build.
This is not prompt polishing.
This is a real production system that needs to scale.
What You’ll Do
Phase 1
Audit all LLM usage across the system:
FitCheck (vision)
PicReview (vision)
Comparison modes
Conversational chat
Analyze:
Latency bottlenecks (user-perceived and backend)
Cost per request / feature / user
Model usage vs actual requirements
Prompt size, retries, determinism, and waste
Review existing cost instrumentation and update pricing assumptions
Deliverable:
A written audit outlining:
Current performance & cost profile
Clear problem areas
Ranked list of optimization opportunities with estimated impact
Phase 2 — Optimize & Implement
Implement agreed optimizations directly in the codebase, which may include:
Multi-model routing (cheap → expensive fallback)
Vision + text model rationalization
Caching (hash-based, context-based, or result reuse)
Async coordination improvements (queues, batching, retries)
Prompt minimization and structural refactors (not stylistic rewrites)
More accurate cost tracking and reporting
Ensure output stability and scoring consistency are preserved
Deliverable:
Merged code changes
Before/after latency and cost comparison
Clear documentation of decisions and tradeoffs
What You Will Not Do
To be explicit:
❌ Redesign product flows, UX, or scoring logic
❌ Rewrite Jules’ persona or tone
❌ “Improve” the product by adding features
❌ Push unnecessary infra churn before instrumentation
❌ Suggest fine-tuning as a first solution
Your job is to make the engine faster, cheaper, and more reliable, not change the car.
Technical Environment (You’ll Be Working Inside This)
Frontend: React Native (Expo, TypeScript)
Backend: Node.js + Express
Database: MongoDB
AI: OpenAI (GPT-4o for vision, GPT-4.1-mini for chat)
Infra: Cloudinary (images), Firebase Auth, Segment, Sentry
Architecture: Async API calls, structured JSON outputs, prompt routing system
Full architecture documentation will be provided on engagement start.
What We’re Looking For
Required
Deep experience optimizing production LLM systems
Strong intuition for cost vs latency vs quality tradeoffs
Hands-on backend engineering skills (Node.js)
Experience with:
model routing
async systems
caching strategies
deterministic LLM outputs
Nice to Have
Vision model experience
Experience evaluating multiple inference providers
Prior startup or zero-to-scale experience
Engagement Details
Type: Short-term contract
Length: TBD
Scope: Audit → Recommend → Implement
Potential extension: Yes, based on results
Timezone: Flexible, but on Pacific Time
Apply tot his job
Apply To this Job