[Remote] AI Safety Research Intern (PhD)
Note: The job is a remote job and is open to candidates in USA. Centific is focused on advancing AI safety and responsible AI development. As a Ph.D. Research Intern, you will conduct high-impact experiments and contribute to the security guarantees of AI systems through innovative research and practical implementations.
Responsibilities
- Advance AI Safety: Design, implement, and evaluate attack and defense strategies for LLM jailbreaks (prompt injection, obfuscation, narrative red teaming)
- Evaluate AI Behavior: Analyze and simulate human-AI interaction patterns to uncover behavioral vulnerabilities, social engineering risks, and over-defensive vs. permissive response tradeoffs
- Agentic AI Security: Prototype workflows for multi-agent safety (e.g., agent self-checks, regulatory compliance, defense chains) that span perception, reasoning, and action
- Benchmark & Harden LLMs: Create reproducible evaluation protocols/KPIs for safety, over-defensiveness, adversarial resilience, and defense effectiveness across diverse models (including latest benchmarks and real-world exploit scenarios)
- Deploy and Monitor: Package research into robust, monitorable AI services using modern stacks (Kubernetes, Docker, Ray, FastAPI); integrate safety telemetry, anomaly detection, and continuous red-teaming
- Jailbreaking Analysis: Systematically red-team advanced LLMs (GPT-4o, GPT-5, LLaMA, Mistral, Gemma, etc.), uncovering novel exploits and defense gaps
- Multi-turn Obfuscation Defense: Implement context-aware, multi-turn attack detection and guardrail mechanisms, including countermeasures for obfuscated prompts (e.g., StringJoin, narrative exploits)
- Agent Self-Regulation: Develop agentic architectures for autonomous self-check and self-correct, minimizing risk in complex, multi-agent environments
- Human-Centered Safety: Study human behavior models in adversarial contexts—how users probe, trick, or manipulate LLMs, and how defenses can adapt without excessive over-defensiveness
Skills
- Ph.D. student in CS/EE/ML/Security (or related); actively publishing in AI Safety, NLP robustness, or adversarial ML (ACL, NeurIPS, BlackHat, IEEE S&P, etc.)
- Strong Python and PyTorch/JAX skills; comfort with toolkits for language models, benchmarking, and simulation
- Demonstrated research in at least one of: LLM jailbreak attacks/defense, agentic AI safety, human-AI interaction vulnerabilities
- Proven ability to go from concept → code → experiment → result, with rigorous tracking and ablation studies
- Experience in adversarial prompt engineering, jailbreak detection (narrative, obfuscated, sequential attacks)
- Prior work on multi-agent architectures or robust defense strategies for LLMs
- Familiarity with red-teaming, synthetic behavioral data, and regulatory safety standards
- Scalable training and deployment: Ray, distributed evaluation, CI/telemetry for defense protocols
- Public code artifacts (GitHub) and first-author publications or strong open-source impact
Benefits
- Comprehensive healthcare, dental, and vision coverage
- 401k plan
- Paid time off (PTO)
- And more!
Company Overview
Company H1B Sponsorship
Apply To This Job