Back to Jobs

AI Safety Research Intern-2

Remote, USA Full-time Posted 2025-11-24

Centific is a frontier AI data foundry that empowers clients with safe, scalable AI deployment. The AI Safety Research Intern will focus on advancing AI safety, designing and evaluating attack and defense strategies for LLM jailbreaks, and contributing to the platform's security guarantees through high-impact experiments.


Responsibilities

  • Advance AI Safety: Design, implement, and evaluate attack and defense strategies for LLM jailbreaks (prompt injection, obfuscation, narrative red teaming)
  • Evaluate AI Behavior: Analyze and simulate human-AI interaction patterns to uncover behavioral vulnerabilities, social engineering risks, and over-defensive vs. permissive response tradeoffs
  • Agentic AI Security: Prototype workflows for multi-agent safety (e.g., agent self-checks, regulatory compliance, defense chains) that span perception, reasoning, and action
  • Benchmark & Harden LLMs: Create reproducible evaluation protocols/KPIs for safety, over-defensiveness, adversarial resilience, and defense effectiveness across diverse models (including latest benchmarks and real-world exploit scenarios)
  • Deploy and Monitor: Package research into robust, monitorable AI services using modern stacks (Kubernetes, Docker, Ray, FastAPI); integrate safety telemetry, anomaly detection, and continuous red-teaming
  • Jailbreaking Analysis: Systematically red-team advanced LLMs (GPT-4o, GPT-5, LLaMA, Mistral, Gemma, etc.), uncovering novel exploits and defense gaps
  • Multi-turn Obfuscation Defense: Implement context-aware, multi-turn attack detection and guardrail mechanisms, including countermeasures for obfuscated prompts (e.g., StringJoin, narrative exploits)
  • Agent Self-Regulation: Develop agentic architectures for autonomous self-check and self-correct, minimizing risk in complex, multi-agent environments
  • Human-Centered Safety: Study human behavior models in adversarial contexts—how users probe, trick, or manipulate LLMs, and how defenses can adapt without excessive over-defensiveness

Skills

  • Ph.D. student in CS/EE/ML/Security (or related); actively publishing in AI Safety, NLP robustness, or adversarial ML (ACL, NeurIPS, BlackHat, IEEE S&P, etc.)
  • Strong Python and PyTorch/JAX skills; comfort with toolkits for language models, benchmarking, and simulation
  • Demonstrated research in at least one of: LLM jailbreak attacks/defense, agentic AI safety, human-AI interaction vulnerabilities
  • Proven ability to go from concept → code → experiment → result, with rigorous tracking and ablation studies
  • Experience in adversarial prompt engineering, jailbreak detection (narrative, obfuscated, sequential attacks)
  • Prior work on multi-agent architectures or robust defense strategies for LLMs
  • Familiarity with red-teaming, synthetic behavioral data, and regulatory safety standards
  • Scalable training and deployment: Ray, distributed evaluation, CI/telemetry for defense protocols
  • Public code artifacts (GitHub) and first-author publications or strong open-source impact

Company Overview

  • Zero distance innovation for GenAI creators and industries Expertly engineering platforms and curating multimodal, multilingual data, we empower the ‘Magnificent Seven’ and enterprise clients with safe, scalable AI deployment We a team of over 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. It was founded in 2020, and is headquartered in Redmond, Washington, USA, with a workforce of 5001-10000 employees. Its website is https://www.centific.com.

  • Company H1B Sponsorship

  • Centific has a track record of offering H1B sponsorships, with 10 in 2025, 22 in 2024, 14 in 2023. Please note that this does not guarantee sponsorship for this specific role.

  •   Apply To This Job

    Similar Jobs

    Seasonal Sales Associate-282 Southeast Richmond...

    Remote, USA Full-time

    Experienced Customer Support Remote Representative – Delivering Magical Experiences to arenaflex Enthusiasts from the Comfort of Your Own Home

    Remote, USA Full-time

    Remote Care Manager - RN 3 Locations

    Remote, USA Full-time

    **Experienced Customer Service Representative – Pet Industry Expert (Remote in Florida)**

    Remote, USA Full-time

    Intelligence Analyst – RFI Triage (Remote, East...

    Remote, USA Full-time

    Business Development Director, Commercial Enter...

    Remote, USA Full-time

    Data Entry Remote Jobs-JetBlue Airline At Home ...

    Remote, USA Full-time

    [Hiring] Temporary Team Lead @TTEC

    Remote, USA Full-time

    Senior Data Scientist - Revenue Intelligence

    Remote, USA Full-time

    Delivery Director - US-Based

    Remote, USA Full-time

    Remote Customer Service Rep - Starts at 19 per Hour – Amazon Store

    Remote, USA Full-time

    Customer Service Representative – Remote (OK Residents Only)

    Remote, USA Full-time

    Part time, Remote Data Entry Clerk - Work From Home at Yexgo Denver, CO

    Remote, USA Full-time

    Senior Payroll Associate

    Remote, USA Full-time

    [PART_TIME Remote] Part-Time Bookkeeper (Atlanta Preferred)

    Remote, USA Full-time

    **Experienced Full Stack Customer Service Chat Assistant – Remote Work Opportunity with blithequark**

    Remote, USA Full-time

    Project Manager - B2B Automation Strategy (Part-Time Jobs)

    Remote, USA Full-time

    Retail Associate – Amazon Store

    Remote, USA Full-time

    Experienced Live Chat Customer Support Specialist – Remote Work Opportunity with blithequark, Earn $25-$35/hr

    Remote, USA Full-time

    Ciox Health – Customer Service Client Support Manager – USA

    Remote, USA Full-time