About the Role:
We are hiring a Full-Stack Developer with a strong back-end focus to help us build a high-impact platform for automated adversarial testing, vulnerability detection, and model benchmarking of generative AI systems.
This platform empowers subject matter experts and enterprise clients to test and evaluate large language models (LLMs) across a wide range of data types and task taxonomies, without the need for manual evaluation. Your work will directly contribute to improving the safety, robustness, and alignment of modern AI systems deployed in production environments.
Responsibilities:
- Develop and maintain full-stack features with a strong focus on back-end development
- Build scalable batch pipelines that automate LLM testing and integrate third-party evaluators
- Process and transform multi-modal data via ETL workflows and store results in MySQL and Elasticsearch
- Create and manage stored procedures, job schedulers, and retry mechanisms for API pipelines
- Design REST APIs to support front-end dashboards, filters, and benchmarking tools
- Collaborate closely with front-end developers, QAs, DevOps, and product leads in an agile environment
- Ensure systems are performant, fault-tolerant, and secure
Platform Capabilities:
- Supported Data Types: Image, Video, Sensor (LiDAR), Audio, Speech, Document, Code
- Task Taxonomies: Summarization, Image Evaluation, Image Reasoning, Q&A, Question Understanding, Entity Relation Classification, Text-to-Code, Logic & Semantics, Question Rewriting, Translation
- Feedback Types: DPO (Direct Policy Optimization), Simple RLHF, Complex RLHF, Nominal Feedback
- Techniques Tested: Payload Smuggling, Prompt Injection, Persuasion and Manipulation, Conversational Coercion, Hypotheticals, Roleplaying, One-/Few-shot Learning
Tech Stack:
- Node.js, TypeScript, React
- MySQL (including stored procedures), Elasticsearch
- REST APIs, OAuth2.0, JWT
- Docker, GitHub Actions, Kubernetes (optional)
- Job orchestration tools (Cron, node-cron, BullMQ or similar)
Requirements:
- 3–5+ years of full-stack development experience, with a strong back-end orientation
- Proficiency in Node.js and TypeScript; working experience with React
- Strong experience integrating and orchestrating REST APIs at scale
- Experience building ETL workflows and handling multi-modal data
- Solid database development skills in MySQL and Elasticsearch
- Familiarity with OAuth2.0, JWT, and secure API development
- Comfortable working in a remote team with a 7:30 AM EST start time
Nice to Have:
- Experience with LLM APIs, AI/ML workflows, or evaluation techniques (e.g., DPO, RLHF)
- Familiarity with adversarial testing methods such as prompt injection and roleplaying
- Experience with CI/CD, Docker, Kubernetes, or distributed system architecture
- Background in AI safety or model evaluation frameworks