Additional Information
Fully remote | Complete engagement job
Founded in Palo Alto by Dr. Andrew Ng and Israel Niezen, Factored helps U.S. companies build and scale world-class AI, ML, and Data teams, powered by the top 1% of LATAM talent, with a defining purpose: To empower brilliant humans, unleash their potential, and amplify their impact in the world.
At Factored, you'll be part of a community that values learning, ownership, and authenticity, where your growth is personal and your ideas matter. We're transparent, curious, and collaborative. We strive for excellence, celebrate diversity, encourage curiosity, and build an environment where you can truly thrive.
We're seeking a Senior DevOps Engineer with 5+ years of hands-on experience designing, building, and operating reliable infrastructure and deployment pipelines in production environments. In this role, you'll help tackle complex infrastructure challenges and ensure systems are secure, stable, and scalable.
You'll play an important role in supporting AI and GenAI workloads , building and operating deployment pipelines, and contributing to the evolution of a shared GenAI application platform used across multiple teams. This is a hands-on, high-ownership role with meaningful impact, working closely with engineers to bring AI-enabled applications into production responsibly and reliably.
Functional Responsibilities
Design, deploy, and manage cloud infrastructure on AWS , optimized for AI and ML workloads with high computational demands.
Build, maintain, and optimize CI/CD pipelines tailored for AI/ML and GenAI applications .
Automate model training, testing, deployment, and monitoring workflows.
Ensure scalability, reliability, and high availability of AI-powered applications in production.
Implement monitoring and observability systems to track model performance, data drift, logs, and system uptime.
Deploy and operate applications using containerized and orchestrated environments (Docker, Kubernetes).
Apply Infrastructure as Code practices using tools such as Terraform or Ansible .
Contribute hands-on to complex application and web development projects , delivering high-quality, production-grade code when needed.
Help build and scale a central GenAI application platform , enabling teams to share data, code, and best practices.
Collaborate closely with data scientists, ML engineers, and backend developers to ensure secure and smooth deployment of AI services.
Apply an SRE mindset to improve system resilience, operational excellence, and long-term maintainability.
Build trust and collaboration through strong communication and active engagement with cross-functional partners.