Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AIapplications, LLM integrations, model routing, and enterprise AI services.
Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECSFargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA.
Staff Platform Engineer, AI/ML Infrastructure at Pfizer
Establish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to supportreliable multi-environment and multi-region deployments.
Lead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWSauthentication, automated quality gates, deployment promotion, and environment approvals.
Design and improve observability across AI platforms, including CloudWatch dashboards, logs,alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics.
Build platform capabilities for GenAI workloads, including model availability monitoring.
Partner with software engineering teams to improve deployment reliability, rollback strategies,health checks, autoscaling, load testing, and runtime performance.
Define and enforce security and compliance practices for infrastructure, including IAM permissionboundaries, Secrets Manager usage, secret scanning, audit logging, tagging standards, andchange-management controls.
Provide technical leadership for cost optimization, capacity planning, environment standardization,and operational resilience across development, test, production, and sandbox environments.
Mentor engineers, review architecture and infrastructure designs, and influence platformengineering practices across teams.
Requirements
Bachelor's degree in Computer Science, Engineering, Information Technology, or a relatedtechnical field, or equivalent practical experience.
7+ years of experience in DevOps, platform engineering, cloud infrastructure, site reliabilityengineering, or software engineering roles.
Strong hands-on experience with AWS/Azure/GCP infrastructure and services, including container,serverless, networking, storage, observability, and security services.
Experience designing and operating production systems on Kubernetes, ECS/Fargate, orcomparable container orchestration platforms.
Proficiency with infrastructure-as-code, especially CloudFormation, Terraform, Helm, or similartooling.
Strong CI/CD experience with GitHub Actions or similar platforms, including reusable workflows,automated testing, deployment gates, and cloud authentication.
Experience building and operating observability solutions using CloudWatch, Prometheus/Grafana,OpenSearch, or similar tools.
Strong understanding of cloud security practices, IAM, secrets management, least-privilegeaccess, audit logging, and compliance requirements.
Demonstrated ability to lead technical design, mentor engineers, and influence engineeringpractices across teams.
Experience supporting AI/ML or generative AI platforms, including LLM gateways, model routing,prompt observability, token metering, or model failover.
Experience operating platforms in regulated enterprise environments, ideally healthcare,pharmaceutical, finance, or life sciences.
Experience with multi-account, multi-region AWS architectures and enterprise governancepatterns.
Experience with cost optimization, autoscaling strategies, capacity planning, and cloud budgetmonitoring.
Experience with load testing and performance validation using tools such as Locust or comparableframeworks.
Strong Python or scripting skills for platform automation, operational tooling, and CI/CD extensions.
Ability to communicate complex technical decisions clearly to engineering, security, operations,and leadership audiences.
Technical Environment
This role works across a modern AI platform ecosystem including: Cl
Benefits
Health insurance
Additional Information
Staf f Platform Engineer, AI/ML Infrastructure
Department:AI Software & Operations
Role Summary
The Staff Platform Engineer, AI/ML Infrastructure will provide technical leadership for thecloud platforms, deployment systems, and operational foundations that power enterprise-scalegenerative AI applications.
This role will define and evolve the infrastructure architecture for AI/ML platforms running across AWS,Kubernetes, serverless, and containerized environments. The engineer will lead platform standards forreliability, scalability, observability, CI/CD, security, and developer enablement, while partnering closelywith software engineering, AI engineering, security, and operations teams.
The ideal candidate combines deep hands-on cloud engineering experience with staff-level technicalinfluence. They are comfortable designing infrastructure patterns, writing infrastructure-as-code,improving delivery pipelines, mentoring engineers, and making architectural decisions that raise theoperational maturity of AI platforms across multiple teams.