Lead AI/ML Platform Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for talented team members who want to Dream. Do. Grow. with us. An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment. To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time. Who we're looking for Toyota Financial Services Enterprise Platforms team is looking for a passionate and highly motivated Lead AI/ML Platform Engineer . The primary responsibility of this role is to design, build, and implement scalable platform solutions that power enterprise AI/ML and GenAI capabilities across the organization. You will help enable secure, production-ready MLOps and LLMOps infrastructure that supports model training, inference, orchestration, and retrieval-augmented generation. The Lead AI/ML Platform Engineer will support the Enterprise Platforms team's objective to deliver reliable, secure, and high-performing AI platform capabilities that drive business value at scale.
Responsibilities
- Design and implement cloud-native infrastructure that enables enterprise AI/ML and GenAI workloads in production
- Build and evolve MLOps and LLMOps platform capabilities, including model training, versioning, deployment, monitoring, and rollback
- Create GPU-accelerated compute environments that improve model performance while balancing scalability and cost efficiency
- Standardize infrastructure patterns for vector databases, model registries, and orchestration frameworks
- Develop reusable approaches for model serving, inference scaling, prompt management, and latency optimization
- Design secure, multi-tenant environments with strong access controls, auditability, and usage governance for AI models
- Partner closely with engineering, platform, and data teams to ensure smooth data flow, strong observability, and operational resiliency
- Own technical direction for AI infrastructure services and integrations in collaboration with the architecture team
- Lead design reviews, establish engineering standards, and help guide critical technical decisions
- Mentor engineers, provide thoughtful feedback, and support growth through coaching and development planning
- Stay current on emerging GenAI, distributed systems, and infrastructure trends to bring fresh ideas and better solutions to the team
- What you bring
- 10+ years of experience in software engineering, with a focus on cloud infrastructure or cloud platform engineering
- 3+ years of experience building cloud infrastructure that supports AI/ML workloads such as training, tuning, and inference
- Deep hands-on experience with AWS and infrastructure-as-code tools such as Terraform, CDK, or CloudFormation
- Experience with Kubernetes, containerization, and CI/CD pipelines in a production environment
- Strong understanding of GPU infrastructure , serverless compute, and scalable microservice patterns
- Familiarity with model hosting, inference scaling, and observability tools such as Datadog, CloudWatch, or Prometheus
- Practical experience using Git/GitHub and CI/CD tooling such as GitHub Actions or Jenkins
- Added bonus if you have
- Experience with AWS AI/ML services such as SageMaker or Bedrock
- Familiarity with LLMOps tooling and GenAI infrastructure such as LangChain or RAG pipelines
- Experience working with vector databases, model registries, or orchestration tools such as MLflow, Airflow, or Ray
- Knowledge of prompt management, token usage optimization, and model performance tuning
- AWS Solutions Architect Professional or Machine Learning certification
- What we'll bring
- During your interview process, we'll share details about our industry-leading benefits and career development opportunities designed to support your growth and well-being:
- A collaborative work environment built on teamwork, flexibility, and respect
- Professional growth programs inc
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at toyota? Share your experience