Lead AI/ML Platform Engineer

External

Toyota · Plano, TX

Full-timeOn-site1w ago

AirflowAWSCI/CDCloudFormationDatadogGit

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for talented team members who want to Dream. Do. Grow. with us. An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment. To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time. Who we're looking for Toyota Financial Services Enterprise Platforms team is looking for a passionate and highly motivated Lead AI/ML Platform Engineer . The primary responsibility of this role is to design, build, and implement scalable platform solutions that power enterprise AI/ML and GenAI capabilities across the organization. You will help enable secure, production-ready MLOps and LLMOps infrastructure that supports model training, inference, orchestration, and retrieval-augmented generation. The Lead AI/ML Platform Engineer will support the Enterprise Platforms team's objective to deliver reliable, secure, and high-performing AI platform capabilities that drive business value at scale.

Responsibilities

Design and implement cloud-native infrastructure that enables enterprise AI/ML and GenAI workloads in production
Build and evolve MLOps and LLMOps platform capabilities, including model training, versioning, deployment, monitoring, and rollback
Create GPU-accelerated compute environments that improve model performance while balancing scalability and cost efficiency
Standardize infrastructure patterns for vector databases, model registries, and orchestration frameworks
Develop reusable approaches for model serving, inference scaling, prompt management, and latency optimization
Design secure, multi-tenant environments with strong access controls, auditability, and usage governance for AI models
Partner closely with engineering, platform, and data teams to ensure smooth data flow, strong observability, and operational resiliency
Own technical direction for AI infrastructure services and integrations in collaboration with the architecture team
Lead design reviews, establish engineering standards, and help guide critical technical decisions
Mentor engineers, provide thoughtful feedback, and support growth through coaching and development planning
Stay current on emerging GenAI, distributed systems, and infrastructure trends to bring fresh ideas and better solutions to the team
What you bring
10+ years of experience in software engineering, with a focus on cloud infrastructure or cloud platform engineering
3+ years of experience building cloud infrastructure that supports AI/ML workloads such as training, tuning, and inference
Deep hands-on experience with AWS and infrastructure-as-code tools such as Terraform, CDK, or CloudFormation
Experience with Kubernetes, containerization, and CI/CD pipelines in a production environment
Strong understanding of GPU infrastructure , serverless compute, and scalable microservice patterns
Familiarity with model hosting, inference scaling, and observability tools such as Datadog, CloudWatch, or Prometheus
Practical experience using Git/GitHub and CI/CD tooling such as GitHub Actions or Jenkins
Added bonus if you have
Experience with AWS AI/ML services such as SageMaker or Bedrock
Familiarity with LLMOps tooling and GenAI infrastructure such as LangChain or RAG pipelines
Experience working with vector databases, model registries, or orchestration tools such as MLflow, Airflow, or Ray
Knowledge of prompt management, token usage optimization, and model performance tuning
AWS Solutions Architect Professional or Machine Learning certification
What we'll bring
During your interview process, we'll share details about our industry-leading benefits and career development opportunities designed to support your growth and well-being:
A collaborative work environment built on teamwork, flexibility, and respect
Professional growth programs inc

Benefits

Vision insurancePerformance bonus

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at toyota? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect