The MLE role overlaps with many disciplines, such as Ops, Modeling, and Data Engineering. In this role, you'll be expected to perform many ML engineering activities, including one or more of the following:
Design, build, and/or deliver ML models and components that solve real-world business problems, while working in collaboration with the Product and Data Science teams
Inform your ML infrastructure decisions using your understanding of ML modeling techniques and issues, including choice of model, data, and feature selection, model training, hyperparameter tuning, dimensionality, bias/variance, and validation)
Solve complex problems by writing and testing application code, developing and validating ML models, and automating tests and deployment
Collaborate as part of a cross-functional Agile team to create and enhance software that enables state-of-the-art big data and ML applications
Retrain, maintain, and monitor models in production
Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale.
Construct optimized data pipelines to feed ML models
Leverage continuous integration and continuous deployment best practices, including test automation and monitoring, to ensure successful deployment of ML models and application code
Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI
Use programming languages like Python, Scala, or Java
Requirements
Bachelor's Degree
At least 2 years of experience designing and building data-intensive solutions using distributed computing (Internship experience does not apply)
At least 2 years of experience programming with Python, Scala, or Java
At least 1 year of Machine Learning experience with an industry recognized ML framework (scikit-learn, PyTorch, Dask, Spark, or TensorFlow)
Experience developing and deploying ML solutions in a public cloud such as AWS, Azure, or Google Cloud Platform
1+ years of experience working with large code bases in a team environment
1+ years of experience with distributed file systems or multi-node database paradigms
Contributed to open source ML software
1+ years of experience building production-ready data pipelines that feed ML models
Experience leveraging interactive AI tooling to accelerate productivity, utilizing capabilities beyond basic code completion
McLean, VA: $135,600 - $154,800 for Machine Learning Engineer
New York, NY: $148,000 - $168,900 for Machine Learning Engineer
Candidates hired to work in other locations will be subject to the pay range associated with that location, and the actual annualized salary amount offered to any candidate at the time of hire will be reflected solely in the candidate's offer letter.
This role is also eligible to earn performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI). Incentives could be d
Benefits
Performance bonus
Additional Information
Machine Learning Engineer (AI Foundations)
As a Capital One Machine Learning Engineer (MLE), you'll be part of an Agile team dedicated to productionizing machine learning applications and systems at scale. You'll participate in the detailed technical design, development, and implementation of machine learning applications using existing and emerging technology platforms. You'll focus on machine learning architectural design, develop and review model and application code, and ensure high availability and performance of our machine learning applications. You'll have the opportunity to continuously learn and apply the latest innovations and best practices in machine learning engineering.
Capital One is accelerating the adoption of state of the art AI research to create simpler, safer banking experiences for over 100 million customers. The AI Foundations team spearheads this mission by developing advanced LLMs and autonomous agentic systems capable of complex reasoning and real world problem solving. Their comprehensive research framework prioritizes foundational model architecture, operational efficiency, and responsible AI practices to ensure all systems are trustworthy and scalable.