Machine Learning Engineer, AI Solutions Hub (AISH)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Job Summary This is a full-time, one-year term appointment with the possibility of renewal. The position is in-person at Northeastern's Roux Institute in Portland, Maine. The Machine Learning Engineer (MLE) at the AI Solutions Hub (AISH), the delivery arm of Northeastern University's Experiential AI Institute, will support the development, deployment, and maintenance of machine learning systems in collaboration with other AISH employees. This role is intended for early-career engineers who want to build strong foundations in MLOps, cloud-based ML systems, and production-oriented AI development. The MLE will contribute to ML pipelines, deployment workflows, and infrastructure components while learning best practices for scalable, reliable, and responsible AI systems. Education & Experience - Bachelor's or Master's degree in Computer Science, Software Engineering, or a closely related field. - 0-2 years of experience in software engineering, machine learning engineering, or applied ML projects. - Experience may include industry work, internships, co-ops, academic research, or applied project work. - Exposure to cloud platforms and ML deployment concepts, tools, and services is required. - Industry experience is preferred. Knowledge, Skills, and Abilities ML Engineering Foundations - Strong programming skills in Python; comfort with software engineering practices including code review, testing, version control, and documentation. - Working knowledge of ML workflows with emphasis on the deployment side: model packaging, serving, validation, and inference rather than research or experimentation. - Familiarity with classical ML techniques and practical exposure to modern AI including deep learning, generative AI, and large language models. LLM & Agentic AI Systems - Understanding of how LLMs are deployed, served, and integrated into applications (e.g., API-based inference, model hosting via vLLM, TGI, or similar serving frameworks). - Familiarity with agentic AI patterns: tool use, multi-step reasoning, orchestration frameworks (e.g., LangGraph, CrewAI, or similar), and structured output from LLMs. - Awareness of prompt engineering for production systems: not just conversational prompting but structured prompting for reliable, parseable outputs in automated pipelines. - Exposure to AI-assisted development workflows and coding agents as productivity tools. Cloud Engineering - Experience with at least one cloud platform (AWS, Azure, or GCP), including core compute, storage, and networking services. - Familiarity with containerization (Docker) and container orchestration (Kubernetes). - Awareness of infrastructure-as-code concepts (e.g., Terraform, CloudFormation) is preferred. MLOps and Deployment - Build and maintain ML deployment pipelines, including model packaging, registry management, and promotion workflows. - Support batch and real-time inference workflows using appropriate serving frameworks (e.g., FastAPI, TorchServe, Triton, vLLM). - Contribute to model validation, A/B testing infrastructure, and data and model versioning practices (e.g., DVC, MLflow, Weights & Biases). Observability & Production Reliability - Help implement logging, monitoring, and alerting for deployed ML services (e.g., model latency, prediction drift, error rates). - Contribute to structured approaches for debugging production model issues, understanding the difference between infrastructure failures and model degradation. - Awareness of cost monitoring and resource optimization for GPU and cloud-based ML workloads. DevOps and Automation - Responsible for CI/CD pipelines for ML applications, including automated testing of model artifacts and data validation. - Contribute to reproducible environment setup and configuration management. - Learn and apply best practices for reliability, scalability, and cost-awareness. Security and Responsible Engineering - Follow established security and access control practices for ML workflows. - Assist with implementing data privacy and governance requirements. - Responsible for secure handling of credentials, model artifacts, and sensitive data. - Awareness of LLM-specific security concerns: prompt injection, data leakage, and output guardrails. Collaboration and Communication - Ability to clearly communicate technical decisions and tradeoffs to both technical and non-technical audiences, with guidance. - Collaborate effectively with cross-functional teams including data scientists, engineers, project managers, and faculty experts. - Willingness to participate in client meetings in a supporting role. Preferred Experience - Exposure to Kubernetes, GPU-based workloads, or distributed training/inference concepts. - Familiarity with Git-based workflows and Agile development practices. - Coursework or projects involving NLP, computer vision, or large language models. - Experience with API design for ML services (REST/gRPC). - Familiarity with vector databases, retrieval-augment