Senior Platform Engineer

External

Crop.photo · India

Full-timeRemoteToday

PythonAWSKubernetesTerraformCI/CD

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Senior Platform Engineer / Senior DevOps Engineer Location: Remote | Full-Time About Evolphin Evolphin is building the next generation of AI-powered media workflows for enterprise media teams managing large image and video libraries at scale, including environments with tens of millions of video assets and extremely large metadata and embedding footprints. Its platform adds a conversational AI layer for extracting intelligence from media, enabling powerful search, conversational discovery, and automation of media workflows through AskAI. Crop.photo extends that capability into e-commerce and retail, enabling smart cropping, image transformation, and image and video generation at scale for PDP and eCommerce catalog workflows. Together, Evolphin and Crop.photo create a connected visual AI ecosystem where media can move across systems and be searched, understood, transformed, and prepared for downstream use at scale The Role We are seeking a Senior Platform Engineer who can own infrastructure architecture, reliability, scalability, and platform operations across our cloud environments. This is not a traditional "pipeline management" role. We are looking for someone who can make architectural decisions, evaluate tradeoffs, and build platforms that support large-scale SaaS applications and AI workloads. You will work closely with Engineering, Product, and AI teams to design systems that are secure, resilient, scalable, and cost-efficient. Key Responsibilities Platform Architecture & System Design Design and evolve cloud-native platform architecture supporting multi-tenant SaaS applications Define infrastructure standards, deployment patterns, and platform best practices Lead architecture reviews and evaluate technical tradeoffs across reliability, performance, security, and cost Design highly available and fault-tolerant systems across multiple environments AWS Infrastructure Architect and manage large-scale AWS environments Design networking architectures including VPCs, subnets, security groups, routing, load balancing, and connectivity patterns Build secure deployment architectures aligned with security and compliance requirements Implement disaster recovery, backup, and business continuity strategies Kubernetes & Platform Operations Design and operate production Kubernetes environments Build scalable container orchestration strategies Optimize cluster performance, networking, autoscaling, and workload scheduling Improve developer experience through platform automation and self-service tooling AI & GPU Infrastructure Support AI and ML workloads running on AWS Design infrastructure for model training and inference workloads Manage GPU provisioning, utilization, scaling, and cost optimization Collaborate with AI teams to improve deployment and operational efficiency Reliability & Performance Define and measure SLIs, SLOs, and operational metrics Implement monitoring, observability, logging, alerting, and incident management practices Drive performance optimization and capacity planning initiatives Lead root cause analysis and reliability improvement effort Infrastructure Automation Build Infrastructure-as-Code solutions using Terraform Design and optimize CI/CD pipelines Automate provisioning, deployments, scaling, and operational workflows Cost Optimization Continuously evaluate cloud spending Develop capacity planning models Balance performance, reliability, and infrastructure costs Required Experience 6-8 years of experience in DevOps, Platform Engineering, SRE, or Cloud Infrastructure roles. Proven experience designing and operating production-scale SaaS platforms. Strong expertise in AWS architecture, networking, security, and deployment strategies. Deep hands-on experience with Kubernetes, container orchestration, cluster operations, autoscaling, and workload management. Experience designing highly available, fault-tolerant, and scalable distributed systems. Strong understanding of system design, architecture trade-offs, and platform scalability. Hands-on experience with Infrastructure as Code (Terraform preferred). Experience building and maintaining CI/CD pipelines and deployment automation frameworks. Strong Linux, networking, and systems engineering fundamentals. Experience implementing observability, monitoring, logging, and incident management practices. Experience with disaster recovery planning, backup strategies, and business continuity design. Experience with cloud cost optimization, capacity planning, and resource utilization management. Hands-on experience supporting AI/ML workloads in production environments. Experience designing, provisioning, and operating GPU-based infrastructure for model training and/or inference workloads. Experience managing and optimizing AWS Bedrock, OpenSearch, and DocumentDB or equivalent platforms. Strong scripting and automation skills using Python, Bash, or similar languages.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Crop.photo? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect