Senior Backend & Infrastructure Engineer

External

Scispot · Worldwide

Full-timeRemote1d ago

PythonAzureNoSQLRedisDockerKubernetes

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Build the technical backbone for labs that run themselves. This is a hands-on backend and infrastructure role. You will write production code, make architecture choices, and own systems from design through deployment, monitoring, and incident response. Why Scispot Scispot began after my brother Guru and I (Satya) watched someone they loved run out of time while slow, manual lab processes delayed a promising treatment. We are building Scispot so life-saving science can move at software speed. Biotech & Lifescience teams should not have to choose between moving fast and keeping their data clean, connected, traceable, and ready for AI. We are building the digital backbone for scientific discovery. Scispot connects lab operations, instrument data, scientific workflows, and AI-driven insights in one platform. This becomes the memory layer for lifescience teams for their agents. Your code will not optimize clicks for another consumer app. It will help scientists run experiments faster, trace samples accurately, automate repetitive work, and move treatments closer to patients. This is a rare chance to build infrastructure at the intersection of software, AI, data, and biology. Today, Scispot supports more than 100 labs, 250+ instrument types, over 1,000 experiments each month, and millions of samples. After raising an $8M Series A, we are expanding the engineering team to build the reliable platform beneath the next generation of lab automation and AI The role We are looking for a senior backend and infrastructure engineer who treats production as a product. You will own the systems beneath Scispot: backend services, messaging, databases, cloud infrastructure, CI/CD, observability, security, and reliability. This is not an ops-only role. A normal week may include: Tracing a RabbitMQ bottleneck. Building a FastAPI or Spring Boot service. Tuning PostgreSQL or ElasticSearch. Improving an EKS rollout. Designing a safer AWS and Azure boundary. Reducing cloud cost without weakening reliability. Debugging a production issue across code, queues, caches, and infrastructure. Optimising the workload for AI pipelines You will work closely with the founders and product engineers. You will get broad goals, real customer stakes, and room to decide how to solve the problem. We want someone who acts like an owner, not someone who waits for a perfect ticket. What you'll own Design, build, and operate cloud infrastructure across AWS and Azure for scale, reliability, security, and cost efficiency. Build and evolve backend services in Python and FastAPI, Java and Spring Boot, or closely related frameworks. Own backend reliability and performance across services, dependencies, queues, caches, databases, and external integrations. Build and improve CI/CD pipelines so the team can deploy quickly, safely, and with clear rollback paths. Run production end to end. This includes deployments, monitoring, alerting, debugging, incident response, post-incident follow-up, and capacity planning. Design event-driven and asynchronous workflows using RabbitMQ or similar messaging systems. Use Redis and other caching patterns to improve latency, throughput, and resilience. Operate relational data stores in RDS, graph workloads in Cosmos DB, and NoSQL or vector workloads in MongoDB Atlas. Build useful observability with logs, metrics, traces, dashboards, and alerts using tools such as Datadog and ELK. Improve network and application security. This includes VPC design, secrets management, access control, encryption, and auditability. Turn repeated operational work into code, tools, runbooks, and guardrails that raise developer velocity. Make clear trade-offs among speed, reliability, maintainability, compliance, and cloud cost. Problems you may work on How do we absorb bursts of instrument and workflow data without losing work, creating duplicates, or slowing customer-facing services? How do we preserve sample lineage, permissions, and audit history as data moves across services and cloud systems? How do we make graph and vector retrieval dependable enough to support AI features used in real lab workflows? How do we let engineers ship many services quickly while keeping deployments observable, reversible, and safe? How do we scale across AWS and Azure without building fragile one-off infrastructure or wasting cloud spend? How do we find production risks before customers do, then remove the root cause instead of only treating the symptom? What success looks like in your first 90 days First 30 days Map the architecture, critical customer flows, deployment path, data stores, and main production risks. Ship at least one useful production improvement in a backend service, deployment workflow, observability path, or reliability issue. Join incident response and learn the current operating model. Establish a baseline for the system health metrics that matter most. Understand all AI traces leveraging langfuse and suggesting the feedback pipeline for AI workload

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Scispot? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect