Skip to main content
Back to jobs

Senior Site Reliability Engineer (SRE)

External
TOKENINSIGHT SG PTE. LTD. logoTokeninsight Sg · Midview City, Singapore
S$120K–S$240K/yrFull-timeUnknownToday
Information Technology
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • We are looking for an experienced Site Reliability Engineer who is passionate about building reliable, scalable, and automated infrastructure to support mission-critical platform services.
  • Ensure the reliability, availability, and operational excellence of critical platform services and infrastructure.
  • Design, deploy, maintain, and optimize cloud-native infrastructure based on Kubernetes and Docker.
  • Build and improve observability systems including monitoring, alerting, logging, and distributed tracing.
  • Participate in architecture reviews and provide reliability-focused recommendations for high-concurrency, low-latency distributed systems.
  • Develop and maintain CI/CD pipelines to improve engineering productivity and deployment quality.
  • Lead capacity planning, performance tuning, disaster recovery planning, and resilience engineering initiatives.
  • Drive Infrastructure as Code (IaC) adoption and automation to reduce operational overhead and human error.
  • Define and continuously improve SLI/SLO/SLA frameworks across critical services.
  • Participate in incident response, root cause analysis (RCA), and postmortem reviews for production issues.
  • Collaborate closely with engineering, QA, product, and security teams to continuously improve platform reliability, scalability, and efficiency.
  • Leverage AI-powered tools (e.g., Cursor, Claude Code, ChatGPT) to enhance operational automation, troubleshooting, and productivity.

Requirements

  • Must-Have Skills
  • Bachelor's degree or above in Computer Science or a related field.
  • 5+ years of experience in SRE, DevOps, Infrastructure Engineering, or related roles.
  • Strong knowledge of Linux systems and performance optimization.
  • Proficiency in at least one programming language such as Go, Python, Java, or Rust.
  • Hands-on experience with Kubernetes, Docker, and cloud-native ecosystems.
  • Experience with CI/CD tools such as GitHub Actions, GitLab CI, or Jenkins.
  • Solid understanding of networking fundamentals including TCP/IP, HTTP, and WebSocket.
  • Strong troubleshooting, performance analysis, and capacity planning skills.
  • Experience building automation tools and operational platforms.
  • Demonstrated proficiency in AI-assisted development and operations tools such as Cursor and Claude Code.
  • Technical Stack
  • Container Platforms
  • Kubernetes
  • Docker
  • Observability
  • Prometheus
  • Grafana
  • Loki
  • ELK
  • OpenTelemetry
  • Messaging Systems
  • Kafka
  • RocketMQ
  • Redis
  • Databases
  • MySQL
  • PostgreSQL
  • ClickHouse
  • Time-Series Databases
  • Infrastructure Automation
  • Terraform
  • Ansible
  • Helm
  • Cloud Platforms
  • AWS
  • GCP
  • Alibaba Cloud
  • Tencent Cloud
  • CI/CD
  • GitHub Actions
  • GitLab CI
  • Jenkins
  • Preferred Experience
  • Experience in large-scale internet, SaaS, fintech, e-commerce, or mission-critical platform environments.
  • Experience supporting high-concurrency distributed systems.
  • Strong understanding of distributed system architecture, scalability, and reliability engineering principles.
  • Experience operating multi-region or multi-datacenter infrastructure.
  • Experience managing large-scale Kubernetes clusters (1,000+ nodes).
  • Hands-on experience with Service Mesh technologies (e.g., Istio) and OpenTelemetry.
  • Expertise in Kafka, ClickHouse, and large-scale distributed system optimization.
  • Experience implementing Chaos Engineering practices.
  • Strong background in incident management and large-scale production recovery.
  • Experience with AIOps, intelligent alerting, and automated fault diagnosis systems.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at TOKENINSIGHT SG PTE. LTD.? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect