Staff Platform Engineer - Observability

External

Unknown · [object Object]

Full-timeRemoteToday

KubernetesGoogle Cloud PlatformGrafanaPrometheus

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are hiring a Staff Platform Engineer with a dedicated focus on our Observability ecosystem for our Platform area in Brazil ! In an environment of rapid growth and high-scale distributed architecture, your mission is to transform Observability from a passive toolset into a strategic asset using open source standards. You will act as an architect of efficiency and reliability , building a global platform that empowers engineering teams to "own what they build" with confidence. We are moving beyond basic monitoring to build a comprehensive "Observability as a Service" ecosystem. You will be responsible for evolving a self-service platform that balances performance with cost-effectiveness, solving complex challenges related to high-cardinality metrics, log retention strategies, and distributed tracing. We strive to eliminate friction. You will design the "Golden Paths" that allow developers to instrument their code instantly and gain high-fidelity signals without operational overhead. You will have contact with the most popular and modern technologies in the cloud native industry, such as Grafana, OpenTelemetry, Prometheus, Thanos, Loki, Tempo, Pyroscope, Kafka, Kubernetes, AWS, Github Actions, ArgoCD, Crossplane, and Hashicorp Vault. You will also be able to code in Golang, Ruby, or Python to build and maintain our products and tools.

Responsibilities

The team focuses on enabling engineering teams to own and run their decisions end-to-end, rather than taking over support for a set of services. We are responsible for building the necessary tooling and abstractions that simplify service lifecycles. By fulfilling this role, you can expect to:
Take full end-to-end ownership of our observability stack, ensuring it remains resilient, scalable, and capable of supporting our rapid growth
Evolve and maintain a cloud-native infrastructure based on Kubernetes, ensuring that our foundation is always at the state-of-the-art
Bridge the gap between complex infrastructure and developer experience, building the necessary abstractions so teams can manage their own observability lifecycle seamlessly
Contribute to establishing and maintaining standards, guidelines, and best practices, and formalize them through processes like RFCs (Request for Comments).
Live the mission: inspire and empower others by genuinely caring for your own well-being and your colleagues. Bring wellbeing to the forefront of work, and create a supportive environment where everyone feels comfortable taking care of themselves, taking time off, and finding work-life balance

Requirements

This role may fit if you have:
Proven technical experience with observability practices and tooling (metrics, logs, traces, profiling);
Proficiency with a major cloud provider (AWS, GCP, or Azure) and its ecosystem of services for building cloud-native applications;
Deep knowledge of Kubernetes and its ecosystem;
Hands-on experience of tools like Prometheus, Thanos, Grafana, Loki, Tempo, or similar open source solutions;
Hands-on experience with OpenTelemetry infrastructure and instrumentation;
Excellent analytical and problem-solving skills, and proven experience in identifying solutions for complex problems.
The ability to translate technical complexities into clear, actionable guidance for different stakeholders, fostering a culture of collaboration and knowledge sharing.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Unknown? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect