Team Lead - Production Engineering
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Oversee and govern production issue management across the team, acting as the senior escalation point for complex or high‑impact issues.
- Own overall production health for the service area, ensuring monitoring coverage, alert quality, and operational standards are consistently met.
- Ensure consistent operational readiness across platforms and regions through process ownership and team coordination.
- Help set observability standards and priorities, ensuring the team delivers meaningful, actionable monitoring aligned with platform risk.
- Ensure effective change governance, balancing delivery velocity with platform stability and risk management.
- Coordinate major incidents at a leadership level, ensuring appropriate technical ownership, communication, and stakeholder management.
- Own problem management outcomes, ensuring lessons learned are embedded into processes, tooling, and team priorities.
- Define reliability improvement priorities and roadmap, aligning team effort with platform risk and business needs.
- Act as the primary interface between production engineering, delivery teams, and senior stakeholders.
- Prioritize and sponsor automation initiatives, ensuring team capacity is focused on the highest operational value.
- Experience / Competences
- Essential
- Educated to degree level or equivalent combination of education and experience.
- Solid understanding of AWS operational environments, including load balancers, regional failover behaviour, instance lifecycles, and managed databases.
- Experience of supporting business critical front/mid office applications.
- Deep knowledge of market data flows, instrument definitions, pricing mechanisms, and session‑based connectivity.
- Ability to interpret complex application logs and diagnose backend issues with accuracy and speed.
- Operational & Analytical Essentials
- Strong root‑cause analysis capability with the ability to evaluate symptoms, isolate faults, and determine remediation paths.
- Solid experience in incident management, major‑incident coordination, and structured problem‑solving.
- Demonstrated ability to work across regions, managing concurrent issues, escalations, and stakeholder communications.
- Clear understanding of change‑management disciplines including risk assessment and deployment validation.
- Familiarity with observability tooling (e.g., Grafana, CloudWatch, ELK, Splunk), including metrics, logs, dashboards, and alerting used to assess system health and reliability.
- Collaboration Essentials
- Proven ability to work across multidisciplinary teams (Business, Operations, Developers, DevOps).
- Strong customer‑focus and ability to communicate complex technical issues in a business‑friendly manner.
- Comfortable supporting global operations and adapting to multi‑region workflows.
- Reliability & Platform Engineering (Evolving)
- Demonstrated interest or experience in applying SRE principles such as reliability metrics, automation, and continuous improvement within a support or operations role.
- Experience contributing to improved mean time to
Benefits
Additional Information
Group Overview The TP ICAP Group is a world leading provider of market infrastructure. Our purpose is to provide clients with access to global financial and commodities markets, improving price discovery, liquidity, and distribution of data, through responsible and innovative solutions. Through our people and technology, we connect clients to superior liquidity and data solutions. The Group is home to a stable of premium brands. Collectively, TP ICAP is the largest interdealer broker in the world by revenue, the number one Energy & Commodities broker in the world, the world's leading provider of OTC data, and an award winning all-to-all trading platform. The Group operates from more than 60 offices in 27 countries. We are 5,300 people strong. We work as one to achieve our vision of being the world's most trusted, innovative, liquidity and data solutions specialist. Role Overview The Production Engineering Team Lead is responsible for leading a team of Production Engineers, accountable for the stability, reliability, and operational performance of business‑critical trading platforms. This role combines hands‑on technical leadership with people, priority, and process ownership. The Team Lead ensures production issues are handled effectively, operational standards are consistently applied, and reliability improvements are delivered in line with platform risk and business needs. Acting as the primary escalation and coordination point, the Team Lead oversees major incidents, change activity, and problem management, while ensuring clear communication with engineering, platform, and business stakeholders. As the operating model continues to evolve, the Team Lead plays a key role in shaping an SRE‑aligned production engineering function, embedding reliability‑focused practices, reducing operational toil, and developing engineers toward higher levels of production ownership and technical maturity.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at tp? Share your experience