Additional Information
Position: Colo LL Strategic Specialist - Compute
Business Area: Infrastructure
Job Summary:
Squarepoint is looking for a talented and highly motivated Ultra Low Latency Platform Engineer to provide solutions across Squarepoint's global colocation (COLOs) estate consisting of 400+ servers across 30 global sites. The candidate will be responsible for project delivery, support escalations, monitoring, automation, security, documentation, and capacity management for Squarepoint's low latency infrastructure. This will involve collaborating with our business partners, application owners, clients, vendors, and internal teams (SRE, Network, Application Support and Application Development, Quants, etc.) to deliver end to end solutions in a timely manner .
Manage systems efficiently at scale through standardization, automation, testing, and in-depth monitoring
Enforce development standards for source control, testing, and continuous integration for infrastructure, OS, patches, and configuration management
Manage a distributed compute environment and multiple petabyte-scale storage systems
Install, manage, and monitor the Linux operating system (RHEL based)
Troubleshoot complex hardware and software issues throughout the Squarepoint technology stack
Create self-healing systems and automated recovery processes
Respond to system incidents and participate in on-call rotations
Conduct root cause analysis of incidents and outages
Reduce operational toil through the development of user-driven automated workflows
Work with business owners to regularly re-prioritize the book of work, while delivering both tactical and long-term objectives
Required Qualifications:
8+ years of experience working with Linux (RHEL/CentOS/Rocky preferred) in a large complex or niche environment with the following areas of focus: operations, systems engineering and systems performance.
Deep knowledge of server Architecture: HP, SuperMicro, Dell, various overclock servers.
Low latency network interfaces and kernel bypass (configuration and optimization): Solarflare with onload, Mellanox with VMA.
System hardware/OS tuning and performance troubleshooting, understanding of CPU architectures.
Experience with build and configuration management tools, specifically Chef or Ansible.
Experience with observability tools, specifically Grafana and Prometheus.
Highly motivated and a keen eye for scripting and automation in Python, Ruby, and Bash etc.
Deep knowledge and experience of server network stack configuration, tuning and troubleshooting including TCP, UDP(unicast/multicast), NTP, PTP, wireshark/tshark
Strong communication: verbal and written.
Critical thinking and problem-solving skills to tackle troubleshooting the unknown, glitches and the obscure.
Well-organized, proactive, resourceful, able to handle a fast-paced environment, question the status quo, accountable and possesses an ownership mindset.
Good understanding of trading venues such as Nasdaq, LSE, Euronext etc.
Degree in Engineering, Computer Science or related experience.
Well-rounded understanding of network architectures