Senior Software Engineer, Agentic Systems

External

Horizon3ai · Remote

Full-timeRemote2d ago

Application SecurityAWSLLMsNeo4jObservabilityPython

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're building an autonomous, black-box web application penetration tester. It crawls and attacks real production websites the way a skilled human pentester would, finding broken access control, injection, XSS, SSRF, SSTI, and more, under a strict production-safe, no-false-positives mandate. We have deep offensive expertise on this team: people who know exactly how to find and exploit these vulnerabilities by hand. What we need is an engineer who can turn that expertise into autonomous agent capability , the reasoning, orchestration, tooling, and evaluation that lets an LLM-driven agent do this work reliably, at scale, and unattended. You'd own and evolve the attack-agent layer: the part of the system that decides what to probe, forms and tests hypotheses, exploits, and verifies, without false positives and without touching anything it shouldn't. This is a build role, not a research role. We use models surgically, deterministic-first, LLM-as-scalpel , and the hard problems are in engineering reliability, not chasing benchmarks. Essential Functions Build and evolve the agent harness and orchestration that turns an LLM into a reliable autonomous pentester, the loop that reasons over an application, forms attack hypotheses, acts, and verifies results. Design the tools and tool-shaped feedback the agent uses to probe and exploit, and the structured-output and validation layers that keep it reliable (e.g., hook-enforced mandatory validation, schema-constrained outputs). Translate the team's offensive expertise into repeatable agent capabilities - partnering directly with our attackers to encode how they think into something the agent can do consistently. Own and grow our evaluation infrastructure : benchmark suites, a failure-mode taxonomy across the pipeline (discovery → hypothesis → exploitation → verification), and regression detection, so we actually know whether the agent is getting better. Manage LLM inference in production: model selection, prompt and context engineering, and keeping cost and latency under control (we run on AWS Bedrock with centralized cost tracking). Hold the line on production-safety and no-false-positives , every finding the agent reports has to be real and reproducible. Competencies/Requirements 5+ years building production software, with strong Python . Hands-on experience building LLM-powered applications or agents , tool use / function calling, structured outputs, multi-step orchestration, and the glue that makes it all hold together. A track record of making LLMs reliable in production , you've wrestled nondeterminism, designed around model limitations, and shipped something that worked when it mattered. Real experience with evaluation : you've built or owned the harness that tells you whether a model or agent change is an improvement, not just a vibe. Strong instincts for prompt and context engineering , and the judgment to keep the model's job small and well-scoped. Solid software fundamentals - testing, observability, and the discipline to keep a complex agent debuggable. Ownership mentality, comfortable owning a critical, fast-moving subsystem end to end. Desired/Nice to Have Working knowledge of web application security , broken access control, IDOR/BOLA, SQLi, XSS, SSRF, SSTI, enough to collaborate fluently with offensive engineers. Experience building eval harnesses or benchmarks specifically for agents (synthetic environments, CVE-based test targets, capture-the-flag-style scoring). Experience with agent frameworks , and strong opinions about when not to reach for one. Familiarity with graph data models (e.g., Neo4j) for representing application state and attack context. What makes you stand out: You've shipped an autonomous agent that did real, valuable work unattended in p

Benefits

Remote work options

Additional Information

Get to Know Us Horizon3.ai is a fast-growing, remote cybersecurity company dedicated to the mission of enabling organizations to proactively find, fix, and verify exploitable attack vectors before criminals exploit them. Our flagship product, the NodeZero™ platform, delivers production-safe autonomous pentests and other key assessment operations that scale across the largest internal, external, cloud, and hybrid cloud environments. NodeZero has been adopted by organizations of all sizes, from small educational institutions to government agencies and Global 100 enterprises. It is used by ITOps/SecOps teams, consulting pentesters, and MSSPs and MSPs. We are a fusion of former U.S. Special Operations cyber operators, startup engineers, and formerly frustrated cybersecurity practitioners. We're committed to helping solve our common security problems: ineffective security tools, false positives resulting in alert fatigue, blind spots, "checkbox" security culture, the cybersecurity skills shortage, and the long lead time and expense of hiring outside consultants. Collectively, we are a team of learn-it-alls, committed to a culture of respect, collaboration, ownership, and results.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at horizon3ai? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect