Skip to main content
Back to jobs

Web Scraping Engineer - European Public Procurement

External
nessolabs logoNessolabs · Indonesia
Full-timeRemote1mo ago
AWSCloudflareHTMLPlaywrightPostgreSQLPython
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Build and maintain async scrapers (Python + Playwright) against Italian and later European public procurement portals (Maggioli PortaleAppalti, ANAC, MePA, and others)
  • Handle real-world challenges: JSESSIONID session management, FriendlyCaptcha/Mosparo anti-bot, Cloudflare WAF, IP rotation with rate limit backoff
  • Parse Italian data formats - amounts (€ 1.234.567,89), dates (DD/MM/YYYY, textual), CIG/CUP identifiers with placeholder detection
  • Extract and process documents: PDF, .p7m (PKCS#7 signed), ZIP/7Z archives, with OCR fallback
  • Integrate scrapers into our Prefect orchestration pipeline with monitoring, alerting, and anomaly detection
  • Work with PostgreSQL, Supabase, Clickhouse, and S3 for dual-sink storage with upsert/idempotency patterns

Requirements

  • Strong async Python - you think in asyncio, not time.sleep()
  • Playwright or Selenium experience - you've intercepted XHR responses, handled SPAs, and debugged timing issues
  • Resilience mindset - retry with backoff, graceful degradation, circuit breakers. Your scraper doesn't crash at 3 AM.
  • Comfort with messy HTML - you can write a multi-strategy extractor that handles / , / , and / on the same site
  • Data parsing skills - Italian locale, date formats, CIG validation, document type detection
  • Bonus: experience with Italian PA (Pubblica Amministrazione) portals, ANAC/PVL datasets, or OCDS data formats
  • Tech stack
  • Python 3.11+ - Playwright - httpx - BeautifulSoup - Pydantic - SQLAlchemy 2.0 - PostgreSQL - Prefect - AWS S3 - Supabase
  • How we hire
  • No whiteboard algorithms. We'll send you a hands-on technical assessment: a mock procurement portal with real-world challenges. You build a scraper. We evaluate the code.

Benefits

Performance bonus

Additional Information

We're building the data backbone for European public procurement. Our platform aggregates tender data from 100+ e-procurement portals - each with its own quirks, anti-bot protections, and legacy HTML. We're looking for a scraping engineer who can navigate this landscape: someone who's comfortable with headless browsers, knows how to handle sessions and CAPTCHAs, and won't panic when the same platform serves three different HTML layouts across pages.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at nessolabs? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect