Web Crawling & Scraping Engineer

🌟 We seek a highly skilled and motivated Web Crawling & Scraping Engineer. We crawl over 100k news websites daily and we are looking for someone who is passioned about Web Crawling the same way we are. Curious to explore new ways of handling difficult website cases and automating our crawling techniques.

Functions:

  • Crawling Platform: Design, construct, test and maintain robust, reliable, and scalable crawling pipeline infrastructure.
  • Add an automatic way of fixing non-working crawlers
  • Provide metrics on website coverage
  • Data Pipeline:
  • Design, construct, test and maintain robust, reliable, and data pipeline infrastructure.
  • Automation and unit tests
  • Optimization: Optimize server performance and resource utilization of crawling infrastructure.
  • Regularly review and improve system performance and scalability.
  • Collaboration and Documentation: Maintain accurate and up-to-date documentation of server configurations, procedures, and policies.
  • Provide technical support and training to team members as needed.

Example Tasks:

  • Introduce a new automatic way of crawling a website that does not work with existing techniques
  • Come up with an idea on how to verify why a specific crawler stopped working and fix it automatically
  • Use LLM methods to improve crawling methods

Experience:

  • Proven experience as a Web Crawling & Scraping Engineer or similar role.
  • Web Scraping and Web Crawling Techniques
  • Streaming/batch data processing framework such as RabbitMQ.
  • Solid knowledge of SQL and NoSQL databases
  • Kubernetes / Docker is a must have
  • Strong problem-solving skills
  • Excellent communication and collaboration skills.

Nice to have:

  • Experience with ElasticSearch (OpenSearch)

Your KPIs:

  • Number of non-working crawlers per website (should be small)
  • The time between a crawler goes down and we come up with a fix (should be small)

Compensation and Perks:

  • Competitive salary and equity.
  • Up to 24 days of vacation & 16 days of sick leave/holidays (all fully paid)
  • Learning and development compensation
  • One meeting-free day per week
  • Co-working Budget
  • Training Budget
  • We provide all the necessary equipment to work comfortably and efficiently from home.
  • Yearly company retreats (2024 — Canary Islands, 2023 — French Alpes)

Needed tools:

  • Scrapy
  • Crawlee