Data Engineer
ABOUT US
We are a multi-award-winning team of over 100 engineers, designers, and analysts based in Leicester, with development hubs in Ukraine and Spain. We specialise in bespoke software development, extended teams/staff augmentation, and support as a service.
OUR CLIENT
Our partner is Europe’s leading platform for buying and selling books, CDs, movies, games, and fashion.
The company fosters success by bringing together a team of professionals with diverse backgrounds who collaborate to bring bold ideas to life and find innovative solutions.
We are looking for a Data Engineer to join the Evolved Ideas team.
As Senior Data Engineer, you will own one of our most business-critical data assets: the system that links customer identities across our businesses and powers better decisions in marketing, CRM, reporting, and analytics. You will join our Business Intelligence & Data Engineering team and work closely with Data Engineers and Business Analysts to build reliable, scalable, and trustworthy customer identity data.
Your mission
- Own the end-to-end pipeline that creates the unified customer_uuid across Books & Media and Fashion
- Maintain and evolve our customer identity master data with a strong focus on accuracy, reliability, and production quality
- Improve our probabilistic identity resolution model and make matching decisions measurable, transparent, and explainable
- Build scalable and cost-efficient data pipelines across BigQuery, GCS, and Cloud Run Jobs
- Introduce diagnostics, monitoring, and structured validation for every relevant model change
- Identify and resolve edge cases in customer matching logic before they become production issues
- Work closely with business and technical stakeholders to turn complex matching challenges into robust data solutions
Our Tech Stack
- BigQuery
- SQL
- Python
- Airflow
- Splink
- Google Cloud Storage
- Cloud Run Jobs
- Pub/Sub
Your profile
Must-Have:
- 5+ years of experience in production data engineering
- Strong experience with BigQuery and advanced SQL in large-scale analytical environments
- Strong Python skills for production-grade data engineering
- Solid Airflow experience and a strong understanding of reliable orchestration patterns
- Hands-on experience with incremental pipelines and idempotent data processing
- Experience with probabilistic record linkage or entity resolution in production
- Strong understanding of data quality, matching logic, and precision/recall trade-offs
- A careful, structured, and ownership-driven way of working
- Strong communication skills and the ability to explain technical decisions clearly
Nice-To-Have:
- Experience with Splink and probabilistic record linkage tools
- Experience with Cloud Run Jobs, GCS, and event-driven patterns in GCP
- Experience with Pub/Sub as a source in data pipelines
- Familiarity with data format trade-offs such as Parquet, Avro
- Experience with dbt
- Exposure to downstream BI use cases
- Experience in e-commerce or marketplace environments
- German language skills
YOU CAN LOOK FORWARD TO
- Contributing to a high scale, complex product and seeing the real-time impact of your work
- Healthcare insurance
- Educational budget
- Challenging tasks and professional development, knowledge & best practice sharing