Our multi-environment cloud infrastructure hosted in Azure is based on micro-services deployed on Docker containers. This allows us to scale our software components in a flexible, resilient and automated way through Kubernetes clusters.
We rely on Spark clusters processing, supported by Databricks, to read incoming data either historical coming from our data lakehouse (Parquet & Delta Lake), or via streaming (Kafka).
The input is processed by our ETLs implemented in Scala distributedly, delivering the analysis outcomes to our ML applications and frontend layer.
We build on MLOps pipelines that are optimized for scale, efficiency, and control. We rely on common Python frameworks (Pandas, Spark) to enable (big) data analysis and features engineering, that are sunk in our feature store.
For model building and training we rely on state-of-the-art libraries such as PyTorch, scikit-learn or ONNX. Every experiment and model deployment is tracked by MLFLow, which allows us to use a fine model selection and standardized offline validation procedure.
Finally, we rely on a complete observability platform powered by Azure Monitor and Grafana that monitors our deployed APIs, providing all the insights necessary for online validation and A/B testing.
We deliver our ML applications with the latest web technologies. Our core is developed in React-Redux, the type safety of TypeScript and the SSR framework of Nextjs. We like to style with CSS-in-JS solutions, like styled-components.
Our backends are polyglots, Go, Python and Node. We are now experimenting with Rust and Web Assembly to speed up our hot paths and deliver the best possible performance.
We have a fully automated CI/CD pipeline and follow the best DevOps practices, implementing reviews, linting and automated testing. As a small team, we follow a flexible kanban-based methodology. We work on GitHub and love anything that makes our lives easier as developers, e.g., GitHub’s co-pilot.