Clicky

Posts

all articles

Converting a Kaggle notebook to a Kedro Pipeline

This tutorial is aimed at Data Scientists who want to convert Jupyter Notebook to a Kedro pipeline. Notebooks are a great way to experiment and explore data, figure out what machine learning model works well and illustrate outcomes. But it is very hard to maintain notebooks, they don’t play nice with repositories, testing can’t be automated and you can’t reuse code from them or integrate them into a bigger workflow. So Kedro takes all these good things we have learned from software engineering and applies them to data science projects.

Read more…

Setting up a Virtual Environment and VS Code for Kedro

Utilising best software engineering practice with Kedro starts with setting up a virtual environment and an IDE to work with. This short guide shows you how to get started with Python’s built in venv environment manager together with VS Code to have a Kedro project set up in no time. I am assuming you have Python 3.7 or greater and VS Code installed. I am using Windows for the prompt examples, for macs/linux replace python with python3 most other commands should be pretty similar - your prompt will look a bit different too.

Read more…

Tools Infrastructure for Production

Tools Infrastructure for Production Working in a team on a project that will span several months or years, you will need to create an infrastructure to support your team as the codebase grows. In this article I will discuss a bunch of tools, what they do and why you would need them. Every project, team and business is different though, so this is part of my experience and you will need to tweak the selection of tools and their priorities to your own use-case.

Read more…
1