Skip to content Skip to sidebar Skip to footer

Recurrent Machine Learning Etl Using Luigi

Today, running the machine learning job I've written is done by hand. I download the needed input files, learn and predict things, output a .csv file, which I then copy into a data

Solution 1:

Your pattern looks largely correct. I would start by using a cron job to call a script that triggers the Load task pipeline. It looks like this Load task already verifies the existence of new files in the S3 bucket, but you would have to change the output to also be conditional, which could be a status file or something else if there is nothing to do. You could also do this in a higher level WrapperTask (with no output) that just required the Load task only if there were new files. Then you could use this WrapperTask to require two different Load tasks and which would respectively require your Transform1 and Transform2.

Adding in containers... what my cron really calls is a script that pulls my latest code from git, builds a new container if necessary, and then calls docker run. I have another container that is always up running luigid. The daily docker run executes a shell script in the container using CMD that calls the luigi task with the parameters needed for that day.

Post a Comment for "Recurrent Machine Learning Etl Using Luigi"