Managing the Data Pipeline with Git + Luigi
Managing the Data Pipeline with Git + Luigi:
from Tumblr http://shiyamaz.tumblr.com/post/112106407828
One of the common pains of managing data, especially for larger companies, is that a lot of data gets dirty (which you may or may not even notice!) and becomes scattered around everywhere. Many ad hoc scripts are running in different places, these scripts silently generate dirty data.
February 26, 2015 at 01:14PM
from Tumblr http://shiyamaz.tumblr.com/post/112106407828