Totango Engineering

Chronicles of a Distributed Data Pipeline (part 2)

Quick recap So... in part 1 we had these daily data pipelines running on a bunch of servers. Jenkins schedules the daily runs and Luigi manages the logical flow of each pipeline. The system works, but it's starting to show strain as our data grows. There's a new requirement to

Chronicles of a Distributed Data Pipeline (part 1)

In the beginning Here at Totango we crunch loads of data. Where it’s possible we try to do this in realtime, however inevitably most of our meaningful analytics are processed in batch pipelines on a daily or hourly basis. When Totango was in its infancy these pipelines were basically