The Hadoop ecosystem has improved markedly over the past few years. Moreover, MPP databases seem to slot in nicely as complementary tools to map-reduce batch jobs, in that they allow analytics teams to easily query massive structured data sets.
Rex Gibson, Manager of Data Engineering at Knewton and Scott Hoover, Data Scientist at Looker walk through how these pipelines work. They discuss:
- their technology and data stacks
- possible drawbacks to Hadoop + Redshift
- the merits and drawbacks associated with making data processing and querying more “democratic.”