Season 3 · Episode 4
Data Lakehouses, Interoperability, and Accessibility with Tomer Shiran
This episode features an interview with Tomer Shiran, Founder and Chief Product Officer at Dremio. Dremio is a high-performance SQL lakehouse platform that helps companies get more from their data in the fastest way possible. Prior to Dremio, Tomer served as VP of Product at MapR and also held product management and engineering roles at Microsoft and IBM Research. He also has a master’s degree from Carnegie Mellon University as well as a bachelor’s from Technion - Israel Institute of Technology.
In this episode, Tomer and Sam dive into the economics of storing data, how to build an open architecture, and what exactly a data lakehouse is.
Episode Guest
Episode Transcript
Read the full transcript here.
Guest Quotes
(10:32): “I think in the world of data lakes and lakehouses, the model has shifted upside down. Now, instead of bringing the data into the engines, you’re actually bringing the engines to the data. So you have this open data tier built on open source technology. The data is represented in open source formats and stored in the company’s S3 account or Azure storage account. And then you can use a variety of engines. We at Dremio, we take pride in building the best SQL engine to use on the data. There are different streaming engines, like Spark and Flink. There are different batch processing and machine learning engines. Spark is an example of that as well that companies can use on that same data. And I think that’s one of the really important things from a cost standpoint, too, is that this really lowers your overall costs, both today and also in the future as you scale.”
Time Stamps
(02:04): What open source data means to Tomer
(03:14): Tomer’s motivation behind Apache Arrow
(06:42): How Tomer solved data accessibility
(08:43): The unit economics of storing data
(14:31): Tomer’s motivations for Iceberg and how it relates to Project Nessie
(17:06): What is a data lakehouse?
(18:31): What gives Dremio its magic?
(23:39): What cloud data architecture will look like in 5 years
(27:19): Advice for building an open data architecture
Links