Listen on Apple PodcastsListen on Google PodcastsListen on SpotifyListen on Stitcher
Season 7 · Episode 7

From Standalone Models to Integrated Data Products with DataArt's Yuri Gubin

As AI continues to evolve, standalone models may give way to integrated data products with built-in AI capabilities. Yuri Gubin, CIO at DataArt, shares his predictions for the future of AI and how it will become increasingly intertwined with data platforms. Learn about the role of open-source implementations and the emerging expectations for mature data products in the AI-driven future.

Episode Transcript

Chet Kapoor: Yuri, welcome to Inspired Execution. You've been at DataArt for 16 years, and we'll talk more about that later. But I'd love to know more about how you got to where you are today. Tell us a story that shaped your career, right? Something that you wouldn't put on your resume, right? That people who are close to you or people that you would communicate with as a way to inspire them. 

Yuri Gubin: Yeah, and thank you for mentioning this. Exactly today, July 31st, 16 years ago, I joined DataArt. So today's my anniversary. That's awesome. Yeah, and I know that people change jobs, but what I did... So first, it started as a software architecture, and I realized that it's my passion to actually create things, build things that work, figure out what doesn't work and fix it. And then I realized that I'm also doing the same thing with teams, with accounts. I help customers do the same, and I also can do the same within the company. So architecting, creating the design, the organizational design, architectural design, it's my passion, what I do. And I transformed from the purely technology guy. I moved towards the leadership and driving something new, changing, transforming DataArt itself, and help our customers do the same. 

Chet Kapoor: How was... Was it hard? Because you're probably still really good at what you... like doing architectural stuff, having people eat from your hands, like inspiring people to do projects, seeing the projects deliver on time. Was it hard to say, I'm just going to teach instead of do? 

Yuri Gubin: Oh, yes, of course. Remember what they say, two biggest problems in software, one is naming conventions, and the other one is regular expressions. Now I can say that the biggest problem in IT is people, because I can code, I can script everything. I can never automate a person, like a real human who will be making decisions or will be listening to you or talking to you or someone who wants to do something, like making sure that you understand it and your team understands that. And the person who is asking for this also understands what he or she wants to build. This is a challenge. 

Chet Kapoor: You know, I talk about that as, and I half-jokingly, just for the record, I say, it's very hard for you to know everybody's childhood issues, right? Because that's what is acting out. And that's the problem, right? That's what makes human beings unpredictable, right? Because software has no childhood issues, right? And so people do. And so that makes it really, really hard. I have very early on realized that it is, you know, as a product person it's about the best products you can deliver, but the people and process part is something that all the customers that I have talked to, all CIOs I talked to, has said they struggle with it. There's no easy recipe to it, right? And it seems like you've had to go through that in your own journey. 

Yuri Gubin: Yep, yep. You're right. And so there is something that I learned is that when you think about the organization where you work as your employer and very formal structure, there will be issues by design. But if you think about the place where you work as a framework and you change your mind, I'm coming from the engineering, from the software. So if you try to think about it the same way that there are components, there are functions, there are groups, there are people, there are interfaces, how the organization works, it helps, see the same perspective, look at it from the same lens. 

Chet Kapoor: Yeah, no, for sure. No, I love that. I love the, let me try. I know how to solve technical problems. How do I take some of the things I already know and modify it to actually think about how I solve the people problems, right? What I would call, and tell me if I'm wrong, take a systems approach to solving these kinds of things. Is that a fair way to put it? . That's great. All right, so next question. We talked about this right before we got started. DataArt is a consultancy, but a different kind of consultancy, right? Because you bring your best practices and you bring your reusable frameworks and reference architectures to do it. What are your clients, what is the biggest problem they're facing today? 

Yuri Gubin: It's a very interesting question. And yes, there are technology, different hypes and trends. But the real, in a nutshell, under the hood, the biggest problem is uncertainty and ambiguity about what will happen in three months from now. If we talk about GNI, if we talk about cybersecurity, if we talk about regulation, or what will happen with clouds and my data living in a cloud, it's that ambiguity and organizations are trying to adjust and learn as they go. And everyone is trying to catch up on the technology front, but at the same time, we have to balance budgeting and priorities and what's really important and try to distill between what is the actual value, what can be done versus what is nice to have and try to see apply some critical thinking to everything that is happening. So this is the real challenge. 

Chet Kapoor: And we'll talk about AI in a couple of minutes. You talk about DataArt being a people-first company, right? As you consult and you run a large team to go off and deliver this, what are some of, and now, and let's just say, now with cloud, right? Because AI is still early. With cloud, what are the hardest problems that you've been solving for organizations? Because cloud does actually change the dynamic on how people organize themselves, right? Yes, of course. - You can include how you solved it for DataArt because that also affects you, right? As well as affect your customers. 

Yuri Gubin: This is exactly right. So one of the questions is, one of the challenges is to, given all the speed and velocity of cloud and how quickly you can iterate and move there, how to do it in the right way so you don't destroy what works and you don't ruin a good initiative by very bad implementation. So the governance, applying, somehow, finding that balance between the velocity, the pace of the innovation, and control and checks and governance and regulation, this is one of the challenges. And the other one is, I think it's even bigger than the first one, is by many, cloud is being perceived as it's just given, it just works. It has 99.99 reliability and why do we need to care about it when there is cloud? No, you have to apply the same, you need to think about resilience, so you have to design with it in mind. 

Chet Kapoor: It's not magical. It doesn't just work by itself. 

Yuri Gubin: Absolutely. 

Chet Kapoor: That's awesome. Did you have to make a shift in how you're organized because of cloud? 

Yuri Gubin: Yes, so we had to, and our first cloud project, they were dating back in 2009 or something. So we have a long history there. So we also have our own internal IT and things that we use internally for data art. So finding that balance between what, which cloud? 

Chet Kapoor: Yeah. 

Yuri Gubin: What is the extent of, are we all in a cloud or there is a hybrid? And now, because we are services and consulting, we live in that environment when we have customers and customer projects. We have our internal projects, our internal IT. How to establish the governance model that handles both scenarios, that we can actually coexist together. How do we control it altogether? So that's one, it was another one. 

Chet Kapoor: That's awesome. Let's talk about AI, right? I'm sure you're, probably 100% of your clients are talking to you about generative AI, right? And you've been very clear. You have a quote that says AI is not merely a tool, but a complex ecosystem. Tell us a little bit more about that. Yeah, 

Yuri Gubin: so AI is very accessible now. You can just go and in no time, start talking to a chatbot. You can see all the magic computer. You can create images and video. Now, thinking about the really live use cases, as a retailer, as a insurance company, as a bank, will your customers pay for a very good poem written by AI? Most likely, no. 

Yuri Gubin: Now you need to make use of this power and beauty in your world, in your organization, with your particular problems and challenges and opportunities that will serve your customers. Yes. Fighting that, there is a long path from the model, the API to talk to model and to make it actually work. So you think about compliance, security and access control and costs and all of this. And don't forget that if model or system or application doesn't work, people don't know about how good it is. So reliability and security, that's why it's ecosystem. It is not just one model. It is the whole thing, including the strategy and going all the way to the organizational level. You need to think about it seriously because you can not just think about it as one nice thing we will implement the chatbot and new revenue will appear. 

Chet Kapoor: It will not work this way. No, I agree. I agree. That's a good way to put it. Let's dig a little deeper into it. This is most CIOs, even the younger ones have actually been through waves, right? They have actually been through mobile. They've done cloud, right? I think they understand that this is an ecosystem and not a tool. Do you, would you agree with that? That it's a little bit more complex than saying let's use chat GPT. 

Yuri Gubin: Yes, it is more complex. And I agree with the waves notation that there are trends, there are hype cycles. So it is there in the industry. 

Chet Kapoor: And so in your experience, where are they getting stuck? 

Yuri Gubin: So by injecting GNI into their processes now or into their product, we have to touch different parts of the organization, data and for sec. And some companies, all of these parts, they live in silos. So there is not enough data. There is not enough, we don't know how to handle AI. So our compliance department, they don't know what data can be shared or what functionality can be used in what areas. And when we deliver the POC, it can be done as the fastest POC, I remember it was two days. Yes. And it is very accessible now, again. So, and now when we have this POC, now we need to think about how to move it forward into production. And now we start thinking about, oh, there are so many different aspects and there are so many different gaps and the organizations might not be ready to address these gaps. So it's. 

Chet Kapoor: I love, listen, we have a biased point of view, right? Our biased point of view is, there is no AI without data. You basically said that. There is no AI without data at scale. You said that, right? It's in silos and you have to do it at scale. And so we definitely subscribe to that as people who are in the data at scale business, right? We think it absolutely matters. So I think the number one problem, if I can just go through and tell me your perspective on any or all of them. One is, can you get the data together in the right way? Right? The second thing is, it is not the traditional, this is not your father's data, right? This is not the traditional structured data. It is the unstructured PDFs, relevant engine, things like that, insights from your data warehouse, like a bunch of different things. And we think that synthesizing all of that to give smart context to a rag based application is really, really useful. So I'll stop there. I have other questions, but would you agree with the statement I just made? Yes, 

Yuri Gubin: I agree. Data at scale, quality and availability of data and boundaries between structured, non-structured, basically everything, images, we have to deal with scans of documents, PDFs, unwritten information, the voice itself, for example, interviews, extracting knowledge from interviews, all sorts of sources of data. 

Chet Kapoor: Yeah, no, that's for sure. That is for sure. So the one, so I'll give you my experience. I think there are two things if I abstract myself beyond technology, right? Because we can both talk about, is RAG going to work? Is there other techniques, right? One shot going to work? Then we can talk about a bunch of, how do you do fine tuning? When do you do fine tuning? How often do you want to use large language models? Do you want to use small language models? Do you want to use teacher models? We can get into that discussion, but let's abstract up a little bit. I find there are two big issues that keep customers from, that enterprises from delivering Gen AI apps. The first one is, I don't think, I think they're doing exactly what you're doing, which is causing a problem. They're not rethinking the problem with Gen AI. They're just inserting Gen AI into what they're doing already. And I think that's a problem because the developers are not thinking of this as a new thing, right? By the way, did that happen with this as well? It happened with the web as well. People were doing client server, two tier stuff, three tier stuff, right? In the web. And the web was different, right? Because of HTTP and things like that. So I think that's problem number one. The second one is a little bit more concerning because I call it, I don't know exactly, I don't have a catchy phrase yet, but after everything is done, somehow people show the power of the will, it still takes them weeks to put it into production. I mean, like somebody has to flip a switch because there's some kind of fear that this is going to blow up in their face, right? They're almost like stuck, like paralyzed. Would you agree with those two comments? 

Yuri Gubin: Yeah, I have seen how Gen AI and models, they are just, again, they're used as a tool, as a replacement for something without rethinking the whole motion, without rethinking the whole system. And with respect to getting stuck, yes. We have seen different situations when, because of the uncertainty on the roadmap or product or vision, even a technology that works, it just takes time because of so many different factors. 

Chet Kapoor: Yeah, no, I agree. And do you think that your, do you think the customers that you work with will get more used to putting something in production or do you think that it will take a while? 

Yuri Gubin: So this is a very interesting question. I have talked to customers who actually think that right now it's, this landscape is too hot. Regulation will catch up. Technology will evolve. Bigger players will consolidate and consume smaller players. Out of 1000 startups, only 12 will survive. And some of the customers, they prefer to actually be more mindful and do something pragmatic and tactical now, waiting for the market to cool down a little bit. The other side prefers to move a little bit faster, not afraid to invest and then, throw it away and do something else. But only one condition, it should be lightweight. It should not be 12 months engagement and large implementation project. Because of the uncertainty, we have to move very, very quickly, fail fast. If it works, it works. If it doesn't, it doesn't. And it acts, so there are kind of two buckets, two ways about how to think about it. I think 

Chet Kapoor: I have a little different view slightly. I think just to make it a little bit more controversial for our listeners, I don't believe, I actually believe it seems like there'll be consolidation. And yes, over a five year or eight year period, there's always only 10 people that survive, right? But I think it is becoming very clear. You know, I'm having conversations with Jensen and NVIDIA with all the different people. And it's becoming very clear that the open source ecosystem will be much bigger at play than it ever has been. I don't think this is just going to be four or five companies, right? And I think, and that's how they're going to build an optionality, not by going with a large vendor, but saying, let me go off and use open source. Would you agree with that or would you disagree? 

Yuri Gubin: It's just different way of thinking about this. I have an opinion, and hear me out, that with time, the models will be so perfect, 80% of customers will not notice a difference. Even developers. And it might be the case that 12 months, 24 months from now, only the group of researchers will actually understand all of the metrics, how they evaluate these LLMs and how they compare them with each other. The rest will be, there will be standards, there will be expectations, and pretty much you replace one with another and the product, the final product will work. And I also think that it might be the case that if you think about relational databases, every database, every data product has an indexing functionality, how to search data. But if you look at it as if you were like 40 years ago, implementation of the index, it's a science. You have to code it. It's an algorithm. It's a very difficult technique to implement the right index. So it might be the case that actually three years from now, five years from now, LLMs and generative AI will become a feature of the data product. And you would expect from a mature data product to have indexes, yes, to be horizontally scalable, yes, and to have Gen AI built in. So there will be open source implementations, yes, of course, but it will be the expectations that the data platform, the product you're using, it will have built-in functionality around Gen AI, around AI, because data will be so glued together with artificial intelligence. 

Chet Kapoor: No, I like that. I like that. We certainly feel that's our point of view on this because I think Gen AI is gonna show up everywhere. And I actually, I'm a firm believer in the OSS ecosystem on this, right? And I think the stuff that Meta is doing is just awesome. This is, Jury, this has been an awesome conversation. Let's go to the final stage of this, which is a rapid fire. I will ask you a bunch of questions. Give me responses as quickly as you can. So, know you enjoy traveling and reading. Where's, what's one place everyone should travel too? 

Yuri Gubin:I think it's New York City because of the depth and breadth of it. 

Chet Kapoor: Yeah, I would agree. What's one book we should all read? 

Yuri Gubin: It's The Iceberg is Melting. It's about change management by John Kotter, as far as I remember. 

Chet Kapoor: Wow, that's awesome. I've not read that book, so I'll have to pick it up. What's the coolest Gen AI use case you've seen so far? 

Yuri Gubin: Oh yeah, when your CEO talks to you in Arabic about things that he never thought about. When you say it for real, it is fantastic, fascinating, and terrifying. 

Chet Kapoor: Yeah, that is cool. What's one thing you do, the one thing do you want AI to automate most in your life? Cooking. It's funny, I asked this question of many guests and everybody says either cooking or folding laundry. It seems like that's the consistent answer. We have more people saying that fold the laundry, but cooking shows up as well. What global challenge do you hope humans and AI can solve together? I think it's cancer and continuous ongoing monitoring and screening. I would agree. As a mother that passed away because of it, that's 100% agreement with that answer. Sorry to hear that. Guri, this has been awesome. Thank you very, very much for your time. We really appreciate it. I think as two people who've been in the AI space and are actually actively in it before and after this recording, I think our listeners are going to love listening to the fact that how our enterprise is thinking about this and as well as about your great background. I really appreciate the time and I'm sure we'll figure out a way to have you back. Thank you. Thank you very much. Thank you.