Robin Lehmann: Data Unions – A radical new data ownership model

Robin is the CEO & CTO of DataUnion, a startup which is revolutionising Data Ownership.

We explore the opportunities that exist with this new data ownership model and touch on some of the work Robin and his team have done with DataUnion.

Website –


Twitter –

The following is a rough transcript which has not been revised by Ocean Missions. Please check with us before using any quotations from this transcript. Thank you.

[00:00:00] Scott: Thank you for joining us today. I’m going to be speaking with Robin, who is the CTO and CEO of data union. Robin is one of the bedrocks of the ocean, DAO community. One of the first people that I spoke with when. Joined the, the ocean DAO and we’re going to be learning a bit more about data unions and his organization data union.

So welcome to the podcast Robin

[00:00:29] Robin: thank you, Scott, for having me.

[00:00:31] Scott: My pleasure. So I just wanted to um, start off the conversation just with you telling us a little bit about your. And if you could also explain to the listeners who may have heard the term data union for the very first time, what on earth a data union actually is.

[00:00:54] Robin: All right. Yeah. So about myself. I’m a computer scientist by trade. So I worked in the automotive industry for a long time. Went into machine learning around 11 years ago when the first online courses that led to coursera. Happened. So that’s where I learned myself how to do machine learning. Never stopped to love it since then, and have then continued in the automotive industry from developer to program manager to, to learn more about it, to learn how to reteach or to organize projects, how to make them successful.

My last job was at Royce Royce as AI program manager overseeing the development of all AI projects in, in a, in a certain part of. And yeah, so I’m on the web three sides. I joined around 2017 and put my career there from investor and at singularity net for one year, then head of head ambassador of ocean protocol kind of sense.

The ocean protocol ICO happened. So then getting involved in the committee. That was around 2018. So never looked back

so, yeah, I kind of learned over time what the benefits are about ocean protocol and I’m kind of hooked and stuff. And that also brings us to the topic of data unions. So data unions from my point of view are a way to bring people or entities together to share their data as a shared asset.

So an ocean protocol you’ll have the opportunity to list assets on the marketplace. It’s obvious about one entity. So like if I want to list my data, I can go not to the market. And upload my data to some server and connect that to the, to the marketplace, offer my data assets and in a way that that’s good, you know, like, so some, some entities want to do that.

I might have a data set that I want to share this way to them. But from, from my personal perspective, the power of data comes from collaborations so that several entities can, can combine the data together and then offer it as data. And especially. Machine learning. That’s the split on top of it, which I see as one of the main use cases of data the power comes from a lot of data.

So like a lot of data that is combined, that is the unified that is harmonized and that is ready to be consumed for AI. And that’s our definition of data unions in our project is like bringing, bringing entities together for data collaborations, and then especially for AI.

[00:03:15] Scott: You mentioned a couple of key sort of concepts and ideas that are very much ingrained and important to the idea of a data union and also ocean protocol, the first being the idea of a data asset, and that’s a foundational block where you have this data. that is ownable and tradable through ocean protocol. And as that becomes a tradable and shareable and consumable asset You know, just like a building, which has as an asset owner you can also have buildings which have, multiple owners and, you know, the idea there that data union is as essentially, the technological infrastructure that enables these multiple owners to, all share us.

Inside of the data union and presumably also use those incentive mechanisms to help reward people who potentially come through add value to the data asset by, by cleaning it potentially by promoting it and sharing it and finding buyers for it and all those sorts of things.

I just wanted to touch on. Maybe if you could share us a bit of just a bit of an overview of the version, one of data union. I’m not entirely sure if this is vision one, but the, the vision I’m talking about is when I joined ocean, you were working on a version which was categorizing and classifying images.

And yeah, I was just wondering if you could maybe just talk a little bit about, that. First iteration of the project, just to help paint a bit of a picture for the, for the listeners.

[00:05:05] Robin: Planet computer vision is the one that you are talking about, the one about image enhancement, and it’s also all prototype data union

so for planet computers, We have done. We have started with all web we call the mentors. So it’s a web app where you can upload images. You can then annotate the images, basically telling where something is in the image, what is in the images, and then have a third layer on top of that, where people can then see what other people claimed about images.

And then actually verifying this. And by doing that, we kind of created our machine learning data pipeline. So. If this scales up very much, then we can have like thousands of images or millions of images or tens of millions of images coming in, being all part of a, of a data larger data set that understand offered an ocean protocol.

And the important thing is that everybody uses meta mask in this prototype and that’s their identity. So like everything that they do is recorded to this wallet. And then if later says, Or an algorithm is trained on top of the data set and that’s getting sold. Then there’s a distribution of the, of the reward.

So like of the value being created, so I’m always explaining this to my UI X engineer. Billing I’m like trainers that create brains. So like, these are the models that extract the knowledge out of, out of a part of the data union’s data or a part of the images, and then put them into the robot.

So like these other than the inference algorithms that basically later will be connected to, for example, mobile apps or web apps, and then images come into these, these, these apps or the mobile apps, and then are, are looked at by these robots that are powered by brains, created all of. Of the dataunion.

And our we run also from, from what we have experienced with the web app is that a web is not scalable in terms of blockchain technology. The problem is like, And the wallets and meta masks or the people are not really interested in doing it this way. They want to have something that abstracts from that.

So that’s why we also focus in heavily now on a new version of our mobile app, which is then basically abstracting Columbus. And there, we will have them missions. So like you are asked to do certain tasks, either finding data and hunting data or verifying data or challenging or. And that’s all then connected to images as the sister, from our point of view, easiest, medium.

And then you get rewarded for that. So like you get 10 shares in the, in the data set, you can get also direct, direct rewards. You’ll get experienced, leveling yourself up so it’s a, more than a play to earn experience that we have been.

And do submissions are then connected also to our, our, our data portal, but then people can actively ask for certain kind of data for enhancements. And so then we’re closing the loop between the training and verification of algorithms and the people that are actually using them, the mobile app to become a part of all of this.

Yeah. So that’s, that’s all prototype data union.

[00:08:14] Scott: Yeah, very interesting. I like that the project obviously started off with a specific use case. And then sort of saw that, there was a wider need for data unions, more.

And then also , the amount of layers that can come in on top of just the, core concept of a data union, leveraging this distributed technology from the tokenized ownership and rewards perspective. And then how that also.

Actually opens up this idea of data businesses. I mean, I don’t know. Really have the language for this. But I have been thinking a lot recently around this idea of data assets becoming almost like data businesses themselves. And, and I mean, that’s arguably what, what you’re working with there where, you know, Obviously the raw data, the images there is the, the classification side, you know, there is the, the algorithms on top of it.

There’s the rewards and incentive programs to drive. Certain behaviors towards improving classification or whatever that might be.

So, yeah, it’s just a, it’s a very interesting mental model. When you start pulling all these pieces together and quite fascinating, really that, all of this can be spawned from, what is ultimately a relative. Simple sounding idea you know, tokenizing data assets, I’m sure it’s not simple to engineer, but it just leads me to another question.

I mean, This concept of data unions, how did you actually become interested in the idea of data unions

[00:10:00] Robin: so, yeah, the initial idea for me was that I was thinking about. How to clean our planet of trash. So that was like a major point for me that I wanted to find a system for. I played around with Niantic software, so Pokemon go and these kind of games.

And I realized, okay, well, I mean, you can create an augmented reality around it on this planet. And then the idea was like, okay, how can we incentive? With crypto tokens that people pick up the trash and, and remove it, and then also create AI around it, you know, to, to classify things, to find things and make a map around the world.

And then I realized, okay, it’s going to be very difficult to prove that the trash has actually been removed. But the idea that, that, that you could have people working together on collecting the data and then creating algorithms out of it, that kind of. I was like yeah, I was visiting ocean protocol at the time and it wasn’t 2000.

And 20. So and then I realized, okay, I was spitballing with a lot of people Jamie from, from outlie ventures back then and so on and the ocean team and so on. And I realized, okay, I mean, it’s going to be difficult to have the proof of, of trash removal, but it’s going to be very easy or relatively easy to bring people together to do, to do a data collection and a part together.

So yeah, that, that was one part. The other part was. I was working in the automotive industry and we had a lot of data. So we were building data sets for $20 million and we were using them one time. They were sitting on a server, never to be used again. So a little bit like yeah, w what ocean was also built for, but then realizing, Hey, there was a company like just two houses down.

They will be working on something similar and like being connected to many other of them, because I knew the engineers from the other companies and realize, okay, well we never really go into work with all data together. So then that’s like more of a industrial data union case and realizing, okay. I mean, there’s so much value sitting there, there, the companies are paying for the service.

We have to make this into a union together. And then as the third component, I was starting. With the Ocean marketplace, like many other people did when it came out and seeing how, how the individuals were putting up their data sets. A lot of rappers happened a lot of distrust and then kind of also all of this winding down a bit.

And that’s when I, when I kind of claim to claim the ground to make this image that. So, yeah, I mean, it was basically the realization that it’s not going to be like an individual that that can can vet also for that. So like an organization, like the union has so many people in work that it will not really record anything.

So like you have much more trust into that. Then into, into individuals. And it’s also then many more people behind that. So it’s also then basically directly a community. And yeah, this led me to the realization to do data governance. I mean, I know now there’s much more rounded, like streamr or did something and there’s many other projects.

I was not really aware of this at the time. I mean, it came kind of from my own own learnings and my own experience to do something this direction. And I called it and also data union, which. It’s kind of the broader term for all of this. And we stuck with this term to, to, to make a statement. And, yeah, so that, that’s kind of the history just from, from some ideas to make the world a better place.

It turned out into, into a project that you know, list of working on now this, yeah.

[00:13:33] Scott: Awesome. The idea around the trash is a very interesting one. I recently heard about a program running in Nigeria where Basically kids collect trash and then use that to generate money to then pay for school.

And I believe it’s been relatively successful, so thinking about how to leverage incentive mechanisms and various ways to solve some of those more difficult problems is yeah, definitely an area that I’m personally very interested in with regards to all of these new technologies that are coming online.

I was wondering if you could tell us a bit about some of the partners that you’re working with today that are leveraging you know, the work that you are doing through data union. There’s I know there’s a, a few believe in, in healthcare and, and some others maybe around wearables and things like that, but yeah, if you want to just choose one of those as an example, to illustrate.

Some of the work that is happening now that data union has sort of matured more into this kind of macro service provider around data unions. I think it would give a really good understanding for how all of these pieces are coming, coming together.

[00:14:46] Robin: So one is maybe one example that I would like to touch on is like we’re working with UNICEF or one of the organizations called Ooma, your migraine and the Alma youth. So there it’s about young people in, in, in countries that are developing upcoming countries that want to make an income. And it’s about flying drones to, to monitor our, how trees are growing for carbon emission verification, you know, so that you can set the carbon credits or like, as you said here, such an example about.

Young children collecting, collecting litter tour to go to school like this are all, all of their projects. And we are now working together with them to enable them to have this, this income sharing, you know, like to enable them to have the data being used for algorithms to enable them to use the ocean protocol technology underneath it.

You know? So like that’s, that’s a very important part. And. Also helped them connect to like with data, data, sharing, data enhancement. So, and with different partners, you know, to, to unite them around then also one data talking one data asset, which can then share the, the, the value that is being created amongst all of these different parties.

So that’s like one interesting example and they are like so many other ones, but like the thing that generally sticks with all of this. They want to use ocean protocol. They want to have. The sharing economy around, around data amongst people, amongst companies, but they are not necessarily the technology kind of people, you know?

So like they have knowledge around their domain. So they, they know a lot about that. They have a good network, you know, they have lots of other people, which we would be interested in, in using the services that can come out of the data that can bring in the data. But the whole technology part of it, that’s like just too abstract, you know, like blockchain technology, machine learning, all of this is, is too much.

So we are very much trying to abstract from that. Like are our optimal use cases like a per one, single person was a domain expert coming to us and getting a mobile app on the backend and everything running for themselves. The data set being listed, the data set, being usable for machine learning.

[00:16:59] Scott: That sort of follows on to, to my next question relatively nicely

you know, if there was a recipe for a successful data union what are sort of some of the things that, that we might see on that, on that list of on the recipe?

[00:17:15] Robin: Yeah. So we are still experimenting. So I cannot definitely give an answer to that. I mean, but what I think is important is like to, to, to abstract from the blockchain technology’s important piece so that the people can really contribute to the data union without having to care about that. This also involves the rewards that they get to be very easy handable so like, if it’s a data.

That is the reward. It’s going to be very difficult for somebody to understand the value. And so that’s, that’s a key piece of learning. So abstracting from this, giving them the opportunity. For example, if you upload this and this many images, you can plant a tree or you can contribute to to donate to a charity or or like you can pay all directly in your local currency like this, this is one key learning.

The next key learning is that all of this machine learning and, and the like, around it is also very complicated. So they want to have this as a service, you know, like that, that basically they. It’s the data, but they don’t have to care too much about how it’s then going to be used. But on the other hand, it’s also very important to make this a successful thing.

So like just offering data from our point of view, it’s not, it’s not enough. So it has to be then used in insights and these insights also have to be then sold.

But yeah, I mean, today, I would say it’s still an experiment, you know, like we, we don’t have a data union yet in the whole blockchain space, which is actually. From my point of view, very successful. So there is still a lot of learnings to be done. Yeah. Well, so I

[00:18:49] Scott: mean, I liked the fact that you driving, what is a very well can, can be sort of viewed as quite a highly technical and very abstract you know, product like ocean protocol.

And then just looking to basically cut through all of that, to deliver. Something a unique and a value to end users. And I think that in of itself, I mean, I can imagine one how, how, how, you know difficult that journey will be. And then how many you know, interesting challenges will come up and need to be overcome and all those sorts of things, but equally how valuable.

It’ll be once it does does reach that level of maturity and scale. So why don’t we sort of just fast forward for, for a moment and imagine that we have gotten to the point where, you know, this, this, all of the software is being built and all the mechanisms have been kind of configured in such a way that, that, you know, we can start to produce and create.

Many, many different data unions. What are some of the use cases that personally excite you

[00:19:59] Robin: so, yeah, for me, one of the use cases that I, I really like is if you have restaurants, let’s say you have restaurants in a city and they want to optimize the way that they order food, you know, like, so to optimize the amount of food that.

Then they can come together as a data union and create an algorithm together on top of their data. So like all they collect all the data that is produced by the restaurants or all the weather data, all the tourism data and United in a data unit to create an algorithm that helps them personally to predict how much should I buy?

How much, how many customers will I have? What will that. Tomorrow. And also, and or what will I sell to next week? So like to make this kind of a, of an organization where we’re businesses that are collecting a lot of data today already can come together and create an algorithm to make their business much better.

So like, that’s something I see as a really important use case. Another one that I really like is now the Tesla is coming up with a robot. So the robot will have to learn a lot about the whole. So here, here, we kind of have two parts, you know, that they are partnering with large enterprises, which are going to collect data from them in the web 2.0 style by harvesting from, from individuals, you know, all the data that is there.

And then the enterprises are going to take all the value from that. And the other side is there that we can collect with Tesla to make a data unit around us and all like everybody maybe sets up a similar camera system, you know, to how the robot works. But then collect data of their daily activities, you know, and, and start contributing that to a data union

and then the algorithms are getting created out of that. And then these robots, which are probably going to be so prevalent in, in, in the future, in all, in all the, in our everyday life are then getting powered by the algorithms that are created by data unions, you know, All the value from that is, again, coming back to the people that are actually contributing to that, like finding the quirks challenging the algorithms, making them much better.

Yeah, so, so these are like two potential futures that I see basically the one where the enterprises will continue to, to, to collect the data and harvest it. And create items out of it. And the other one is like where people are taking over this kind of process as data unions. And also like companies might join.

You, stay on ANSYS does not have to be dressed individuals, but yeah, in a way, being able to capture the value of this AI is something that I think is important for, for a better future for us as a humanity. Then if the enterprises do that, they will do it. They will try for sure. You know? So like it’s, it’s up to us to decide whether we want to do something against it, or like benefit from it by becoming a member of such data union.

And that’s where I think yeah, the future’s heading, you know, like, so whether we can decide to, to unite as as the owners of the AI and the data. Or we will still be giving it away in the future. So this is a very interesting topic that I’m trying to try and to figure out how, how we can advocate this to the people, you know, that they understand it’s so important to, to become part of this data unit instead of being the, being the, the, the data slave, basically for all the enterprises, because in the auntie, you know, like the enterprises are very efficient, they are very um, To take your data.

I mean, web 2.0, has, has shown how, how well they can do. And yeah, I, hopefully we can overcome this. So that’s why we also drastically trying to expand, you know, outside of the crypto space with mobile apps and so on to, to bring this to everyday people,

so that’s basically, yeah, that’s, that’s where I see things heading in the future. Lots of difficult decisions and lots of advertisements. But in the end, I’m quite certain that the concept, you know, of, of being able to, to systematically take advantage of the data and, and enhance things to collaboration is just going to be convincing everybody.

That’s my approach.

[00:24:14] Scott: Yeah. And it’s a concept that strikes to the heart of the whole decentralization movement. Really? Doesn’t it? I mean, I’m just looking here at the helium networks website, which is basically they, they call themselves a people powered. Network where they’re building out a wireless network all around the world and they’ve implemented over half a million low band wifi routers around the world.

Right? So that is, that is an example where the same sort of. Concept the same approach is being implemented in the physical world, which is actually leading to real world physical outcomes, which is over half a million of these devices placed all around the world and you can actually go on and, and see a map of the world and where they’ve, they’ve set them up.

So, I mean, that’s, that’s for low band wifi. You know, this, this is basically applying that same principles to data and data creation, data ownership. And you know, this conversation is, is being converted into data. As we speak, you know, everything around us every day, we’re creating more data. So it is I believe we’re, we’re very much at a fork in the road.

The centralized model has well and truly got a headstart on us in this . But we are now becoming too, or slowly approaching this point where the alternative option. Is actually becoming available. And so identifying, you know, how that’s gonna play out and you know, where we can start to leverage the ownership and control and the shared ownership and shared rewards of, of all of this data that’s going on around us.

There’s definitely going to be a you know, a very interesting thing to, to see . plus. Hey, thanks so much for taking the time to come and chat to us about the work that you’re doing and just about this kind of concept of data unions more broadly. I, I suspect it’s a, it’s an idea which would be new to me.

People coming from outside of the ocean ecosystem, that may be more familiar to those who are more familiar with the ocean. If people are interested in learning more about data union, where, where can they.

[00:26:42] Robin: So, yeah, I mean, let me, let me, the question, you know what for wonderful moment. It’s like actually we working with helium, I’m not also on a project.

So like with some other projects from the ocean ecosystem, it’s like this half a million, the device that you’re talking about, we also working on making a data unit for these devices, you know? So like that’s addressed just, I just got reminded when you, when you talk about it. So like this, this is a thing that we are not also integrating into a nature yet.

On the weekend, there’s upcoming hackathon from Sovereign Nature initiative, they call it it’s the nature of data union, making all of the nature data you know, collectible and accessible in one data union. I forgot about issues case before. So like that’s also very exciting and that’s where we wanted them make it possible to everybody can bring in.

Their own data set up, you know, like if you have a temperature sensor at home, or you’re collecting images of trees that all of this can be in one data union tool, we have a sustainability data, data unit around the world. But yeah, if you want to, if you want to connect to us we have social media channels all over the place.

So like we have a telegram channel, we have a discord, we have a Twitter channel and we have our all printed computer vision data going in. So like check those. And you will be updated in our social media channels about the upcoming product releases. We also have our all data set on the Ocean marketplace if you’re interested in correcting something.

So we also make going to make some progress in that direction and in the new. With before, there’s going to be some changes there. And basically, yeah, we are very much interested in growing our community at the moment. It’s a little bit a difficult thing to do. You know, like we, we are looking basically for, for new data.

These data unions need and communities of contributors. So the only for data you’re in foundation, you know, if you like the idea, then please, please join us. Like in these channels and connect to us, let us know if you have ideas for your own data units and all like, this is all. W more or less, very open to, to connect to people that, that have ideas and to, to help them to do that.

In the future, we want to also launch our own token and then have a grants program on, on this, for this, on our own so that we can help people to build these out. Yeah. So I’m very accessible in discord, in telegram onto it to wherever you want. Come, come to, come to me. Talk to me if you, if you like the idea, if you have some, some ideas.

And join our community is w wherever you are, your preferred choices, I will be looking forward to, to get to know some more people out of the ocean ecosystem, or also from outside of that, and to connect to more people and bring in more ideas and more people that, that help us build all of this ideas and make it successful.


[00:29:26] Scott: Thanks so much Robin. And we will provide links to all of those channels in the show notes as well