There are thousands of data businesses across industries, across a complex value chain. Some companies advertise that they curate data better than others, some advertise that they have exclusive access to X data set, some advertise that they offer the best analytics, and some advertise that they are the easiest to use.
The vast majority of these businesses do not go on to become large successes — even if they execute well and build a great data set — because they do not have a defensible moat.
This post walks through the different moats that exist in data businesses — and how a few dozen companies found defensibility and became $1+ billion data businesses over time.
Before jumping into what moats exist, I’ll outline some features that companies focus on that are not defensible:
All of these may improve the customer’s experience, and they may be very useful features — but they are not defensible enough to create a large business around (despite hundreds of companies trying to use these to differentiate). Ideally, companies may use some of these features to smooth their sales cycles on their paths to establishing a moat, but these are not sufficient themselves as they are easily replicable.
Almost every data business that is a $1+ billion enterprise has a data currency aspect to its model. A data currency is used by two (or more) parties that rely on a particular data set to complete a transaction; meanwhile the data company that controls the currency takes a tax on its use.
Once established, these businesses have massive network effects: the existence of the currency unlocks a large number of transactions, and it is difficult for any company to stop using the currency, because that’s what everyone else is using.
Examples of $1+ billion companies that have become data currencies:
If a company can become a currency, this is by far the most defensible business model in data — as it creates a sticky, multi-sided network. As a result, many of the biggest companies in data have this as a foundational component of their business model.
Another approach that can create large data businesses is navigating the long-tail of relationships. In these business models, a company focuses on a large investment in making thousands of partnerships and integrating data across them, and bets that it does not make economic sense for a second company to attempt to replicate that work once they have become an incumbent.
The primary risk in this business model: if a large portion of value can be captured with just a handful of relationships, the defensibility only applies to the value of the long-tail portion. Long-tail aggregators should assume that they’ll be disintermediated at the top of the market, and need to ensure there’s still a big enough prize if their moat only protects the long tail.
Some examples of multi-billion dollar companies built on this moat:
Many companies aim to have proprietary, differentiated data. While this can work as a short-term strategy, it is usually difficult to build a moat around. Typically there are substitutes to the data that exist, so gaining enough proprietary data that another aggregator can’t do the same is difficult.
This section will walk through various models of building a proprietary data set that can be defensible, and their pitfalls.
One approach is for a data aggregator to create exclusive relationships with data originators. This is difficult: if the aggregator proves that the data is valuable, the data sources generally try to capture the value (rather than allowing the aggregator to do so); as a result, it works best in industries without a single dominant data source where no individual party has enough market power to extract concessions from the aggregator.
There are a few examples of this:
Another approach to proprietary data is “give-to-get” models: companies have to share data, and in return they get access to data that has been shared with them. There are a number of industry-specific benchmarking tools and “data coops” that follow this model, and a handful of $1+ billion companies:
Another approach is proprietary data creation. This is also a risky model — as other companies can often create similar data assets and make similar investments — but there are several examples of proprietary data assets that have stood the test of time:
“Exhaust data” is data that comes from another line of business. While it is a near-impossible strategy to build from scratch, the largest data companies in the world generally employ this strategy.
***

Data businesses are fiercely competitive, both with other data aggregators, but also with the other players in the value chain (such as the data sources), so moats are foundational to building a large data business. In fact, many of the best data businesses employ multiple of these moats and have several layers of defensibility.
In the decade ahead, there will be dozens more $1+ billion data businesses created, and thousands more created that won’t hit that level of success. The ones that succeed won’t just be the companies with the most comprehensive data or the most curation, but they’ll also require thoughtful strategies designed around strong network effects.