There are three foundational bottlenecks to developing AI: algorithms, computational power, and data. While the first two have robust markets around them, the process of obtaining data for training AI is currently a wild west — suboptimal for both owners of data/content as well as developers of AI. Shaper Capital is thrilled to launch Protege to address this, alongside Bobby Samuels, Richard Ho, Ray Shi, and Engy Ziedan.

***

The process of getting the right data for training purposes currently ranges from arduous to impossible. Early-stage AI companies sometimes require millions of dollars and years negotiating access to the right dataset; larger LLM companies are seeking every piece of rich data they can find and often missing proprietary data sets. Meanwhile, data providers (ranging from Reddit to textbook companies to hospital systems) want to license data but don’t know where to start and are rightly concerned about the privacy, security, and IP implications of letting companies build models on top of their data. So data remains illiquid, even though there are eager buyers and sellers.

This isn’t the right industrial organization. The market needs a platform to connect data buyers and sellers in a way that puts control into the hands of data sources and helps them manage the privacy, security, and downstream usage of data. This would jumpstart the AI economy, opening up new opportunities that previously weren’t possible and dramatically lowering the cost (both in dollars and time) in building AI. The winners will be both the data buyers building the models and the data holders too.

To solve this problem, I’m excited to announce Protege. Protege will be the data layer helping to unlock private training data sets for AI.

We believe this layer will be a critical part of the AI stack. Every organization building an AI application or foundational model needs to look externally for data. This problem is not specific to any one industry. From healthcare to agriculture to marketing to finance, similar dynamics exist. The solution needed should span all industries.

A core part of Protege’s ethos is a belief in source-centricity. Data sources have deep concerns about their data being misused by AI models, ranging from privacy & security violations to unauthorized use of their IP by derivative models. We see our role as ensuring sources have complete control of their data and have confidence in the safety of how it’s used. And we began with a privacy and security review before we wrote a single line of code.

AI today is where the internet was in 1995, and we expect orders of magnitudes of growth in the coming years and decades. A data layer will be one of the critical parts of the infrastructure for the coming boom, and the winner will be a massive, incredibly impactful company.

In the coming weeks, we will announce our first vertical and will rapidly scale to more verticals shortly after. In the meantime, we are hiring engineers, superstar generalists, and GM’s for new verticals. We’d love to hear from you if it sounds like a fit at withprotege.ai/careers.

Travis May is the Founder and CEO of Shaper Capital, a company dedicated to building businesses that solve data fragmentation across industries. 

Travis has a proven track record as a serial entrepreneur, having previously led the two biggest data exits of the last 20 years as co-founder and CEO of both LiveRamp and Datavant. LiveRamp, which pioneered data onboarding, is now a publicly traded company (NYSE:RAMP); he scaled it to over $200 mm in revenue. Travis then founded Datavant, which became the leading platform for healthcare data interoperability. Under his leadership, Datavant merged with Ciox Health in a $7 billion transaction, creating the largest health data ecosystem in the United States.

Travis graduated with magna cum laude and phi beta kappa honors from Harvard University with degrees in economics and mathematics. He has been recognized by Forbes’ “30 Under 30” list and AdAge’s “40 Under 40” for his impact in technology and business. Travis lives in North Carolina with his family and is focused on building the next generation of world-changing companies.