AI: Decentralized Compute and Data in 2025

On Data:

In the 20 years since Clive Humby coined the phrase “data is the new oil,” companies have taken robust measures to hoard and monetize user data. Users have woken up to the reality that their data is the foundation upon which these multi-billion dollar companies are built, yet they have very little control over how their data is leveraged nor exposure to the profit that it helps generate. The acceleration of powerful AI models makes this tension even more existential. If combating user exploitation is one part of the data opportunity, the other concerns solving for data supply shortfalls as ever larger and better models drain the easily accessible oilfields of public internet data and demand new sources.

On the first question of how we can use decentralized infrastructure to transfer the power of data back to its point of origin (users), it's a vast design space that requires novel solutions across a range of areas. Some of the most pressing include: where data is stored and how we preserve privacy (during storage, transfer and compute), how we objectively benchmark, filter, and value data quality, what mechanisms we use for attribution and monetization (especially when tying value back to source post-inference), and what orchestration or data retrieval systems we use in a diverse model ecosystem.

On the second question of solving supply constraints, it's not just about trying to replicate Scale AI with tokens, but understanding where we can have an edge given technical tailwinds and how we can build thoughtful solutions with a competitive advantage, be it around scale, quality, or better incentive (and filtering) mechanisms to originate a higher value data product. Especially as much of the demand side continues to come from web2 AI, thinking through how we bridge smart contract-enforced mechanisms with conventional SLAs and instruments is an important area to be aware of.

On Compute:

If data is one fundamental building block in the development and deployment of AI, compute is the other. The legacy paradigm of large data centers with unique access to sites, energy, and hardware has largely defined the trajectory of deep learning and AI over the last few years, but physical constraints alongside open-source developments are starting to challenge this dynamic.

v1 of compute in decentralized AI looked like a replica of web2 GPU clouds with no real edge on supply (hardware or data centers) and minimal organic demand.

In v2, we're beginning to see some remarkable teams build proper tech stacks over heterogeneous supplies of high-performance compute (HPC) with competencies around orchestration, routing, and pricing, along with additional proprietary features designed to attract demand and combat margin compression, especially on the inference side.

Teams are also beginning to diverge across use cases and GTM, with some focused on incorporating compiler frameworks for efficient inference routing across diverse hardware while others are pioneering distributed model training frameworks atop the compute networks they're building.

We're even starting to see an AI-Fi market emerge with novel economic primitives to turn compute and GPUs into yield-bearing assets or use onchain liquidity to offer data centers an alternative capital source to acquire hardware. The major question here is to what extent DeAI will be developed and deployed on decentralized compute rails or, if as with storage, the gap between ideology and practical needs never sufficiently closes to reach the full potential of the idea.

Next
Next

The Foundation Model for Crypto