Problems with Centralized Storage
Storage is costly on blockchains, which makes it economically infeasible to store most NFT media on the blockchain itself. Instead, NFTs typically point to a media file that is hosted somewhere else; however, problems arise if the NFT’s media file is hosted on centralized storage like a web server or cloud drive. This defeats the purpose of a decentralized digital asset because whoever controls the server can alter the media that the NFT points to.
Most are familiar with the concept of a URL, or unique resource location. This is fundamentally how we navigate the internet today. We use URLs that point to websites and files that are stored, and served, by a centralized server. In fact, URLs specify the location of a file on the server. For example, look at the file path of the designWeb3 logo. The file resides in several nested folders such as “wp-content”, “uploads”, “2022”, and more. The link below would no longer work if I changed the name, or destination, of the file.
Now let’s apply this to NFTs. Say you purchased a rare dog NFT for several thousand dollars on OpenSea. And this NFT is simply a URL that points to the dog media file hosted on the creator’s web server. Not only can the creator delete the image entirely, resulting in a blank NFT, but he can also swap the dog image for a cat image. This calls the value of your NFT into question, doesn’t it? Luckily, emerging decentralized storage solutions remedy this.
It isn’t possible to have permissionless, censorship-resistant digital assets whose media and metadata are hosted on centralized cloud services offered by the likes of Amazon and Google. IPFS is a decentralized, peer-to-peer storage protocol where data can be replicated across multiple, globally distributed nodes. Companies, and their services, come and go but IPFS provides a more robust way to persist data over time due to its decentralized architecture.
IPFS also solves the problem of data authenticity with something called content-addressing. Whereas location-addressing specifies the path of a file, which can change, content-addressing assigns a unique ID to every file. Going back to the NFT example above, the dog image receives a unique content identifier (CID) when uploaded to IPFS. No other file in the world has the same CID. Even if we changed 1 pixel of the dog image, the CID of the altered image would be completely different from that of the original image. This is important for verifying NFT authenticity.
Decentralized storage sounds great, but how do we ensure that globally-distributed, independently-acting IPFS nodes continue to store our files? Storage comes at a cost, and although it’s free to upload data to an IPFS node, and distribute it to other nodes, there’s no guarantee that any of them will continue to retain our data, especially over long time periods.
The first solution is a centralized IPFS pinning service like Pinata and Infura. These are centralized companies that run IPFS nodes and ensure user data remains “pinned” on the nodes. The user experience is much like that of centralized cloud services from Amazon or Google. You start with some amount of free storage, and can opt for paid plans to expand your storage limits.
Next, there are decentralized networks like Filecoin, Crust Network, and Shift, that incentivize IPFS nodes to store data. Filecoin is by far the most popular decentralized storage protocol. Filecoin miners are essentially IPFS nodes; however, Filecoin adds an incentivization layer on top of IPFS.
Filecoin miners and users enter into storage deals, where the miner agrees to store a file for a certain period of time, paid for by the user with $FIL. Thus, Filecoin creates an open marketplace for storage. These protocols also ensure that nodes store files for the specified duration. The protocol challenges miners with random data checks, and miners are penalized if they fail these checks (i.e. Proof of Storage and Proof of Spacetime).
There are a suite of options for programmatically interacting with Filecoin. Lotus is the reference implementation of the Filecoin node that can be spun up locally. Alternatively, Glif and Infura offer hosted Filecoin nodes. Nodes are required to become a Filecoin miner and supply unused hard disk space to the Filecoin network. Filecoin nodes also give application developers fine-grained control of storage deals such as duration, maximum file size and cost; however, simpler APIs exist that make it easier for developers to integrate Filecoin storage into their dApp:
- Estuary – for larger-scale applications and migrating data from IPFS hot storage
- NFT.Storage – for NFT data
- Web3.Storage – for general application data
NFT.Storage and ChainSafe give end-users access to Filecoin storage with dropbox-like UIs. Storage is totally free at this time because of Protocol Labs’ subsidy program. All media types can be uploaded, and uploads are only limited to 32GiB per upload. Checkout some of the other exciting projects built on Filecoin.
Filecoin is by no means the only decentralized storage protocol although it has the most data stored on it by far. Other decentralized storage protocols include Sia/Skynet, Storj, Swarm, and DatDot (powered by Hypercore). They each have their own architectures but are similar in that they are contract-based meaning that storage deals eventually expire, and must be renewed to ensure data persists.
Arweave is a decentralized storage protocol with a completely different approach. Its main selling point is permanent storage, or “pay once, store forever”. Users pay an upfront amount that acts as an endowment for whatever they upload. Interest is generated on their principal, and used to pay Arweave miners for retaining the data. These incentives enable perpetual storage.
Also, data is continuously lost on the web as websites are taken down, companies go out of business, and services are discontinued. This refers to the problem of link rot where hyperlinks stop working and return 404-errors. Researchers estimate 43% of links no longer work from 2008, and 72% no longer work from 1998. Arweave combats this kind of data loss and seeks to archive all web data. Arweave calls this the “permaweb”.
And Arweave’s permanent, decentralized storage is particularly attractive for NFTs. Imagine purchasing an expensive NFT whose metadata and media are lost over the course of several decades. That is why popular NFT products like OpenSea, Glass Protocol, Mirror, Solana’s Metaplex, and Singular store NFTs on Arweave during minting.
Arweave won’t be the only player in perpetual storage for long. Filecoin has announced upcoming support for smart contracts, and one of the conceivable use-cases is smart contracts that automatically renew storage deals.
Commonalities Amongst Decentralized Storage Protocols
Decentralized storage protocols all have some features in common. First, all of the data stored on these protocols can be referenced by their CID, or unique content identifiers. We already discussed the benefits of content-addressing versus location-addressing in terms of data authenticity.
Second, protocols have checks in place to ensure nodes correctly store data and abide by their lease terms so that data is retrievable throughout the duration of the lease term. Nodes are penalized when they fail these checks.
Third, data uploaded to these protocols are replicated across multiple nodes. This redundancy ensures that data can always be retrieved even if a portion of the nodes go offline. Also, nodes are penalized when they go offline, and their data is replicated on other nodes in order to maintain an adequate level of redundancy.
Finally, application ecosystems are forming around all these protocols. The main use-case is indeed decentralized storage, but offerings are expanding with smart contract capabilities, and more. Check out their ecosystems to learn more about how these decentralized protocols are being used.
- IPFS Application Ecosystem
- Arweave’s PermaWeb
- Dfinity Project Showcase
- Ethereum’s Swarm Ecosystem
- Filecoin Ecosystem
- Hypercore Protocol/Dat
Storage aggregators make it easy for end-users to access multiple decentralized protocols at once, which adds an additional layer of redundancy. Filebase is a centralized service where users start with 5GB of free storage, and can upgrade this limit with paid plans. Each upload distributes user files across IPFS, Storj, and Sia. Pinata does the same by pinning data to IPFS and Filecoin. And, NFT.Storage deploys NFTs to IPFS pinning services, Arweave, Storj, and even centralized cloud storage like AWS.
There are also P2P compute protocols where suppliers lease their unused CPU/GPU cycles, providing compute to consumers. Blockchain mining is one immediate use-case for this. Currently, the majority of blockchain miners and validators are hosted on centralized cloud platforms, which undermines the security and decentralization of blockchains. Migrating blockchain nodes over to decentralized compute platforms solves this problem.
Also, high-performance computing is needed for video transcoding, graphics rendering, AI/ML, and computer simulations in fields like aerodynamics, materials research, semiconductors, and pharmaceuticals. Some general, decentralized compute protocols include:
- Akash (a Cosmos blockchain)
- Phala (a Polkadot parachain)
- Golem (it’s own blockchain)
- CUDOS (Ethereum smart contracts and L2 compute nodes)
There are also a couple specialized compute protocols. Render Network is focused on providing GPU compute for graphics rendering. Render has been integrated into 21 digital creation tools including the likes of Blender, Autocad, Unity, and Unreal Engine. Creators are able to submit their render jobs directly to Render Network. Beeple, perhaps the world’s most famous NFT artist, is using Render Network for rendering, and archiving, his art.
LivePeer is another decentralized compute network specialized for transcoding video streams into viewable formats. This enables video on-demand and live streaming for applications like decentralized social media or decentralized YouTube.
A related topic is data privacy during computation. Compute nodes can employ a variety of solutions to ensure data privacy such as AMDs SEV (virtualization) and Intel SGX (hardware execution in TEEs).
We’ve talked about decentralized storage and decentralized compute. Adding them together gives us a decentralized cloud on which to host websites and web apps. This brings us to a decentralized internet, which can be argued is at the heart of Web3.
As a quick note, Dfinity’s Internet Computer (IC) is a major Web3 project with the goal of decentralizing the internet as a whole. Dfinity plans to do this by decentralizing pre-existing data centers, and using a blockchain for the trustless execution of smart contracts.
It’s difficult to categorize Dfinity and other previously mentioned protocols as either storage or compute because they blend together. For example, Filecoin is adding GPU computation markets to its decentralized storage offering.
Let’s review why it might be advantageous to use the decentralized cloud (compute + storage) for hosting websites or web apps.
First, decentralized storage and compute, even in these early stages, are showing massive cost reduction compared to that of centralized services. Akash’s compute is 3x cheaper than centralized compute, and Filecoin offers 90% cost reduction compared to centralized storage. These decentralized services create open markets where suppliers compete for storage and compute deals. This drives down the cost for resource consumers.
Second, an application that is only partially decentralized will be as weak as its centralized components as these components always represent potential censorship vectors. Take Uniswap for example. Uniswap has a fully decentralized backend with smart contracts deployed to Ethereum; however, its frontend was hosted on centralized servers. Consequently, Uniswap was pressured by the US government to delist tokens from its frontend. This means the government was able to censor Uniswap for end-users who rely on its UI.
Finally, new business models are made possible when websites and web apps are hosted on decentralized protocols. Application developers can receive crypto micropayments for every website visit. This functionality is native on Sia’s Skynet. Also, when a dApp launches on Arweave, developers can mint profit sharing tokens, or PSTs. PST holders get streamed crypto every time someone accesses the app. Koi, an application building on top of Arweave, allows for the monetization of NFT views, streamed to the NFT owner.
Decentralized Web App Stack
Akash and Fleek bundle multiple Web3 technologies that fully decentralize the web stack. Fleek is more focused on streamlining webapp deployment for developers. Spheron Protocol is an alternative to Fleek. These services are still being built out, but roadmaps are clear and features include:
- Decentralized hosting and content delivery (IPFS)
Utilizes the P2P content delivery of IPFS as opposed to HTTP requests sent to centralized servers; however, users are still able to access content from internet browsers through IPFS gateways and browser plugins. Also see Sia’s Homescreen, which allows users to download, and version-control frontends.
- Decentralized storage of application & user data (Filecoin/IPFS, Sia/Skynet, Storj, Filebase)
This relies on Web3 database technologies like Textile’s ThreadsDB, GunsDB, and OrbitDB, which abstract IPFS and make it similar to using Web2 databases like MySQL. There is an emphasis placed on user-controlled data, enabled by client-side encryption that ensures only the user has access to read/write his data by default. This obfuscates user data from the application, and is different from how it’s currently done in Web2 with server-side encryption and centrally-managed encryption keys. Also, see Sia’s SkyDB and Ceramic protocol.
- Decentralized compute for application frontends & middleware (Akash)
Provide the compute necessary to host middleware like RPC endpoints and application APIs. Also, compute is needed for dynamic web apps and server-side script execution to update frontends based on user interaction.
- Decentralized naming services (HNS, ENS)
The current domain naming service (DNS) is centrally managed. Decentralized options like Ethereum Naming Service (ENS) and Skynet’s Handshake (HNS) have the potential to replace DNS.
Decentralized Web App Stack (continued)
It’s worth mentioning several other decentralized middleware technologies that enable fully decentralized web apps.
Decentralized applications must be able to read and write to the blockchain. Messages are sent to the blockchain through RPC endpoints, and client-side SDKs (e.g. Web3.js and Ether.js) are used to make calls to these endpoints. There are a variety of ways to access RPC endpoints:
- Run local nodes
- Centralized node as a service (e.g. Infura, Cloudflare)
- Decentralized node as a service (e.g. Pocket, Ankr)
Running a local node complicates a dApp’s architecture and makes downtime more likely with reliance on a single node. Hosted nodes add in centralization risk as service providers can censor incoming transactions and manipulate outbound data. Pocket and Ankr seek to solve these problems by providing a decentralized network of RPC endpoints.
Next, blockchain and smart contract data are natively stored in a way that is inconvenient to query. For example, a futuristic dApp may want a list of all the wallet addresses that own a CryptoPunk NFT to target them with advertising. This query, “return all the wallet addresses that own a Punk”, would be extremely cumbersome to make on native Ethereum data. In fact, some queries would require that we search through the entire Ethereum blockchain (~1TB and growing), which is much too slow. Data indexing solves this by providing a more efficient means of reading blockchain data.
Indexing CryptoPunk’s smart contract would result in something like a table with 10,000 rows representing each of the NFTs, and columns representing NFT metadata like token ID, owner address, tokenURI, and more. Anyone could index the smart contract, and host the resulting table as an API on their server, but this adds centralization back in.
The Graph protocol provides a means for decentralized data indexing. Indexers are nodes on the Graph network that index smart contracts, resulting in subgraphs. Now data consumers can query these subgraphs, which are basically smart contract APIs, and pay query fees to the indexers for doing so. There are also mechanisms in place that penalize indexers if they serve incorrect data to consumers. This incentivizes data integrity.
Some believe Graph is the “Google of Web3”, because it indexes many blockchain ecosystems as well as decentralized storage protocols like IPFS and Arweave. This is getting into the territory of a project called Origin Trail, which is creating a decentralized knowledge graph (DKG) for linking physical assets with digital Web3 assets. Tableland is another innovative project when it comes to Web3 data middleware. It gives developers the ability to build relational tables directly into NFTs, enabling SQL read/write operations.
Lastly, some dApps want the ability to communicate with centralized services, and the real-world. For example, there are Web3 betting platforms where people bet on the price of a cryptocurrency in the future, or the result of a presidential election. The smart contract needs an external data feed in order to determine the outcome, and distribute funds accordingly.
Oracles provide this service. Smart contracts read real-world data feeds from an oracle and, on the other hand, oracles can export blockchain events to the real-world. Use-cases include asset prices for finance, weather/accident information for insurance, randomness for gaming, IoT device data for supply chain, ID verification for government, and more.
However, there are two problems with single oracles. First, there’s a reliability problem. If the oracle goes down then the smart contract won’t have the data needed to function. Second, it adds in centralization risk, and the oracle owner can tamper with the data feed and thus manipulate the smart contract.
This implies that we need a decentralized oracle network like Chainlink that aggregates data from multiple nodes. Also, there’s a concept of node reputation based on the node’s previous performance in terms of data accuracy and availability. This incentivizes honest behavior amongst oracle nodes.
Okay, let’s take a step back. We’ve talked about the decentralized cloud, and decentralized web app stack. But what about the oligopolies that control telecommunications networks, and are responsible for building and maintaining the internet’s physical infrastructure (e.g. cellular towers or cables)?
It’s difficult to compete with legacy telecom companies like AT&T and Verizon, because telecommunications networks are expensive to set up. The commercial rights to broadcast on a portion of the electromagnetic spectrum (i.e. frequency range) must be purchased from the government. And custom hardware and software must be developed for broadcasting data at these specific frequencies.
Because of this we must accept the business models and quality of service imposed on us by current telecom companies. As an example, IoT use-cases suffer under the current regime. It costs about $8/month to provide cellular connectivity to a single IoT device. This makes large IoT networks prohibitively expensive to deploy. Other network protocols like LoRaWAN exist that are better suited for IoT use-cases; however, this represents another major capital expense for telecom companies to deploy the infrastructure necessary for such networks.
Several Web3 protocols have proven an alternative to network building by bootstrapping the supply-side of a network with crypto incentives.
Helium is a Web3 protocol that has already bootstrapped a LoRaWAN network for IoT devices; although, Helium is expanding into other network protocols including 5G, WiFi, CDN, and VPN. It wants to be the “AirBnb of telecommunications networks”.
Anyone can purchase, and set up, a network hotspot that increases coverage and supplies the network with bandwidth. Suppliers are rewarded crypto for doing so. Data consumers are incentivized to join the network as coverage increases and the cost of connectivity is shown to be cheaper than that of legacy providers.
Pollen Network is competing with Helium to bootstrap a decentralized cellular network with 5G hotspots. Andrena and Althea enable neighbors to supply internet access through a decentralized WiFi network.
Just to recap, these protocols are all similar in that crypto rewards incentivize people to set up antennas at their place of business or residence. This is effective for bootstrapping the physical infrastructure of a telecommunications network, creating a mesh network of hotspots that compete with legacy networks on coverage and cost.
So far we’ve discussed decentralized storage, computation, and connectivity. Web3 protocols are also being used to support a medley of location-based services, which include:
- Foam Protocol – a decentralized network of radio beacons that can verify someone’s physical location on-chain. This supports several use-cases like mobility and transportation, location-based gaming, and supply-chains.
- Hivemapper – a decentralized network of 4K dash cams that provide imagery to an for an up-to-date Google Street View service. Developers can access this open API to request images, directions, and more.
- DIMO Network – a decentralized network of devices that connect to cars and transmit mobility data. Developers can build new mobility applications on this real-time dataset for car maintenance and insurance purposes.
- WeatherXM – a decentralized network of weather stations that improve local weather forecasting to support applications in agriculture, energy, outdoor sports, maritime, and more.
- Planet Watch – a decentralized network of air-quality sensors for detecting pollution hotspots and trends.