The Next Wave of Crypto UnicornsThursday, 4th of April 2019 · by Mohamed Fouda
Every crypto venture capital fund is hunting for one thing: the next unicorn (a company valued at $1B+). Mining hardware manufacturers like Bitmain (seeking IPO last year) and exchanges like Coinbase were the first to unlock the unicorn status. At a high level, current crypto unicorns can be cleanly categorized into two buckets: the financial services sector (Coinbase, Kraken, Circle, Binance) or mining hardware sector (Bitmain, Bitfury). Within these two sectors, there are promising companies lined up to achieve unicorn status, including Bakkt and Innosilicon. This article, however, focuses on a different sector where I expect the rise of multiple crypto unicorns: the blockchain data industry.
Blockchain data is hiding in plain sight
The idea that investment companies need data to have a competitive edge in the current information age is not new. What is new is that in the blockchain world, almost all the data is public. Every transaction that happened on a network is recorded forever in a public blockchain including the amount transacted and the addresses involved. Yet, the data is still hidden in a sense. The data stored in blockchains is hard to deal with to extract useful insights. As TD research partner and lead dev of Ethereum 2.0 Prysm client Raul Jordan has stated, database model used for Bitcoin and Ethereum and many other blockchains (LevelDB) is optimized for transactional integrity in consensus, not for relational storage or retrieval. LevelDB does not have relational models and does not support SQL queries. This makes it a daunting task to extract any insights from this data format. Moreover, it was shown that LevelDB is prone to data corruption issues which make handling blockchain data even harder.
One clear example of blockchain data hiding in plain sight was the Bitcoin Private’s covert minting of additional 2 million coins during the Zclassic/Bitcoin merged fork in March 2018 that was not noticed until December 2018 when Coinmetrics published an analysis revealing the coin minting. A public blockchain event took about 9 months to be noticed. It may be because Bitcoin Private was not a popular project, however, the situation shows that even “public” blockchain data requires dedicated handling to synthesize meaningful information.
The complexities of handling and analyzing blockchain data creates the perfect opportunity for data scientists and engineers to jump to the blockchain industry to build companies that tackle these problems. Accurate data synthesis and analysis have numerous applications in the crypto industry that are hard to count. In the following sections I explore these applications, the major data providers and their path to the unicorn status.
But before diving into that, let’s ask a simple question: Who needs to query the blockchain data and why?
The simple answer is Everyone.
The answer may seem too general to be accurate but it is true. Every crypto user needs to regularly query blockchain data. Take for instance when a user needs to know if a transaction is confirmed, she just connects to a blockchain explorer website and searches using the address or the transaction id. This is, in fact, a blockchain query, the user “searched” the whole blockchain to get information about a specific address or transaction. The block explorer company has performed the task on behalf of the user and delivered the results. In the background, the company didn’t actually query the blockchain. Instead, it queried a relational database, derived from the blockchain data, that the company created.
To cover the incurred costs, the company needs revenue sources. For example, Etherscan, the most popular Ethereum blockchain explorer, shows users ads to generate revenue. A similar revenue model is used by almost all other companies that serve blockchain data whether its dApp activity, e.g, Dappradar, or price information of different coins and their market cap e.g., CoinMarketCap. In many cases, this revenue model has been received negatively particularly in the case of ad placement for illicit or scam projects.
The previous example serves to show the Total Addressable Market for blockchain data companies. It is upwards of tens of billions of dollars as literally everyone in the space needs access to crypto data.
Although in the case of the block explorer, the cost to the user was zero and funded by ads, this is not the case for meaningful data query requests. Investors, funds and even crypto companies need to perform far more sophisticated data collection and analysis for various reasons. For example, crypto exchanges, especially in the US and Europe, need to perform intensive blockchain data analysis to comply with AML regulations and to ensure that their users don’t use the exchange for illegal activities such as liquidating stolen cryptocurrency or ransomware payments.
These companies are also obligated to ensure that their users are not transferring cryptocurrencies to sanctioned entities or funding illegal operations. Multiple companies, Chainalysis, Elliptic, etc, provide blockchain analytics tools that are being used by governments and exchanges to combat illicit uses of cryptocurrencies. While most exchanges prefer to contract with specialized data analytics providers, Coinbase decided to take a different route by bringing these capabilities in-house via the acquisition of the controversial data analytics company Neutrino.
Another area where blockchain data has fundamental importance is the extraction of economic signals. Investors, funds and research shops need robust and clear data to drive their investment decisions. For this use case, three categories of data stand out: Network Data, Off-chain activity data, and Exchange Data.
Exchange Data is the most relevant from an economic perspective. It is also the most difficult to obtain given that cryptocurrencies trade on multiple exchanges that are not willing to share much information about the trading activity on their platforms. Additionally, many exchanges engage in deceptive techniques like wash trading to paint a fake image of increased user adoption.
Similarly, off-chain activity data is scattered as it includes almost everything that happens outside exchanges and that isn’t saved in the blockchain. Moreover, it is the most difficult category to monetize as it is essentially public information. Most of the work in this category comes from the community or from niche websites that depend on donations like Coin Dance, which provides historical data on the number/client distribution of nodes running on the Bitcoin network and its forks. The developer activity report by Electric Capital is a good example of the community contributions, in this case, a fund, to measure off-chain activity.
Economic Data Providers
The growing need for clean and standardized data has led to the emergence of an increasing number of companies that aggregate, standardize and sell these types of data. However, it is worth mentioning that most of these companies focus on the Exchange Data category first, followed by Network Data category. The off-chain activity category has not attracted significant commercial demand yet.
Currently, there is a good number of companies that tackle the Exchange Data and Network Data categories which has been reflected in the quality of information available now compared to that of fa year ago. It would be nearly impossible to list all of these companies. However, noteworthy platforms include Kaiko, Coinmetrics and Messari. These companies have delivered notable contributions to the data ecosystem and are highly likely to reap the benefits in the form of larger future investments in the future.
Kaiko is one of the leading data providers of historical exchange data. They have been in the business of aggregating exchange data since 2014. They collect data from 30+ exchanges for 1000+ cryptocurrencies. Kaiko offers both a monthly licence to access their exchange data, or a subscription-based service for unlimited API calls that costs up to €2.5k/month. Interested customers can also buy historical price and volume data and order book data. They are also planning to expand their offering to include OTC data. Recently, they have provided data to Bitwise to create their seminal report about cryptocurrencies real exchange volume that concluded that about 95% of the reported volume is fake volume created by wash trading. One of the major results of this data and analysis is the creation of the “Real 10” volume index which aggregates the volume from 10 exchanges that were proven to provide reliable trading volume information.
CoinMetrics is mainly known for its network data offering which contains on-chain data accompanied by many of their self-developed metrics like Realized Capitalization. Investment firms are actively trying to utilize these metrics to develop valuation criteria for cryptocurrencies. These efforts usually come short because of major challenges that I will discuss later. CoinMetrics was also recognized for their exposing of the covert minting of extra 2M Bitcoin private coins in December 2018. The company is currently considering to expand its offering to also including Exchange Data.
Messari started with the vision to bring transparency to the crypto ecosystem by encouraging crypto projects to disclose important information about the project and its founders such as the involved developers, early investors, ICO and Pre-ICO details. The Agora disclosure database is public and free. Additionally, Messari’s OnChainFx offers an improved data feed for major cryptocurrencies which includes multiple innovative metrics such as the fully-diluted “2050” market cap and the “Real 10” trading volume for all their listed cryptocurrencies. Although Messari doesn’t charge for this public data and services, they charge for their curated data insights and for the proprietary tools they develop to analyze this data. In addition to their multiple off-chain activity data, Messari is planning to provide historical price and volume data to its customers.
Challenges of the data sector
Although there are a good number of quality companies in the field of blockchain data, there are major problems that need to be addressed by these companies to unlock their real potential.
- Exchange data can be misleading as it doesn’t include OTC market volume. Many OTC trading disks negotiate prices and execute trades through direct transfers outside exchange order books. The OTC activity is mostly secretive and data companies need to contract with multiple OTC providers to obtain this data.
- Activity derived from on-chain blockchain data, in many cases, doesn’t represent real economic activity. An example is when an entity moves coins between different wallets under their control, e.g., moving coins from an exchange’s hot wallet to cold storage and vice versa. To clean the data from such events, a list of the different addresses used by each exchange needs to be maintained which usually falls within the work scope of data analytics companies.
These issues can affect the data integrity and lead to wrong or inaccurate insights and conclusions.
How close are these companies to achieving unicorn status?
Blockchain data analytics companies are currently much closer to achieve billion-dollar valuations due to the nature of their business and their customers. The customers of these companies are governments, legislative bodies, and exchanges that have to comply with KYC regulations. The customers here are well-funded and deep-pocketed entities which, in turn, works in favor of the data analytics companies. Another advantage for these companies is the high barrier of entry which reduces competition. Early involvement in the blockchain data analytics space provides a wealth of historical information, like information about ransomware attacks and legal prosecutions, that cannot be easily obtained by new entrants (or, arguably, traditional software incumbents).
The same advantage exists for Exchange Data companies that provide detailed order-book and trading information. Their early involvement allowed them to aggregate early exchange data that won’t be available for new competitors. On the other hand, Network Data can be always extracted from the blockchain data. The innovation in the area is mainly on how to improve the data quality and how to create unique insights or metrics to interpret this data.
Decentralized Data Providers: The Graph
The question here becomes: so after decentralizing the world through blockchains, will we settle for centralized, rent-seeking data providers?
Well, while this is still a possibility, decentralization fans may not let this happen without a fight. For example, the team behind The Graph has thought about this question and answered it by building a decentralized protocol to index and query data for Web 3.0 in a decentralized way. The discussion about The Graph, its goal, how it works and what it can make possible needs a separate article. However, it suffices to say that if their vision holds, we can end up by a global multi-billion-dollar data sharing protocol that can still be driven by strong and standalone data companies.
In conclusion, I am fairly confident that the blockchain data sector will grow significantly in the next few years delivering a number of solid unicorns. As discussed earlier, the major challenges facing this industry can be significantly solved if the different data providers integrate into bigger entities. This way, economic analysts can combine exchange and network data to deliver a clearer understanding of the market activity. That economic data will even be more reliable when it is cleansed based on the insights from data analytics.
Consequently, I expect the current companies to expand their business into other areas of the industry. I can also predict a number of mergers and acquisitions consolidating the smaller players into large entities to win bigger market share and offer more sophisticated services.
Thanks to Qiao Wang and Ambre Soubiran for their feedback on this article.