This is an expansion on a recent conversation I had on Reddit. I thank the original commenter for inspiring me to finally put these thoughts into writing. I especially thank him/her for being open to conversation, even if we disagree on some things.
"It is just mind blowing to me that if someone asked you “How much daily/monthly unique traffic does your website get?” or “How many pageviews are you getting a month?” you would have absolutely no idea. Unreal. " — /u/DrinkMoreCodeMore
What would you change about your product if you knew that your website got 1,000 unique visitors per day vs 10,000 vs 100,000? What about if you knew that your referrers came from blog#1 vs blog#2?
What would the end goal be? How would you plan to use this information? Would you use this to improve your product and better serve your users? Or stroke your own ego? Or sell your users and their thoughts, desires, and attention?
MyCrypto doesn’t have analytics on any of our sites. This wasn’t an oversight—we decided not to implement them early on. Not having this data has been a unique experience, mostly because it’s unexpected and because today’s online environment thrives on this data. We use it to validate our existence, whether we are product creators, marketers, writers, or developers. I lived in that world for years and made a good living building marketing websites for clients and implementing Google Analytics, Kissmetrics, Hubspot, Mixpanel, Crazyegg, or all of the above.
However, one reason the blockchain industry excites me is because it’s not that. To build a decentralized product and then throw a centralized, third-party analytics provider on it goes against what the blockchain is. The blockchain is about aligning incentives and decentralization. It’s not supposed to have central parties collecting all of your personal information. It is about breaking down walls and empowering the individual. It does this partially by making otherwise proprietary data (like financial transactions) public and verifiable, not by hoarding data and building moats.
The problem with Google and other analytics’ providers (especially the free ones) is that you, the owner of the website, and your users are the product. The Google Analytics script that you add to the footer of your website is a google.com URL. Your analytics provider, with your permission, has the ability to track every user action on your site. And they do.
Because we are building products in this space we should ensure that all the decisions we make are aligned with the greater vision. The excuse of “well, everyone else is doing it!” is a terrible one. We must strive to be better and to live by the values of this decentralized world we are building.
What Analytics Should & Shouldn’t Be
There are two primary types of data collection that I abhor.
- Companies that mindlessly collect data by throwing a script like Google Analytics in their footer. They don’t know what they are tracking nor why they are tracking it. They are not only storing this data for themselves, but also giving a third-party (e.g. Google) full access to this data as well. In this case, the user is not only the product, but the company who added analytics is the product as well. Google is using all the data they collect to better serve themselves and their products (like advertisements).
- Companies are mindfully collecting, analyzing, exploiting and selling their users and their personal information— likely without the users realizing it. In this case, the users are the product. Data is insanely valuable and is a business-model in itself. See: Facebook.
Now, this doesn’t mean that all analytics are bad and all companies that collect metrics are evil. These are just two examples of ways data is harvested that I personally disagree with. In my opinion, the “right” way to collect analytics is mindfully and thoughtfully. And to get a better idea of what that is, let’s start by identifying what it’s not.
The ideal analytics do not sell your users nor exploit their trust. They do not manipulate them or use dark patterns to trick them into revealing all their information. You should be able to sit across from a user (or Congress) and show them the information you collect without shame or fear. The data you collect should be limited to what empowers you to build a better product—one that better serves your users—and you should actively use that data to make better decisions. Collecting data for the sake of it is a waste. Building a business that exploits your users is wrong.
The most useful data while building a product is going to be usage metrics, not personal information (unless you are selling your users 😉). To improve the user experience, you need to know what features people are using, how they are using those features, and where those features are failing. You don’t need to know the features are being used by a 22-year-old male who lives in Toronto, holds over $10,000 worth of cryptocurrency, sneaks onto your website during his 9-to-5 job, has an adorable dog, and is obsessed with high-end gaming gear.
Simply acknowledging the difference between usage metrics and personal information goes a long way. Being honest about what type of data you are collecting and why will help you make more holistic decisions that (hopefully) serve your product and your users better.
Security Pitfalls of Analytics
As with everything in this crazy crypto world, there are some additional risks that we must be mindful of as we build the decentralized web. These include taking necessary security precautions to protect against all attack vectors. Things that are a minor inconvenience elsewhere can be exploited for profit in this space. The potential return-on-investment for attackers allows them to be increasingly creative.
It should be assumed that anything on a server will be obtained by a nefarious party and/or made completely public. This includes anything you email, log, collect, store, etc. We must avoid thinking that metrics are useless or not valuable to attackers. They are. Case in point:
- Email Addresses: Obtaining a list of emails of people who are interested in cryptocurrency, use a certain wallet, or hold a specific token has resulted in targeted spear-phishing campaigns that trick users and steal funds with higher-than-average success rates.
- Referrers: Armed with a list referrers, an attacker could attack the website referring the traffic, or simply drop more phishing links on those referring websites. No matter how secure your own infrastructure is, the weak point is now upstream and not guarded with the same scrutiny. It is relatively easy to hack a WordPress blog in order change all the links to be phishing links.
- “Painting a target on your back”: Sometimes those things that make you feel good about yourself and your growth can have some unintended consequences. For example, the (amazing) imToken team was rightfully proud of their “billions of dollars deposited by users” metric and it got them some great press. Unfortunately, it may have also helped attract an attack 11 days later.
- Yes, sometimes even usage metrics: If an attacker knows that 90% of our users unlock via Ledger hardware wallets, they may make sure that their phishing websites manipulate the Ledger addresses instead of just stealing the private keys.
When adding analytics, one should avoid creating new attack vectors. Even if the data can’t directly result in loss, it doesn’t mean it can’t indirectly contribute to said loss.
- Don’t collect information you don’t need: It’s easier to track everything but simply taking the extra time to turn off features you don’t need goes a long way. Just because you don’t make use of it doesn’t mean someone else won’t.
- Move the data offline: Instead of saving every tiny bit of data indefinitely, export your yearly and monthly reports to a separate, secure location and delete the raw data every 90 days or so.
- Secure your infrastructure: treat the logins, passwords, keys, accounts, logs, data, etc. with the same scrutiny as you do your normal infrastructure. Limit access, make sure MFA is enabled, and make sure everything is properly patched and kept up-to-date.
MyCrypto’s Plans for the Future
We have been actively discussing analytics / metrics on our sites for a while now. It’s not a very high priority, but it is something that I think is worth examining, especially as our product-offerings grow and we have to decide what resources are allocated to what features.
Currently we rely pretty heavily on the anecdotal information that comes in via feedback across social media channels, email, our Discord channel, etc. The squeaky wheel does get the grease; if there is a feature you would like implemented or improved or a bug that you’d like to be fixed, find us and get in touch. :)
While anecdotal is great, we would love to be able to see what is being used most on our site and our knowledge base in order to make better decisions as a product and as a team. We want to ensure that the benefits outweigh the risks and that we limit any personal information collected. We want to ensure that these metrics help us give back to the wider community, not hurt it.
Things we would love to know:
- How should we allocate resources?
- Which features or knowledge base articles should we spend time improving?
- Which features or knowledge base articles are missing the mark and should be nixed, improved, or advertised better?
- What articles are most visited on the knowledge base?
- What articles results in the emails being sent most? Is the article not answering the question successfully?
- Where do people “get stuck” and exit our site? Especially if they exit via the “Help & Support” link.
Things I want to avoid:
- Personally identifiable information (e.g. name, location, email)
- Psuedo-personally identifiable information (e.g. IP addresses, Ethereum address)
- “Linkable” information (e.g. connecting IP addresses to Ethereum addresses or ETHAddress1 to ETHAddress2 to ETHAddress3)
- Things where the benefit of collecting the info doesn’t outweigh the risk if the data is compromised by an attacker. (e.g. everything above)
- Things that are solely ego-boosting or fun to have. (e.g. how many wallets are generated, how much money is sent via our site, how much ETH vs tokens were sent via our site)
- Collecting / storing the data via a third-party platform. It should be self-hosted so that nobody else has access to the data.
As we explore collecting metrics we encourage discussion with the wider community and our users, and also encourage healthy debate within our team. This ultimately allows us to be more confident the decisions we make, ensures the proper balance is found, and helps us better serve this community.
What can you do to create a more privacy-minded world?
As a product creator, respect your users and their privacy:
- Treat them as you would want to be treated.
- Be mindful of what you collect and how you collect it.
- Limit how long you store information for.
- Utilize explicit opt-ins.
- Remove dark-patterns.
- Ask yourself, “if I showed my users our analytics’ dashboard, would they be surprised or horrified by what we collect?”
- Be honest with yourself and others. Acknowledge where you could be better...and strive to be better.
Learn from others & have open conversations:
Status recently presented at the Web3 UX Unconference and spoke about how they gather data without exploiting their users. Here is an example of an update that removes the use of dark patterns and ensures users’ opt-in to sharing.
Status double opt-in flow. Note the “Do not share” button is given equal weight as the “Share” button and placed on the right side—the opposite of a dark pattern. 🙇
Even more recently, Status removed it entirely, deciding that, “even if [users] consent, they have no idea how data is collected on them.”
Ledger’s new Ledger Live also has a great UI that allows users to specifically choose what they share. Note that the “analytics” is off by default. Again, this is the opposite of a dark pattern and is great to see.
Use (and/or build) self-hosted analytics solutions:
As a user, respect companies that respect you:
You may not know that not tracking users is much harder than tracking them. It makes product decisions harder. It makes validating those decisions harder. It makes it harder to prove that what you are building is worth it—whether that’s to investors, token holders, your peers, or just yourself.
It’s also even harder to responsibly collect usage metrics. It requires real resources (time, energy, developers) to create a custom solution, while copying & pasting a Google Analytics snippet takes about five seconds. As a user...
- Be mindful about what companies know and collect about you.
- Be vocal. Let companies and products know that you value your privacy.
- Do not reward companies that can’t or won’t answer questions about what they do with your data.
- Reward companies who collect data responsibly, or not at all.
- Ask questions — especially when these companies are young. What they are doing with your data? How they are safely storing it? Why do they collect it in the first place? It’s too late to influence Facebook’s data collection as it’s their entire business model. However, it isn’t too late to influence Status, Ledger, Peepeth, Akasha , or us.
Another option is to not give companies your information in the first place:
- Install Ghostery.
- Install an ad blocker (e.g. uBlock Origin).
- Install Privacy Badger.
- Use Incognito Mode.
- Use Brave.
- Use Tor Browser / Tails.
- Use DuckDuckGo.
- Don’t use those handy “sign in via Facebook / Twitter / Google” buttons.
- Reset your advertiser ID on your Android, Apple Device, Facebook, and Twitter often.
- Audit & adjust your privacy settings with the same frequency you audit your passwords. Reset your ad IDs, disconnect linked accounts, view your profiles as other people, delete old posts, etc. A good starter guide is here.