Research
March 8, 2024

Announcing Sybil Detection

Using machine learning to catch sybils on blockchains at scale

Andrew Van Aken
Data Science

Introducing the Sybil Filter

A question we often hear at Artemis is “How engaged are blockchain users?” Bridges or new chains have tens of thousands of users and hundreds of millions of dollars in TVL. However, due to the nature of crypto and incentives, this activity can often be plagued by “sybils.”  For those unaware, the concept of sybiling involves users creating as many different addresses as possible, in hopes these addresses will receive an airdrop. Users are hugely incentivized to do this, given airdrops can command up to 5 figures for simply making a few transactions and “sybil farms” have grown in complexity.

Our goal was to develop a scalable and real time way to detect sybil addresses. While sybil detection has been done manually by teams, we wanted to lay the groundwork for large scale sybil detection to give teams a starting point and help analysts better understand blockchain usage.

Methodology

Given the size and scope of the project, we turned to machine learning to produce probability scores of addresses being a sybil across many chains. Since a nicely labeled sybil/non-sybil dataset does not exist, we had to get creative. We pooled previous sybil removal campaigns (Aribtrum, Hop, etc) into a dataset and recipients of the airdrop were considered non-sybil, while excluded addresses were considered a sybil. We built a feature set including transactions, distinct interacting addresses, wallet balance, etc during the time before the airdrop was announced, to see if a machine could learn patterns to predict sybil probability.

We acknowledge that “sybil” term can mean many things and coming up with the right term is a challenge. These addresses might not have been sybils but could have just been low-engaged users (only a few transactions). Since our end goal is to measure engaged users, the definition can fluctuate.

How can I use Sybil Filters on Artemis?

The results of scoring all addresses are now available in the Artemis Terminal and can be viewed on a chain or application level.

To look at sybil vs non-sybils for chains, you can go to the following products:

  • Chains Page → Manage Charts → Select “Sybil Users” and “Non Sybil Users”

  • Activity Monitor → Select Chain of Interest → Click “Breakdown” → Select “By Breakdown” → Select “By Sybil”

  • Chart Builder → Select Chain of Interest → Search “Sybil” in the search box → Select “Sybil Users” and “Non Sybil Users”

To look at sybil for applications, you can go to our Activity Monitor → Select Application of Interest → Click “Breakdown” → Select “By Sybil” → Click “% Share”

Insights from Sybil Filters

Diving into the data, it’s no surprise blue chip DeFi protocols have high organic usage. It is typically expensive to transact on Layer 1 and addresses can’t extract value from using protocols with established tokens. As expected, the sybil rate climbs as the chance of receiving a token increases. The Stargate and zkSync bridge have higher sybil rate as users of these protocols are trying to farm a token.

We can also look at sybil activity by category on different blockchains. Taking a look at Arbitrum, gaming has a very low sybil rate, indicating people are actually using the game, instead of farming it. EOA transfers and CeFIs tend to rank higher on sybil identification as often addresses send to many other addresses, or deposit into exchanges to break their sybil graph.

Sybil patterns can also change drastically over time. While Starnet had a large number of sybil users that were token farming, once the token launched, the % of sybil users started to decline, indicating organic usage of the bridge.

StarkNet Ethereum Bridge Sybil Breakdown Over Time

This tool can also be used to evaluate how addresses are using points programs. HyperLiquid had spikes in usage that were sybil driven, but now we see engaged addresses using the protocol once the points program launched. This indicates HyperLiquid has built a great product and users will stick around even after the points program ends

Sybil vs Non Sybil Usage HyperLiquid

Conclusion

By introducing these features we hope to give insights into how addresses are interacting with protocols through a new lens. By no means do we consider our job done. Similar to fraud detection, once a ML algorithm spots one type of fraud, fraudsters are on to their next scheme. But we believe providing estimates can motivate the industry to collaborate on sybil detection methods and compare techniques. While it is a cat and mouse game, it's one that's worth it to push the usage and understanding of crypto forward.

Crypto trends in your inbox

Subscribe to our newsletter and understand what’s happening on-chain.