There have been hundreds of thousands of ENS-related events on OpenSea, going well beyond their initial collaboration on short ENS names. It is a fun and intriguing ecosystem, revealing echoes of social, economic and psychological goals, from personal names (willie.eth) and economic big ideas (unbank.eth) to less-than-subtle adult themes (*******.eth). I was curious if I could predict the price of a domain from the domain name itself, using simple out-of-the-box NLP tools.
I used OpenSea’s API to pull down about 150,000 ENS events from the platform. Almost 9,000 of these are successful domain purchases. There are about 50,000 bids in these data, and 50,000 successful transfers. About 5,000 unique addresses are associated with domain purchases. In total, these data appear to represent about half of the ENS domains reported on the OpenSea platform.
Here are some quick observations. Like many economic metrics, the frequency of ownership shows a Pareto-like distribution. In these data, the biggest ENS “whale” owns about 1,700 domains, and the next 1,000 domains. Most (73%) only own 1 or 2. Click here to visit an interactive plot on my site. On that plot, you can click to see an address profile on OpenSea.
A power-law distribution holds over address ownership, common across many economic, social and physical domains. There are a few whales, and many minnows.
Perusing the domains reveals interesting patterns, as reported earlier by Makoto Inoue at ENS. There are homoglyph attacks in waiting, such as medicarе.eth, and a quick pick up of niсk.eth to match Nick Johnson’s nick.eth. The longest domain in these data is:
Many domains in other languages are present: French, Spanish, Marathi, Chinese, and more. My favorite Chinese domain is 毛主席语录.eth, which Google translates to “Chairman Mao’s quotes.”
It’s fun just to wander ENS domains and look over basic statistics. But the goal in this post is to show that ENS is now at a scale allowing prediction of market value using features of the product itself. These features can be extracted even from the single string of characters making up a domain. Features might include: How frequent is a domain’s word? How semantically positive or negative? Does it have a mixed character set? Does it mix alphabetical and numeric digits?
I coded up a multiple regression model on successful OpenSea purchases for which I had WETH pricing — about 8,100 domains. I used a variety of features using NLP and some other wrangling in R: length, word frequency, sentiment analysis, character and alphanumeric mixtures.
The log of WETH value can be predicted quite well with a handful of such features. The model accounts for over 20% of the variance in ENS domain prices. See the figure below, plotting predicted WETH by observed WETH. The best features, in order of strength and also of intuitiveness, are: log(length of domain), log(Google word frequency), positive sentiment, and trust sentiment.
Predicted by observed WETH (log transformed). Basic regression with simple features can account for over 20% of the price variability.
There is enough data in ENS now to plug in a domain name and get a decent prediction of what its market value may be. Obviously length and familiarity are the greatest factors, but subtler NLP variables, like sentiment, can help with accuracy.
In an interesting way, ENS is very quickly approximating economic dynamics in other domain services (though you can do a lot more with ENS!). This is revealed in various patterns of economic strategy (squatting, homoglyphs, etc.), along with the “at-scale” product-market predictability. With an understanding of such dynamics, ENS implements aggressive anti-squatting policies in the registration service. Perusing and aggregating over the countless thousands of events on OpenSea, you get the sense this was a very good move.