Rest of World turned to W3Techs, a web-scanning firm based in Austria, to count all of the publicly accessible web addresses on the internet to get hard numbers on the languages used on internet websites.
They survey shows a clear discrepancy. A little more than half the sites on the web use English as their primary language. That’s a lot more than one might expect, given that native English speakers only make up just under 5% of the global population. Meanwhile, Chinese and Hindi are the second and third most-spoken languages in the world, but the same scan found they account for just 1.4% and 0.07% of domains, respectively.
There are some obvious limitations to this survey, detailed in the article, but for some groups this is an ominous sign for the future. In 2003 UNESCO issued a recommendations to the use of multilingualism in the cyberspace. Moreover, since the internet is used to train large language models like GPT-4 and Bard, we might be building the same imbalance into AI future.