5 Comments

Interesting insights Antoine. How do you think development of NLP applications will influence the "internet languages"? I assume countries whose language is not popular and the national market is small will only rely on language-agnostic apps or will adapt to the dominant market language.

Expand full comment

Interesting question. My guess: 7,000 languages are spoken in the world. "Roughly 40% of languages are now endangered, often with less than 1,000 speakers remaining." (Ethnologue) These languages will disappear before 2050. Then 50% of languages have between 1,000 and 1,000,000 speakers. Too small, no literature and often no official recognition: they will also slowly disappear by the end of the century as speakers will switch to other bigger languages (French, English, Indonesian, etc.).

The remaining 10% (~700 languages) are spoken by more than 1m people, they represent more than 90% of the world population. They often have official status in at least one country or region. I don't know how things can play out. As you say NLP can help these survive: you could automatically translate a lot of English content into these languages. (see for instance: https://www.adssx.com/p/abstract-wikipedia-towards-a-multilingual ). Live translation could also remove the need to learn another language. But with globalization and international migration, the pressure may be too big and they may have to adapt. Some languages could only survive as liturgical languages or for tradition and community.

Also, we rarely talk about the emergence of new languages. Latin unified Europe but then gave birth to French, Spanish, Catalan, Italian, Romanian, etc. What if English varieties diverge enough to become separate languages? Singlish, "Euro English", African-American Vernacular English, and "BBC English" are probably already not mutually understandable. Tok Pisin, an English-based pidgin, became the official language of Papua New Guinea (together with English). Haitian Creole, a French-based creole, is now the official language of Haiti (together with French). Similarly, 525 languages are spoken in Nigeria, most of them with fewer than 1 million speakers. The most spoken language is Nigerian Pidgin, which doesn't have official status yet. I wonder if all languages of Nigeria could disappear and Nigerian Pidgin become the national language.

tl;dr: By the end of the century, there will be about 500 living languages with a critical mass of speakers, helped by NLP to translate content. Many of them could be English or French-based pidgins.

What do you think?

Expand full comment

When I look at the editor statistics, I feel that there might a phenomenon like a "first phase" in which a lot of articles you would expect in an encyclopedia are missing and a "second phase" in which every such article already exists and most of the editing is either 1. improving these articles 2. writing articles on specialized topics that would never have been covered in another encyclopedia (like articles on individual movies, towns and so on). In this "second phase" there is therefore less editing activity.

The German peak we see could then be when the German Wikipedia was in this first phase and they achieved it so fast that they reached their "second phase" rapidly, in which there is less editing than in Spanish for instance (maybe only in relative terms but not in absolute terms?), which still has to finish its first phase. It seems that the Japanese have the same pattern as the Germans (probably because they were the biggest economies in 2005 with the US). But French and Spanish seem to have been slow so maybe their first phase is spanned along many years.

Expand full comment

Yes, there are different phases. You can see it here with the decline of active editors on the English WP: https://stats.wikimedia.org/#/en.wikipedia.org/contributing/active-editors/normal|line|all|(page_type)~content*non-content|monthly

The German Wikipedia was created right after the English one and experienced a similar decline, so I don't know what can explain the difference in the graph in the article: https://stats.wikimedia.org/#/de.wikipedia.org/contributing/active-editors/normal|line|all|(page_type)~content*non-content|monthly

And yes the situation is quite different for French, Spanish, and Japanese. Here's French: https://stats.wikimedia.org/#/fr.wikipedia.org/contributing/active-editors/normal|line|all|(page_type)~content*non-content|monthly

French is actually still growing. My first guess was that French was growing in Africa and that this growth was compensating for the decline in France/Belgium/Canada/Switzerland/Luxembourg. However, since 2015 developed French-speaking countries have continued to rise among readers of the French Wikipedia and Africa hasn't significantly increased: https://commons.wikimedia.org/wiki/File:Wikipedia_fr_-_Page_views_by_country_over_time.png

Another explanation is the one you gave: maybe that France/Spain/etc. were "late" on the internet (you can see that in e-commerce as well where the US & the UK are far ahead Continental Europe) and that they're still in their "first phase".

Anyway, if you want to see real growth, look at Farsi: https://stats.wikimedia.org/#/fa.wikipedia.org/contributing/active-editors/normal|line|all|(page_type)~content*non-content|monthly

Farsi will soon overtake Italian: https://stats.wikimedia.org/#/it.wikipedia.org/contributing/active-editors/normal|line|all|(page_type)~content*non-content|monthly

Expand full comment

Better to look at editors and not only "active editors" (more than 5 edits per month): https://stats.wikimedia.org/#/de.wikipedia.org/contributing/editors/normal|line|all|~total|monthly

It seems that

Expand full comment