Jump to content

Wikifunctions:Type proposals/Alphabet

From Wikifunctions

This would be a list of Code point (Z86) associated with one Natural language (Z60). A language may have multiple alphabets associated with if for different purposes.

Uses

Sorting

The most obvious user case would be language respecting sorting, as even latin based alphabets disagree on the order of letters.

Language dependent string evaluation/manipulation

This covers miscellaneous cases where an alphabet is passed as one argument to a functions. Some existing functions where this could be useful:

Comments

This still leaves some sorting related issues unresolved, like transliteration of foreign orthology. In Swedish, the Danish Ø and ø are treated like the native Ö and ö in sorting, like in this Wikipedia category. But those could be handled using language specific replacement maps, an alphabet passed to the function would contain which natural language to use. --Autom (talk) 01:33, 30 March 2024 (UTC)[reply]

Do you think this should be a String, rather than an (ordered) list of code points? Jdforrester (WMF) (talk) 18:17, 1 April 2024 (UTC)[reply]
@Jdforrester (WMF): I wrote it like that because some languages treat double letters differently for sorting (like how Aa is sorted under Å in w:da:Kategori:Købstæder). Using single code points would be more elegant and intuitive, but a small string can do all the same things and more. --Autom (talk) 11:42, 10 May 2024 (UTC)[reply]
Sorry, this wouldn't solve my example as they are treated as equivalent. The Dutch Ij is already in its alphabetical position, but I'm certain there are other exceptions I haven't thought about. I have only limited knowledge of European languages, after all.
You have me convinced that it might be best to use code points and solve edge cases on a per language basis instead. --Autom (talk) 11:51, 10 May 2024 (UTC)[reply]