Wikifunctions:Type proposals/Typed string

From Wikifunctions

Summary

A Typed string is a String (Z6) whose value is a member of a specific subset of strings. There is only one persistent object of Type (Z4) for all the distinct subsets of strings that we support. Each subset determines the string’s subtype. Each subtype has different validation but within functions and their implementations all typed string values are handled as strings.

This is a Generic type, so a specific subtype becomes a Z4/type when the Z896/Typed string function is evaluated. Z896 is a new function.

Suggested ZID

Z96

This value is suggested (and assumed here) because the type is a Z6/string with a Z9/reference and the Z6n range is rather crowded. (Arguably, and in practice, it should be “Z896”. This will change later.)

Typed strings distinguished by Natural language (Z60) may be a special case. That is, there might be a case to be made for language-specific subtypes with four additional keys (for value, Z9/reference Z8/function and Z60/natural language) but that is beyond the scope of this proposal (for the time being).

Uses

Typed strings are used for functions that are designed to handle only particular types of string, such as those using specific alphabets, particular types of word (e.g. lemmas or infinitives), or standard codes like SI units or IUPAC chemical element symbols.

The use of a Reference (Z9) in the definition of a subtype allows the function to be unambiguous about which subset of strings it is appropriate for. If changes are made to the referenced subset, affected functions can be identified, modified if necessary and tested.

The string type reference would be passed to an Implementation, which might use its value to control the evaluation, returning values using different alphabets or in upper or lower case, for example. The string type reference indicates that the string value could or should be a member of the relevant subset. The strength of the indication depends on the validation provided by the subtype function.

There are already functions on Wikifunctions that are only correct or meaningful for a subset of all possible strings.

List of example string functions
Function Input type language Evaluation type
Brahui (Perso-Arabic) nominative case plural (Z12082) nominative singular Brahui (Z1293) nominative plural
English plural (Z11089) singular English (Z1002) plural
is uppercase (Z10336) uppercase (?) Boolean (Z40)
is a chemical element symbol (Z11854) element symbol (P246)(?) Boolean (Z40)
is pangram (Latin alphabet) (Z12626) Latin-script alphabet (Q29575627) string Boolean (Z40)
Decode NATO phonetic alphabet code (Z10970) NATO phonetic codewords Latin alphabet string
[ next]] input Natural language (Z60) output

Structure

Typed strings are objects with three additional keys:

  1. the string value (a String (Z6));
  2. a Z9/reference to a persistent Wikifunctions object or to a Wikidata item;
  3. a Z9/reference to a Function (Z8) that provides the validation.

Generic type function call

In practice, it is likely that users would supply typed string values using a constructor function. This avoids the need to supply the string type reference along with the typed string value. The constructor function or the user (particularly in new or complex scenarios) needs to provide values for the object’s three additional keys.

Z896/typed string(Z6/string, Z9/reference, Z8/function (reference))

An uppercase “O”

{
 "type": {
   "type": "function call",
   "function": "typed string",
   "typed string value": "string",
   "string type reference": "reference",
   "subtype function": "function"
 },
 "string": "O", 
 "reference": "uppercase letter",
 "function": "is uppercase"
}
{
 "Z1K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z896",
   "Z896K1": "Z6",
   "Z896K2": "Z9",
   "Z896K3": "Z8"
 },
 "K1": "O",
 "K2": "Q98912",
 "K3": "Z10336"
}

Evaluated Z4/type for an uppercase character

{
 "type": "type",
 "identity": {
   "type": "function call",
   "function": "typed string",
   "typed string value": "string",
   "string type reference": "uppercase letter",
   "subtype function": "is uppercase"
 },
 "keys": [
   "key",
   {
     "type": "key",
     "id": "K1",
     "value type": "string"
   },
   {
     "type": "Z3",
     "id": "K2",
     "value type": "reference"
   },
   {
     "type": "key",
     "id": "K3",
     "value type": "function"
   }
 ],
 "validator": "validate typed string"
}
{
 "Z1K1": "Z4",
 "Z4K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z896",
   "Z896K1": "Z6",
   "Z896K2": "Q98912"
   "Z882K3": "Z10336"
 },
 "Z4K2": [
   "Z3",
   {
     "Z1K1": "Z3",
     "Z1K2": "K1",
     "Z3K1": "Z6"
   },
   {
     "Z1K1": "Z3",
     "Z1K2": "K2",
     "Z3K1": "Z9"
   },
   {
     "Z1K1": "Z3",
     "Z1K2": "K3",
     "Z3K1": "Z10336" //not "Z8"??
   }
 ],
 "Z4K3": "Z196"
}

Note that the string type reference has the valid form for a Z9/reference but there may be no corresponding Persistent object (Z2) (depending on the interaction with Wikidata).

The referenced function will not be a validator but it has the same purpose, providing the conditions that must apply for a value to be a member of the specific subset (rather than those that apply to a valid object with the persistent Z4/type). Unlike a Validator, a subtype function returns no list of errors; if the value is not valid, the subtype function simply returns Boolean false (this may change in the future) and the Validator is responsible for handling the error (or, in the future, errors)

Example values

Value O (oxygen (Q629)) element symbol (P246)

{
 "type": {
   "type": "function call",
   "function": "typed string",
   "typed string value": "string",
   "string type reference": "reference",
   "subtype function": "function"
 },
  "typed string value": "O",
  "string type reference": "element symbol",
  "subtype function": "is a chemical element symbol"
}
{
 "Z1K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z896",
   "Z896K1": "Z6",
   "Z896K2": "Z9",
   "Z896K3": "Z8"
 },
  "Z96K1": "O",
  "Z96K2": "P246",
  "Z96K3": "Z11854"
}

This asserts that “O” is a chemical element symbol (not that oxygen is an element).

Value O uppercase letter (Q98912)

An uppercase “O”

{
 "type": {
   "type": "function call",
   "function": "typed string",
   "typed string value": "string",
   "string type reference": "reference",
   "subtype function": "function"
 },
 "string": "O", 
 "reference": "uppercase letter",
 "function": "is uppercase"
}
{
 "Z1K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z896",
   "Z896K1": "Z6",
   "Z896K2": "Z9",
   "Z896K3": "Z8"
 },
 "K1": "O",
 "K2": "Q98912",
 "K3": "Z10336"
}

Value O English lexeme for the letter O

{
 "type": {
   "type": "function call",
   "function": "typed string",
   "typed string value": "string",
   "string type reference": "reference",
   "subtype function": "function"
 },
  "typed string value": "O",
  "string type reference": "L20831",
  "subtype function": "is English for letter"
}
{
 "Z1K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z896",
   "Z896K1": "Z6",
   "Z896K2": "Z9",
   "Z896K3": "Z8"
 },
  "Z96K1": "O",
  "Z96K2": "L20831",
  "Z96K3": "Z?????"
}

Uppercase “O” is one of two forms in lexeme L20831, the other being lowercase “o”.

Note: we are not replicating Wikidata here, so it doesn’t much matter, for function implementation or evaluation, whether we refer to an item, a property or a lexeme. The community will evolve best practices, such as when to prefer an item over a lexeme or vice versa.

Persistent objects

A subtype function may (in the future?) refer to a Typed list containing all the permitted values for a given subtype (i.e. zero or one list for each subtype function). Depending on the Reference, these lists may be maintained by the Wikifunctions community, by WMF staff (e.g. Z60/natural language), or be extracted from Wikidata (?)

Validator

The validator ensures that:

  • the typed string value is a valid String (Z6)
  • the string reference type is a valid Reference (Z9)
  • a referenced Wikifunctions Persistent object (Z2) exists
  • the subtype function has a Boolean return type (or (later?) returns a list of errors). A Boolean False becomes a generic Z5/error in the validator.

The subtype function ensures that:

  • The format of the typed string value is correct
  • The value exists (if there is an enumeration)
… Any interaction with Wikidata is currently undefined, but if the Z9/Reference is a Wikidata item, the domain may be specified using SPARQL

The subtype function provides validation specific to the subtype and additional to that in the Typed string’s Type validator. At this time, the actual validation of a specific subtype must be specified as a community function. In the future it might be specified as a SPARQL query or as a persistent Typed list. (Initially, a Typed list can be specified as a function call, the evaluation of which returns the Typed list.)

The generic type returned by the Z896/typed string function call returns a reference to the type’s validator. This validator needs to include the validation specified in the subtype function. (For clarity, safety and convenience, the subtype function is currently included in the generated type, see #Evaluated_Z4/type_for_an_uppercase_character

The function’s Z2K1/id is the value in the Z96K3. This means that the required validation is immediately available and apparent, and clearly seen as definitional for the subtype.

Identity and equivalence

Identity

Two Typed string objects are the same (“identical”) if they have the same string type reference and subtype function, as well as the same typed string value. This is “Typed string identity”. This means, for example, that Typed string (“O”, element symbol, is uppercase) is not the same as Typed string (“O”, uppercase letter, is uppercase). If all pairs of values differ, the Typed strings are fully distinct.

Where only one or two pairs of values match, different types of similarity apply.

Where only the typed string values match, there is mere string equality (Z866). String equality can also exist between Typed strings and other types of string, notably Z6/string and Z9/reference.

When the string type reference or the subtype function match, of both match, interpretation will depend on the context. A new partial match function will compare two Typed strings and return three Boolean values, one for each pair of keys, according to whether the values match.

Equivalence

Because a typed string reference can be to a Z4/type, a Typed string may be an alternative representation for some types of object. This means that constructor functions can safely convert such alternative representations into an equivalent, regular Wikifunctions object, and vice versa. If a regular object converted into a Typed string is identical to another Typed string, the unconverted object and the Typed string are equivalent. Similarly, if a Typed string converted into a regular object is identical to some other object of the same type, the unconverted Typed string and the regular object are equivalent.

For example, Typed string (“42”, natural number, is a natural number) is equivalent to a Z10/natural number representation of 42 ({"Z1K1": "Z10", "Z10K1": "42"}}) and Typed string(“Z10”, Z9, Z109) is equivalent to {"Z1K1": "Z9", "Z9K1": "Z10"} (a Z9/reference to the Z10/natural number Z4/type).

String equivalence means that any Z6/string can also be represented as Typed string (Z6K1, Z6, Z106).

{
  "type": "typed string", //actually a generic type
  "typed string value": "O",
  "string type reference": "string",
  "subtype function": "is a valid string"
}
{
  "Z1K1": "Z96",
  "Z96K1": "O",
  "Z96K2": "Z6",
  "Z96K3": "Z106" //or Boolean wrapper
}


The Z4/type for a generic type has no Z9/reference, as it’s Z4/type is not persistent, so objects with generic types do not have such alternative representations.

…This equivalence is a somewhat accidental consequence of the use of a Z9/reference for the string type reference. A Validator (function) is not strictly a subtype function, since it does not have a Boolean return type. This is conveniently ignored for the time being!

Converting to code

No conversions will occur that differ from the existing Z6/string.

If an implementation returns a Z96K2/string type reference that differs from that of the function’s Z8K2/return type, this shall not be considered an error, so long as the Z96K1/string value passes the validation for the specified Z96K3/subtype function (as well as for the Z4/type). An implementation cannot change the value of the Z96K3/suptype function. (This allows for subtype specialization, changing “O” from “element symbol” to “group 16 (Q104567) symbol”, for example, but this is likely to be supported better by a contextualized type.)

Python

JavaScript

Renderer

Renderers are the responsibility of the community and specific Renderers will not generally be required.

Typed string Renderers always output strings. Standard and non-standard formatting may be applied and substitute Unicode code points may be used. (Transformations to non-string types will be possible but are not considered to be renderers).

Parsers

Parsers are the responsibility of the community and specific Parsers will not generally be required.

In general, any transformations that a Renderer may apply will be reversible or ignored by a corresponding Parser. It is expected that a specific typed string subtype will have only one Parser (or, most frequently, none).


Alternatives

We might consider having a strongly typed version of Z96/typed string in addition to the more weakly typed version proposed here.


Comments

Types in the equivalent example needs converting to generic type (for all the good it will do).--GrounderUK (talk) 22:14, 3 March 2024 (UTC)[reply]

Now all we need (?) is to work out how to get the subtype function into the generic type without dislodging Z896 🤔 I’ve added it to K3 for now.--GrounderUK (talk) 00:06, 4 March 2024 (UTC)[reply]

There appears to be no Z4 for a Z881/Typed list but I believe there should be exactly one (Z81). I considered changing Z96 to Z896, but I’m leaving it for now.

Not directly relevant to the proposed type, but for calculations involving SI units, for example, the objects could be typed numbers of some sort, with an additional key for units.--GrounderUK (talk) 10:28, 4 March 2024 (UTC)[reply]