Wikifunctions:Type proposals/bytes

From Wikifunctions

Summary

Bytes is a type for array of raw bytes.

Uses

To store content that is not string, e.g. image, audio or video (note external data is currently not supported in Wikifunctions). Some short example of content that is not printable string includes protobuf and ASN.1 encoded data.

  • We need 33-35 byte to store one tinyint (i.e. 0-255) in an array of Natural number (Z13518), so one persistent object can store no more than 60000 such numbers; similarly we need 28 bytes for an array of Byte (Z80), so we are limited to 74000 bytes in such way.
  • Storing bytes in base64 allow creating 1.5MB large binary file. (1MB if using hex and 0.5MB if using double-encoded string)
  • Data larger than 1.5MB may not be stored as persistent object and must be stored elsewhere (e.g. in Commons) and received in web calls.

See also: m:Abstract_Wikipedia/Tasks#Task_P1.17:_REST_calls and m:Abstract_Wikipedia/Tasks#Task_O22:_Binary_type

Therefore we can define:

  • Data shorter than 60,000 bytes is "light" data - can be stored directly as array of byte objects in JSON (though it is not performance-efficient to store like ["Z80",{"Z1K1":"Z80","Z80K1":"12"},{"Z1K1":"Z80","Z80K1":"34"}]).
  • Data between 60,000 and 1,500,000 bytes is "medium" data - currently can not be store it directly as array of bytes but can be stored as Base64, or indirectly generated via function calls.
  • Data longer than 1,500,000 bytes is "heavy" data - usually Wikifunctions can not represent and handle them.

Structure

JSON does support string with non-UTF-8 data, so we need to (1) either double encode it (e.g. '\\xd0\\xcf\\x11\\xe0\\xa1\\xb1\\x1a\\xe1'), or (2) store the data as Base64, or (3) hex.

Note: this is serialization format only. When executing a function, bytes in intermediate result should be stored in its raw form, not encoding/decoding once per (indirect) function calls.

We can also represent it as typed list(bytes), but (1) this does not provide a proper interface to input or output the data; (2) this is not how bytes is implemented in programming languages.

Example values

(double escaped example)

{
  "type": "bytes",
  "value": "\\xd0\\xcf\\x11\\xe0\\xa1\\xb1\\x1a\\xe1"
}
{
  "Z1K1": "Zxyz",
  "ZxyzK1": "\\xd0\\xcf\\x11\\xe0\\xa1\\xb1\\x1a\\xe1"
}


Validator

The validator ensures that:

  • (double-escape) there are no overescaped characters, and no nonprintable characters
  • (base64) the base64 is valid

Identity

Bytes can be compared in the normal way.

Converting to code

Python

Python has a built-in bytes type.

JavaScript

JavaScript has a built-in ArrayBuffer type.

Renderer

Either we render it as hex (e.g. d0 cf 11 e0 a1 b1 1a e1), or use Python-style byte escaping (e.g. b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1').

Parsers

Similar to renderer

Alternatives

Comments

'Data shorter than 60,000 bytes is "light" data - can be stored directly as array of byte objects in JSON'