email - Unicode for scientific communication -- useful but inconsistent? (Specifically superscript/subscript)

08
2014-07
  • kdb

    Unicode provides a decent set of characters for scientific purposes. You have things like the pointed brackets commonly used in quantum mechanics and statistical physics (|ψ⟩, ⟨T⟩), symbols for commonly used constants (ℏ), even thinks like superscript and subscript numbers, paranthesis and letters (χ⁽²⁾).

    I am always a bit baffled though by the inconsistency of the latter. Looking e.g. at Wikipedia, you'll notice that a large subset of the latin alphabet is available as subscripts, but not all. While I understand, why people might not want to put just about ALL characters from all alphabets into unicode as super- and subscript version, but I do really wonder why the latin alphabet was included halfway only for subscript and less still for superscript.

    Is there any reasoning behind this or is it an actual omission?

    PS: I fear this might not be a question fitting superuser very well, but I couldn't think of a more fitting stack-exchange site.

    PPS: I am writing such symbols using Emacs' "TeX" input-method and alternatively an Autohotkey-script generated from its symbol-table.

  • Answers
  • Jukka K. Korpela

    Unicode is a standard for encoding plain text. Thus, any symbol used in mathematical texts is a candidate for encoding as a Unicode character, and a very large number of such characters have been encoded. The process is ongoing, and new characters will be added if they have been actually taken into user.

    Superscripting and subscripting is as such not plain text but “rich text”, just like italic, bolding, specific fonts, colors, backgrounds, borders, and animated letters are. A superscript “2” is still the character “2”, just in a raised position and typically in smaller size. From this perspective, we could say that superscripts and subscripts need not be encoded at all. Normal characters can be used, and devices beyond the plain text level, or “higher level protocols” can be used, such as commands in a word processor, style settings, HTML or MathML markup, etc.

    So the question is really why superscripts and subscripts have been included at all in Unicode, rather than why they do not constitute a uniform set. One reason is that other character codes have superscript and subscript characters. Unicode has to include them. Another reason is given in the note Unicode in XML and other Markup Languages: “Super and subscripted letters and digits are quite common in some forms of phonetic or phonemic transcriptions, where the use of styles is both awkward and prone to data integrity issues when exported to plain text. For super or subscripted letters in phonetic transcription in particular, a change from superscript of subscript to regular style would alter the meaning. Note that such use in transcription is not limited to letters: superscripted small digits are often used to indicate tone. When used for these purposes, these characters should be retained and markup should not be used.”

    However, adding superscript and subscript version of any character would mean adding about 200,000 characters. Next, someone would want to have italic and bold versions of any character, and so on, and we would run out of encoding space. Before that, typographers would have nervous breakdowns: they really don’t want to design glyphs for such characters (most of which would never be used).

    This is why the cited document adds: “When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts. This is because mathematical layout allows not just individual symbols, but entire expressions to be superscripted or subscripted in a regular, nested manner.”


  • Related Question

    windows - Can you map a key combination to output a specific Unicode character when typing?
  • Andreas Grech

    I am searching for a way (on a Windows machine) how to map a keystroke or multiple keystrokes that produce a specific Unicode character when pressed.

    The applications that I found map to a function such as 'run another application', but I couldn't something that lets you map multiple keystrokes to output a string of predefined text.


  • Related Answers
  • 8088

    Your best bet for this is a key combo mapper like AutoHotkey.