Articles | Introduction to
MathML | back
Introduction
Have you always wanted, to display something like the
following on your web page?

No matter what you answered, I'll urge you to read on,
as "MathML is a, major breakthrough in content representation on the
web" according to Vincent Quint, W3C User
Interface Domain Leader.
The Mathematical Markup Language (MathML) was released
as a W3C (World Wide Web
Consortium) recommendation on April 7th. MathML is the 1st
application of XML to be issued as a W3C
Recommendation, and is used for displaying mathematical notation and content
on the web.
The goal of MathML is to enable mathematics to be
served, received, and processed on the web, just as HTML has for text.
As with XML,
MathML tries to address some of the limitations of HTML. Even though the web was
initially conceived and implemented by scientists for scientists, the capability
to include math in HTML is very limited.
The image-based methods that are currently the
predominant means of transmitting scientific notation over the Web are primitive
and inadequate. Document quality is poor, authoring is difficult, load time is
slow, and mathematical information contained in images is not available for
searching, indexing, or reuse in other applications.
As an effective way to include mathematical expressions
in web documents, MathML gives control over the presentation and the meaning of
such expressions. It does this by providing two sets of markup tags: one set
presents the notation of mathematical data in markup format (introducing 28 new
tags with about 50 attributes), and the other set relays the semantic meaning of
mathematical expressions (introducing 75 new tags with about a dozen
attributes), enabling complex mathematical and scientific notation to be encoded
in an explicit way.
A short history of MathML
The problem of encoding mathematics for computer
processing or electronic communication is much older than the web. It was very
common to write papers in ASCII, and then e-mail them, but it was rather
difficult, to do something serious with only ASCII. In 1986 TeX
became a markup method for mathematics, and this was widely used when the web
turned up in 1989.
The TeX typesetting system, developed by Donald Knuth,
is now a de facto standard in the mathematical research community. TeX sets a
standard for quality of visual rendering. TeX was the most important influence
on MathML, and a great deal of effort have been put into giving MathML the same
visual rendering quality, plus providing TeX to MathML conversion.
Later on, there has been plug-ins such as EzMath
that uses an easy to learn notation for embedding mathematical expression in web
pages. I'll write more about EzMath later, hands up those of you who like
plug-ins!
There have also been numerous attempts to use and
display math in Java applets, but still this has not solved some of the basic
needs for a math markup language on the web.
Who's MathML for?
The design goals of MathML required a system for
encoding mathematical material for the web, which is flexible and extensible,
suitable for interaction with external software, and capable of producing
high-quality rendering in different media's. Any markup language that encodes
enough information to do all these tasks will of necessity involve some
complexity.
As a consequence, MathML is not primarily
intended for direct use by authors. I've put ‘not' in italics, to illustrate,
that MathML is mainly to be used by machines, facilitating the searching and
indexing of mathematical and scientific information. I've also put ‘primarily'
in italics to show, that MathML is also a nice and simple way for students and
other groups to include Math in web pages by hand.
It is anticipated that most authors will use equation
editors, conversion programs and other specialized software to generate MathML.
Compared to HTML, this is not much unlike people using HTML-editors.
One can envision a student using a menu-driven equation
editor that can write out MathML to an HTML file. A researcher might use a
computer algebra package, which automatically encodes the mathematical content
of an expression, so that it can be cut from a web page and evaluated by a
colleague. An academic journal publisher might use a program that converts TeX
markup to HTML and MathML. Regardless of the method used to create a MathML web
page, once it exists, all the advantages of a powerful and general
communications layer becomes available. A variety of MathML software could all
be used with the same document to render it in speech or in print, to send it to
a computer algebra system, or to manage it as part of a large web document
collection.
The Top-Level math element
MathML specifies a single top-level math
element, which encapsulates each instance of MathML markup within an HTML page.
As such, the math element provides an attachment point for
information, which affects a MathML expression as a whole. For example, the math
element is the logical place to attach style sheet or macro information in the
future, when these facilities become available for MathML.
All other MathML content must be contained in a math
element; equivalently, every valid, complete MathML expression must be contained
in <math> tags. The math element must
always be the outermost element in a MathML expression; it is an error for one math
element to contain another.
Applications, which return sub-expressions of other
MathML expressions, for example as the result of a cut-and-paste operation,
should always wrap them up in <math> tags. Similarly,
applications, which insert MathML expressions in other MathML expressions must
take, care to remove the <math> tags from the inner
expressions.
The attributes of the math element
are:
class="value" and style="value"
– Provided for future CSS support.
macros="URL URL…" – This
attribute provides a way of pointing to external macro definition files. Macros
are not part of the MathML specification, but a macro mechanism is anticipated,
as a future extension to MathML.
mode="display ęinline" –
The mode attribute specifies whether the enclosed MathML expression should be
rendered in a display style or an in-line style. The default is
mode="inline".
Elements and attributes
Many people are somewhat familiar with HTML-style
syntax. In HTML, one mixes keywords in angle brackets with the text to indicate
logical sections like paragraphs and titles. Different kinds of logical blocks
display in different styles. Often, one can specify variants on a theme by
adding attributes in the start tags of a particular block. For example, in HTML,
the start and end tags <table> and </table> mark a table section,
and you can specify variations by adding attributes like <table
width="85%">.
MathML uses a very similar style of mark-up. In MathML,
because of the nature of the subject matter, the ratio of tags to text is much
higher than in HTML, but the start tag/end tag syntax and the use of attributes
is the same.
There are a few small differences, which we will go
over below. These stem from the fact that the HTML syntax follows the rules of
SGML while MathML follows the rules of XML. Both SGML and XML are systems for
defining mark-up languages like HTML and MathML. SGML has been around a long
time, especially in industry and the government, and SGML applications are very
good for data that must remain accessible as software and hardware change.
However, it is quite complicated, so a simplified version tailored to Web
applications called XML has recently been formulated.
In MathML there are two kinds of elements. Most
elements have start and end tags of the form:
<element_name> ... </element_name>
These elements can have other data in between the start
and end tag, such as text, extended characters, or other elements. The remaining
MathML elements are empty elements of the form:
<element_name/>
These elements have just one tag, which looks like a
hybrid between a start and an end tag.
All MathML elements accept a few attributes, and some
accept a dozen or more. Attributes generally specify additional information
about the element. Each attribute has a name and a value. When used with an
element that has both start and end tags, the attributes go in the start tag
between the element name and the final '>'. In empty elements, attributes go
in between the element name and the final '/>'.
Attribute values must always be enclosed in quotes. In
XML, either double or single quotes are permitted A couple of templates
illustrate the general format for attributes:
<element_name attrib_name1='val1'
attrib_name2='val2' ... >
and:
<element_name attrib_name='value'/>
Most MathML attribute values are required to be in a
particular format, such as a positive integer, or one of a short list of
keywords like "true" and "false". The proper format for a
given attribute is listed in the presentation reference section.
The final thing you need to know about MathML syntax is
how the actual text and symbol characters needed for mathematical formulas are
encoded. First of all, characters and symbols can only appear inside a handful
of special MathML elements called token elements. Consider an example:
<mrow>
<mi>a</mi>
<mo>+</mo>
<mi>b</mi>
</mrow>
|
Most MathML elements, like the outer mrow element,
expect to only find other MathML elements in their content. By contrast, the mi
and mo elements are tokens, and their content consists of characters and
symbols.
Within token elements, one can have plain text
characters, which display as themselves, or special entity references. Entity
references are just keywords in a special format, which represent extended
characters. Examples of character references are α and ∩ which
stand for a lower case Greek alpha, and the intersection sign, respectively.
MathML renders like WebEQ, with access to symbol fonts, will display the actual
extended character glyph in the place of the entity reference.
The format for an entity reference is a keyword
preceded by an ampersand (&) and followed by a semicolon (;). That is, a
generic entity reference looks like: &entity_name;.
Most of the MathML entities names are nearly identical
to LaTeX symbol names: To write a LaTeX symbol such as \alpha in a form used by
MathML, remove the initial backslash and add an ampersand to the beginning and a
semi-colon to the end of the word. Thus, \alpha becomes α
Tokens and Basic Layout Schemata
The most common MathML presentation elements are the
token elements mi, mn and mo. Recall that token elements are the only elements
which directly contain character data, so each individual identifier, operator,
and number that appears in an expression must be wrapped in a token element.
<mi> ... </mi>
mi elements indicate that their contents should be
displayed as identifiers. This means that single character identifiers like 'x'
and 'h' should appear in italics, while multi-character identifiers like 'sin'
and 'log' should be in an upright font.
Attributes include font properities like fontweight,
fontfamilyand fontslant as well as general properties like fontcolor and
background.
<mn> ... </mn>
mn elements indicate that their contents should be
rendered as numbers, which generally means in an upright font.
Attributes are like those for mi.
<mo> ... </mo>
mo elements are the most complex token schema. The
indicate that their contents should be displayed as operators, but how operators
are displayed is often quite complicated. For example, the spacing around
operators varies depending on the operator. Other operators like sums and
products have special conventions for displaying limits as scripts. Still other
operators like vertical rules stretch to match the size of the expression which
they enclose.
In MathML, rendering software is expected to contain an
"operator dictionary" which contains information about how different
operators are conventionally rendered. However, everything about how an operator
should be displayed can be controlled directly by using attributes. Attributes
include properties like lspace, rspace, stretchy, and movablelimits.
The mo element is also used to mark-up other symbols
which are only operators in a very general sense, but whose layout properties
are like those of an operator. Thus, mo elements are used to mark-up delimiter
characters like parentheses (which stretch), punctuation (which has uneven
spacing around it) and accents (which also stretch). One can use attributes to
indicate that the contents of an mo should be treated as one of these related
types.
Now that we are acquainted with a few token elements
for marking up individual characters and symbols, we need some layout schemata
for arranging tokens into expressions. The most common and important general
purpose layout schema is the mrow element. The following list describes mrow and
some other common elements in more detail:
<mrow> child1 ... </mrow>
The mrow element can contain any number of child
elements, which it displays aligned along the baseline in a horizontal row.
However, in addition to positioning schemata in a row, the mrow is very handy
for grouping together terms into a single unit. One might do this in order to
make a collection of expressions into a single subscript, or one might nest some
terms in an mrow to limit how much a stretchy operator grows, and so on.
<mfrac> numerator denominator </mfrac>
The mfrac element expects exactly two children, the
first of which will be positioned as the numerator of a fraction, and the second
will be the denominator. By setting the linethickness attribute to 0, the mfrac
element can also be used for binomial coefficients.
<msqrt> child1 ... </msqrt>
The msqrt element accepts any number of children, and
displays them under a radical sign.
<mroot> base index</mroot>
The mroot element is nearly identical to the msqrt
element, except it expects a second child, which is displayed above the radical
in the location of the n in an nth root.
<mfenced> child ... </mfenced>
The mfenced element is like an mrow, except that it
displays enclosed in parentheses. Using attributes, one can set the beginning
and ending delimiter character, as well as internal separator characters like
commas.
<mstyle> child ... </mstyle>
The mstyle element is also like an mrow except that it
handles attributes differently. The mrow element has almost no attributes of its
own, while the mstyle elements can be used to set any MathML attribute.
Containers
Token elements represent identifiers and numbers. Of
course, an identifier can refer to any kind of mathematical object, but in the
case of common objects like vectors and sets, it would be nice to directly
encode the structure of the object as well as its name. For this, new elements
are needed to represent other kinds of mathematical objects and data types.
MathML uses container elements to represent basic
mathematical objects and data types. In general, container elements represent
things like sets which are constructed out of other data. The main examples are
sets, intervals, vectors, and matrices.
<set> [<elt1> <elt2> ... |
<condition>] </set>
The set element constructs a mathematical set whose
elements are specified by the set element's children. This can be done in two
ways. The children can either be a list of tokens and containers which represent
the individual elements of the set, or the set elements can be specified by a
single condition child element. The condition element is discussed below, and
encodes expressions like "all x such that x < 2".
<interval> <pt1> <pt2>
</interval>
Intervals in the real line can be specified with the
interval element. It expects exactly two children elements, which encode the end
points. The closure attribute determines which of the end points lie in the
interval, and can have the values "open", "closed",
"open-closed" and "closed-open". The default is closed.
<vector> <elt1> <elt2> ...
</vector>
A vector element constructs a vector whose components
are given in order by its children. By convention, in MathML vectors are column
vectors for matrix multiplication.
<matrix> <row1> <row2> ...
</matrix>
Matrices actually require two elements, matrix and
matrixrow. Although matrix rows are a little odd to single out from a
mathematical viewpoint, they are necessary crutch for encoding matrices. A
matrix element expects any number of children, but they have to all be matrixrow
elements. The children of the matrixrow elements represent the individual
entries in the matrix. All matrix rows should have the same number of elements.
The best way, to understand how all this works, is to
look at an example:

In MathML this becomes:
<e> <eq/>
<set>
<condition>
<ci>x</ci>
<e> <geq/><ci>x</ci><cn>0</cn> </e>
</condition>
</set>
<interval closure='closed-open'>
<cn>0</cn>
<ci>&infty;</ci>
</interval>
</e>
|
Demo of MathML in a browser
Currently only W3C's Amaya browser supports MathML and
still only some of the Presentation Tags. Amaya allows users to browse and edit
web pages containing mathematical expressions. Like the rest of the document,
these expressions are manipulated through a WYSIWYG interface.
If you want to check how much your browser supports
MathML, I've created a sample budget.
This screen shot is a demonstration of MathML. With
Amaya, an author can see the formatted view and the structure view at the same
time.

Both Microsoft and Netscape have recently stated that
their version 5 browser will not support MathML, due to the increase in
complexity and size.
With Netscape's release of their source code, and the
rapid change that version 5 is currently undergoing, it is very possible, that
the final version 5, will perhaps support some sort of MathML.
The EzMath plug-in
Dave Raggett, the co-chair of the W3C Math working
group (the group behind MathML), made EzMath. EzMath provides an easy to learn
notation for embedding mathematical expressions in web pages. The notation is
inspired by how expressions are spoken aloud together with a few abbreviations
for conciseness.
EzMath covers a widely used subset of mathematics.
EzMath focuses on the meaning of mathematical notation rather than just how it
looks on paper (or screen). The EzMath editor makes it very easy to create the
markup for pasting into your web pages, either in the EzMath notation or as
MathML.
The possibility, to create MathML code in EzMath, is
quite good, as you by doing this, don't force your users, to use a plug-in, to
view your page. You'll realize though, that EzMath code normally only takes up
half as much as MathML code.
You can download EzMath @ http://www.w3.org/People/Raggett/EzMath.zip,
or just learn more about EzMath.
The future of MathML
In the MathML 1.0 specification there's still factors,
that have not yet been fully developed. We still need MathML to fully integrate
into HTML. With the use of "math" stylesheets, it might become easier
for you to separate form from content. There is also ongoing work, on the use of
macros and a labeled diagram facility.
For those that want MathML to become simpler, you
should look forward to the day, when macros arrive. A possible macro, could be
an abbreviation macro so that he/she who hand-coded MathML would not have to
repeat some complicated but constant notation.
An important issue, which might be solved in the
stylesheets, is how to print MathML.
There are many different possibilities in MathML, but
there's still a long way to go, before MathML will give you a web-based
spreadsheet, without the use of applets and plug-ins.
Conclusion
Even though MathML is still in its infancy, it's now
ready for you to use.
Several software companies and organizations have
already publicly announced they will support MathML, including IBM,
HP, American
Mathematical Society, Geometry
Technologies and Design
Science.
It's not hard, to write some simple code in MathML, as
shown in the example below:
<apply><plus/>
<apply><sin/><ci>x</ci></apply>
<cn>9</cn>
</apply>
|
Means:
Last, but not least, it's important, to remember, that MathML is the 1st
XML application to ship, and together
with RDF (Resource Description Format),
that'll ship later this year, these 2 are supposed to be the test-cases to see
if, and how XML will succeed.
References
MathML 1.0 Specification
http://www.w3.org/TR/REC-MathML-19980407
W3C's Amaya browser, that supports MathML:
http://www.w3.org/Amaya
Including Math Notation in Web Pages:
http://forum.swarthmore.edu/typesetting/web.choices.html
HTML Math Implementation Goals:
http://www.geom.umn.edu/~rminer/w3c/interface
You can download EzMath @ http://www.w3.org/People/Raggett/EzMath.zip,
or just learn more about EzMath.
Content Element Reference
Token Elements
| <cn> |
Content Number |
| <ci> |
Content Identifier |
Basic Content Elements
| <apply> |
explicit application of a function to its
argument |
| <e> |
equation or relation |
| <fn> |
user-defined function |
| <interval> |
interval constructor |
| <inverse/> |
generic inverse |
| <sep/> |
separator in numeric values |
| <condition> |
domain constructor |
| <declare> |
declaration |
| <lambda> |
function construction from an expression |
Arithmetic, Algebra and Logic
| <idiv/> |
division modulo base |
| <exp/> |
exponentiation |
| <factorial/> |
factorial |
| <over/> |
division |
| <max/> |
maximum |
| <min/> |
minimum |
| <minus/> |
subtraction |
| <plus/> |
addition |
| <power/> |
to the power of |
| <rem/> |
remainder modulo base |
| <times/> |
multiplication |
| <root/> |
nth root |
| <gcd/> |
greatest common denominator |
| <and/> |
boolean and |
| <or/> |
boolean or |
| <xor/> |
boolean exclusive or |
| <not/> |
boolean not |
Relations
| <eq/> |
equal |
| <neq/> |
not equal |
| <gt/> |
greater than |
| <lt/> |
less than |
| <geq/> |
greater than or equal |
| <leq/> |
less than or equal |
| <implies/> |
boolean implies |
Calculus
| <ln/> |
natural logarithm |
| <log/> |
logarithm to given base |
| <int/> |
integral |
| <diff/> |
derivative, differentiation |
| <partialdiff/> |
partial derivative |
| <totaldiff/> |
total derivative |
| <lowlimit> |
lower limit (of integral, sum etc) |
| <uplimit> |
upper limit (of integral, sum etc) |
| <bvar> |
bound variable (e.g. for integral) |
| <degree> |
holds the n in "nth derivative" |
Theory of Sets
| <set> |
|
| <list> |
|
| <union/> |
union or meet |
| <intersect/> |
intersection or join |
| <in/> |
is in, is a member |
| <notin/> |
is not in, is not a member |
| <subset/> |
is a subset |
| <prsubset/> |
is a proper subset |
| <notsubset/> |
is not a subset |
| <notprsubset/> |
is not a proper subset |
| <setdiff/> |
set difference |
Sequences and Series
| <sum/> |
sum terms of a sequence |
| <product/> |
multiply terms in a sequence |
| <limit/> |
limiting value of a sequence |
| <tendsto/> |
relation on sequences |
Trigonometry - Standard Trigonometric Functions
| <sin/> |
<cos/> |
<tan/> |
<sec/> |
| <cosec/> |
<cotan/> |
<sinh/> |
<cosh/> |
| <tanh/> |
<sech/> |
<cosech/> |
<cotanh/> |
| <arcsin/> |
<arccos/> |
<arctan/> |
|
Statistics
| <mean/> |
mean or average |
| <sdev/> |
standard deviation |
| <var/> |
variance |
| <median/> |
median |
| <mode/> |
mode |
| <moment/> |
moment |
Linear Algebra
| <vector> |
vector |
| <matrix> |
matrix |
| <matrixrow> |
matrix row |
| <determinant/> |
determinant |
| <transpose/> |
transpose |
Semantic Mapping Elements
| <annotation> |
<semantics> |
<xmlannotation> |
Presentation Element Reference
Token Elements:
| <mi> |
identifier |
| <mn> |
number |
| <mo> |
operator, fence, or separator |
| <mtext> |
text |
| <mspace/> |
space |
| <ms> |
string literal |
General Layout:
| <mrow> |
group any number of sub-expressions horizontally |
| <mfrac> |
form a fraction from two sub-expressions |
| <msqrt> |
form a square root sign (radical without an
index) |
| <mroot> |
form a radical with specified index |
| <mstyle> |
style change |
| <merror> |
enclose a syntax error message from a
preprocessor |
| <mpadded> |
adjust space around content |
| <mphantom> |
make content invisible but preserve its size |
| <mfenced> |
surround content with a pair of fences |
Scripts and Limits:
| <msub> |
attach a subscript to a base |
| <msup> |
attach a superscript to a base |
| <msubsup> |
attach a subscript-superscript pair to a base |
| <munder> |
attach an underscript to a base |
| <mover> |
attach an overscript to a base |
| <munderover> |
attach an underscript-overscript pair to a base |
| <mmultiscripts> |
attach prescripts and tensor indices to a base |
Tables:
| <mtable> |
table or matrix |
| <mtr> |
row in a table or matrix |
| <mtd> |
one entry in a table or matrix |
| <maligngroup/> |
alignment group marker |
| <malignmark/> |
alignment point marker |
Actions:
| <maction> |
bind actions to a sub-expression |
Articles | Introduction to
MathML
| back
|