» Home Page
xhtmlatex: a latex-to-XHTML+MathJax conversion tool.


Given a LaTeX book or document, the aim is to produce an XHTML version of the book, with TeX+MathJax formulas, as much accessible as possible. I wrote a few script wrapping the work done by Emma Cliffe on PlasTeX, and decided to give the name xhtmlatex to the main program, to remember that we take a LaTeX source and convert it to XHTML.

The final output is an accessible version of the book/article, with MathJax-embedded latex formulas, which is supposed to be accessible with MathPlayer or text-only browsers (such as lynx), and a set of tactile display printable tagged images. The program is not easy to use, is badly documented, and it does not always work. But sometimes it works well.


  1. linux, unix, or Mac OSX. Python (2.7) with PIL (Python Imaging Library) installed. Probably it works on Windows, but it has not been tested.
  2. A recent TeX distribution (such as TeXLive http://www.tug.org/texlive/), including William Park's braille.sty.


The zip archive xhtmlatex.zip contains what is needed, except python and TeX. I am not sure about licensing and redistribution, but it contains a copy of plasTeX-0.9.2 http://plastex.sourceforge.net/ (in the version modified by Emma Cliffe, and further modified by me), together with a stripped-down distribution of MathJax 2.0 (20-g07669ac) http://www.mathjax.org/, to avoid relying on CDN, and some scripts. Please read the README and LICENSE files in the zip archive befor using it.

  1. Download xhtmlatex.zip, and unzip it somewhere. The best place could be ${HOME}/local/
  2. (for /bin/bash) Add to .bash_profile (or .bashrc or .profile) the following lines:
    export XHTMLATEXDIR=[absolute path of the unzippped xhtmlatex directory]
    export PATH=${XHTMLATEXDIR}/bin:${PATH}
  3. Load the new variables with `. .bash_profile` or logout-and-relogin.
  4. Compile a LaTeX file filename.tex with the command
    xhtmlatex --output=outputdir filename.tex
    Then open outputdir/index.html with a browser.

A more complex way of using it is

xhtmlatex --tidy --fix-labels  --add-images -x qedhere \
-x bigskip -x medskip -x textellipsis -x smallskip \
--output notes1 -s -n eserc -n equatio notes.tex


xhmlatex [OPTIONS] [--output=OUTPUTDIR] [filename][.tex]

        --output|-o = OUTPUTDIR 
        --nunmberwithin|-n = COUNTERNAME
        --stopmode|-s   : stop LaTeX at the first error
        --split-level|-l: SPLIT-LEVEL (=Highest section level that generates a new file [1])
        --exclude|-x = COMMANDNAME : remove all occurrencies of \COMMANDNAME
                        (no backslash in the CLI argument) in TeX 
                     (put more -x .. -x .. -x .. for more commands to exclude).
        --add-images|-a : *try* to add an image for each includegraphics.
        --latexcmd|-c   : latex command to use for compiling.
        --fixlabels|-f  : fix labels (converting to plain strings)
        --tidy|-t       : use html-tidy to clean XHTML and remove broken links

Basically the program executes the following steps:

  1. It creates a temporary LaTeX file OUTPUTDIR.tex, which is a preprocessed version of <filename.tex>.
  2. It runs [plasTeX] on OUTPUTDIR.tex with the MathJax Renderer, and creates the xhtml files in the directory OUTPUTDIR (which is created if it does not exist).
  3. Post-processes the files in OUTPUTDIR to fix crosslinks in the document and to fix some shortcomings of plasTeX (on theorem numbering or images).
  4. If with `--add-images` option, tries to create the file OUTPUTDIR/images.pdf, containing all images without a description, to be printed on swell paper or embossed.

It is necessary that the book/article does not use packages conflicting with plastex, and counters for theorems and equations should be different. Other counters can be manually reset at each section with the option `--numberwithin`.

If there are plastex errors during the compilation, try to comment-out the offending command, or renewcommand it in a simpler way (using \ifplastex if necessary).

An example of a suitable definition of theorems is the following (in the preamble):




The MathGen software helped me to make two test cases, randomly generated books then converted to XHTML: Euclidean Category Theory and Introductory Microlocal Graph Theory.



Somehow related to accessibility and blind teachers, I wrote MCQ-XeLaTeX, a piece of software to author LaTeX multiple-choice quizzes and grade them with (free) Optical Mark Recognition imaging software.