123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455 |
- # $Id: README,v 1.2 2007/06/13 10:09:47 ssttoo Exp $
- Introduction
- ============
- Text_Highlighter is a class for syntax highlighting. The main idea is to
- simplify creation of subclasses implementing syntax highlighting for
- particular language. Subclasses do not implement any new functioanality, they
- just provide syntax highlighting rules. The rules sources are in XML format.
- To create a highlighter for a language, there is no need to code a new class
- manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
- to create a new class.
- This document does not contain a formal description of API - it is very
- simple, and I believe providing some examples of code is sufficient.
- Highlighter XML source
- ======================
- Basics
- ------
- Creating a new syntax highlighter begins with describing the highlighting
- rules. There are two basic elements: block and region. A block is just a
- portion of text matching a regular expression and highlighted with a single
- color. Keyword is an example of a block. A region is defined by two regular
- expressions: one for start of region, and another for the end. The main
- difference from a block is that a region can contain blocks and regions
- (including same-named regions). An example of a region is a group of
- statements enclosed in curly brackets (this is used in many languages, for
- example PHP and C). Also, characters matching start and end of a region may be
- highlighted with their own color, and region contents with another.
- Blocks and regions may be declared as contained. Contained blocks and regions
- can only appear inside regions. If a region or a block is not declared as
- contained, it can appear both on top level and inside regions. Block or region
- declared as not-contained can only appear on top level.
- For any region, a list of blocks and regions that can appear inside this
- region can be specified.
- In this document, the term "color group" is used. Chunks of text assigned to
- same color group will be highlighted with same color. Note that in versions
- prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
- HTML output is supported, so "color group" is more appropriate term.
- Elements
- --------
- The toplevel element is <highlight>. Attribute lang is required and denotes
- the name of the language. Its value is used as a part of generated class name,
- and must only contain letters, digits and underscores. Optional attribute
- case, when given value yes, makes the language case sensitive (default is case
- insensitive). Allowed subelements are:
- * <authors>: Information about the authors of the file.
- <author>: Information about a single author of the file. (May be used
- multiple times, one per author.)
- - name="...": Author's name. Required.
- - email="...": Author's email address. Optional.
- * <default>: Default color group.
- - innerGroup="...": color group name. Required.
-
- * <region>: Region definition
- - name="...": Region name. Required.
- - innerGroup="...": Default color group of region contents. Required.
- - delimGroup="...": color group of start and end of region. Optional,
- defaults to value of innerGroup attribute.
- - start="...", end="...": Regular expression matching start and end
- of region. Required. Regular expression delimiters are optional, but
- if you need to specify delimiter, use /. The only case when the
- delimiters are needed, is specifying regular expression modifiers,
- such as m or U. Examples: \/\* or /$/m.
- - contained="yes": Marks region as contained.
- - never-contained="yes": Marks region as not-contained.
- - <contains>: Elements allowed inside this region.
- - all="yes" Region can contain any other region or block
- (except not-contained). May be used multiple times.
- - <but> Do not allow certain regions or blocks.
- - region="..." Name of region not allowed within
- current region.
- - block="..." Name of block not allowed within
- current region.
- - region="..." Name of region allowed within current region.
- - block="..." Name of block allowed within current region.
- - <onlyin> Only allow this region within certain regions. May be
- used multiple times.
- - block="..." Name of parent region
-
- * <block>: Block definition
- - name="...": Block name. Required.
- - innerGroup="...": color group of block contents. Optional. If not
- specified, color group of parent region or default color group will be
- used. One would only want to omit this attribute if there are
- keyword groups (see below) inherited from this block, and no special
- highlighting should apply when the block does not match the keyword.
- - match="..." Regular expression matching the block. Required.
- Regular expression delimiters are optional, but if you need to
- specify delimiter, use /. The only case when the delimiters are
- needed, is specifying regular expression modifiers, such as m or U.
- Examples: #|\/\/ or /$/m.
- - contained="yes": Marks block as contained.
- - never-contained="yes": Marks block as not-contained.
- - <onlyin> Only allow this block within certain regions. May be used
- multiple times.
- - block="..." Name of parent region
- - multiline="yes": Marks block as multi-line. By default, whole
- blocks are assumed to reside in a single line. This make the things
- faster. If you need to declare a multi-line block, use this
- attribute.
- - <partgroup>: Assigns another color group to a part of the block that
- matched a subpattern.
- - index="n": Subpattern index. Required.
- - innerGroup="...": color group name. Required.
- This is an example from CSS highlighter: the measure is matched as
- a whole, but the measurement units are highlighted with different
- color.
- <block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
- innerGroup="number" contained="yes">
- <onlyin region="property"/>
- <partGroup index="1" innerGroup="string" />
- </block>
-
- * <keywords>: Keyword group definition. Keyword groups are useful when you
- want to highlight some words that match a condition for a block with a
- different color. Keywords are defined with literal match, not regular
- expressions. For example, you have a block named identifier matching a
- general identifier, and want to highlight reserved words (which match
- this block as well) with different color. You inherit a keyword group
- "reserved" from "identifier" block.
- - name="...": Keyword group. Required.
- - ifdef="...", ifndef="..." : Conditional declaration. See
- "Conditions" below.
- - inherits="...": Inherited block name. Required.
- - innerGroup="...": color group of keyword group. Required.
- - case="yes|no": Overrides case-sensitivity of the language.
- Optional, defaults to global value.
- - <keyword>: Single keyword definition.
- - match="..." The keyword. Note: this is not a regular
- expression, but literal match (possibly case insensitive).
- Note that for BC reasons element partClass is alias for partGroup, and
- attributes innerClass and delimClass are aliases of innerGroup and
- delimGroup, respectively.
-
- Conditions
- ----------
- Conditional declarations allow enabling or disabling certain highlighting
- rules at runtime. For example, Java highlighter has a very big list of
- keywords matching Java standard classes. Finding a match in this list can take
- much time. For that reason, corresponding keyword group is declared with
- "ifdef" attribute :
- <keywords name="builtin" inherits="identifier" innerClass="builtin"
- case="yes" ifdef="java.builtins">
- <keyword match="AbstractAction" />
- <keyword match="AbstractBorder" />
- <keyword match="AbstractButton" />
- ...
- ...
- <keyword match="_Remote_Stub" />
- <keyword match="_ServantActivatorStub" />
- <keyword match="_ServantLocatorStub" />
- </keywords>
- This keyword group will be only enabled when "java.builtins" is passed as an
- element of "defines" option:
- $options = array(
- 'defines' => array(
- 'java.builtins',
- ),
- 'numbers' => HL_NUMBERS_TABLE,
- );
- $highlighter =& Text_Highlighter::factory('java', $options);
- "ifndef" attribute has reverse meaning.
- Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
- tag.
- Class generation
- ================
- Creating XML description of highlighting rules is the most complicated part of
- the process. To generate the class, you need just few lines of code:
- <?php
- require_once 'Text/Highlighter/Generator.php';
- $generator =& new Text_Highlighter_Generator('php.xml');
- $generator->generate();
- $generator->saveCode('PHP.php');
- ?>
- Command-line class generation tool
- ==================================
- Example from previous section looks pretty simple, but it does not handle any
- errors which may occur during parsing of XML source. The package provides a
- command-line script to make generation of classes even more simple, and takes
- care of possible errors. It is called generate (on Unix/Linux) or generate.bat
- (on Windows). This script is able to process multiple files in one run, and
- also to process XML from standard input and write generated code to standard
- output.
- Usage:
- generate options
- Options:
- -x filename, --xml=filename
- source XML file. Multiple input files can be specified, in which
- case each -x option must be followed by -p unless -d is specified
- Defaults to stdin
- -p filename, --php=filename
- destination PHP file. Defaults to stdout. If specied multiple times,
- each -p must follow -x
- -d dirname, --dir=dirname
- Default destination directory. File names will be taken from XML input
- ("lang" attribute of <highlight> tag)
- -h, --help
- This help
- Examples
- Read from php.xml, write to PHP.php
- generate -x php.xml -p PHP.php
- Read from php.xml, write to standard output
- generate -x php.xml
- Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
- generate -x php.xml -p PHP.php -x xml.xml -p XML.php
- Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
- /some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
- php.xml contains <highlight lang="php">)
- generate -x php.xml -x xml.xml -d /some/dir/
- Renderers
- =========
- Introduction
- ------------
- Text_Highlighter supports renderes. Using renderers, you can get output in
- different formats. Two renderers are included in the package:
- - HTML renderer. Generates HTML output. A style sheet should be linked to
- the document to display colored text
- - Console renderer. Can be used to output highlighted text to
- color-capable terminals, either directly or trough less -r
- Renderers API
- -------------
- Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
- override at least two methods - acceptToken and getOutput. Overriding other
- methods is optional, depending on the nature of renderer's output and details
- of implementation.
- string reset()
- resets renderer state. This method is called every time before a new
- source file is highlighted.
- string preprocess(string $code)
- preprocesses code. Can be used, for example, to normalize whitespace
- before highlighting. Returns preprocessed string.
- void acceptToken(string $group, string $content)
- the core method of the renderer. Highlighter passes chunks of text to
- this method in $content, and color group in $group
- void finalize()
- signals the renderer that no more tokens are available.
- mixed getOutput()
- returns generated output.
- Setting renderer options
- --------------------------------
- Renderers accept an optional argument to their constructor - options array.
- Elements of this array are renderer-specific.
- HTML renderer
- -------------
- HTML renderer produces HTML output with optional line numbering. The renderer
- itself does not provide information about actual colors of highlighted text.
- Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
- name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
- If 'use_language' option with value evaluating to true was passed, class names
- will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
- highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
- There are 3 special CSS classes:
- hl-main - this class applies to whole output or right table column,
- depending on 'numbers' option
- hl-gutter - applies to left column in table
- hl-table - applies to whole table
- HTML renderer accepts following options (each being optional):
-
- * numbers - line numbering style.
- 0 - no numbering (default)
- HL_NUMBERS_LI - use <ol></ol> for line numbering
- HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left
- column and highlighted text in right column
- * tabsize - tabulation size. Defaults to 4
- Example:
-
- require_once 'Text/Highlighter/Renderer/Html.php';
- $options = array(
- 'numbers' => HL_NUMBERS_LI,
- 'tabsize' => 8,
- );
- $renderer =& new Text_Highlighter_Renderer_HTML($options);
- Console renderer
- ----------------
- Console renderer produces output for displaying on a color-capable terminal,
- either directly or through less -r, using ANSI escape sequences. By default,
- this renderer only highlights most common color groups. Additional colors
- can be specified using 'colors' option. This renderer also accepts 'numbers'
- option - a boolean value, and 'tabsize' option.
- Example :
- require_once 'Text/Highlighter/Renderer/Console.php';
- $colors = array(
- 'prepro' => "\033[35m",
- 'types' => "\033[32m",
- );
- $options = array(
- 'numbers' => true,
- 'tabsize' => 8,
- 'colors' => $colors,
- );
- $renderer =& new Text_Highlighter_Renderer_Console($options);
- ANSI color escape sequences have the following format:
- ESC[#;#;....;#m
- where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
- one of the following:
- 0 for normal display
- 1 for bold on
- 4 underline (mono only)
- 5 blink on
- 7 reverse video on
- 8 nondisplayed (invisible)
- 30 black foreground
- 31 red foreground
- 32 green foreground
- 33 yellow foreground
- 34 blue foreground
- 35 magenta foreground
- 36 cyan foreground
- 37 white foreground
- 40 black background
- 41 red background
- 42 green background
- 43 yellow background
- 44 blue background
- 45 magenta background
- 46 cyan background
- 47 white background
- How to use Text_Highlighter class
- =================================
- Creating a highlighter object
- -----------------------------
- To create a highlighter for a certain language, use Text_Highlighter::factory()
- static method:
- require_once 'Text/Highlighter.php';
- $hl =& Text_Highlighter::factory('php');
- Setting a renderer
- ------------------
- Actual output is produced by a renderer.
- require_once 'Text/Highlighter.php';
- require_once 'Text/Highlighter/Renderer/Html.php';
- $options = array(
- 'numbers' => HL_NUMBERS_LI,
- 'tabsize' => 8,
- );
- $renderer =& new Text_Highlighter_Renderer_HTML($options);
- $hl =& Text_Highlighter::factory('php');
- $hl->setRenderer($renderer);
- Note that for BC reasons, it is possible to use highlighter without setting a
- renderer. If no renderer is set, HTML renderer will be used by default. In
- this case, you should pass options as second parameter to factory method. The
- following example works exactly as previous one:
- require_once 'Text/Highlighter.php';
- $options = array(
- 'numbers' => HL_NUMBERS_LI,
- 'tabsize' => 8,
- );
- $hl =& Text_Highlighter::factory('php', $options);
- Getting output
- --------------
- And finally, do the highlighting and get the output:
- require_once 'Text/Highlighter.php';
- require_once 'Text/Highlighter/Renderer/Html.php';
- $options = array(
- 'numbers' => HL_NUMBERS_LI,
- 'tabsize' => 8,
- );
- $renderer =& new Text_Highlighter_Renderer_HTML($options);
- $hl =& Text_Highlighter::factory('php');
- $hl->setRenderer($renderer);
- $html = $hl->highlight(file_get_contents('example.php'));
- # vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */
|