README 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455
  1. # $Id: README,v 1.2 2007/06/13 10:09:47 ssttoo Exp $
  2. Introduction
  3. ============
  4. Text_Highlighter is a class for syntax highlighting. The main idea is to
  5. simplify creation of subclasses implementing syntax highlighting for
  6. particular language. Subclasses do not implement any new functioanality, they
  7. just provide syntax highlighting rules. The rules sources are in XML format.
  8. To create a highlighter for a language, there is no need to code a new class
  9. manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
  10. to create a new class.
  11. This document does not contain a formal description of API - it is very
  12. simple, and I believe providing some examples of code is sufficient.
  13. Highlighter XML source
  14. ======================
  15. Basics
  16. ------
  17. Creating a new syntax highlighter begins with describing the highlighting
  18. rules. There are two basic elements: block and region. A block is just a
  19. portion of text matching a regular expression and highlighted with a single
  20. color. Keyword is an example of a block. A region is defined by two regular
  21. expressions: one for start of region, and another for the end. The main
  22. difference from a block is that a region can contain blocks and regions
  23. (including same-named regions). An example of a region is a group of
  24. statements enclosed in curly brackets (this is used in many languages, for
  25. example PHP and C). Also, characters matching start and end of a region may be
  26. highlighted with their own color, and region contents with another.
  27. Blocks and regions may be declared as contained. Contained blocks and regions
  28. can only appear inside regions. If a region or a block is not declared as
  29. contained, it can appear both on top level and inside regions. Block or region
  30. declared as not-contained can only appear on top level.
  31. For any region, a list of blocks and regions that can appear inside this
  32. region can be specified.
  33. In this document, the term "color group" is used. Chunks of text assigned to
  34. same color group will be highlighted with same color. Note that in versions
  35. prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
  36. HTML output is supported, so "color group" is more appropriate term.
  37. Elements
  38. --------
  39. The toplevel element is <highlight>. Attribute lang is required and denotes
  40. the name of the language. Its value is used as a part of generated class name,
  41. and must only contain letters, digits and underscores. Optional attribute
  42. case, when given value yes, makes the language case sensitive (default is case
  43. insensitive). Allowed subelements are:
  44. * <authors>: Information about the authors of the file.
  45. <author>: Information about a single author of the file. (May be used
  46. multiple times, one per author.)
  47. - name="...": Author's name. Required.
  48. - email="...": Author's email address. Optional.
  49. * <default>: Default color group.
  50. - innerGroup="...": color group name. Required.
  51. * <region>: Region definition
  52. - name="...": Region name. Required.
  53. - innerGroup="...": Default color group of region contents. Required.
  54. - delimGroup="...": color group of start and end of region. Optional,
  55. defaults to value of innerGroup attribute.
  56. - start="...", end="...": Regular expression matching start and end
  57. of region. Required. Regular expression delimiters are optional, but
  58. if you need to specify delimiter, use /. The only case when the
  59. delimiters are needed, is specifying regular expression modifiers,
  60. such as m or U. Examples: \/\* or /$/m.
  61. - contained="yes": Marks region as contained.
  62. - never-contained="yes": Marks region as not-contained.
  63. - <contains>: Elements allowed inside this region.
  64. - all="yes" Region can contain any other region or block
  65. (except not-contained). May be used multiple times.
  66. - <but> Do not allow certain regions or blocks.
  67. - region="..." Name of region not allowed within
  68. current region.
  69. - block="..." Name of block not allowed within
  70. current region.
  71. - region="..." Name of region allowed within current region.
  72. - block="..." Name of block allowed within current region.
  73. - <onlyin> Only allow this region within certain regions. May be
  74. used multiple times.
  75. - block="..." Name of parent region
  76. * <block>: Block definition
  77. - name="...": Block name. Required.
  78. - innerGroup="...": color group of block contents. Optional. If not
  79. specified, color group of parent region or default color group will be
  80. used. One would only want to omit this attribute if there are
  81. keyword groups (see below) inherited from this block, and no special
  82. highlighting should apply when the block does not match the keyword.
  83. - match="..." Regular expression matching the block. Required.
  84. Regular expression delimiters are optional, but if you need to
  85. specify delimiter, use /. The only case when the delimiters are
  86. needed, is specifying regular expression modifiers, such as m or U.
  87. Examples: #|\/\/ or /$/m.
  88. - contained="yes": Marks block as contained.
  89. - never-contained="yes": Marks block as not-contained.
  90. - <onlyin> Only allow this block within certain regions. May be used
  91. multiple times.
  92. - block="..." Name of parent region
  93. - multiline="yes": Marks block as multi-line. By default, whole
  94. blocks are assumed to reside in a single line. This make the things
  95. faster. If you need to declare a multi-line block, use this
  96. attribute.
  97. - <partgroup>: Assigns another color group to a part of the block that
  98. matched a subpattern.
  99. - index="n": Subpattern index. Required.
  100. - innerGroup="...": color group name. Required.
  101. This is an example from CSS highlighter: the measure is matched as
  102. a whole, but the measurement units are highlighted with different
  103. color.
  104. <block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
  105. innerGroup="number" contained="yes">
  106. <onlyin region="property"/>
  107. <partGroup index="1" innerGroup="string" />
  108. </block>
  109. * <keywords>: Keyword group definition. Keyword groups are useful when you
  110. want to highlight some words that match a condition for a block with a
  111. different color. Keywords are defined with literal match, not regular
  112. expressions. For example, you have a block named identifier matching a
  113. general identifier, and want to highlight reserved words (which match
  114. this block as well) with different color. You inherit a keyword group
  115. "reserved" from "identifier" block.
  116. - name="...": Keyword group. Required.
  117. - ifdef="...", ifndef="..." : Conditional declaration. See
  118. "Conditions" below.
  119. - inherits="...": Inherited block name. Required.
  120. - innerGroup="...": color group of keyword group. Required.
  121. - case="yes|no": Overrides case-sensitivity of the language.
  122. Optional, defaults to global value.
  123. - <keyword>: Single keyword definition.
  124. - match="..." The keyword. Note: this is not a regular
  125. expression, but literal match (possibly case insensitive).
  126. Note that for BC reasons element partClass is alias for partGroup, and
  127. attributes innerClass and delimClass are aliases of innerGroup and
  128. delimGroup, respectively.
  129. Conditions
  130. ----------
  131. Conditional declarations allow enabling or disabling certain highlighting
  132. rules at runtime. For example, Java highlighter has a very big list of
  133. keywords matching Java standard classes. Finding a match in this list can take
  134. much time. For that reason, corresponding keyword group is declared with
  135. "ifdef" attribute :
  136. <keywords name="builtin" inherits="identifier" innerClass="builtin"
  137. case="yes" ifdef="java.builtins">
  138. <keyword match="AbstractAction" />
  139. <keyword match="AbstractBorder" />
  140. <keyword match="AbstractButton" />
  141. ...
  142. ...
  143. <keyword match="_Remote_Stub" />
  144. <keyword match="_ServantActivatorStub" />
  145. <keyword match="_ServantLocatorStub" />
  146. </keywords>
  147. This keyword group will be only enabled when "java.builtins" is passed as an
  148. element of "defines" option:
  149. $options = array(
  150. 'defines' => array(
  151. 'java.builtins',
  152. ),
  153. 'numbers' => HL_NUMBERS_TABLE,
  154. );
  155. $highlighter =& Text_Highlighter::factory('java', $options);
  156. "ifndef" attribute has reverse meaning.
  157. Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
  158. tag.
  159. Class generation
  160. ================
  161. Creating XML description of highlighting rules is the most complicated part of
  162. the process. To generate the class, you need just few lines of code:
  163. <?php
  164. require_once 'Text/Highlighter/Generator.php';
  165. $generator =& new Text_Highlighter_Generator('php.xml');
  166. $generator->generate();
  167. $generator->saveCode('PHP.php');
  168. ?>
  169. Command-line class generation tool
  170. ==================================
  171. Example from previous section looks pretty simple, but it does not handle any
  172. errors which may occur during parsing of XML source. The package provides a
  173. command-line script to make generation of classes even more simple, and takes
  174. care of possible errors. It is called generate (on Unix/Linux) or generate.bat
  175. (on Windows). This script is able to process multiple files in one run, and
  176. also to process XML from standard input and write generated code to standard
  177. output.
  178. Usage:
  179. generate options
  180. Options:
  181. -x filename, --xml=filename
  182. source XML file. Multiple input files can be specified, in which
  183. case each -x option must be followed by -p unless -d is specified
  184. Defaults to stdin
  185. -p filename, --php=filename
  186. destination PHP file. Defaults to stdout. If specied multiple times,
  187. each -p must follow -x
  188. -d dirname, --dir=dirname
  189. Default destination directory. File names will be taken from XML input
  190. ("lang" attribute of <highlight> tag)
  191. -h, --help
  192. This help
  193. Examples
  194. Read from php.xml, write to PHP.php
  195. generate -x php.xml -p PHP.php
  196. Read from php.xml, write to standard output
  197. generate -x php.xml
  198. Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
  199. generate -x php.xml -p PHP.php -x xml.xml -p XML.php
  200. Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
  201. /some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
  202. php.xml contains <highlight lang="php">)
  203. generate -x php.xml -x xml.xml -d /some/dir/
  204. Renderers
  205. =========
  206. Introduction
  207. ------------
  208. Text_Highlighter supports renderes. Using renderers, you can get output in
  209. different formats. Two renderers are included in the package:
  210. - HTML renderer. Generates HTML output. A style sheet should be linked to
  211. the document to display colored text
  212. - Console renderer. Can be used to output highlighted text to
  213. color-capable terminals, either directly or trough less -r
  214. Renderers API
  215. -------------
  216. Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
  217. override at least two methods - acceptToken and getOutput. Overriding other
  218. methods is optional, depending on the nature of renderer's output and details
  219. of implementation.
  220. string reset()
  221. resets renderer state. This method is called every time before a new
  222. source file is highlighted.
  223. string preprocess(string $code)
  224. preprocesses code. Can be used, for example, to normalize whitespace
  225. before highlighting. Returns preprocessed string.
  226. void acceptToken(string $group, string $content)
  227. the core method of the renderer. Highlighter passes chunks of text to
  228. this method in $content, and color group in $group
  229. void finalize()
  230. signals the renderer that no more tokens are available.
  231. mixed getOutput()
  232. returns generated output.
  233. Setting renderer options
  234. --------------------------------
  235. Renderers accept an optional argument to their constructor - options array.
  236. Elements of this array are renderer-specific.
  237. HTML renderer
  238. -------------
  239. HTML renderer produces HTML output with optional line numbering. The renderer
  240. itself does not provide information about actual colors of highlighted text.
  241. Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
  242. name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
  243. If 'use_language' option with value evaluating to true was passed, class names
  244. will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
  245. highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
  246. There are 3 special CSS classes:
  247. hl-main - this class applies to whole output or right table column,
  248. depending on 'numbers' option
  249. hl-gutter - applies to left column in table
  250. hl-table - applies to whole table
  251. HTML renderer accepts following options (each being optional):
  252. * numbers - line numbering style.
  253. 0 - no numbering (default)
  254. HL_NUMBERS_LI - use <ol></ol> for line numbering
  255. HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left
  256. column and highlighted text in right column
  257. * tabsize - tabulation size. Defaults to 4
  258. Example:
  259. require_once 'Text/Highlighter/Renderer/Html.php';
  260. $options = array(
  261. 'numbers' => HL_NUMBERS_LI,
  262. 'tabsize' => 8,
  263. );
  264. $renderer =& new Text_Highlighter_Renderer_HTML($options);
  265. Console renderer
  266. ----------------
  267. Console renderer produces output for displaying on a color-capable terminal,
  268. either directly or through less -r, using ANSI escape sequences. By default,
  269. this renderer only highlights most common color groups. Additional colors
  270. can be specified using 'colors' option. This renderer also accepts 'numbers'
  271. option - a boolean value, and 'tabsize' option.
  272. Example :
  273. require_once 'Text/Highlighter/Renderer/Console.php';
  274. $colors = array(
  275. 'prepro' => "\033[35m",
  276. 'types' => "\033[32m",
  277. );
  278. $options = array(
  279. 'numbers' => true,
  280. 'tabsize' => 8,
  281. 'colors' => $colors,
  282. );
  283. $renderer =& new Text_Highlighter_Renderer_Console($options);
  284. ANSI color escape sequences have the following format:
  285. ESC[#;#;....;#m
  286. where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
  287. one of the following:
  288. 0 for normal display
  289. 1 for bold on
  290. 4 underline (mono only)
  291. 5 blink on
  292. 7 reverse video on
  293. 8 nondisplayed (invisible)
  294. 30 black foreground
  295. 31 red foreground
  296. 32 green foreground
  297. 33 yellow foreground
  298. 34 blue foreground
  299. 35 magenta foreground
  300. 36 cyan foreground
  301. 37 white foreground
  302. 40 black background
  303. 41 red background
  304. 42 green background
  305. 43 yellow background
  306. 44 blue background
  307. 45 magenta background
  308. 46 cyan background
  309. 47 white background
  310. How to use Text_Highlighter class
  311. =================================
  312. Creating a highlighter object
  313. -----------------------------
  314. To create a highlighter for a certain language, use Text_Highlighter::factory()
  315. static method:
  316. require_once 'Text/Highlighter.php';
  317. $hl =& Text_Highlighter::factory('php');
  318. Setting a renderer
  319. ------------------
  320. Actual output is produced by a renderer.
  321. require_once 'Text/Highlighter.php';
  322. require_once 'Text/Highlighter/Renderer/Html.php';
  323. $options = array(
  324. 'numbers' => HL_NUMBERS_LI,
  325. 'tabsize' => 8,
  326. );
  327. $renderer =& new Text_Highlighter_Renderer_HTML($options);
  328. $hl =& Text_Highlighter::factory('php');
  329. $hl->setRenderer($renderer);
  330. Note that for BC reasons, it is possible to use highlighter without setting a
  331. renderer. If no renderer is set, HTML renderer will be used by default. In
  332. this case, you should pass options as second parameter to factory method. The
  333. following example works exactly as previous one:
  334. require_once 'Text/Highlighter.php';
  335. $options = array(
  336. 'numbers' => HL_NUMBERS_LI,
  337. 'tabsize' => 8,
  338. );
  339. $hl =& Text_Highlighter::factory('php', $options);
  340. Getting output
  341. --------------
  342. And finally, do the highlighting and get the output:
  343. require_once 'Text/Highlighter.php';
  344. require_once 'Text/Highlighter/Renderer/Html.php';
  345. $options = array(
  346. 'numbers' => HL_NUMBERS_LI,
  347. 'tabsize' => 8,
  348. );
  349. $renderer =& new Text_Highlighter_Renderer_HTML($options);
  350. $hl =& Text_Highlighter::factory('php');
  351. $hl->setRenderer($renderer);
  352. $html = $hl->highlight(file_get_contents('example.php'));
  353. # vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */