A Quick Introduction to TREEBAG

Frank Drewes

Below, you will find some brief instructions on how to install, run, and use TREEBAG. To learn more about TREEBAG, see the TREEBAG home page.

Installation
To install TREEBAG, simply download the system from the home page and store the file (which is a jar archive, called the TREEBAG archive below) on your computer.
Running TREEBAG
In order to execute TREEBAG, the Java runtime environment (version 1.4 or later) must be installed on your system. If this is the case, it should be sufficient to double-click the TREEBAG archive if you just want to start TREEBAG without an argument. However, you gain flexibility by executing TREEBAG from the command line. For this, let us assume that you have copied the TREEBAG archive to <my directory>/treebag.jar. Then you can start TREEBAG using the command java -jar <my directory>/treebag.jar [<worksheet>]
where the last argument is optional and should, if present, be the file name of the TREEBAG worksheet to be opened (see below). Under Windows, you may prefer to substitute javaw for java in order to suppress the console window which the former opens (and you should probably use Windows' path separator character '\' instead of '/').
To avoid running out of stack or memory space, it is advisable to start the Java interpreter using the options -Xss and -Xmx, e.g., -Xss8m (8MB stack space) and -Xmx800m (800MB memory limit). This would turn the basic command displayed above into
java -Xss8m -Xmx800m -jar <my directory>/treebag.jar [<worksheet>].
For convenience, one may define a small command called treebag or the like, which takes one optional argument and acts as an abbreviation for java -Xss8m -Xmx800m -jar <my directory>/treebag.jar. The way in which such a command can be defined (e.g., as a shell script under Unix, Linux, or Mac OS X) depends on the operation system used.
Examples
A wealth of examples related to picture generation can be found on the CD enclosed with the book Grammatical Picture Generation. The examples can also be downloaded from the web site related to the book.
The TREEBAG worksheet
The worksheet is the main window of TREEBAG. It allows the user to create and modify a graph consisting of TREEBAG components: tree generators (i.e., tree grammars and tree transducers), algebras, and so-called displays. The output of a tree generator can be fed into tree transducers and algebras; that of an algebra can be fed into displays. New input relations can be established by clicking first on the source and then on the target node. A double click on a node opens a control pane that allows you to interact with the respective component (in the case of displays, the first double click opens the associated window whereas the second opens its control pane). A double click on an unoccupied area of the worksheet opens a file selector that allows you to load another component. See the TREEBAG manual on how to define such components.
Classes of tree grammars
Below follows a brief description of the classes of tree grammars used in the examples on the CD mentioned above.
- regularTreeGrammar
  As its name indicates, this class provides an implementation of regular tree grammars. It has two basic modes, called enumeration and random generation. In the first, the button advance makes it possible to step through the elements of the generated language one by one. Often, the random generation mode is more useful. It provides commands refine and back, the first replacing all nonterminals in parallel using randomly chosen rules whereas the second undoes such a parallel derivation step. Note that the current version of TREEBAG does not provide any means to choose rules interactively. Repeatedly executing refine and back is often the only way to achieve a satisfactory result.
  In enumeration mode, the command derive stepwise yields the possibility to view the generation of the currently enumerated tree in a stepwise manner using commands single step, parallel step, and back. Note that this is not the same as refine and back in random generation mode as the derivation is now fixed. Thus, one may use random generation mode to produce a satisfactory tree (in the way described above), switch to enumeration mode (which automatically terminates the derivation if possible), and then use derive stepwise in order to examine the derivation in detail.
- ET0LTreeGrammar
  Like regularTreeGrammar this implementation of ET0L tree grammars has two main modes. They are called table enumeration and random tables. In the first, the command advance lets you step through all table sequences (well, not really all – see below for more about this) and TREEBAG will only return output trees of the grammar. In this mode, there is also a command derive stepwise similar to the corresponding command in regularTreeGrammar. In random tables mode, tables are chosen at random and output trees as well as nonterminal ones are produced. In both modes, rules within tables are chosen randomly if the grammar is nondeterministic. In this case, there is another command (available in both modes) called new derivation that keeps the table sequence but makes new random choices to select the rules within the chosen tables. Note that new derivation is unavailable in deterministic grammars (for obvious reasons).
  It may be interesting to know that the implementation of ET0LTreeGrammar decomposes every such grammar into a regular tree grammar and a top-down tree transducer. These automatically generated components are stored as grammar.1 resp. grammar.2 if the file name of the grammar is grammar. As these files describe ordinary TREEBAG components, one may load them onto the worksheet like any other component.
  For readers who want to design their own grammars: The class ET0LTreeGrammar tries to enumerate only table sequences resulting in output trees. This is done in an unsophisticated way which may sometimes lead to unexpected results: If there is at least one "terminating" table (i.e., a table having for each nonterminal at least one rule whose right-hand side consists entirely of output symbols), only table sequences are enumerated whose last table is terminating. If there is no terminating table at all, TREEBAG simply enumerates all table sequences – something which frequently yields undefined results. Hence, for such grammars, random tables mode is much more useful. (If you are designing ET0L tree grammars yourself, recall that a missing rule for a nonterminal A gives rise to the implicit rule A -> A. Thus, to make TREEBAG realize that a table is supposed to be terminating you have to include terminating rules for all nonterminals that are not output symbols.)
- pdtGrammar
  The class name of this component stands for parallel deterministic tree grammar. It implements the special case of an EDT0L tree grammar consisting of exactly two tables, where the right-hand sides of the second table are terminal trees. In terminal results mode the component returns all trees obtained by applying the first table i times and the second table once. Here, i can be increased and decreased by the commands advance respectively back. In nonterminal results mode all trees are returned that are generated by applying the first table i times.
  Compared with the more general class ET0LTreeGrammar, the major advantage of pdtGrammar is its greater efficiency. In many directories containing examples that make use of some EDT0L tree grammar grammar, implemented as an instance of ET0LTreeGrammar, you can therefore find a file grammer.pdt containing the equivalent pdtGrammar, which you may load onto the worksheet. (For the sake of uniformity, ET0LTreeGrammar has been used as the default for all worksheets; the gain in efficiency is usually not very large because the generated trees and pictures remain the same.)
- BSTGrammar
  This implementation of branching tree grammars has been made recently and has not yet been tested as much as the classes above. The two main modes as well as most commands commands are similar to the class ET0LTreeGrammar. The command new derivation has been replaced with choose new supertables at depth i and choose new rules (where the latter is available only if the grammar is nondeterministic). An additional command is show sync info, which switches to a mode in which the synchronization strings are added to nonterminals as monadic subtrees. This is useful if you want to understand what happens during a derivation and watch the intermediate trees in a tree display.

Back to main page