SourceForge.net Logo Home of http://sourceforge.net/projects/pyfmf
Contact: danperl@users.sourceforge.net
Downloads: http://sourceforge.net/project/showfiles.php?group_id=116235

pyfmf: a file management framework in python
Page Changed: 2/15/2005


http://www.python.org

Design

The framework consists of a few base classes in Python.  The zigo toolkit is the simpler extension of the framework and zago is basically the same functionality as zigo but with a GUI wrapper.  This document describes the design of the framework and the zigo toolkit.

Structure


class diagram

The implementation of the framework and the zigo toolkit consists of the top-level class (Controller, in src/ctrlr/zigo.py) and two packages (config and handlers).  The handlers package (under the src directory) contains a base class for all the handlers and all the derivations of the base class.

The source files can be viewed here:  http://cvs.sourceforge.net/viewcvs.py/pyfmf/danperl/zigzag.  Or you can download them as a package from: http://sourceforge.net/project/showfiles.php?group_id=116235.  Or better still, download all the files from CVS (see Zigo).

The Controller Class

This is the top-level class to be used by any application based on the framework.  Controls the creation of the handler stack and the traversal through the tree of directories.  It invokes the handler stack for every directory that is traversed and for every sub-directory and file in that directory.  [Actually, the controller sees the handler stack as one single handler which is the handler at the top of the stack.  Each handler in the stack is aware only of the handler right below it.  See more in 'The Handler Stack']

The entire functionality of the Controller class is in two methods, walk (a generator based on the code in os.walk) and run, which invokes walk in a loop and thus traverses all the trees in its configuration.  See the code in zigo.py for an example of how to use this class.

The Handler Classes

All Handler classes are subclasses of baseClass.Handler.  Each handler is defined in its own module in the handlers package and the class has to be named Handler.  This is a convention used by the controller to create handler instances from the configuration.

The base class (baseClass.Handler) has several 'hook' methods that can be overridden in Handler subclasses:
  1. treeTopHook - initialization that is needed for every tree.
  2. beginParentDirHook - invoked for each parent directory, BEFORE processing the children directories and the files in it.
  3. endParentDirHook - invoked for each parent directory, AFTER processing the children directories and the files in it.
  4. handleChildDirHook - invoked for each child directory in the parent directory.
  5. handleFileHook - invoked for each file in the parent directory.
  6. finalizeHook - save all the results if needed and release resources.
All these hook methods return True or False,  a value of False indicating to the handlers above in the stack that the directory or the file is filtered out and should not be handled.  The base class provides defaults for these hook methods that simply return TrueHandler subclasses implement at least one of these hook methods to process and/or filter the directories and the files in the tree.

There are other methods in baseClass.Handler that are wrappers around hook methods.  These wrapper methods take care of the recursion in the handler stack and they are the methods called by the walkDirs.Controller class.

There is also a setConfig method that takes a dictionary parameter (configDict) and updates the handler with the configuration described by the configDict parameter.  Handler classes have to override the setConfig method with configuration specific to the handler.  Another important method in baseClass.Handler is overriding the operator __add__.  It is used to append handlers to each other and thus forming the handler stack (see more in 'The Handler Stack').

The Handler Stack

A handler stack is created by appending handlers to each other, using the __add__ operator.  For example,
hStack = hInst1 + hInst2 + hInst3
creates hStack from three instances of Handler subclasses.  hStack in this example is actually another name for the object represented by hInst1.  The same result would be achieved by:
hInst1 + hInst2 + hInst3
hStack = hInst1
Even more, the same result is achieved by:
hStack = hInst1
hStack + hInst2
hStack + hInst3
or by:
hInst2 + hInst3
hStack = hInst1 + hInst2
or even by:
hStack = hInst1 + hInst2
hInst2 + hInst3


All these different uses are possible because the effect of h1.__add__(h2) is to append the h2 handler to h1 and to return a reference to h1h1 then has a reference to h2 in the nextHandler member of baseClass.Handler.  Every handler in the stack is aware only of the handler that was appended to it.  The last handler in the stack (we'll call it the bottom of the stack) has the nextHandler member set to None.  Similarly, the Controller builds the stack but then accesses only the first handler in the stack (we'll call it the top of the stack).

The Handler hook methods are invoked bottom-up in the stack.  In the example above, when processing a file, hInst3.handleFile would be invoked first, then hInst2.handleFile, then hInst1.handleFile.  The following diagram describes this example:
Handlers Stack
An explanation is in order here.  The wrapper methods of the handlers are the ones invoked directly by the Controller and they are invoked from the top of the stack.  However, inside these wrapper methods, every handler first invokes the hook method of the next handler and only then invokes its own hook method.

Note: This is in essence a Pipes and Filters architectural pattern.  See Pattern-Oriented Software Architecture by Frank Buschmann et. al.

Data Passing Between Handlers

baseClass.Handler has a member, stackData, that is a dictionary and that can be used by handlers to pass data to each other.  The stackData dictionary is cleared when a hook is invoked on the bottom handler and data can be passed only upwards, in the direction in which hooks are invoked.  It is up to the user of the handler stack to ensure that a handler writes to a key in the dataStack dictionary when a handler above it expects that key.  That is usually a matter of configuration, meaning that some handlers are meant to be used together with other, specific, handlers.

Configuration

The configuration is contained in a dictionary, configDict.  This dictionary has four entries, with the keys 'topDirs', 'workDir', 'handlersCfg' and 'description'.  There are 2 examples of configuration modules: config/ctrlrCfg.py and config/exampleCfg.py.  Both are exactly the same configuration, but  ctrlrCfg.py imports other modules and builds configDict in steps, whereas exampleCfg.py has a flat, explicit, definition of configDict.

The value associated with 'topDirs' is a list of the roots of directory trees that are to be traversed and processed.  The value associated with 'workDir' is the directory where all result files are located.  The value associated with 'description' is a brief description of the configuration that will be used in the future by zago.

The value associated with 'handlersCfg' is a list of tuples.  Each one of the tuples in this list is associated with a handler in the stack.  The order of the list follows the order of the stack, top-to-bottom, the first element in the list corresponding to the top of the stack and the last element of the list corresponding to the bottom of the stack.  Processing is done in reverse order, the bottom handler is the first one to process directories and files, the top handler is the last one processing.

Each tuple in the list associated with 'handlersCfg' has 2 items.  The first item is the name of the module which contains the Handler.  The second item is a dictionary used to configure the Handler instance.  To understand what members are expected in the dictionary for each Handler class, look at the unbound member _metaCfg of that class; it is a tuple of metaCfgCls (a class nested in class baseClass.Handler) objects, each object describing a member of the dictionary.  These objects have three attributes: 'label', 'desc', and 'typ''label' represents the key used in the configuration dictionary, 'desc' represents the description of the value, and'typ' represents the type of the value ('string', 'bool', or 'sequence').  A 'sequence' value can be either a tuple or a list.

Example of a metaCfgCls object:
    metaC = baseClass.Handler.metaCfgCls(
        label='name',
        description='Optional, unused.  Identifies a handler in the stack.',
        typ='string')

In configDict, this object may be represented by an entry:
    'name': 'fodSeqP'

Implementing New Handlers

A new handler can be easily implemented by subclassing from baseClass.Handler and by overriding one or more its hook methods.  A template is also provided (handlers/handlerTemplate.py).  Most of the time, a handler overrides only the handleChildDir or the handleFile methods.  The new handler has to be implemented in a module in the handlers package and the class implementing it has to be named Handler (this is a convention used by the Controller class to create handler instances from the configuration).  The new handler has to be added in the configuration by modifying the ctrlrCfg.py file (or any other configuration script that you are using).  For the purpose of configuring through the zago GUI, new Handler classes should also implement their _metaCfg unbound attribute.  See other Handler classes for examples.

Viewer handlers (used to view results in zago) must subclass the mixin class ViewerMixin, defined in handlers/resultsViewer.py.  The ViewerMixin class has two methods that can be overridden: createViewWidget and getViewWidget (normally, the default method can be used)See already implemented viewers for examples.