DRAFT -- bdBuild -- DRAFT

Tutorial and Reference Manual

Rawld Gill

Chief Engineer
ALTOVISO LLC

This document was generated 2011-03-02 20:02:29.

Abstract

bdBuild is a program that optimizes a browser-based application in order to improve the load-time performance of that application. At its core, bdBuild implements a simple but elegant engine that marshals a set of resources through a set of gates, causing pluggable, resource-dependent transforms to be applied as a prerequisite to passing each gate. Since new/different transforms are pluggable, the functionality of the program can be easily extended. Further, the engine allows the transforms to be applied either synchronously or asynchronously. Employing asynchronous transforms for tasks such as reading and writing data results in extremely high performance. This article describes the motivation, design, and use of bdBuild.


Table of Contents

Overview
Discovering Resources
basePath, destBasePath, and Discovering Files
Discovering Directories and Trees
Discovering Packages
Transforms
Design and Implementation of Transform Functions
Configuring Transforms
Transforms Supplied with bdBuild
Transform Jobs
Build Control Property Reference
Job Control Properties
bdBuild Configuration Properties
Build Control Scripts
Command Line
Recipes
AMD Module Identifier to Filename Mapping Algorithm
Writing Custom Discovery and Transform Processes
Writing a Custom Discovery Process
Writing a Custom Transform
Design and Implementation

Overview

bdBuild is a general-purpose program for transforming resources. Although general purpose, it was built to solve the problem of transforming a set of resources that comprise a browser-based application in order to improve the load-time performance of that application. For this particular application of bdBuild, two kinds of transformations are typical:

  • The content of a resource is analyzed and those portions not required are removed. An example is applying dojo build pragmas to Javascript code.

  • Several resources may be bundled into a single resource so that a single server transaction results in downloading several resources. An example is combining several AMD module definitions into a single resource.

The first transform results in smaller resources which decreases transmission time. Even better, depending upon the environment, the reduction in resource size may result in the user agent caching the resource, thereby completely eliminating download time. The second transform results in fewer server transactions which, independent of bandwidth, reduces the latency costs of loading an application. This effect is particularly noticeable when the organization of the program results in a serial chain of downloads (for example, module A requires module B, but the program doesn't know this until module A is evaluated). Using these techniques to optimize non-trivial applications often results in improving load times by a factor of 10 or more.

The semantics of any particular transform range from trivial to quite powerful. bdBuild includes several transforms; these are described in the section called “Transforms”. Further, bdBuild is designed so that transforms may be easily constructed and plugged into the transform engine.

The overall design of bdBuild is simple. It "discovers" a set of resources and applies an ordered set of resource-dependent transforms to those resources. Both the discovery process and the transforms are controlled by a "build control script" which is a user-configurable Javascript object. I'll use the term acronym BCS instead of build control script from now on.

bdBuild includes a discovery process that I'll describe in detail in the next section. You can add additional discovery processes and/or remove the bdBuild-included discovery process. For now, the important point is that there is some process that "discovers" resources and then "starts" these resources in the configurable transformation process.

When a resource is started, bdBuild queries the BCS for the set of transforms that should be applied to that particular resource. The BCS includes the property transformJobs, a vector of [predicate, transform vector] pairs. Each started resource is applied to each predicate and the transform vector associated with the first predicate that returns true gives the set of transforms to apply to that particular resource.

Transforms are functions that take a resource and do some resource-dependent work. Reading, applying dojo pragmas, parsing, static semantic analysis, and writing are all examples of transforms. Obviously, transforms must be applied in a prescribed order (for example, reading before writing). Further, some transforms may operate on multiple resources that have already undergone a prescribed set of transforms. For these kinds of transforms, bdBuild provides synchronization machinery to ensure that all resources have completed transformation up to a prescribed step before any resource is allowed to proceed to the next step. bdBuild calls these steps "gates". Any gate that is designated such that all resources must pass the previous gate before any resource is allowed to begin the designated gate is termed a "synchronized gate". Finally, bdBuild associates a gate with each transform.

Let's consider an example. Suppose a set of resources caused the same predicate in the transformJobs vector to return true and the transform vector associated with that predicate was [T1, T2, T3, T4]. Further assume T1 and T2 are associated with the "read" gate, T3 with the "optimize" gate, and T4 with the "write" gate, with the gate order set at read-optimize-write and all gates being synchronized gates. In this case, bdBuild would apply T1 and T2 to all resources (in that order), and ensure all resources finished T2 before any resource was allowed to begin T3; similarly for T3 and T4. Notice that since T1 and T2 are associated with the same gate, bdBuild will run T2 on any resources immediately after T1 has completed for that resource without waiting for other resources to complete T1. This is not true when proceeding to T3 from T2. Since T3 is a synchronized gate, bdBuild will not start any resource with T3 until all resources have completed T2.

Here's a summary of the design:

  1. Transforms are functions applied to resources.

  2. Each transform is associated with a gate; gates give the order transforms are applied.

  3. Any gate may be designated as a synchronized gate; a synchronized gate causes all transforms associated with any previous gate to be completed before any transform is applied to any resource in the particular synchronized gate.

  4. transformJobs maps a particular resource to an ordered set of transforms to apply to that resource.

  5. Resources are discovered by a discovery process; once discovered, bdBuild applies each resource to the predicates in transformJobs to find the transforms to apply, and then controls the application of all prescribed transforms to all discovered resources until the last gate is passed and then terminates.

bdBuild is highly configurable: the discovery process(es), transforms, transform-gate associations, and transformJobs are all configurable through the BCS. bdBuild provides a default configuration so you won't have to bother with configuring this machinery until and unless you want to do something special (see the section called “Writing Custom Discovery and Transform Processes”). Now that you understand the basic operation of bdBuild, let's get out of this design talk and start using it.

Discovering Resources

The default discovery process looks for a set of files, directories, directory trees, and/or packages as specified by a BCS.. This section describes how to specify these resources in a BCS

basePath, destBasePath, and Discovering Files

Let's start with a super-simple example. The BCS files property gives a list of file names to discover. The following BCS discovers the file myModule.js:

{
  basePath:"sample-project",
  files:[
    "myModule.js"
  ]
}

Of course, this begs the question, "where should bdBuild look for the file myModule.js?"

Before executing any discovery process, bdBuild converts all file names and paths mentioned in the BCS to absolute paths. Throughout this tutorial, I'll use the term "relative" loosely to mean any path that is not absolute. Notice that all of the names in the example are relative; this is typical.

In order to convert a relative path to an absolute path a "base" path must be designated, and the base path must be absolute. Given a base path and a relative path, bdBuild computes an absolute path by concatenating the base path, "./", and the relative path. Notice if the relative path actually happens to include a "./" prefix, the computed value remains the same.

For the files property, the property basePath is designated as the base for any relative source names and the property destBasePath is designated as the base for any relative destination names. basePath is computed as follows:

  1. By the basePath property explicitly set in the BCS; if this value is relative, then the path that contains the BCS is used as a prefix to make the path absolute. Otherwise...

  2. By the path that contains the BCS.

destBasePath is computed as...

  1. By the destBasePath property explicitly set in the BCS; if this value is relative, then basePath is used for the base path when computing an absolute path. Otherwise...

  2. By appending "-build" to the value of basePath.

So the BCS given above tries to discover the resource myModule.js at the location basePath/myModule.js, and, if found, associates the destination destBasepath/myModule.js with the discovered resource.

An explicit destination can be indicated by providing a pair of names:

{
  basePath:"sample-project",
  files: [
    ["myModule.js", "someDir/myModule.js"]
  ]
}

This BCS discovers the resource basePath/myModule.js and associates the destination destBasePath/someDir/myModule.js with that resource. Let's run bdBuild and see this work.

The directory tree bdBuild/test/tutorial contains all the resources used as examples in this tutorial. The first BCS shown above is located at bdBuild/test/tutorial/discovery/ex1.bcs.js. Let's have bdBuild execute this BCS. Get a command prompt and navigate over to bdBuild/test/tutorial/discovery and execute the command

~/dev/bdBuild/test/tutorial/discovery> node ../../../lib/main.js -b ex1

bdBuild is a node.js program and the startup resource for bdBuild is bdBuild/lib/main.js, so node ../../../lib/main.js simply causes node to execute bdBuild (assuming the current working directory is bdBuild/test/tutorial/discovery). The command line argument -b ex1 instructs bdBuild to execute the BCS ex1.bcs.js. When a BCS is specified without a file type with the -b command line option, a file type of .bcs.js is assumed. You could also explicitly specify the filename completely by typing -b ex1.bcs.js. If all goes well, the resource sample-project/myModule.js should be discovered and the destination sample-project-build/myModules.js associated with the discovered resource. As we'll see below, the default transformJobs will cause the discovered resource to be copied to the destination location. You should see bdBuild print the following output:

discovering resources...
reading resources...
processing resource AST...
executing global optimizations...
writing resources...
cleaning up...
done
Total build time: 0.167 seconds

And if you navigate over to sample-project-build, you should see that the resource myModule.js was indeed copied. Running bdBuild on the second BCS example above (stored at ex2.bcs.js), will cause myModule.js to be copied to sample-project-build/someDir/myModule.js.

If you specify a resource that doesn't exist, bdBuild will report and terminate. The BCS at ex3.bcs.js includes such a resource; here is the output:

bdBuild/test/tutorial/discovery]$ node ../../../lib/main.js -b ex3
discovering resources...
reading resources...
ERROR: error while transforming resource: /usr/home/rcgill/dev/bdBui
ld/test/tutorial/discovery/sample-project/NOT-EXIST-myModule.js
transform: 0
Error: ENOENT, No such file or directory '/usr/home/rcgill/dev/bdBui
ld/test/tutorial/discovery/sample-project/NOT-EXIST-myModule.js'

The first error message gives the resource and transform that encountered a problem; in this case, bdBuild encountered a problem with the resource NOT-EXIST-myModule.js during the first transform. The next error message happens to be a node.js error message because node threw an ENOENT (error-no-entity) exception when it tried to read the resource. This is as expected since the resource does not exists. bdBuild attempts to provide copious feedback when errors occur to help track down the problem.

Discovering Directories and Trees

Specifying an entire project one file at a time would become quite tedious. The BCS properties dirs and trees specifies directories and directory trees respectively. In the case of dirs, just the files contained in the specified directories are discovered, while trees discovers all files in the tree rooted as the specified directory. Both dirs and trees are vectors of items, and each item can be a single name (a string) or a pair of [source, destination] names (strings)--just like the files property described above. dirs and trees items can also be a vector of more than two items. In this case, items in the vector after the second item specify "exclusions"--patterns that are to be excluded from discovery. Exclusions can be either strings, in which case they are treated as glob patterns[1] or regular expressions. Consider the following example (you can find this at ex4.bcs.js):

{
  basePath:"sample-project",
  dirs:[
    [".", ".", "*.bak"]
  ],

  trees:[
    ["../../../../../bdParse", "bdParse", /\/(demo)|(test)\//]
  ]
}

This BCS discovers all resources in the basePath directory, but excludes any filename that ends with ".bak"; the destination destBasePath/filename is associated with each discovered resource filename. Similarly, all resources in the bdParse tree, excluding any filename that includes either "/demo/" or "/test/", are discovered and associated with the destination destBasePath/bdParse. As in previous examples, this BSD assumes that basePath is set to bdBuild/test/tutorial/discovery/sample-project and that bdParse is a sibling of bdBuild. This results in the longish relative filename "../../../../../bdParse" to resolve correctly. Notice how source and destination names are mapped for both dirs and trees items.: the source path is chopped off and replaced with the destination path. Although the example provides a single item for each of dirs and trees, and a single exclusion for each item, any number of items and/or exclusions can be provided.

Discovering Packages

In addition to files, dirs, and trees, the default discovery process can discover all the resources that comprise one or more CommonJS packages by specifying a package configuration in the BCS just as you would provide such a configuration to an AMD loader such as bdLoad. There are many subtleties to package configuration; see bdLoad - Tutorial and Reference Manual for a detailed explanation. Within the context of a BCS, a single package configuration item may define the following properties:

name

The name of the package. This is the only required property.

location

The location of the package; in the context of a loader, this is a URL, in the context of bdBuild, this is a file path. Relative paths are relative to the property basePackagePath (see below). If not provided, defaults to "./name".

lib

A path fragment that, when concatenated with the location property, gives the location of the package modules. If not provided, defaults to "lib".

main

The name of the main module for the package. For example, for the package named "myPackage", the main module is given by the AMD module identifier "myPackage" and this module will be located at location/lib/main. If not provided, default to "main".

packageMap

A map from package name to package name (a Javascript object where each property is a package name and each property value is a package name). The map resolves package names as given by AMD module identifies contained in the packages modules to the package name known by the loader for the particular application. This allows two different packages to internally reference modules in two (or more) other packages with the same name, yet resolve these modules to physically different package locations. With this machinery, applications can use packages that are dependent on other packages that have a name clash. If not provided, defaults to a map that maps any package name to itself. This is a bdLoad extension to the CommonJS package specification; see bdLoad for details.

pathTransforms

A vector of transform functions that take an AMD module identifier and map it to a URL (in the context of bdLoad) or a file path (in the context of bdBuild). If not provided, defaults to an empty vector. This is also a bdLoad extension to the CommonJS package specification; see bdLoad for details.

files

Just like the BCS files property discussed in the section called “basePath, destBasePath, and Discovering Files” except that relative names are relative to the path given by the package location property. If not provided, defaults to an empty vector.

dirs

Just like the BCS files property discussed in the section called “Discovering Directories and Trees” except that relative names are relative to the path given by the package location property. If not provided, defaults to an empty vector.

trees

Just like the BCS files property discussed in the section called “Discovering Directories and Trees” except that relative names are relative to the path given by the package location property. If not provided, defaults to location/lib.

modules

A vector of AMD module identifier names that should be explicitly discovered. If not provided, defaults to an empty vector.

As described above, the location property says where to find the package contents.. When relative, the location is taken relative to basePath during discovery and relative to the BCS property destPackageBasePath when determining a destination name for a discovered resource. If destPackageBasePath is not provided, it defaults to destBasePath/packages. Similarly, relative source and destination paths in files, dirs, and trees are taken relative to basePath and destPackageBasePath respectively.

Packages can be specified by giving a packages property (a vector of package configuration items) and/or by providing a packagePaths property (a map from location prefix to vector of package configuration items). Finally the CommonJS paths property (a map from AMD module identifier prefix to URL/file path) may be included in a BCS. Here is a quick review of these concepts by example:

// packages is a vector of package items...
packages:[
  // an item can be a string...
  "myPackage",
  // which is equivalent to {name:"myPackage", location:"./myPackage", lib:"lib", main:"main"}

  // or an under-specified object...
  {name:"p2"}
  // which is equivalent to {name:"p2", location:"./p2", lib:"lib", main:"main"}

  // or a fully-specified object...
  {
    name:"p2",
    location:"some/path/to/somewhere",
    lib:"someDirectoryName",
    main:"someModuleName"
  }
]

// packagePaths is a map from location root to package items...
packagePaths:{
  "my/root":[
    // each item in this vector is assumed to have location root of "my/root"

  "myPackage",
  // which is equivalent to {name:"myPackage", location:"my/root/myPackage", lib:"lib", main:"main"}

  // or an under-specified object...
  {name:"p2"}
  // which is equivalent to {name:"p2", location:"my/root/p2", lib:"lib", main:"main"}

  // or a fully-specified object...
  {
    name:"p2",
    location:"some/path/to/somewhere", //therefore, location is ultimately my/root/some/path/to/somewhere
    lib:"someDirectoryName",
    main:"someModuleName"
  }
}

Given a package configuration, the default discovery process will discover all resources as given by files, dirs, and trees. Further, the process will try to determine if each discovered resource is an AMD module. It does this by extracting the module name from the filename, and then submitting that module name to the standard loader module name to filename mapping algorithm. This algorithm takes in to consideration all of the package configuration properties and the package-independent paths property and maps the module name to a URL (in the case of the loader) or a filename (in the case of bdBuild). See the section called “AMD Module Identifier to Filename Mapping Algorithm” for a detailed discussion of this algorithm. If the mapping algorithm returns the same filename as was discovered, then the resource is adjudicated to be an AMD module and the following properties are included when the resource is created and started:

src

the filename of the source (this is included in all resources)

dest

the destination for the resource (this is also included in all resources)

pid

(package identifier) the name of the package

mid

(module identifier) the AMD module name (without the package name)

pqn

(package-qualified name) "pid*mid"

path

full module identifier "pid/mid"

pack

a reference to the BCS package configuration

deps

a vector that will hold the AMD dependencies for this module; initialized to []

Various transforms may use this information to process the module.

Notice that the default discovery process will not find any module that doesn't map directly to a filename relative to the package lib directory. For example, in dojo version 1.6, the text! plugin module maps to dojo/lib/plugins/text. That is, the AMD module identifier "text", not a member of any package, actually maps to a resource that resides within the dojo package tree. A module that's "not a member of any package" is said to be a member of the default package. The default package is a modeling device and is loosely defined as all "modules" in the tree rooted at basePath that are not part of another package). bdBuild denotes the default package with the package name "*". So, to find the text module, you must provide a package configuration that requests the text module explicitly as part of the default package configuration and further indicates how to find that module either through the package-dependent pathTransforms property of the package-independent paths property. Here is an example using the paths property:

paths:{
  "i18n":"../../../dojotoolkit/dojo/lib/plugins/i18n",
  "text":"../../../dojotoolkit/dojo/lib/plugins/text"
},
packages:[{
  name:"*",
  modules:{
    i18n:1,
    text:1
  }
}]

This is part of the BCS for a backdraft demonstration and is submitted to bdBuild with a basePath that causes ../../../dojotoolkit to resolve to the root of the dojo toolkit source distribution on the local file system. When bdBuild tries to discover the i18n and text AMD modules for the default package by traversing the lib tree, it won't find them since they actually reside within the dojo. However, the discover process will then explicitly attempt to discover the i18n and text modules by submitting these module identifiers to the loader name to filename mapping algorithm, which will direct the discovery process to the correct filenames in the dojo tree.

Before we leave this section, let's pause a moment to make a high-level observation: much of the information in a BCS serves the same purpose and is specified in the same way as a loader configuration (for example, a bdLoad or RequireJS configuration object). This isn't surprising since a loader configuration informs the loader where to find resources just as a BCS informs bdBuild where to find resources. Although there are several BCS configuration options that don't apply to loaders, typically, you'll use your loader configuration as a starting point for a BCS. We'll have more to say about construction BCSs in the section called “Build Control Property Reference”.



[1] In a glob pattern, * matches zero or more of any character, ? matches any single character.

Transforms

Now that we know how to specify the set of resources to transform, we need to describe how to specify the transforms themselves.

Design and Implementation of Transform Functions

Transforms are AMD modules that have the value of a function with the signature function(resource, callback), where resource is a resource object as discovered by a discovery process, perhaps transformed by one or more previous transforms. The transform function may return...

  • falsy, to indicate the transform was executed synchronously and completed successfully

  • callback, to indicate the transform is executing asynchronously. Note carefully, the return is precisely callback, not an application of callback; callback is applied later to signal the completion of the asynchronous process.

  • any other value, to indicate the transform failed

If the transform is implemented as an asynchronous process, then upon completion, the transform must apply callback to (resource, err), where resource is the resource that was just transformed, and err is falsy to indicate the transform completed successfully or any other value to indicate the transform failed.

Here is an example of a simple synchronous transform:

define(["../buildControl", "bdParse"], function(bc, bdParse) {
  var 
    filterComments= bdParse.filterComments,
    parse= bdParse.parse;
  return function(resource) {
    try {
      resource.tree= parse(filterComments(resource.tokens));
      return 0;
    } catch (e) {
      bc.logError("failed to tokenize");
      return e;
    }
  };
});

This is the jsParse transform that's included with bdBuild. The transform requires the resource have already completed the jsTokenize transform that tokenizes a Javascript resource and stores the token stream in the resource property tokens. The transform uses two bdParse functions to parse the token stream into an abstract syntax tree and stores the result in the resource property tree. If all goes well, the transform returns 0, indicating success; otherwise, any errors are caught and returned. The transform signature doesn't mention the callback parameter since it is synchronous and has no use for this parameter.

Here is an example of an asynchronous transform:

define(["../buildControl", "../fileUtils", "fs", "../replace"], function(bc, fileUtils, fs, replace) {
  return function(resource, callback) {
    fileUtils.ensureDirectoryByFilename(resource.dest);
    fs.writeFile(resource.dest, resource.getText(), resource.encoding, function(err) {
      callback(resource, err);
    });
    return callback;
  };
});

This is the write transform that's included with bdBuild. The transform uses the node.js asynchronous file system function writeFile to write the value of the resource asynchronously. Since the transform is asynchronous, it returns the callback parameter. When the asynchronous write completes, the callback is applied to the resource and the error condition (zero if no error).

Notice the high degree of orthogonality of these transforms. This makes simple transforms trivial to implement and greatly simplifies difficult transforms.

Configuring Transforms

In order to employ a transform, it must be described in a BCS. The BCS property transforms gives a map from transform name (a string) to ordered pair of [AMD module identifier, gate] (both strings). AMD module identifier gives the AMD module that implements the particular transform. Recall from the section called “Overview”, gates give an ordered sequence, and each transform is associated with a single gate. Further, gates may be designated as "synchronized". For a synchronized gate, all transforms associated with previous gates must be completed for all resources before any resource is allowed to begin a transform associated with the particular synchronized gate (or any later gate).

bdBuild defines the following gates:

Table 1. Default Gates

NameOrderSynchronized?Semantics
read1noread the resource
text2notransform the raw resource text
tokenize3notransform the raw resource text into a token stream
tokens4notransform the token stream
parse5notransform the token stream into an abstract syntax tree
ast6yestransform the abstract syntax tree
optimize7yesglobal optimizations (transforms that analyze multiple resources)
write8yeswrite the resource
cleanup9yesexecute any post-write chores
report10yesprovide any process reports


Here is the value of the default transforms property (I'll describe the semantics of each transform next).

transforms:{
  read:["bdBuild/transforms/read", "read"],
  dojoPragmas:["bdBuild/transforms/dojoPragmas", "read"],
  jsTokenize:["bdBuild/transforms/jsTokenize", "tokenize"],
  jsParse:["bdBuild/transforms/jsParse", "parse"],
  has:["bdBuild/transforms/has", "ast"],
  amd:["bdBuild/transforms/amd", "ast"],
  write:["bdBuild/transforms/write", "write"],
  writeAmd:["bdBuild/transforms/writeAmd", "write"],
  readBdLoad:["bdBuild/transforms/readBdLoad", "read"],
  writeBdLoad:["bdBuild/transforms/writeBdLoad", "write"],
  compactCss:["bdBuild/transforms/compactCss", "optimize"],
  writeCss:["bdBuild/transforms/writeCss", "write"]
}

The key point here is that the transforms property associates a name (the transform name) with a particular transform function (as given by an AMD module identifier) and further associates that function with a particular gate. As I'll describe in the section called “Transform Jobs”, transformJobs will associate a set of transform names with a set of resources to which the set of transforms should be applied..

Transforms Supplied with bdBuild

Each of the transforms listed below is provided as part of the bdLoad standard release.

Note: all resources are expected to contain the properties src and dest which give the source and destination file name respectively.

read (asynchronous)

Reads the contents of resource.src into resource.text. Encoding is determined by the file type suffix of resource.src. The following files types map to "utf8": css, html, htm, js, json, asc, c, cpp, log, conf, text, txt, dtd, xml; all other file types map to binary.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/read.js

dojoPragmas (synchronous)

Applies all Dojo pragmas. to resource.text. Dojo pragmas bracket a set of continuous lines of code by an opening and closing pragma with the following syntax:

opening::= //>> directive ws* ( ws* quote id quote ws* , expr ) eol
closing::= //>> directive ws* ( ws* quote id quote ws* ) eol
directive::= excludeStart | includeStart
ws::= white-space-but-not-new-line
eol::= end-of-line
quote::= " | '
id::= [a-zA-Z0-9_]+
expr::= javascript-expression

When an opening pragma is found, a closing pragma with the same directive and id is expected to follow in the source code; and an opening pragma with no matching closing signals an error. When a matching closing is found, the expression given by the opening pragma is evaluated with the variables filename and kwargs in scope. Prior to evaluation, filename is set to the source filename of the resource and kwargs is set to the value of the BCS property dojoPragmaKwArgs. Source code lines may be deleted depending upon the directive and the result of evaluating the expression as follows:

Table 2. Dojo Pragma Semantics

directivevalue of expressionaction
excludeStarttruethe bracketed lines are deleted
excludeStartfalsenone
includeStarttruenone
includeStartfalsethe bracketed lines are deleted


In all cases, the pragmas are deleted from the source text.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/dojoPragmas.js

jsTokenize (synchronous)

Tokenizes the resource and initializes machinery so that chunks of the resource can be easily specified for deletion.

  • Splits resource.text into a vector of lines on each new-line detected in resource.text; replaces resource.text with the result.

  • Tokenizes resource.text with the bdParse tokenizer. and stores the result in resource.tokens.

  • Initializes resource.deleteList to an empty vector; resource.deleteList is expected to be filled with zero to many bdParse location objects.

  • Initializes resource.getText with a method that applies resource.deleteList (expected to be a vector of bdParse location objects) to resource.text (expected to be a vector of strings).

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/jsTokenize.js

jsParse (synchronous)

Parses resource.tokens into an abstract syntax tree with the bdParse parser, and stores the result at resource.tree.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/jsParse.js

has (synchronous)

Traverses resource.tree and looks for has applications. Any has applications that have a constant value in the BCS property staticHasFlags are evaluated and any resulting dead code is marked for deletion in the delete list provided by the jsTokenize transform.

Memorizes all has feature identifiers encountered in the BCS property hasLocations (a map from feature name to vector of locations that reference the feature). This map can later be used to optimize a has.js module to include only the feature tests referenced in the discovered resources.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/has.js

amd (synchronous)

Traverses resource.tree and looks for the application of the AMD define function that defines the module. If found, traverses the argument list to discover the dependency vector and then resolves each name into a resource. If a resource is not found (that is, the discovery process failed to discover all resources mentioned in the dependency vector of a discovered AMD module), then an error condition is signaled; otherwise, resource.deps is augmented to include all discovered dependencies.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/amd.js

write (asynchronous)

Writes the result of resource.getText to the file given by resource.dest using the encoding given by resource.encoding.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/write.js

writeAmd (asynchronous)

If resource.layer is truthy, then writes the following modules to the file name given by resource.dest:

  • the module given by the resource

  • the dependency tree of the resource (typically, as computed by the amd transform)

  • all modules given by resource.layer.includes and their dependency trees

  • except that any resource in resource.layer.excludes together with any of their dependencies are not written

The set of modules that is written is all the modules that would be downloaded by an AMD loader in order to load the resource and any module in resource.layer.includes after all the modules in resource.layer.exlcudes were already loaded.

If resource.layer is falsy, then simply writes the module to the file name given by resource.dest.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/writeAmd.js

readBdLoad (asynchronous)

Identical to the read transform except that prior to starting the asynchronous read, the BCS property loader is set to reference the current resource, and resource.boots is initialized to an empty array. The default discovery process inserts all layers that provide a boot property into boots so that the transform writeBdLoad may write these bootstrap layers. The default discovery process is able to find the loader resource as a consequence of this transform publishing it in the BCS property loader.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/readBdLoad.js

writeBdLoad (asynchronous)

Writes the backdraft loader (bdLoad), its configuration, and optionally, one or more bootstraps. A bootstrap is the loader and its configuration plus a layer, all written to a single file. A bootstrap is designated by providing the property boot (a string) that designates a destination filename that within a layer specification. See xxx.

The configuration is computed by combining three components:

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/writeBdLoad.js

compactCss (synchronous)

Inspects the BCS property compactCssSet[resource.src]. If the value of that property is a string, then memorizes that string as the destination of a compacted version of the current resource; otherwise, memorizes the destination provided by the discovery process.

Transforms the resource.text by removing comments and white space. Further, resolves all CSS import statements (recursively) and inserts the also-transformed referenced resource text. This results in a single string that contains the compressed contents of the entire tree of CSS files implied by the import statements contained in the resource.

Maps all image URLs encountered during the tree traversal described above to minimized names; replaces all image URLs within the transformed text with the minimized names.; memorizes the map from original URL to minimized name. See writeCss.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/compactCss.js

writeCss (asynchronous)

Writes the transformed text computed by the compactCss transform to the memorized destination. Copies all referenced images as given by the map of memorized image URLs to minimized name to the same directory as given by the resource destination.

Current implementation: https://github.com/altoviso/bdBuild/blob/master/lib/transforms/writeCss.js

Transform Jobs

We now have everything we need to actually do some work. Recall the basic algorithm implemented by bdBuild:

  1. Discover some resources.

  2. For each discovered resource, look up and apply the resource-dependent ordered set of transforms.

The transformJobs BCS property gives a vector of pairs of [predicate, vector of transforms]. When a resource is started, bdBuild applies the resource to each predicate in the transformJobs vector; the vector of transforms associated with the first predicate that returns true is selected for the given resource. By default, bdBuild provides the transform jobs listed below.

bdLoad Job

Applied to: any resource with a source filename that matches /.*\/bdLoad\/lib\/require\.js$/

Transforms: readBdLoad, jsTokenize, jsParse, has, writeBdLoad

AMD Modules

Applied to: any resource discovered while executing discovery on a package configuration this is determined to be an AMD module.

Transforms: read, dojoPragmas, jsTokenize, jsParse, has, amd, writeAmd

Normal, Non-i18n Javascript Code Modules

Applied to: any resource that has file type .js, does not have file type .bcs.js, is not an AMD module, does not have /nls/ in its filename.

Transforms: read, dojoPragmas, jsTokenize, jsParse, has, write

CSS Files Designated for Compaction

Applied to: any resource named in the BCS compactCssSet property.

Transforms: read, compactCss, writeCss

Catch All

Applied to: any resource that is not a member of any previous transform job.

Transforms: read, write

Build Control Property Reference

A "build control script" (BCS) instructs bdBuild how to discover resources and then what transforms to apply to the those resources discovered. A BCS is embodied by a file that contains Javascript code that ultimately returns the value of a Javascript object termed a "build control object" (BCO). The build control object contains various properties that control bdBuild. This section describes how to construct build control scripts and how to specify build control objects within those scripts.

Loosely speaking, build control objects contain two classes of properties:

  • Job control properties: those properties that control the discovery and transform processes. These are the properties used by the typical user

  • bdBuild configuration properties: those properties that configure which discovery process(es) are used, which transforms are available, and how transforms are applied to resources. These properties are used when changing or extending the operation of bdBuild. bdBuild is delivered with defaults for all of these properties.

Note that, as for as bdBuild is concerned, there's no difference between job control properties and bdBuild configuration properties. But most users will never touch bdBuild configuration properties, so it's convenient to describe them in separate sets.

Job Control Properties

basePath, string, default = the path to the BCS that is being processed

A relative or absolute path that provides the base for many other source path properties when those path properties are given as relative paths:

  • the base of any relative source names given in the files, dirs, and trees properties

  • the base of any relative path given in the location property in a package configuration when resolving source names

  • the location property for the default package (if any)

  • the base of any relative source name given in the compactCssSet property

  • the base of any relative source name given in the replacements property

If basePath itself is given as relative, then it is automatically made absolute by prepending the path to the BCS that provided the basePath value. For example, if a BCS located at /home/rcgill/dev/myProject/build/ex1.bcs.js provided a basePath value of "..", then the basePath would be computed to be /home/rcgill/dev/myProject.

destBasePath, string, default = basePath + (basePathSuffix || "-build").

A relative or absolute path that provides the base for many other destination path properties when those path properties are given a relative paths:

  • the base of any relative destination names given in the files, dirs, and trees properties

  • the base of any relative path given for the destPackageBasePath property

  • the base of any relative destination given in the compactCssSet property

If destBasePath itself is given as relative, then it is automatically made absolute by prepending the basePath.

destPackageBasePath, string, default= destBasePath + "./packages"

The destination base for all packages where the destination location property is relative.

files, vector of fileItems, default = []

A vector of fileItems that give source file names that point to resources to discover and associated destinations for those discovered resources. A fileItem may be either a string or a pair of strings (a vector of two strings). A string implies a source file name relative to basePath with an associated destination relative to destBasePath. A pair implies a source file name and associated destination file name; basePath and destBasePath are used for bases for relative source and destination names respectively.

dirs, vector of dirItems, default = []

A vector of dirItems that give source directories in which to discover resources and associated destinations for any discovered resources. A dirItem may be either a string or a vector. A string implies a source directory name relative to basePath with an associated destination relative to destBasePath, and all resources that reside in the source directory are discovered. If a vector is given, then the first two items give the source directory name and associated destination directory name; basePath and destBasePath are used for bases for relative source and destination names respectively. Any items after the first two items may either be strings or regular expressions, and these items give patterns to exclude from discovery. String patterns are understood to be globs where "*" matches zero or more of any character and "?" matches any single character. All resources in the source directory that do not match an exclusion pattern are discovered.

trees, vector of dirItems, default = []

Same as dirs except that discovery traverses the tree rooted at each source directory.

packages, vector of packageItems, default= []

A vector of file items that gives the set of CommonJS package items to discover. A package item may be either a string or a hash of package properties. If a string, then the string is understood to be the package name and implies the following hash of package properties:

{
  name: string-value-given,
  location: "./" + string-value-given,
  lib:"lib",
  main:"main",
  trees:["./" + string-value-given]
}

If a has, then the following properties may be specified:

name, string, must be specified

The name of the package

location, string, default = "./" + name

The root directory of the package resources.

lib, string, default = "lib"

The root directory of the package AMD modules.

main, string, default = "main"

The name of the AMD module for the package root AMD module

packageMap, map:string --> string, default = {}

A map of package names used by the package's AMD modules to package names as configured by the build control object.

paths, map:string --> string, default = {}

A map from path prefixes to replacement path prefixes.

pathTransforms, vector of path transform, default = []

TODOC

files, vector of fileItems, default = []

Same as files property in a build control object except that relative names are relative to the location property.

dirs, vector of dirItems, default = []

Same as dirs property in a build control object except that relative names are relative to the location property.

trees, vector of dirItems, default = [location + "/" + lib]

Same as trees property in a build control object except that relative names are relative to the location property.

A package item specifies where to discover package resources in bdBuild just like the same object specifies how to locate package resources in an AMD loader (e.g., bdLoad). The destination of all resources is computed relative to a destination location property which defaults to destPackageBase + "/" + package-name.

layers, map:string --> layerItem, default = {}

Each key names an AMD module that, when written to its destination, will include a bundle of modules (that is, zero or more additional modules). The bundle contents for a particular (key --> layerItem) includes...

  • the module as given by the key, together with its dependency tree, plus...

  • all modules given by the layerItem includes property, and their dependency trees, except that...

  • any module given by the layerItem excludes property and their dependencies are not written

layerItem is a hash with the following properties:

includes, vector of strings (AMD module identifiers), default = []

A list of additional modules which, together with their dependencies, are included in the layer subject to exclusion by excludes.

excludes, vector of strings (AMD module identifiers), default = []

A lost of modules which, together with their dependencies, are unconditionally excluded from the layer

boot, string, default = undefined

A destination to write the bundle other than the AMD module resource. When the bundle is written, it is preceded by the bdLoad loader and loader configuration, and followed by the value of the layerItem bootText property.

bootText, string, default = ""

A string of Javascript code to write at the end of a bootstrap layer. Typically, this is some kind of AMD require application.

loaderConfig, bdLoad configuration object, default = {}

Gives the configuration to provide for the bdLoad bootstrap.

locales, vector of strings (locale identifiers), default = []

Gives the set of locales that should be written when including i18n bundles in any layers.

has, string, default = undefined

Gives the has.js implementation that should be included with the bdLoad configuration:

staticHasFlags, map:string --> {-1, truthy, falsy}, default = see below

Gives a map from has.js feature names to build-time-known values: truthy indicates the feature exists, falsy indicates the feature does not exist, -1 indicates the feature is not known at build time. The -1 value facilitates mixing build control objects as described in the next section. The default staticHasFlags value is contained in https://github.com/altoviso/bdBuild/blob/master/lib/defaultBuildControl.js.

bdBuild Configuration Properties

bdBuild provides defaults for all of these properties in the resource bdBuild/lib/defaultBuildControl.js. see https://github.com/altoviso/bdBuild/blob/master/lib/defaultBuildControl.js for the current value in trunk.

discoveryProcs, vector of string (AMD module identifiers)

Gives the list of discovery procedures to be executed.

gates, vector of [boolean, string, message] triples (synchronized gate?, gate name, gate message)

Gives the list of gates and their properties that control movement through the transform jobs.

transforms, map:transformId --> [string, string] pairs (AMD module identifier, gate name)

Gives the set of transforms and their associated gates that are available to transformJobs.

transformJobs, vector of [predicate, vector of string (transformId)] pairs

Gives a predicate function to test resources against. The first predicate that returns true for a particular resource indicates the set of transforms to apply to that resource.

plugins, map:AMD-plugin-id --> string (AMD module identifier)

Gives the processor to apply to any discovered AMD plugin modules.

Build Control Scripts

bdBuild will accept multiple build control scripts on the command line. If more than one is provided, then each script is processed left to right and the properties found in later scripts are preferred to properties found in earlier scripts on a per-property basis.[2] For example, if the first of two scripts submitted to bdBuild had the value...

// assume this BCS is processed first
{
  basePath:"acmeApp",
  destBasePath:"staging/acmeApp"
}

... and the second has the value...

// assume this BCS is processed second
{
  destBasePath:"/corp-www/apps/acmeApp",
  destPackageBasePath:"/corp-www/packages"
}

Then then the effective value is given by...

{
  basePath:"acmeApp",                        // value only in first script
  destBasePath:"/corp-www/apps/acmeApp",     // value in second script preferred to value in first script
  destPackageBasePath:"/corp-www/packages"   // value only in second script
}

A few properties behave differently:

  • Properties in package items contained within the packages and packagePaths properties are preferred on a per-package-property basis. For example, if a later BCS simply adds a trees property to a package configuration, this will not affect other properties for that particular package.

  • The staticHasFlags property is processed on a per-flag basis. Specifying a flag value of -1 indicates the flag should be removed from the staticHasFlags set.

I've observed that BCSs and loader configuration objects have much in common. For example the packages and packagePaths properties serve the same purpose and the items they contain share many properties. Recognizing this, bdBuild allows you to specify a loader configuration for use as a BCS. Generally, loader configurations have two forms: applying the loader function require to a configuration object or setting the global variable require to the value of a configuration object before bootstrapping the loader. bdBuild accepts both of these patterns along with the pattern of a script that has the value of an object as we've used in all of the examples. The command line flag used to specify the script instructs bdBuild which pattern to expect as follows:

command line flag --build or -b

Generic Javascript code that has the value of an object. bdBuild will apply Javascript eval to the code contained in the resource after wrapping it with parenthesis to obtain the build control object. This allows for a couple of different patterns. Here is an example of just supplying an object:

{
  packages:{
    name:"myPackage"
    location:"packages/myPackage"
  },
  // et cetera...
}

And here is an example of supplying an object after some computation:

(function() {
  // any code can go here...

  // assuming the code above initialized the variable result as required...
  return result;
})()

command line flag --require or -r

A script that defines the variable require as it would be set to initialize bdLoad or RequireJS. For example,

// any code can go here...

var require= {
  packages:{
    name:"myPackage"
    location:"packages/myPackage"
  },
  // et cetera...
}

// any code can go here...
// HOWEVER; the last value of require is used for the BCO

command line flag --loader or -l

A script that configures bdLoad or RequireJS. For example,.

// any code can go here...

require({
  packages:{
    name:"myPackage"
    location:"packages/myPackage"
  },
  // et cetera...
})

// any code can go here...
// HOWEVER; the last application of require must be to a configuration object
// this is to be used for the BCO

In some cases, it is convenient to include all configuration information within a single script. The problem with this idea is that you will often want a slightly different configuration for development compared to an optimized release. bdBuild provides for this by allowing a single BCS to include two sets of properties, with the second, more significant set residing at the property build. Consider the following BCS:

var require= {
  paths:{
    "i18n":"../../../dojotoolkit/dojo/lib/plugins/i18n",
    "text":"../../../dojotoolkit/dojo/lib/plugins/text"
  },

  packages:[{
    name:"bd",
    location:"../.."
  },{
    name:"dojo",
    location:"../../../dojotoolkit/dojo",
    lib:".",
    main:"lib/main-browser"
  }],

  deps:["main"],

  build:{
    packages:[{
      // since dojo uses the "text!" and "i18n!" plugin, and these are not really in the default package tree
      // we must tell bdBuild to discover them by explicitly asking for them which will cause paths
      // to be inspected
      name:"*",
      modules:{
        i18n:1,
        text:1
      }
    }]
  }
};

The BCS above is equivalent to providing two BCSs, one without the build property, followed by one containing only the properties contained by the build property. Note also that bdLoad will process such a script properly and simply ignore the build property.

Lastly, if no BCS is provided through a command line argument, bdBuild will attempt to read the file config.js in the current working directory as if the --require command line switch has been specified. bdBuild informs the user by writing a message to the console when this default action is invoked.

Command Line

bdBuild is a node program located at bdBuild/lib/main.js. bdBuild recognizes the following command line flags:

--build, -b, --require, -r, --loader, -l

As described above in the section called “Build Control Scripts”

--check

Process all build control scripts, print out the resulting build control object, and terminate.

--version

Print the current version of bdBuild.

--help

Print help.

--unit-test, --unit-test-param

Used internally by bdBuild for testing.

--javascript-identifier argument, -javascript-identifier argument

javascript-identifier is assumed to designate a build control object property and argument a string value for that property.. No check is made to see that the property-value pair is rational since user-defined discovery procedures and transforms may define build control properties/values not known to bdBuild. If any such properties are provided on the command line, they can be visualized initializing a build control script "on the fly" that has precedence over all other scripts. For example, the command line...

node ../bdLoad/lib/main.js --build myPackage --someProperty someValue

...is equivalent to creating the build control script myQuickie.bcs.js (in the same directory as the myPackage BCS) as follows:

{
  someProperty:"someValue"
}

An then issuing the command...

node ../bdLoad/lib/main.js -b myPackage -b myQuickie



[2] The default build control script located at bdBuild/lib/defaultBuildControl.js is always implied as the left-most, initial value for the build control script.

Recipes

TODO

AMD Module Identifier to Filename Mapping Algorithm

The mapping algorithm used to map AMD module identifiers to filenames in bdBuild is very nearly identical to the algorithm used to map AMD module identifiers to URLs in bdLoad. The only real difference is what is used for the loader configuration variable baseUrl:

  • when resolving a source location, basePath is used.

  • when resolving a destination location, destPackageBasePath is used.

Here is the algorithm (the variable base is used for either basePath or destPackageBasePath, depending upon the application):

Filename Computation for AMD Module Identifiers

  1. The first segment of the module identifier is assumed to be the package name and the remaining segments (if any) the module within that package. The package name is mapped by the packageMap configuration variable for the reference package; if mapped successfully, then the mapped name indicates the target package. If no mapping occurs, and the package name is known to the loader, that name indicates the target package; otherwise, the assumption was wrong and the module identifier is not a member of any package, but rather a member of the "default" package (designated as the set of modules that reside in the tree rooted at basePath), and the default package is the target package.

    If the target package was not the default package and the module name was composed of a single segment (that is, just the package name), then the target module is set to the main configuration variable given by the target package.

  2. The computed filename is set to the location configuration property of the target package concatenated with the lib configuration variable of the target package concatenated with the target module; concatenation inserts a "/" at each location.

  3. The pathTransforms mapping is applied to the computed filename.

  4. If the filename computed so far is not absolute, then the value of the base is prepended to the computed filename.

  5. If the filename computed so far does not include a file type, then suffix ".js" is appended to the computed filename.

Writing Custom Discovery and Transform Processes

TODOC

Writing a Custom Discovery Process

TODOC

Writing a Custom Transform

TODOC

Design and Implementation

Although highly capable, the bdBuild program design and implementation is also quite simple. The main program control machinery is implemented in bdBuild/lib/main.js. The code contained loads the AMD loader (bdLoad) which loads the remainder of the program. Once loaded, control is transferred to the module bdLoad/lib/argv which processes the command line, and then to bdload/lib/buildControl which computes the build control object that controls how bdBuild discovers and transforms resources.

bdLoad/lib/buildControl is tedious but straightforward module that mixes build control scripts, resolves all relative paths to absolute paths, and fills in missing BCS properties with defaults. When bdBuild isn't doing what you expect, the cause usually lies in an improperly specified property somewhere in one or another build control scripts. One of the best ways to diagnose this problem is to run the command line argument --unit-test dumpbc which dumps the fully computed build control object. You will see lots of internal properties in this object, but they are all well-named and their semantics obvious.

Once the build control object is computed, all of the transform and plugin modules are loaded by the AMD loader and then the discovery process(es) are started. As each resource is discovered, the discovery process creates a resource object and publishes that object to the function start contained in bdLoad/lib/main. This function ensures that no two resources are attempting to write to the same destination, and then enters the resource in the engine that moves resources through the ordered set of gates. The functions advance and passGate synchronize and move resources through their prescribed set of transforms. When all resources have been moved though all gates, bdBuild prints a message and terminates. If an error occurs along the way, bdBuild allows the process to complete through the current synchronized gate and then terminates.