NAME

a4intro - Introduction to and general information about A4


AUDIENCE

This man page discusses things that you should know about A4 whether you are using it via the a4(1) command or the A4::Ops perl(1) module.

This document assumes that you know all about Perforce and want to get into all the nitty gritty details of A4. A more light-handed document is supplied as a4quickstart.html.


DESCRIPTION

A4 is a source configuration management utility that builds on a somewhat restricted subset of the capabilities of p4(1), and adds the following capabilities:

A4 is not intended to supplant direct access to p4(1) for routine operations. Instead, A4 can be used to simplify certain otherwise nontrivial operations that are often necessary to satisfy certain useful properties, as described in this man page. Furthermore, some p4(1) operations must be undertaken with certain reasonable restrictions, in order to avoid violating those properties.


DEPOT STRUCTURE

The depot is structured as an arbitrarily deep hierarchy of branches. Each branch directory in the depot contains a number of sub-branches, and a directory named top. Each branch's top directory corresponds to the root directory of every client that is based on that branch.

Each client is based on exactly one branch. The client view is some (possibly improper) subset the files in the branch. The location of each file in the client corresponds exactly to the file's location within the branch. [Yes -- this is a lot more restrictive than what Perforce allows, but as we'll see in GENERATING VIEWS, it's necessary for reproducible results, so we can trust regressions.]

Branch names always end with a forward slash (/). Normally, a branch is located in the //depot depot, in the subdirectory with the same name as the branch name; however, a branch may be located anywhere, including in another depot, provided that no component of a branch's location can contain ``top''.

Every branch (except for the TRUNK) has exactly one parent. Normally, a branch's parent's name is obtained by stripping the trailing component from the child branch's name; however, a branch can have any other branch as its parent, provided that the branch ancestry graph is acyclic.

The TRUNK is a pseudo-branch that corresonds to the root of the branch hierarchy. It is referred to by the reserved names ``'' (the empty string) or ``TRUNK''. It has no branch record in the Perforce database.

In addition to the usual Perforce naming restrictions, it is prohibited to end a directory name in whitespace. (In particular, this would make the output of p4 where ambiguous.)


USER-DEFINED FIELDS

A4 stores some if its information in the Perforce database by piggy-backing onto the ``Description:'' field of certain types of records, specifically clients and branches.

You shouldn't delete or modify the affected portion of the ``Description:'' field unless you know what you're doing.


GENERATED FILES

When a client is generated, a .p4env file is created in the client's root directory. It is assumed that the P4CONFIG environment variable is set to .p4env, such that this file causes it to be the default client whenever you run p4(1) or a4(1) from inside the client. Depending on your site configuration, .p4env might also set the Perforce port. See $keep_port in the A4::Config manpage. If your home directory has a .p4passwd file, then .p4env will set the Perforce password to its contents. (Since other users can see your command-line arguments and environment on some systems, this is the only secure way to specify your Perforce password.)

.p4env files are always excluded from every client's view. If you want to customize .p4env, then you should create a .p4props file in the same directory. The contents of this file are prepended to the corresponding .p4env file.

Also, a newly-generated client will have a .p4top symbolic link in every directory, that targets the relative path to the client root directory. This is useful for traditional build systems that expect the installation prefix variable (usually called ``PREFIX'') to be an absolute path. Since .p4top always points to the same directory no matter where you are in a given client, you can use it as the first component of the installation prefix if you want to install inside the client instead of in some global location. This is particularly useful when part of a project depends on an evolving tool.

You (or your build system) can create such links for newly-created directories within an existing client using the maketree subcommand of a4(1).

Note that although you can replace a directory containing files under source control with a symbolic link (for example, a link to another storage volume) without confusing Perforce, this will confuse maketree. As a result, each workspace must reside within a single storage volume.


THE CFG SUBTREE

In the root of every client workspace, there is a special subdirectory called CFG. This directory contains the configuration information for all the branches.

The configuration information for a branch mybranch is contained in the directory CFG/mybranch/top/. Note that the configuration information for every branch get propagated into every other branch through the usual integration mechanisms. That's good because:

Because A4 forces the entire CFG tree to be in-view (see GENERATING VIEWS), it is important to minimize the amount of data in this tree. In particular, programs that are needed by executables in the CFG tree (other than makecfg) should be located elsewhere in the branch's file space. Replicated information should be generated via makecfg or via symbolic links where it makes sense. Programs used by makecfg should be located in CFG/bin and shared by all branches that need them.

For each branch, a number of ``special'' files are defined as described in the following sections. More special files may be defined in the future, so user-defined files and directories should begin with an underscore (_) to avoid collisions.

Except as noted, if a given file is missing for the present client's branch, then the parent branch is checked, and so on. Executables are run with their own directory as the current directory, and with the name of the present client's branch as an argument. Also, the PERL5LIB environment variable is set to point only to the additional library directories used by A4. File lists may contain both inclusionary and exclusionary locations, but the mapping between the depot location and the client location is fixed and inferred. A location beginning with dash (-) is disambiguated from an exclusionary mapping by including a leading slash (/), which is otherwise optional, in the location.

makecfg

Build all of the generated files for the given branch's configuration. If another branch's configuration is a prerequisite, then it must arrange for that to be made first. It's a good idea to employ make(1) to avoid repeating work unnecessarily. A4 normally invokes this just after it sync's the CFG tree.

makecfg should normally ignore the branch name argument. If it's invoked on behalf of a sub-branch, then you'll probably still want to build your own configuration only, and expect that any information that was intended to be inherited will be picked up through the normal configuration search process.

files

A list of locations in the branch's default view. Should include anything necessary to run presub and refbuild, as well as anything else of general interest to many users of the branch. If not found for any ancestor branch, then the entire tree is viewed.

See GENERATING VIEWS for more information on files, allfiles, prefiles and postfiles.

allfiles

A list of locations in the branch's integration view. Should include anything relevant to this branch or any of its progeny. Only the present branch is searched. If not found in the present branch, then instead a files file is sought in the present branch and its ancestors.

prefiles, postfiles

prefiles is a list of locations to be prepended to the view on behalf of this client only. postfiles is a list of locations to be appended to the view on behalf of this client only. Only the present branch is searched. Furthermore, these files are forced out-of-view.

This is useful in order to bring rarely-used files in-view for a particular workspace without burdening all of the branch's clients by default. prefiles also takes effect for the integration view, so adding locations indiscriminantly could result in irrelevant files being integrated.

In order to reproduce a condition that cannot be observed with the default view, you should copy prefiles and/or postfiles to some other location in the tree (preferably outside of CFG), such that it becomes visible to other clients.

presub

Check the presubmission criteria for the branch. If the branch has an A4 submit trigger and this does not exit normally with zero status, then the submission is rejected. If not found for any ancestor branch, then the default checks are performed. See REGRESSION.

refbuild

Check the reference build criteria for the branch. If this does not exit normally with zero status, then the reference build will not be made the ``current'' build. If not found for any ancestor branch, then run regr with an argument of 2 following the branch name argument.

regr

Run a regression test on the workspace. Usually called by presub and refbuild. Only the the present branch's configuration is searched. If regr is not found, then the regression does nothing and always passes.

See REGRESSION for more information.


GENERATING VIEWS

A4 expects to have control of the ``View:'' fields of branches and clients. If you edit that field manually, then you might get unexpected results.

A4 will always map files in your workspace to the corresponding file in the corrsponding branch. Although this is somewhat restrictive, it makes life easier in the long run, because:

A4 determines the set of included locations from the configuration corresponding to the present branch. That's good, because the set of locations that are relevant to the branch is shared by all workspaces, so it is not necessary for every client to specify those locations independently. A given workspace can still add or remove locations from the default view via prefiles and postfiles.

There are three views associated with each workspace:

The default view contains all of the locations that are necessary to test the branch, as well as anything else that is of interest to many users. This is the view that is normally used by workspaces. Files that are not relevant to the branch, as well as files that are of interest only to one or two users, can and should be removed from the view in order to improve performance.

The integration view contains all of the locations that are relevant to the branch, even if they are of interest to only a single user. This view is normally used during integration from another branch. That's because obscure but relevant files should be stored in the branch even if they are not in view by default.

The configuration view contains the locations necessary to bring the configuration information into view, such that other views can be generated. This view is hard-coded into A4 and cannot be modified.

A view that is broader than necessary should still work (by convention, it is a bug if this is not the case). However, a broad view causes most of the accesses to the Perforce database to be slower. It is up to the maintainers of each branch to narrow the view sufficiently to minimize performance issues without bringing any necessary files out-of-view, or having to spend too much time maintaining the view.

To determine the list of locations in the view, A4 looks for the following files in the present branch's configuration (see THE CFG SUBTREE for more information):

prefiles
(not for configuration view)

files
(only for default view)

allfiles
(only for integration view)

postfiles
(only for default view)

It then appends the following forced locations:

        CFG/...
        -.p4env
        -.p4top
        -CFG/prefiles
        -CFG/postfiles
        -.../.p4env
        -.../.p4top
        -CFG/.../prefiles
        -CFG/.../postfiles


REGRESSION

Each branch may have a submission gate, implemented as a standard submit trigger. Passing the submission gate, if there is one, is a condition for the acceptance of a submission to the branch.

The main reason for having a submission gate is that it prevents grossly malfunctional changes from ever getting into the branch. Once in the branch, a grossly malfunctional change can be very costly, because anybody who sync's to the branch (which may be required in order to make a submission) becomes unable to make progress until the cause can be isolated, resolved and distributed. The overall effect of this is that a few careless individuals can adversely effect the entire project unless countermeasures are taken.

If it is necessary to record or share changes that are not expected to be functional, then a development sub-branch should be used.

Submit Locks

The submit subcommand of a4(1) obtains an exclusive submit lock on the branch from the a4 server. If successful, it then makes a submission request to the p4 server.

If the branch is gated, then the p4 server will run the standard submit trigger, which consists of verifying that the submitting user owns the submit lock, logging into the client machine and running a4 presub as the submitting user by making use of the SUID script that the users created by running a4 setup.

If a4 presub succeeds, then the SUID script will print ``exit 0'', signifying to the trigger that the submission check passed. Otherwise, the SUID script will print something else as the last line, and the trigger will fail, resulting in the submission being rejected. Either way, a4 submit will then release the submit lock.

The advantages of using an a4 server instead of a filesystem to keep track of submit locks are that locks cannot be orphaned, even if there is a system failure, and that not all client hosts need to share a filesystem. The a4 server is also useful for other things (see LOCKING_TYPE FILES).

presub

The presub executable (see THE CFG SUBTREE) is invoked by a4 presub, which is invoked by the standard submit trigger, if any. By default, it does the following:

  1. All files (unopened files in particular) in the workspace are up-to-date with respect to the latest versions in the depot.

  2. All opened files in the workspace are in the submission change list.

  3. All open locking-type files in the workspace are either not open on any other branch, or are authorized by the owner of the branch on which the merge would take place. (TBD)

  4. If the branch is designated fully-merging with its parent, then each file whose last pending forward integration is being submitted has contents and type that match the parent branch exactly. Ditto for integrations from fully-merging child branches. (TBD)

  5. Run regr with the -e option and an argument of 1 following the branch name argument.

regr

The regr executable in the configuration directory (see THE CFG SUBTREE) is usually responsible for testing the contents of a workspace according to the requirements of its associated branch. regr can be invoked by presub, refbuild, or explicitly by the user.

regr should not assume that any non-source files exist, nor should it assume that any existing non-source files are up-to-date unless so indicated by the modification timestamps, nor that existing source files are write protected. It's a good idea to employ make(1) or a similar tool to avoid repeating work (including testing) unnecessarily, but care must be taken to consider all of the possible dependencies.

regr should not modify the source configuration (e.g. via p4 sync) or the source control database (e.g. via p4 submit). Its sole purpose is to determine whether the set of source files as they presently appear in the given workspace meet some criteria, along the way generating (and usually leaving behind) whatever non-source files are necessary to do that.

With the -e option, it should (if possible) exit with a nonzero status if there is any nontrivial work to be done in order to complete the regression. (This is useful to prevent the branch from being submit-locked for long periods of time. That is, the submitter is expected to run and pass the regression before submitting, and then the submit trigger will merely check that this has been done.)

The second argument (following the branch name) should be used to indicate the thoroughness of the regression. There are no strict rules on how to interpret it, and your interpretation should vary with the proximity to release (both temporally and in terms of which branch you're on), but here's a rough guideline:

  1. Minimal regression. Check that compilation is possible. Perhaps some simple liveness tests.

  2. Submit regression. Check that all deliverables compile. Get as much test coverage as possible in 10-60 minutes.

  3. Reference regression. Maximize test coverage over a period of 8-16 hours.

  4. Release regression. Maximize test coverage over a period of 2-5 days. Pseudo-randomized testing usually appropriate.

The submit and reference regressions (typically thoroughness levels 1 and 2, respectively) should always be completely deterministic in terms of any material results. See CONFIGURATION CONVENTIONS. In particular, every randomized test in those regressions should be converted to a pseudo-randomized test with a fixed seed. However, tests not involved in those regressions may use a truly random seed, provided that the seed is recorded and that the test can also be run with a specified seed in order to materially reproduce the results.


REFERENCE BUILDS

A reference build is a clean build of some source configuration under a default environment. Reference build are expensive to store for long periods of time because they contain generated files; however, they are very useful for the following reasons:

The last advantage is particularly important. A very large project might takes days to build, whereas the resulting data might take only minutes to copy over a LAN. [Ideally, even that could be avoided by using symbolic links to the reference build. However, that places rather onerous requirements on the build system to know when to explode links, so it's not recommended.]

Reference regression

All changes that are in a reference build are available via p4 sync, so nobody should be waiting for the reference build with baited breath. Therefore, it's usually a good idea to have reference builds undergo a more thorough regression, typically overnight. That way, a user who starts with the same source configuration as a reference build will have a pretty good idea of the initial level of quality, which can be quite valuable.

If necessary, reference builds can be pipelined in order to partially compensate for the thoroughness of their regression.

The canonical location of refbuilds is determined by your site configuration. See %refbuild_map in the A4::Config manpage and platform2pdir in the A4::Config manpage.


CONFIGURATION CONVENTIONS

It is an error for the default view (as defined by the files file) not to include a file that is needed in order to run presub or refbuild (or if they don't exist for any ancestor branch, their default counterparts that invoke regr).

It is an error for presub or refbuild (or their default counterparts) to have material results that vary nondeterministically or due to reasonable variations in the environment. The exit status is specifically deemed material. Generation timestamps are specifically deemed immaterial. Running on an unsupported platform, setting environment variables to misleading values, or using a umask including a bit in 0711, is specifically deemed unreasonable.

It is an error for any file result to be tied to a particular workspace, such that it cannot be copied verbatim into the same location in a different workspace or the same platform without undesirable consequences. (However, one shouldn't do that unless it is certain that a materially identical file would have been generated, or else the build system will probably get confused.)

It is an error for the integration view (as defined by the allfiles file) not to include anything included by the files file, or anything included by the allfiles file of any of its subordinate branches.

It is an error for any executable to rely on the nonexistence of an out-of-view file, even if the branch's default view does not include the file. [That's because it should be possible to bring more files into view via prefiles and/or postfiles without breaking things.]


USING BRANCHES

Branches, even the somewhat restricted style of branches supported by A4, can be used to overcome a number of otherwise intractable development obstacles. A complete discussion of branching is beyond the scope of this document, but we'll touch on a couple of examples here.

Pipelined Projects

In order to deploy product generations more frequently, it is sometimes necessary to begin development on the following generation before the current generation is released. In order to do this, the development of the current project must be isolated from the following project in order to avoid incorporating inappropriate or malfunctional changes shortly before release. However, this is problematic because the following project is likely to fail unless the appropriate improvements from the current project are incorporated. Detecting and incorporating such improvements manually is extremely tedious and error-prone.

A good solution is to have the projects explicitly share whatever data should be shared, for example, via symbolic links or by generating slightly different files from a common base. (By convention, a relative symbolic link should not traverse outside its workspace, and an absolute symbolic link should point to stable data only, because otherwise reproducibility is jeopardized.) In order to temporarily isolate changes among projects, such that the current project is not destabilized and the following project can proceed without the burden of a lengthy regression, each project has its own branch whose parent is the TRUNK. Changes are propagated at controlled intervals by integrating through the TRUNK, and most incompatible changes are automatically detected and rejected along the way by regressions.

Development Branches

Because a project branch usually has a regression, a change that requires submissions from multiple contributors in order to reach a coherent state is problematic, because no single contributor can successfully submit.

This is best addressed by making such changes on a development sub-branch whose parent is the project branch. The development branch may have either a limited regression or no regression at all. When the development branch reaches a coherent state, it can be integrated back into the project branch.

The Tributary Flow Discipline

The term ``Tributary Flow Discipline'' was (AFAIK) coined by Laura Wingerd of Perforce. In short, it means that there should not be multiple integration paths between any two files in the depot. In terms of A4, it means that:

  1. Integration between the same file location in two different branches should always be either from parent to child branch (``forward integration'') or from child to parent branch (``reverse integration''), and

  2. Integration between a given pair of different file locations should always occur within the same branch every time.

We further suggest that forward integrations be performed in-order, and that reverse integrations be performed only when the child branch already contains all of the parent branch's changes. (By convention, it is an error to reject a change from the parent branch when integrating to a child branch that is designated fully-merging.)

The advantage of following this discipline is that the basis of merging (that is, the common starting point of parallel edits) is always clear. This means that changes can't get lost in the shuffle, causing branches to get persistently out-of-sync.

Unfortunately, there exist scenarios under which it is not feasible to adhere to this discipline. When that happens, you'll need to exercise a certain degree of caution, and expect that some amount of manual clean-up may be required.

Inter-file Branching

As far as Perforce is concerned, all branching is between different files in the depot. However, since A4 formalizes the distinction between the branch location and the file location, there is a semantic distinction between intra-file and inter-file branching. All branches created by A4 (upon which all A4 clients are based) are intra-file branches, because they map between identical file locations on different branch locations.

It is also sometimes necessary to use p4 integrate directly to branch files to a different location on the same A4 branch. (This is better than copying and p4 add'ing, but you should still exercise restraint here. It's usually better to share. See Pipelined Projects.)

Inter-file branching is essentially ad-hoc, but in order to conform to the Tributary Flow discipline you'll need to always do the inter-file integration between a given pair of files on the same A4 branch. In order to avoid having to edit integrations before submitting (which tends to leave the integration history in a misleading state), that branch should be a non-gated branch. See Development Branches.

If the branch mapping is complex and/or frequently used, then you should consider storing the mapping in the Perforce database using p4 branch. This is also a way (albeit a sneaky way) to enforce integrating on the same A4 branch every time.


LOCKING-TYPE FILES

[TBD: This section needs work. None of this is implemented yet.]

Locking-type files are (by definition) files that are automatically locked when they are opened. Their Perforce type contains the +l (that's the letter 'l') modifier.

Locking a file is usually counter-productive, because it prevents multiple clients from making independent changes on the same file, which is a normal part of the development process. However, if there is no means for automatically merging independent changes into the file, then it typically makes more sense to lock the file than to manually merge an unbounded array of changes together after the fact.

Perforce already prevents locking-type files from being opened on multiple clients. However, it has no mechanism for preventing a given unmergable file from being opened simultaneously on different branches. This is problematic if the path between those branches is fully-merging, because at some point you'll need to merge.

A4 addresses this problem by providing the a4 edit operation (TBD). This operation is a variant of p4 edit in which edited locking-type files are registered with the a4 server. An attempt to edit a file that has already been registered will fail unless either:

  1. All of the previously registered edits have already been sync'ed into the version on the client, or

  2. Authorization to edit has been granted to the client in question by the owner of every branch on which any unsync'ed registered edits would need to be merged (considering only reverse integration).

Note that the edit of a non-locking-type file can still fail if the same file is registered as locking-type on a different branch.

The default submit trigger will attempt to register the edit of all submitted revisions to locking-type files, and will fail if it can't. See presub. This prevents bypassing the registration by using p4 edit directly.


SEE ALSO

a4(1), A4, the A4::Ops manpage