Top process, top tools, shame about the configuration management.

Free Your XP IDE with Coarse Grained SCM

The Role of the IDE

The modern IDEs, and in particular Eclipse and IDEA, are getting very powerful. The feature set just keeps on growing and the tools make it harder to indroduce errors. Great! but there is one area where growth of functionality and usage of tools is very much constrained - and this is SCM (Software Configuration Management).

The current set of SCM tools and their IDE integrations are quite limiting. The functionality that currently exists covers renaming files (and preserving the history) and version differencing (which includes patching). For small changes and small source trees there is no problem. But ask anyone who has done a major refactoring exercise using these integrations and they will tell you that you need to be very careful if you want to maintain any kind of tracability and that most merge tools are useless. Language aware SCM systems are one way of improving things, but only the merge tools really exist and the whole solution is still experimental (tell me if I am wrong John).

So if you follow the rules of XP and Refactor Mercilessly, then you are going to have to take the hit and lose tracability, right? No, not if you change the granularity of your unit of source control.

Home · Contact · Blog · General Interest · Software · JHosts · Gos4j · © Hugh Reid

The Theory

Martin Fowler defines refactoring thus 'Refactoring is making changes to a body of code in order to improve its internal structure, without changing its external behavior'. So if you take this notion of a 'body of code' and make that your unit of source control then you can refactor to your hearts/budget/IDEs content without losing tracability.

The problem is that most SCM systems use the file as the basic unit of control. And for this purpose that is too fine grained, firstly because you will lose the history if the name change is untracable, and secondly the overhead of doing a fine grained merge is high (network access, storage size etc.).

So, in theory, you control your source at the unit of the component. This sits very well with CBD (Component Based Development) as when you refactor you should only need to update one entry in your respository. For a typical component you might have 4 units of control:

  1. The component public interfaces. Optional: you may want to keep these individually controlled on a fine level basis, but then it gets harder to identify and maintain build dependencies.
  2. The component implementation.
  3. The component test suite. You may want to split these into different units for the different types.
  4. Documentation. API, user guides etc.

If you follow the Blueprints about project structure then you can also see how this might sit very nicely with a deployment scheme.

Your unit of source control is therefore an archive file.

Or try:
Eclipse
IDEA
rules of XP
Martin Fowler
Component Based Development
clearcase merge
cygwin
sharutils package
Related Pages

Rollback a Moment

Anyone that knows the inner workings of a VC system will be screaming 'are you quite mad?' at the moment. Why? because zip or tar archives are considered to be binary files by most SCM storage media, and therefore they usually store the 'whole' thing at every revision. Also it means that to merge parrallel changes you need a 'pro' merge tool like ediff that can handle zip/tar archives.

But it can work...

In Practice

Most SCM tools do suffer from the inability to handle zip/tar archives directly. You should check with your vendor or to some tests to ensure that archives are able to be merged, which also implies that incremental storage is possible (but not always used).

If your SCM tool does not support using zip/tar archives then try it with a shar archive. This (ancient) format of archiving is designed for emailing trees of files, and creates a self extracting unix shell script, which is of course text. Even clearcase merge should be able to cope with that. And cygwin has a sharutils package for Windows environments.

One of the big plus points is that changes are consistent between elements of the same unit. So if you have some closely coupled sources then there is no division between the checkin of one half and the checkin of the other half.

An unexpected bonus of using archive files as the unit of source control is in the build environment. Because the archive preserves the timstamps of the files then Ant and Make can be set up to do incremental rebuilds more reliably. This applies to not only the implementation unit but also the test units.

Conclusion

If you do XP properly then you need to empower your developers to refactor. So that refactoring does not eat your budget, you need to make it quick and easy. Good IDEs like Eclipse and IntelliJ let you make sweeping changes quickly and safely, but you don't want to bog down the developers with hours of merging and SCM admin. By increasing the size of your controlled granule you get the following benefits:

  • Consistent checkins.
  • More traceable refactoring.
  • Smaller number of controlled units.
  • Better abstraction of interface and implementation for CBD.
  • Simpler build process: unpack source, build, pack deployable.
  • Higher level component relationships.
  • Better support for incremental builds.
  • Faster refactoring, because the tools do not need to keep checking if things are checked out and checking them out if they are not.

Copyright © Hugh Reid, Creative Commons License
This work is licensed under a Creative Commons License.