I have been working with ClearCase? since 1994 and have become very familiar with its problems and shortcomings. I am using this page to accumulate a list of what is wrong, broken, or sub-optimal with ClearCase?. This page has been written gradually over several years, often when I was in a bad mood after running into a problem. There are a number of good features of ClearCase? which are not included in this page, but that information is readily available from IBM marketing.
Update: I attended the IBM Rational User's conference in Jun 2007, and it appears that some of these problems are finally getting addressed. Version 8.0 should be sweet. I just hope I can hold out until mid 2009.
If I had a nickel for every time someone complained to be about ClearCase? performance I could have retired by now. The network architecture of ClearCase? assumes that all users will be accessing the vob server via a high-speed local LAN, that is, most ClearCase? operations require a huge number of round-trips between the vob server and the client.
I did some rough measurements of the packets exchanged during common operations and found that a simple "desc" operation takes over 100 round-trips, a "checkout" takes over 500 round-trips, and a "checkin" requires over 1000 round-trips.
I also did a comparison of creating a snapshot view and doing an initial checkout from SVN of an identical source tree. Subversion took about 49 round trips, but Clearcase did 117-144. Due to this latency difference it took 19 seconds to pull the source from Google code (through an https proxy), but it took 30 seconds to pull the source from a neighboring site over the intranet.
Clearly, even the slightest increase in latency between these hosts will mean a huge performance degradation. According to [It's the Latency, Stupid] the theoretical minimum latency for between machines on opposite shores of the USA is 42ms, in Siebel it seems to be about 62ms... that translates to a minimum checkin time of 62 seconds and that does not account for any processing time on any of the involved machines.
Both albd and the lock manager are single-threaded. This means that for a large user population you must have multiple servers in order to get reasonable performance. Update: it appears that the lock manager has been fixed in ClearCase? 7.0.
Access control is very limited. It uses the old Unix model: user/group/other. If you have a vob which must be restricted to the members of two different groups, you will be in trouble. The suggestion always given by IBM is to create different regions for different user populations, but that suffers from the same multi-group issue, not to mention that a machine's region can be changed.
I implemented rudimentary access control by applying an ACL on the vob storage directory. This prevents Windows users from mounting the vobs. Unfortunately, since vob mounting on Unix is done by root, those ACLs are ignored.
Update: Version 8.0 should include ACLs! Version 7.0.1 has a group to region mapping mechanism which is a reasonable stop-gap for CCRC until then.
In one sense this is the greatest feature of ClearCase?: creation of a view (a.k.a. workspace) is a constant-time operation, i.e. creating a view for a 1mb source tree takes the same amount of time as for a 1tb source tree. Most source control systems require you to have a copy of every file on your local disk, which can be prohibitive for large source trees, both in terms of time and space.
But, here's the rub: This means ClearCase? lets you avoid careful segmentation/componentization of a product, instead developers can throw everything into one big source tree. But who cares? Since dynamic views are so cheap, it doesn't matter, right? Wrong! When the source tree becomes so big that snapshots are no longer possible there are big downsides:
The ClearCase? Web interface has been included with the product for many years and is still severely limited. One of these limitations is that interactive triggers will not work. It seems like this would have simply have been a matter of making "clearprompt" understand that it is being run via the web interface and interoperate with it. But they didn't bother with that (see the "Triggers" section for further criticism of clearprompt). Upon testing with our extensive set of triggers (only one of which is "interactive", we find that "describe" does not work, so their documentation is dead wrong: it's not "interactive" triggers that won't work, but, indeed, triggers that call most any external clearcase command. I know that numerous companies use triggers for policy enforcement, to throw all those out the window to use the ClearCase? Web interface is absurd.
Update: Version 7.0.1 seems to fix this so that almost all triggers work (those that modify the source file are said not to work). Even "clearprompt" seems to work.
Writing triggers is a major undertaking. Making complex triggers work is very tricky, time-consuming work for a variety of reasons:
I took the second option, and wrote an elaborate trigger infrastructure to work around all the platform foibles and perl anachronisms. It's about 4000 lines of perl (including perldoc)
Displaying good error messages is very difficult since the UI on windows loses the output generated by the triggers, as does the web interface. Therefore I took to using "clearprompt", but it is very ill suited for displaying long messages, the text ends up wrapped in odd ways and often chopped off. Furthermore you can't select text from it (say, for a URL).
Oh, and I discovered a serious problem many years ago. When do a triggerable action, clearcase searches for all applicable triggers and builds a list, during which the vob is in a semi-locked state. If you have hundreds of triggers, this can cause all kinds of problems. Admittedly, it was not smart to have that many triggers, and it was easily fixed.
Also, see Web Interface section above.
In pre-MultiSite? days, if you had development at multiple sites, someone was going to be stuck accessing a vob via a WAN, which is unacceptably slow (see performance section above). MultiSite? promised to fix that by allowing vobs to be replicated between sites, such that each site would have local copies of each vob. It sounded wonderful, and my employer at the time (Informix) was lobbying hard for this product and were one of the first to deploy it.
Sadly, there was a hitch: mastership. MultiSite? makes a key assumption:
In all my years I have never seen such a situation, and over the years teams have become more widely distributed. As such, "mastership" was troublesome for administrator, and confusing for users.
In order to mitigate this explicit mastership was introduced (in v3, I think). So now mastership of a branch could be moved around on different files. This is an improvement given the following assumption:
Strike two. There are always files that multiple teams need to modify. Furthermore, this sort of mastership is confusing.
Next, request mastership was introduced, which allowed users to request mastership for a given branch or branch instance. This seems like a good idea, but there are several problems:
Here's a different problem: When packets are being imported each action has to be replayed. Normally this is quick... but if the packet contains 50,000 mklabel commands, your MultiSite? queues become jammed.
And another issue: The entire vob database and source pools are replicated, even though it is rare for a remote site to use more than a few branches/versions. 90% of what's being replicated is of no interest to a given site. As I understand it, Perforce has a better replication strategy where local replicas simply cache what is used locally, which would be a much smarter way of doing things.
There is no formal relationship between a branch and its base point, that key bit of information is in the config spec. So, given a branch name, there is no way to find out the base of the branch without asking someone. Guessing is a sure way to run into trouble.
Now, this is actually a feature since it means that the base point can be changed, which is a great optimization of the merge process. For example, given the following branch structure:
dev -------o
/
main --------o----------o
C1 C2
So, this means the "dev" branch was created based on the "C1" checkpoint. Let''s assume that 100 files have been changed on the "dev" branch, but, on main 1000 files have been changed between "C1" and "C2". If you do a merge from "C2" to the "dev" branch (which seems an intuitive way of rebasing) you will bring 1000 files into your "dev" branch. This means that another 1000 files will need to be merged from now on. However, if first change your config spec to base "dev" on "C2" and then do a merge from "C2", you have done the same thing except you will only merge files which have been changed on both branches (which will be 100 or less).
While the merge tools with ClearCase? are some of the best I have seen, there are several key shortcomings:
Labels are a linear-time operation, that is, the time taken is proportional to the number of elements being labeled. The fastest labeling rate I ever saw was about 25 files per second. For small source trees this is irrelevant, but for large ones it is insanely slow (see the Dynamic View section about large source trees). This can be mitigated to a certain degree by running the mklabel commands in parallel.
Furthermore, in a MultiSite? environment, the update packets containing these mklabel commands clog things up since MultiSite? replays these events at about the same pace they took to run in the first place. This clog can be made worse by labeling in parallel as suggested above.
Of course, using timestamps in a config spec can work just as well as a label, providing the engineering managers are willing to accept such a thing.
There should be a new type of label, which is really just a config spec excerpt, which would, in turn contain a branch and timestamp.
When view profiles first appeared (in v3, I think), it seemed like it might address some problems with config specs. However after working with them for several years, they seem like more of a hindrance than a help. First off they are not portable to Unix; what is truly astounding about this mistake is that simply using forward slashes instead of backwards slashes would have done the trick. Though this portability is further hindered by the software's inability to handle Unix line endings. Secondly, there is no command line interface, which means if you want your build scripts to use the same view profile that developers use, you have to cook up a wrapper to do so (my ClearCase?::ConfigSpec? perl module does so). It is nice that it gives you an easy, graphical way to create branches and deliver changes from them, however, this UI is missing a key feature: rebasing! How could such an essential feature have been forgotten?
It also astounds me that a product that specializes in version control would write the view profile mechanism so that is exceedingly hard to incorporate into a VOB. I spent a lot of time figuring out how to check in view profiles and distribute them to all sites.
Also, the if a view is associated with a view profile mechanism, all relevant vobs should be automatically mounted when the view is started. A great feature! However, it doesn't work much of the time, though no errors are recorded as to why.
Another problem is that the automatically generated private branch config specs use the old, cumbersome, -mkbranch modifiers, rather than the mkbranch rule. Furthermore, they neglected to include the "-override" modifier which would have greatly simplified how they set up the private branch config specs.
It seems to me that the view profile mechanism was written by someone who knew nothing of Windows/Unix? portability, version control, the command line, recent config spec features or typical branch/merge techniques.
Of course, this all begs the question as to why they didn't simply extend the config spec mechanism to include vob lists and the like? They extended it for snapshot views...
As noted above, snapshot views can be slow to populate due to the number of round-trips required. For large source trees, it can be prohibitively slow. With some systems (like visual sourcesafe) this could be mitigated by running many updates in parallel on different directories, but, unfortunately, that is out of the question for ClearCase? as the snapshot view update is single-threaded and will not permit more than one to be running at a time.
In ClearCase? v4 a new "Scheduler" system was introduced, which purported to "fix" many of the problems with cron. However, in practice it is a very cumbersome system. The first problem is that current job status and schedule information is mashed together into one "configuration file", which makes version control of these files very tricky (it is odd for a company which specializes in version control to prevent it's use). The job numbers are problematic and redundant (why not just use a job name?). Creating a new job is tricky as there are so many entries that you have to set up. You cannot tell from the sched file what command will be run. That is stored in another file which cannot be modified via the "sched" command! It appears as though this is a system which expects to be manipulated via a GUI, but in the years that this system has been in existence, so such GUI has surfaced.
Most version control systems I know off (e.g. CVS, SVN, VSS, RCS, SCCS, &c.) will expand certain keywords (like $Header$) inside text files to contain information about the version of the file being looked at. This is essential for identifying which files/versions contributed to a given version of a product. However ClearCase? has no such thing. It is often suggested to implement this via a trigger. The problem is that this trigger will cause any non-trivial merge to be a conflicting merge, since the same line has been modified on both branches.
Some then suggested that a type manager be set up to help with this. This is a good idea except for one thing: there is no mechanism for deploying type managers. Every client needs to have the new type manager. That's not happening with over 1000 clients.
Such a type manager should have been a stock part of ClearCase? from the beginning.
I brought a big issue with type managers in the previous section. Another problem, is that the type manager mechanism confuses two separate concepts: How the versions are stored and how differences will be presented.
Case in point: the "ms_word" type manager is based on "file", which will store full copies of every version. Often, the old versions of word documents are rarely going to be used, so devoting all that disk space to them is dumb. I could convert the element type to "binary_delta_file", but that would lose the MS Word diff magic.
This is not unique to ClearCase?, by any means; programmers should be ashamed of the poorly written, uninformative or downright misleading error messages that have become commonplace. Any error message should answer the usual set of questions: who? what? when? how? why? That means it should include all relevant file names, the reason for the failure, identity information (if relevant), and, ideally, some hint as to how to fix it.
Here is a little error message "hall of shame":
Internal Error detected in "../map_db.c" line 822
Here's my take on the state of ClearCase?: Around the time that Rational took over ClearCase? (1998, I think) the core product stagnated entirely. No significant bug fixes or improvements since that time; note that most of the problems I mention above have been like that for 10 years. A few new things were added but they all seemed botched in that they left part of the product out, or didn't support all platforms (plenty of examples above). When IBM took over, I had hoped that they would shake things up, and it seems that in the last year or so they have. But I fear it may be too late. I have seen many teams abandon ClearCase? out of frustration and competitive products pop up in the mean time (e.g. Subversion), and, to be honest, I'm not entirely sad about that since I am tired of being the messenger that everybody shoots at.