By Josef Betancourt, Created 12/27/09 Posted 28 Mar 2010
You will modify a group of files to accomplish something. For example, you’re trying to get a server or network operational again after some security requirements. During this process you also need to keep track of changes so that you can back out of dead ends or combine different approaches. Or you just simply need to temporarily keep revisions of a group of files for a task based project.
Version Control, Revision Control (RCS), Configuration Management (CM), Version Control System (VCS), Distributed Version Control System (DVCS), Mercurial, Git, Agile, Subversion
An lightweight Ad-Hoc Version Control (AHVC) approach may be desirable. Note that even when there are other solutions in place, a lightweight approach may still be desirable. What are the requirements of a lightweight and workable solution?
- Automated: Thru human error a file or setting may not get versioned or even lost. Thus, all changes must be tracked.
- Small: A large sprawling system that could not even fit on a thumb drive is too big.
- Multiplatform: It should be able to run on the major operating systems.
- Non-intrusive: Use of this system should not change the target systems in any way. Ideally should run from a thumb drive or CD. And, if there is a change, backing it out should be foolproof.
- Simple: Anything that requires training or complexity will not be used or adopted. This reduces collaborative adoption and improvements in tools and process.
- Fast: Should be fast and optimized for local use.
- Distributed: Since issues can span “boxes”, it should be able to work with network resources. This reduces the applicability of GUI based solution.
- Scripting: Should be easy to optimize patterns of use by creating high-level scripts.
- Small load: Of course, we don’t want to grab excessive CPU and memory resources from the target system.
- Non-Admin: Even in support situations, full admin access may not be available.
- Transactional: Especially in server configuration, changes should be consistent. Failure to save or revert even one file could be disastrous.
- Agile: Not about tools but results.
At home when I create a folder to work on some files, like documents or programming projects, I will usually create a version control system right in the folder. This has saved me a few times. I also tried to do this at a prior professional assignment and was partially successful (will be discussed later).
I used a Distributed Version Control System (DVCS). Since it does not require a centralized server or complicated setup to use, a DVCS meets most of the lightweight requirements. Though, a VCS is usually used for collaborative management of changing source files it may be ideal here. One popular use case is managing one’s /etc folder in Linux with a VCS.
Seems contradictory that a DVCS is great for local ad hoc use. But, that is just a misconception of the concept of a DVCS.
A good example of a DVCS is:
“(n) a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.” – http://mercurial.selenic.com/
Another is Git
” Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows. ”
Note: I use Mercurial as the suggested system simply because I started with it. Git and others are just as applicable.
To create a repository in a folder, one simply executes three commands init, add, and commit. The “init” creates a subfolder that serves as the folder history or true repository. The “add” is recursive, adding all the files to version control, and the “commit”, makes these changes permanent. Of course, one can ‘add’ a subset of files and create directives for files to skip and so forth.
In a command shell
hg commit -m “initial commit of project”
The terminology may be a little confusing. What happened is that now the GIZMO folder has a Mercurial repository which consists of a new .hg folder, and the other existing files and folders comprise the working directory (see Mercurial docs for a more accurate description). There are no other changes!
That’s all it takes to create a repository. No puzzling about storage, unique names, hierarchy, and all the details that goes with central servers. The Mercurial docs show how to do the other required tasks, like go back to a previous changeset or retrieve file versions and so forth. Here is how to view the list of files in a particular changeset:
c:Usersjbetancourt…ctsadhocVersioning>hg -v log -r 0
changeset: 0:f29a0b0ad03c user: Josef Betancourt <josef.betancourt>l date: Sat Jun 21 10:53:11 2008 -0400 files: AdhocVersioning.doc description: first commit
And, here is a log output using the optional graph log extension (http://mercurial.selenic.com/wiki/GraphlogExtension)
c:Usersjbetancourt...adhocVersioning>hg glog -l 2
@ changeset: 9:25f4c55e4860
| tag: tip
| user: Josef <josef.betancourt>
| date: Fri Mar 26 22:43:56 2010 -0400
| summary: removed repo.bat
o changeset: 8:43a33533c992
| user: Josef <josef.betancourt>
| date: Thu Mar 25 22:08:35 2010 -0400
| summary: removed old files
For the lone individual using ad hoc versioning a sample workflow is give at Learning Mercurial in Workflows.
Ad Hoc Sharing
A DVCS, true to its name, shines in how it allows Distributed versioning sharing of these local repositories. Thus, when a team is working on a technical issue (ad hoc) it is very easy to share each others work. Mercurial includes an embedded web server that can be used for this.
Mercurial’s hg serve command is wonderfully suited to small, tight-knit, and fast-paced group environments. It also provides a great way to get a feel for using Mercurial commands over a network.
This is illustrated with the coffee shop scenario, see manual.
A sprint or a hacking session in a coffee shop are the perfect places to use the hg serve command, since hg serve does not require any fancy server infrastructure … Then simply tell the person next to you that you’re running a server, send the URL to them in an instant message, and you immediately have a quick-turnaround way to work together. They can type your URL into their web browser and quickly review your changes; or they can pull a bugfix from you and verify it; or they can clone a branch containing a new feature and try it out.
Of course, this would not scale and is for “on-site” use between task focused group members.
A great workflow image by Leon Bambridge for team sharing.
Another simple scenario is taking a few file documents from one location to another with a flash drive (in lieu of using a Cloud storage service). Instead of doing a copy or cp one can simply create a DVCS repository at the work directory, then clone it on the flash drive. Then at home one pulls to the DVCS repository at home. When finished editing the files, one then pushes to the flash repo, and does the reverse process at the work site. Not only are you not missing any files, you are also keeping track of prior versions. Note, for security reasons, not everyone has unfettered web access or should they.
Revisiting the flash drive scenario above; if you plan to use a flash drive for transport multiple times and the group of files are large, using the “bundle/unbundle” hg commands are a good tool, see Communicating Changes on the Mercurial site.
Every connection must be secure and every file must be encrypted, especially if on flash drives. The security policies of the employer come first. Even if only for your own personal ad-hoc use, you should be careful with exposing your data.
- Easy to use.The commands needed to perform normal tracking of changes are few and simple. The conceptual model is also simple, especially if one is not fixated on use of centralized Version Control System.
- Some file changes may be dependent on or result in other file changes.In a DVCS, commits or check-ins create a “changeset” in the local repository. This naturally keeps track of related changes.
- You may need to work on different operating systems.Mercurial runs on many systems including Windows.
- You don’t want to change the existing system, low intrusion. Mercurial can be deployed to a single folder, and the repositories it creates do not pollute the target folders. For example, in the Subversion VCS, “.svn” folders are created in each subfolder in the target. Not a drawback but complicates things down the line, such as when using file utilities and filters.
Unfortunately, the use of a DVCS is not perfect and has its own complexities. For Mercurial, in the context of this post, these are handling binary files, versioning non-nested folders, and probably for any VCS is the semantic gap between the project task based view and the versioning mindset.
1. Binary Files
Mercurial is really for tracking non-binary files. That way the advantages of versioning are realized. Diffs and merges are not normally applied to Binary files. Further the size of binary files impact performance and storage when they reach a certain size. Yet, for ad hoc use, binary files will have to be easily tracked. Binary files could be images, libraries, jars, zips, documents, or data.
Large binaries are a problem with all VCS systems. One author discussed a technique to allow Git to handle them in lieu of his continued use of Unison. He said use Git’s “–shared” option: git clone –shared /mnt/fileserver/stuff.git stuff
Note that Mercurial extensions exist to handle binary files. One of these is the BigFiles extension. In essence, BigFiles and other similar approaches, handle large binaries using versioned references to the actual binaries which are stored elsewhere.
Update Oct 29, 2011: Looks like Mercurial 2.0 will have a built-in extension for handling binary files, LargeFiles extension.
Another issue is that since binary files may not be diffed within the dvcs tool set. In a DVCS one can set an external merge agent. If one is not available, using the app that created the binary diff and merge is cumbersome. For example, a Word doc is binary (even though internally it could be all XML) in effect. Thus, a diff would not reveal a usable view. One must ‘checkout’ particular revisions and then use Word to do a diff or just manually eyeballing it. Same thing with a zip, jar, image, etc.
Update 02-02-2012: Some tools allow direct use of external tools to diff “binary” files. I think TortoiseSVN does this, allowing Microsoft Word, for example, to diff.
2. Non-nested target folders.
A scenario may involve the manipulation of folders that are not nested. For example, a business system employs two servers and changes must be made to both for a certain service to work, further, physically moving these folders or creating links is not possible or allowed. Mercurial, at this time, works on a single folder tree, and AFAIK there is no way to use symlinks or junctions to create a network folder graph, at least with my testing. The ForestExtension or subrepositories experimental feature in Mercurial 1.3 do not qualify since they only enable the handling of a folder tree as multiple repositories.
Sure each folder tree in the graph can be managed, but if a particular change effects files in each tree, there is no easy way to transactionally version them into one changeset, though there are ways to share history between repositories (as in the ShareExtension).
A possible solution is to allow the use of indirect folders. In Mercurial, work files and the actual repository, the .hg folder, are colocated. Instead the repository can point to the target folders (containing the work files) to be versioned. In this way multiple non-nested folders can be managed. Note that this is not a retreat to the centralized VCS since the repository is still local and distributed using DVCS operations. Below, the user has created a new Mercurial repository in folder “project”. This creates the actual repo subdirectory “.hg”, and the indirect actual folders to be versioned are pointed to in a “repos” directive file or using actual symlinks.
repos ——> src_folder1
Whether this is useful, possible, or already planned for is unknown.
I mentioned this “limitation” on the Mercurial mailing list and was told that this is not a use case for a DVCS. There are many good reasons why all (?) VCS are focused on the single folder tree.
Update, 2011-08-31 T 11:37
Just learned that Git does have an interesting capability?
It is also possible to have a working tree where .git is a plain ASCII file containing gitdir: , i.e. the path to the real git repository
Though this doesn’t fulfill the non-nested project folders scenario, it does help Git be more applicable in an ad-hoc solution. For example, the repo could be located in a different storage location when the target folder is in a constrained unit.
3. Non-admin install
Updated 25 Aug 2010: In the requirements, non-admin install of the VCS was mentioned. This is where Mercurial fails, somewhat. The default install using the binary, at least on Windows, requires admin privileges. I got around this by first installing on another Windows system, then copying the install target folder to the PC I need to work on. This worked even when I installed on a Windows 7 Pro, and then copied to a Windows XP Pro. No problems yet. The Fossil DVCS does not have this problem.
4. Ignore Files
This is, perhaps, a minor issue. Mercurial, as most VCS do, allow one to filter the files that are versioned in the repo.
In Mercurial one creates an .hgignore file and within it, one can use glob or regular expression syntax to specify the ignore files. Well, this can be tricky. See newsgroup discussion that was started by this post. IMHO, having another syntax declaration that allows specification of directories and files explicitly is the way to go. How do other systems do this? Ant patternsets seem to be pretty usable.
5. Semantic Gap
There is a semantic gap when working on a maintenance problem and switching to the versioning viewpoint. When versioning systems are routinely used, as in Software Development, this is not an issue, just part of the Software Development Lifecycle or best practice (amazing that some shops don’t use version control). But, when one uses VC only occasionally as a last resort it’s another story. QA, Support, and Project Managers, may not be comfortable with repositories, branches, tags, labels, pull, push, and so forth.
When I first tried to use Mercurial for Ad hoc service professionally it quickly lost some of its advantages as the task (fixing a system) reached panic levels (usually the case with customer support and deployment schedules) and simply creating and looking at commit messages failed to follow the workflow. Manually tracking which tag or branch related to which event of system testing was cumbersome. Further use would have eventually revealed the patterns of use that would have worked better, but that was a onetime experiment.
A partial solution, other than just getting more expert with the DVCS and better work patterns, is to implement a higher level Domain Specific Language (DSL) that hides the low level DVCS command line and repository centric view. This could even have a GUI counterpart. This is not the same as a GUI interface to the DVCS such as TortoiseHg or the Eclipse HG plugin. What should that DSL be and is it even possible or useful?
work flow Updates
June 26, 2011: git-flow, is an example of providing high-level operations to enable a specific work flow or model. Perhaps such an approach would be applicable in this AHVC requirements.
Sept 17, 2011: Mercurial Flow-Extension
implements the git-flow branching model.
The usual approach is to just make copies of the effected folder or files that you will be changing. You can use suffixes to distinguish them, such as gizmo.1.conf. It’s very common to see (even in production!) files or folders with people’s initials, gizmo.jb.19.conf.
This gets out of hand very quickly, especially if you are multitasking or working as part of a team and may forget after a good lunch what file “gizmo.24.conf” solved. This problem is compounded when you need to change multiple files, so for example, gizmo.jb.19.conf may depend on changes to “widget.22.conf”. This also gets very chaotic when the files to change and track are in different folder trees or even storage system. Most importantly this will not withstand the throw clay at the wall and see what sticks school of real world maintenance.
One method I’ve seen and used myself is to just clone each folder tree to be modified. This gives an easy way to back out any changes. This, alas, is also error prone, resource intensive, and may not be possible on large file sets or very constrained systems.
A Traditional client-server VCS like Subversion can, of course, be used for Ad Hoc Versioning. With Subversion one can run svnserve, its lightweight server, in daemon mode. Then create a task based repository:
svnadmin create /var/svn/adHoc
And, import your tree of files:
svn import c:\Users\XXX\Documents\projects\adhocVersioning file:///var/svn/adHoc/project
Plus, Subversion supports offline use. I think. Have not used Subversion in a while.
Another effective Subversion feature is the use of local repositories using the “file:// protocol”.
Many systems are managed by various forms of management consoles, graphical user interfaces. These are client or web based and may be part of the system or a third-party solution. This is a big advantage from a hands-on administrative point of view. However, from an automation and scripting viewpoint this is not optimal. Thus, there is hopefully a API or tool based method of accessing the actual configuration files or databases of the system. So in this sense, these systems are within the scope of this discussion.
This is not always the case. One application server comes to mind that was (is?) so complex that there was no way to script it. Thus, no way to automate the build and release process and versioning of the system. Consequently, there was also no way to automate the various QA tests that were always panic driven and manually applied.
The correct or more tractable method is to use a managed approach. This is a software configuration and distribution system that is usually under the control of the IT staff, for example Microsoft’s System management Server (SMS) or System Center Essentials (SCE) for SMB. Non-Microsoft solutions are of course available, such as those from IBM Tivoli’s product lineup.
Why is this not always the best approach? There may be situations where a subset of a managed resource must be modified. For example, you are a field service engineer and must travel or remotely connect to a client’s system to handle a service call. This process may also entail making changes to other hosted apps and system configurations, such as network configurations. Trying to get the IT department to collaborate or change the configuration or schedule of the managed solution may not be possible or timely. In fact, this would be discouraged (rightly so) since it can wreak havoc on a production system. Thus, even changing some resource may entail admin of multiple systems, not just a push of a few files and run of some batch files. It could require interactive set and test. Picture the Mars Rovers and how the OS reset problem was finally solved.
Closely related to the managed approach is to use a centralized version control system (VCS) or backup support. Fortunately many operating systems have versioning capabilities built in or readily available. For example, in the Windows platform one can make System Restore points or use the supported backup subsystems (as found in Windows 7 Professional). Many *nix’s also have built-in versioning support in the form of installable Version Control Systems or differential backup. In high-end virtualized systems there are facilities for backup or making snapshots and even transport of live systems.
While these work, there is a certain amount of complexity involved. Also there are issues using the same approach on multiple operating systems. Another important drawback is that one cannot always modify the target system and, for example, install a VCS, even temporarily. The common factor in these approaches is that there is a central “server” and associated repository for revision storage. This is fine when available but not very conducive to ad hoc use.
Versioning File System
A VFS could be of some use. As far as I know there are no popular file systems that support versioning (as used here). Digital Equipment’s VAX system had single file versioning and now openVMS. Microsoft’s Windows was supposed to have this in the future winfiles, but is no longer in the plan(?), though Windows 7 and current servers can allow access to previous file versions as a feature of its system protection feature. Plan 9 has a snapshot feature. ZFS has advanced stuff too and I would not be surprised if one can set a folder to be ‘versioned’.
However, a VFS would not help in task based versioning since as discussed previously, there may be a need to change multiple subsystems and track these changes as “change sets”. Thus, a VFS is not a Revision Control System.
Of course, using a scripted solution (discussed next) in conjunction with a file change notification system (inotify), one could cobble together a folder based VCS. However, this is outside of our lightweight requirements.
Of course, it should be possible, especially in *nix systems, to use the utilities available and construct a tool chain for a lightweight versioning support. The rich scripting and excellent tools like rsync make this doable. Some languages such as Perl or Python are ideal for gluing this together.
Yet, this is not optimal since these same tools will not work on all operating systems or require duplication. For various reasons, for example, it is not always possible to install cygwin on Windows and make use of the excellent *nix utilities and scripting approach. Likewise, it is not possible to use the outstanding Windows PowerShell in Linux. This is only a problem of course if we are referring to empowering staff to work seamlessly on different OS or resources. Having the same tools and workflow are valuable.
Another thing about this alternative is that a custom solution will eventually become or have functions of a version control system like Git, so why not just start with one?
One approach possible by the use of the aforementioned scripted solution is to create a snapshot system. The DVCS gives us fine grained control of file revisions. But, do we really need to diff and find out that a command in one batch file used the ‘-R’ option or just get the file with the desired option. We would know which file we want using task based snapshots. Before a task is begun, we initiate a snapshot. This is analogous to the OS type of restore points, except we do this for a specific target set.
Finally, there have been alternatives to the Relational Database Management System (DBMS) for many years. Most recently, this is the NoSQL group of projects such as CouchDB. CouchDB claims that it is: “Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.” Those features sound like something an ad hoc version control system should have. Yet, CouchDB, all?, are document centric. Still, worth pondering.
Presented were a few thoughts on an approach to ad hoc versioning. A DVCS was proposed as a lightweight solution and some issues were mentioned. Alternatives were looked at. More research is required to evaluate proposal and determine best practices for the discussed scenarios.
7/15/10: Changed “maintain” to “accomplish” in Scenario as per feedback from K. Grover.
7/23/10: Forgot that I visited Ben Tsai’s blog where he discusses using Mercurial within an existing VCS such as Subversion, which I’ve also done, but not really the topic I discussed.
“HgInit: Ground up Mercurial”, http://hginit.com/01.html
Setting up for a Team
“Easy Automated Snapshot-Style Backups with Linux and Rsync” http://www.mikerubel.org/computers/rsync_snapshots/
Using Mercurial as ad-hoc local version control
“Intro to Distributed Version Control (Illustrated)”
Version Control, infrastructures.org
“The Risks of Distributed Version Control”,
“Subverting your homedir, or keeping your life in svn”
http://kitenet.net/~joey/svnhome/ (He now uses Git)http://microseeds.com/blog/?p=95
Home directory version control notification
“Managing your web site with Mercurial”, Tim Post, http://echoreply.us/tuto/mercurial_site_management.html
Mercurial by example
Mercurial (hg) with Dropbox
Mercurial for Git users
Versioning File System
Agile Operations in the Enterprise
Michael Nygard, http://www.infoq.com/articles/agile-operations
Microsoft System Center Essentials
A utility that keeps track of changes to the etc configuration folder:
Version Control for Multiple Agile Teams
DVCS vs Subversion smackdown, round 3
“Using Mercurial as ad-hoc local version control”; Tsai, Ben;
Tracking /etc etc
For a more detailed exposition, see the mecurial tutorial:
The Hg manpage is available at: http://www.selenic.com/mercurial/hg.1.html
There’s also a very useful FAQ that explains the terminology:
There’s also a good README: http://www.selenic.com/mercurial/README
HG behind the scenes:
Mercurial Basic workflows
Mercurial BigFiles Extension
Mercurial LargeFiles Extension
Mercurial Subrepos: A past example revisited with a new technique
Mercurial(hg) Cheatsheet for Xen http://xen.org/files/hg-cheatsheet.txt
A Guide to Branching in Mercurial
Tracking 3rd-party sources
Git as an alternative to unison