windows 7 - Can I use version control (mercurial) with files distributed in different directories?

25
2013-11
  • Martin

    I want to keep track of files I create when taking notes (mostly text files). I am working with a Notebook in a corporate environment on Windows 7.

    At this time, I have one directory for such files in my Windows user profile directory, so the files are stored locally (and should be synced to the server profile).

    My problem:

    Some of those files shall be shared with coworkers. This is not possible at the moment, as they do not (and shall not) have access to my user profile.

    I could/should store those files in server directories, but then I don't have them with me when disconnected from the corporate network, I can not easily search them (fulltext), as they are not any more all in one directory, etc.

    I'm already using Mercurial version control (TortoiseHg) for this local directory to create backups (and for beeing able to get back to an earlier version, if necessary).

    Would it be possible to add single files located in other server paths to my version control (all in one repository, even though they are not all in one directory on my hard disk?)?

    My Goal:

    • I can save my text files where I want/must in the network directories in our intranet
    • I can add all of them to one mercurial repository, which lets me
      have a local copy of the file and its version history on my hard disk
  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    Version Control for MP3s?
  • Electrons_Ahoy

    I've got a lot of "binary media", which I'll abstract away as "MP3s". I've also got several computers that I'd like to have the whole library on - a desktop, media box, a laptop here or there, etc. In short, it would be nice to be able to sync all these machines with each other such that they all have the same stack of files.

    A Version Control system, as opposed to an rsync/robocopy lashup, in the rough sense seems like the way to go. First, there are several OSs involved (Windows, Mac, Linux flavors). Second, it would be nice if when ID3 tags and such are updated, the system could just update the file delta, not re-copy the whole file. (Finally, being able to update the library over the internet, rather than the lan, would be very cool.)

    But your classic CVS/SVN system has the obvious drawback of needing a full repository to work, and I'd really rather not have two copies of my 60gb+ MP3 folder sitting on a machine somewhere, as well as not traditionally dealing with binary deltas very well.

    So, Distributed Version Control starts sounding pretty good at this point. Mercurial, git, and bazaar all look good on paper, but I don't have any experience with any of them. Has anyone tried to set up a "binaries-only" DVCS with any of them? Any recommendations? Pitfalls?


  • Related Answers
  • Ludwig Weinzierl

    But your classic CVS/SVN system has the obvious drawback of needing a full repository to work, and I'd really rather not have two copies of my 60gb+ MP3 folder sitting on a machine somewhere, as well as not traditionally dealing with binary deltas very well.

    With CVS/SVN you have one repository, and several working copies. So the repository contains every file once plus the whole history for every file. The working copy contains every file once plus some additional data per file (usually approx. the size of the file).

    Very roughly: Let's assume our revision control system cannot store diffs of binary files efficiently (not really true, but for simplicity). Your collection is 60 GB MP3 files. If you have 10 revisions per file on average and we neglect compression (because MP3s compress bad) your repo will be ca. 600 GB and your working copy ca. 120 GB.

    So, Distributed Version Control starts sounding pretty good at this point.

    In a distributed system every working copy is essentially a repository, that means every working copy contains every file plus history.

    Same assumptions as above, every copy will have ca. 600 GB.

    Bottom line is, distributed system will require more space than centralized.

    EDIT:

    Even if your question is more about a large number of binary file than large binary files in version control the following post might be intersting: Revisiting large binary files issue.

  • The How-To Geek

    This isn't really an answer to your question, but I've started using DropBox for the same purpose. It's cross-platform, and you can get a 100GB account if you don't mind paying a little more. It also stores revisions to files, very similar to source control.

  • Ryan Bolger

    The problem with trying to shoehorn version control systems into file synchronization systems is that you'll end up wasting a ton of disk space keeping all the old version history data in the repositories.

    Personally for my large binary media collections, I don't care about being able to revert changes to any given file. All I care about is that the collection is synchronized between my systems. There are many file synchronization solutions out there, but they all have their various pros and cons. Some claim they're cross platform, but that only means Win/Mac. Others really are cross platform, but don't have large enough file size/quantity limits to be useful for large collections. Some offer web access to the files, but also suffer from the file size/quantity limitations. Any solution that keeps a copy of your files on a 3rd party server is inevitably going to cost you money if you have a large collection of files.

  • Oskar Duveborn

    Not really an answer, but I thought I'd share. I've started using SVN for my HD video projects (like events and weddings where the result is a heavily edited video). This is starting to become really awesome for several reasons.

    Usually a video project contains a few or perhaps tens or even hundreds of GB of raw AVCHD files (most just a few hundred MB each though since moving from DV tapes ;). These are added and committed once and then never changed as all the work is then made on (very small and often text or xml-based) video editing software project files, some still images (which are sometimes but not very often changed) and various other descriptor files.

    Tagging and naming of clips are also stored in the project files and not added to the actual raw video files which makes this ideal. Say a project repository database starts at 10 GB it will usually end at 11 GB and consist of ~100 revisions. The rendered final result in various formats is of course not stored in the repository at all, as it can always be re-generated.

    As mp3s in particular store their metadata in the actual mp3 file this will present much more of a challenge but according to this stackoverflow question subversion might handle this decently in the end as id3 tag data is stored at the beginning (or v1 at the end) of the file. However, as v2.x can be any length - I have no idea what happens if you add additional tag data - if the file will grow larger and perhaps mess up the delta comparison, worth testing...

    And storage is cheap - only 60 GB? Get a few 1 TB drives for the repository and be done with it ;)

  • STW

    Windows Vista & 7 offer Shadow Copy / Previous Versions. It's definately not as feature-rich as a true source-control provider but does give you some of the benefits. As others have said the storage required to house multiple revisions will likely be fairly massive--depending on the size of the files.

    The free and popular SCM's are all so-so at the task. SVN for example will work fine, but the repository will quickly grow and the local .svn folder will be quite large as well.

    When all is said and done you might want to consider simply copying the whole lot of files to a safe place prior to making any large changes to your collection; when you're actually using MP3s in a normal day-to-day fashion there's not much reason for changing the files and the expense of having a revision system watching rarely-changed large binary files seems hard to justify... but if you're set on it then SVN at least does binary diff's, CVS does full copies (much larger)