I've always been a huge fan of the VisualSVN Server product and how easy it makes installation and setup of Subversion on Windows. You can be up and running with a single install in a matter of minutes. It's a very nice and neat package for source control. Best of all, it's FREE.
The one drawback with the product (and Subversion in general) is that there's no easy way to Search the contents. Sure, there's products like FishEye and SvnQuery - but FishEye costs $$$ and SvnQuery can't easily search across repositories.
Does it really have to be so hard? Well, maybe not. If you happen to have some sort of web crawler/spider software (i.e. Copernic, dtSearch, MS Search Server 2010 Express, etc etc) that's capable of crawling sites and indexing the contents - you might be thinking, "Why don't I just point the spider at the SVN root and let it crawl?". What'd you quickly learn is that VisualSVN's repository browser doesn't actually serve HTML. It serves XML and XSLT, and your browser actually transforms it into HTML as you browse. Now I don't have a lot of experience with web crawlers, but I couldn't find one smart enough to do that.
Then it hit me - why not just write a simple HTTP Handler and let it transform the XML/XSLT into HTML server-side. That way the crawler would actually have access to real HTML it could crawl.
That night I threw together a little C#/ASP.NET project doing just that - and guess what? It works great! The crawler was able to crawl the generated HTML and index the content just fine. The coding was a little tricky until I got my head around how everything basically points to the handler aspx page, but includes a querystring parameter with the true URL to render.
After I got it going and started playing with the indexed content, I realized my crawler's search page actually provides links to the cached document it received during the crawl, as well as live links to the document. I thought it'd be kind of cool if the live links could serve up Syntax-Highlighted source files, so I implemented Alex Gorbatchev's SyntaxHighlighter. Basically the Handler checks the UserAgent string and if you're a spider, it returns the plain text content. If you're not a spider, it dynamically loads the needed javascript and css files to syntax highlight the document.
I did wind up making one additional change so that when the handler serves content to a spider, it prepends the contents of the source file with the Repository location and full filename. I found this increased the hit accuracy.
The entire app is completely config file driven. Most settings are in the standard web.config file - things like the URL to SVN, UserId & Password (if needed), Paths to your SyntaxHighlighter files, etc. There's also an additional config file named BrushConfig.xml that contains mappings for all the enabled file extensions and their brush aliases. Simply add/remove file extensions as needed to enable/disable SyntaxHighlighting of a particular file type.
To try it out, just open the solution in Visual Studio, edit the web.config AppSettings to point it to your SVN root, and hit run. The solution is configured to run using the integrated development webserver so it should fire right up and you'll be browsing your SVN root. When you come to a source file, it should SyntaxHighlight as soon as you view it. The generated repository browsing pages are very bare-bones, but remember they're just for the spider. You'll actually be interacting with your Search product after it's crawled this generated content.
So after you set it up in IIS somewhere (I actually installed it on the same server with VisualSVN Server) - point your Search crawler at the Handler and let 'er rip! It'll index across multiple repositories and everything-
Comments & Critiques welcome-
interesting idea - i will try it out
ReplyDeleteCOOL!!!
ReplyDeleteYou ave really nice Blog here and I am saving it in my favorite list. Keep posting similar stuff.
ReplyDeleteI am having this error when I try to run the demo: Microsoft JScript runtime error: DOM Exception: INVALID_CHARACTER_ERR (5)
ReplyDeleteAnonymous-
ReplyDeleteAre you by any chance running IE9? If so, just out of curiosity could you try a different browser?
Thanks-
Kenneth