The DVCS that is not a DVCS

Today, I found out about Veracity. The opening paragraphs put up my Skepticism Alert. It claims to be able to do things that git and mercurial were unable to do. It is right in that DVCS has changed the landscape, the two articles with the deepest insights on these changes is first, Forking, The Future of Open Source, and Github, written by @sogrady, an analyst over at Redmonk; the second is "Distributed Version Control is here to stay, baby", written by Joel Spolsky, a self-proclaimed holdout of the days of CVS.

However, the first feature listed for Veracity already tells me it will open up a lot of possibilities (assuming it does what it claims to do):
Veracity goes beyond versioning of directories and files to provide management of records and fields, with full support for pushing, pulling and merging database changesets, just like source tree changesets.

Years ago, right around when Youtube got popular and we saw the prolific rise of other competing offerings such as Vimeo, there was an obscure site called Its maintainer had painstakingly compiled links to TV episodes on YouTube, Vimeo, and other streaming sites. There were many TV series with complete episodes listings. I was able to watch much of Star Trek: DS9 and Stargate (yeah, I had a limited childhood). eventually got shut-down. Hulu didn't show up until three years later, and they were constrained by broadcast rights. This being the year that git came to sweep out the centralized version control system, it occurred to me that had this set of links been decentralized, it would have been much more difficult to take this down. Further, instead of relying on a single maintainer, I could subscribe to several different sources and did my own merging. A knowledge base for links to TV shows doesn't change the world, however, there are a number of datasets that would. The attempt to figure out how to implement this with git stopped my brain cold. I filed it away for later.

A year later, I stumbled over CouchDB. At the time, it had promised many things. Its unstructured JSON format is platform-agnostic. Its very architecture assuming replication and unreliable nodes implied that maybe, just maybe this can form the basis of a distributed, decentralized knowledge base. Sadly, the documentation made it seem all of this were implemented, but it wasn't. CouchDB only went 1.0 today. Further, I would have had to implement the multi-version merging yourself.

In the years since, we have seen and the Amazon Public Datasets. Tim Berners-Lee had been talking about semantic web for years, and in 2010, he was finally able to give the TED Talk, 2009: The Year Open Data Went Worldwide.

In fact, @sogrady extended the ideas of how Github and DVCS impacted the world of open-source software and extended it to the idea of datasets:
In the open source world, forking used to be an option of last resort, a sort of “Break Glass in Case of Emergency” button for open source projects. What developers would do if all else failed. Github, however, and platforms with decentralized version control infrastructures such as Launchpad or, yes, Gitorius, actively encourage forking (coverage). They do so primarily by minimizing the logistical implications of creating and maintaining separate, differentiated codebases. The advantages of multiple codebases are similar to the advantages of mutation: they can dramatically accelerate the evolutionary process by parallelizing the development path.

The question to me is: why should data be treated any different than code? Apart from the fact that the source code management tools at work here weren’t built for data, I mean. The answer is that it shouldn’t [Emphasis mine's] (@sogrady, "The Future of Open Data Looks Like ... Github?")
... but wait ... doesn't Veracity claim to do decentralized, versioned data?

According to the announcement, Veracity uses this particular feature for decentralized user accounts, as well as tags, commits, etc. It has a pluggable storage engine, so we can theoretically use filesystems, SQL, and NOSQL solutions. But again, to call this a DCVS for versioning mere source code is missing the most significant feature -- the ability to decentralize data.

We'll see what the full capability of Veracity is when it comes out.

Update: Eric Sinks comments on the decentralized database in Veracity --



You can tell when an artist makes big motions,
and his skill when they are small,
but sometimes I wonder what happened
to those I can't see at all?


Intangible Assets and Sunk Cost Fallacy

I once read Jerry Pournell's series centered around Falkenberg's Legion. There, he describes how people in that time no longer research Computer Science. Software developers never wrote software. They piece together building blocks. This was partly due to the dominant polity forbidding anyone from researching computer science. However, their grip among all the colony worlds were slipping, and condoterri mercernary groups emerged from the colony worlds as colonial exports. One such group came from a world conducting computer science research on the sly. Their mercenaries came to be well-known as technoninjas, slipping into the software that largely remains opaque to conventional experts. Most other colonies stuck to piecing together software building blocks put together by these experts. Apparently, a job at putting together digital Lego pieces was still in demand.

When I first read of this, it did not sit well with me. In my youthful quest to gain the upper hand through technological innovation and planned obsolescence, it never dawned on me that I might become one of those obsolete fogies. This idea was a subtle crack in my armor of adolescent immortality. So instead, I kept searching for that next thing. My specialization into Ruby on Rails started because I saw a demo of creating a blog in 15 minutes; my adoration of Ruby allowed me a technical expression of code unsurpassed by any other imperative, non-functional language I have worked with. I didn't want to admit that I gravitated to Ruby and Ruby on Rails because I thought I was smart, and despite marketing promises of the Rails brand, you do need to be a cut above average to write Ruby well.

As I grew older, though, I also came to accept what is, is. Pournell's idea is now palatable and insightful.

I've read (and misplaced) an article that sounds crazy on the surface. The author compares the craft of programmers to typesetters. Earlier this century, typesetters was a respected crafts profession. Word processors and automatic kerning has made that obsolete, despite the fact that the word processors don't do nearly a good of a job as typesetters. They were good enough. My last encounter with a wordprocessor is on my office computer in the form of Microsoft Word and on the browser with Google Docs. My last encounter with a typesetter was a year ago, reading a flyer for a typesetter in the Avondale area of Atlanta.

This author continued with how a technology might make a programmer obsolete. In his scheme, a user might specify an input and an expected output, and the software will self-assemble out of pipes to get to there. It is horribly inefficient compared to code hand-crafted by a competent programmer. There would be a lot of wasted CPU cycles as data flows down dead-end. But this scheme is not inconceivable. Our current technology base is already moving towards there, with standardized data formats (XML, json) on top of standardized communications protocols (HTTP, RESTful architecture). Declarative-style programming, such that you might find with Ruby DSLs, Erlang, Lisp, Haskell, are coming back into vogue. NoSQL, unstructured databases are now slowly eating away at SQL. Graphing languages such as Gremlin uses XPath on a generalized graph, might as well be a reincarnation of Prologue. We're not there yet, but we heading towards there.

I've dug up references to Buckminster Fuller speaking about the acceleration of innovations. He said, the speed at which we innovate will grow faster and faster; at the same time the size of innovation will grow smaller and smaller. What were vacuum tubes became silicon. The key thing though, is that innovation will become so small, it becomes invisible. When innovations become invisible, innovations start accelerating faster. It becomes pervasive.

In our current technology base, nowhere is this process more evident than with Cloud Computing. Google proved the concept of using massively distributed commodity boxes that you can throw away. Xen's paravirtualization techniques kickstarted the competition with VMWare and Virtualbox, efficiently abstracting hardware into software. Amazon took it to the next level with a public API that allowed you to create and take down boxes. Many competitors have sprung up with their own offerings, to the point where we've had to invent a whole slew of jargon just to talk about these technologies.

And now in this past year, Chef has taken this to the next level after that. Chef lets a programmer develop infrastructure as if it is software. No more rack-and-stack. No more creating custom scripts to bring up Amazon EC2 instances. No more creating custom scripts to configure the operating systems inside those instances. Since the recipe for each environments is declared, they can be recreated quickly. They become disposable. Your choice of Linux distro no longer matters because Chef, not you, ultimately set up those boxes.

Chef abstracts the infrastructure. It makes this infrastructure invisible. Chef or its successors will eventually drive most system administrators out of a job, since it is often easier to teach a programmer basic system administrations than it is to teach a system administrator how to develop software. There would still be a need for rack-and-stack in cloud datacenters (unless someone builds a rack-and-stack robot), but for the vast majority of future startups, they will use some form of Cloud Computing in their infrastructure.

Further, this is all happening behind the scenes. Corporations now are slowly adopting Cloud Computing. By contrast, most web-technology startups are deploying to the Cloud, and Chef has been rapidly gaining ground. Finally, as the library of Chef recipes grow and come out into the wild, the people coming in after the pioneers will begin piecing together their infrastructure by piecing together pre-made recipes rather than writing their own from scratch.

Chef and its recipes are what Arnold Kling and Nick Schulz describes as an intangible asset in their book, From Poverty to Prosperity. The benefits of intangible assets are more difficult to quantify than tangible assets. Chef lowers the barriers of entry for new innovations and new startups. Entrepreneurs focus on getting their product out the door and solve difficult problems. And if their product fails in the marketplace, the entire infrastructure can be thrown away because it is the blueprint that matters and the blueprint may be reused down the road. Chef increases the adoption efficiency.

Or put it in another way, Chef is a technology that lowers Sunk Cost and hedges against the Sunk Cost Fallacy.

In my martial arts training, I was taught to extract principles and not let myself get distracted by techniques. Principles may be creatively applied to many different situations and different arts. Techniques are specific to circumstance and to a particular art. The difficulty of technical mastery is such that people often fall prey to Sunk Cost Fallacy. We're only human. In the martial arts, Sunk Cost Fallacy is a psychological weakness one exploits in others.

Sadly, even though I "knew" this, I had fallen prey to Sunk Cost Fallacy when it came to my software development career. Writing Ruby code allows me to the pleasure of technical elegance. There are many apps I would like to pay others to write, yet I feel anxious about outsourcing them. I don't want to outsource the code out to India because their code tend to suck. I'd rather outsource them to someone I knew who write code at least as good I as, or better. They're expensive, and they don't scale. Yet, when you come down to it, how many times do I have to write CRUD backends before I get tired of writing them?

Ruby metaprogramming, Ruby duck-typing, code generators, ORMs, MVC architecture, RESTful controllers, mountable Rack applications, Test Drive Development, Behavioral Driven Development, rspec, Cucumber, Agile, SCRUM -- those are all techniques. The singular principle of the Ruby on Rails platform is this:

"Developer time is expensive. Computer time is cheap."

Rails and Chef were built from that principle. Entrepreneurs, such as Tim Ferris and Robert Kiyosaski know this principle, if expressed in a different way. Software -- the full nuance of intangible assets -- are costly to produce, yet once created, infinitesimally cheap to reproduce. Business systems, contracts, intellectual property, branding, positioning, social norms, rule of law, government impartiality, are all examples of what Kling and Shulz consider intangible assets. I didn't apply this principle to software development because I had fallen prey to Sunk Cost Fallacy. I had spent so much time improving my skill at software development that I lost sight of this principle.

I recently completed a project where I rewrote a small backend for an online web company. They publish information related to mortgage lenders. Their original backend sucked. Seeing this as a way to exercise my skill at software development -- and to play with the Rails 3.0 Beta release -- I spent a week rewriting the whole thing with Rails 3.0. I spent a lot of time polishing the code, even though it mostly amounted to managing minor data about mortgage lenders and the deployment environment was messy. I was polishing a turd. I spent a total of 40 hours on the whole project, including deployment.

All of that applies technique and not principle. Even though Rails has a lot of convenience abstractions, I still spent a lot of time wiring things together. I noticed many common patterns emerge. The lender have several children attributes, depending on the type of lender. I've written this stuff before and ganked the code from there. One piece of code that did not require much modification worked after I cut out code. I originally wrote it fairly clean, so it slipped into place with minimal fuss. I felt cheated, rather than satisfied. Surely, this should be more difficult?

As I think on the ideas in Kling & Schultz's book, I thought I should develop something to make me obsolete. If I had a system that could create CRUD code in the way I like it, I can spend more time focusing on the entrepreneurial aspect. I might even outsource it to India, if the system was robust enough. In fact, I considered a DSL that described a CRUD. This system would generate via metaprogramming -- and not writing files onto the disk -- all the necessary component. This is similar to how Chef works, using JSON "dna" strands merged into a recipe which Chef then executes to assemble the infrastructure.

One use-case is a CRM. CRM systems are tightly coupled to business practices, and as such, it is difficult to write one that fits all businesses. One of the better Rails CRMs I've come across is FatFreeCRM. I had attempted to use that for a real estate investing business I started, and quickly came to the conclusion that I could not use it in that way. I later moved to 37 Signals High Rise, and though it is much better, it still does not allow me run my business on there the way I want.

If I had a CRUD toolkit that lets me declare exactly what I want, then I can put together something that fits my business needs. In fact, I liked that idea enough that I started looking around for CRUD generators. ActiveScaffold was not what I needed. Streamlined, and AutoAdmin no longer appears to be actively developed. Hobo looks great. It looks flexible enough to do what I want it to do.

Funny thing is, I had looked at Hobo a while ago and dismissed it. I was under the spell of Sunk Cost Fallacy. Why would I ever want to use this system when I can hand-write better CRUD? And now, when I look at it from the lens of principle, I find that despite Hobo's capability, you could not declare the entire CRUD in a single file. It still depends on code generators -- the Ruby code generators, not Lisp code generators.

Maybe the Hobo developers have Sunk Cost Fallacy stashed away too.