Sunday, November 30, 2008

_The_ bubble has burst.

Do you think the "Internet bubble" pop was bad? Assuming so, you haven't seen anything yet. The recent investment banking collapse will change our lives for decades to come. I've been trying to coax out a blog post on this topic for a few months now, and Condé Nast's Portfolio magazine, which recently ran "The End" by Michael Lewis, finally inspired me.

The 1990's were so fun!
The net bubble was a function of greedy analysts & company insiders pairing up with investment bankers to jack up IPO prices for newly minted securities in newly fashioned industries (online advertising, various Internet driven technologies, the PC business, etc). The result plowed large sums of money into the hands of many, and consumer spending took its queue, resulting in big ticket items flying off the shelves (jets, 2nd mansions, $100k watches, fast cars, etc). Many became "millionaires," and if you weren't one of them, you at least knew a few first hand.

2nd Verse; the bubble bursts
When the reality of the situation set in, and folks realized the Emperor had no clothes (you can only support massive P/E ratios for so long before you have to illustrate at least some value), the tech sector cratered, taking it's periphery with it. How quaint this mere $5 Trillion wipe out turns out to have been. Not one to go down with out a fight, the Fed dropped federal fund rates, to keep credit markets cranking and consumers spending. The masses, not wanting the memory of free wheeling spending to fade, saddled on more debt in order to keep spending. Money was cheap and this time 2nd mortgages flew off the shelves, along with home equity "secured" lines of credit. The mortgage industry was on a tear, and "regular" 1st morgtages weren't going to be enough to keep things raging; there are only so many homes you can build and buy. American minds were made up, and we were going to continue spending come hell or high water. If our bloated salaries and cash from equity stakes in "successful" investments during the Internet Boom, couldn't float our spending consciousness, mortgage backed debt would!

3rd Verse; popular leverage

Wall Street has always led our financial, and cultural "success" measure, thinking. Big spending bankers, traders, and brokers have had their place in pop-culture for decades. The masses watched in drooling amazement as finance industry employees were bonused beyond recognition, and spent lots of money on toys. We eat the stories of $10m dollar birthday parties up like turkey on Thanksgiving. For a time only the big kids on Wall Street had access to the real money; the kind that few could actually understand how it was created. Then, overnight, the perfect storm of greed (hungry Wall Street execs), derivative innovation (CDO creating quants), and deregulation entered the room; she was beautiful, single, and everyone wanted a piece. Derivatives have always been a neat trick on Wall Street. Leveraging one equity to create another, in a "side bet" manner, is a model that has been around forever. They've always been hard to explain, but until the mid-1990's, the common man could get their head around them with enough explanation and description. Too many abstractions away from the original asset however yields mysterious confusion, and wool can easily be pulled over another's eyes. It was the employee's turn now though; step aside CEO! Relative peons were creating new mortgage derived securities by the dozens, and selling them like hot-cakes to buyers with massive bank accounts (e.g. investment banks, state pensions, school districts, etc). The equity markets couldn't satiate the post Internet bubble appetite that bankers had worked up. They needed a bigger market with tighter focus. Mortgages and associated bonds were the new game in town.

Before anyone knew it, hundreds of billions of dollars in securities were being repackaged (as new derivatives (e.g. CDOs)) and re-rated at ratings higher than the underlying securities' ratings. That was the real trick! Bankers, and corrupt/lame rating's agencies, were turning shit into gold, and selling it back to every industry you could imagine. One thing to note here, Moody's (the bond rating firm) is 20% owned by Warren Buffet. I've forever admired Mr. Buffet but this new awareness has caused me to take a second look at him on my "most admired" list.

Unlike the Internet Boom's relatively scoped collapse around the technology industry, Wall Street had infected one of the largest finance vehicles known to man; mortgages ($15 Trillion worth in 2008). Mortgages, particularly during a housing/interest rate boom, comprise the underpinnings of the private financial industry. They are hugely leveraged debt vehicles. Think about it, you pony up say 20% of a speculative value in a down payment, then pay the rest off over a few decades; crazy! Undermine mortgages, and bad things happen. Enter present day; here we stand, wondering what's next. The money the commoner has invested over the years has shrunk by a full third on average. Nest eggs have cracked. Banks aren't able to lend money for the foreseeable future, and Americans can't buy houses like they used to. Our personal wells of vast amounts of money for consumer spending have dried up.

Now What!?

Obviously no-one knows, but I predict a rather harsh reality in the coming years. Jobs will be lost. Homes will continue to be foreclosed upon. Housing inventory will shoot through the roof, and their prices will fall. Stratification will find its way back into society; as the "haves" and the "have nots" will become more apparent, now that "having" will be more of a function of one's ability to raise capital/earn money, rather than one's ability to plow a hole into the ground with personal debt. The great normalizer, consumer debt, will be reeled in, and things will get weird.

Our banking heroes have fallen, and I wonder who will take their place. One thing is for sure, free-market capitalism always finds a way.

Saturday, November 22, 2008

Anatomy of a shared memory node failure.

After my previous post about teams, I quickly received several requests to provide more details. Voici!

Gnip's redundant; you can walk up to any of our cloud instances, vaporize it, and Gnip chugs along its merry way. We use shared memory (via TerraCotta) to replicate memory across nodes. As you can imagine, shared memory across network nodes isn't all that cheap. Just like anything else, when its over used, things can melt down.

One of our customers started injecting hundreds of thousands more actors into their Filter rules than we'd tested for in a long time (or... ever, in the true production environment (there's a "you can never actually replicate production conditions in your staging/demo/review environment" blog post brewing in me). This caused one of the nodes to start working really hard to build the objects to support the additional actors. In turn, TerraCotta had to keep up its replication, going on its own merry way. The number, and size, of objects we were asking TC to manage (across clients, and three TC nodes as well (one primary, one secondary, and a third for good measure) caused too much lock contention across the system, and TC clients started dropping (heartbeats couldn't be kept up between clients and servers) because they were spending too much time processing locks. Once a TC client drops out of rotation, it has to be bounced in order to reconnect to the TC server. (in shared memory situations, you can't let your objects between client and server get "too" far away from eachother, otherwise you have bigger problems).

So, a node was dropping out of the TC network, we'd bounce it, it would come back up, try to recreate all the objects again, and crater. We'd restart it, it'd come back up.... rinse repeat, rinse repeat. Viscous cycle.

We resolved the issue by dramatically (several orders of magnitude) reducing the number of objects TC was managing in this code path. We optimized the object model to only keep the bare minimum in TC in order to keep our cherished clustered approach; the rest of the state stays put in local VM space, and is not shared.

There were other side effects floating around which got cleaned up in the process which was nice. We reduced some function call times from 45 minutes at their worst, to 45 seconds. We reduced our TC data set size from 16G to a few hundred meg. In the process, we also upgraded to TerraCotta 2.7 which further reduced in-memory, and on-disk, data set sizes.

Teams, Rookies & Vets

A week ago, almost to the minute, the following message was generated by an internal Gnip monitoring server, and sent to the person on-call.

** PROBLEM alert - production-head2/production-head2-gnip is CRITICAL **

It was the start to a very long, non-stop, few days at Gnip.

Much of the rapport on your team gets defined in moments like this. Your team's ability to solve hard, live, problems is thrust into the foreground. About five hours into the ordeal, my appreciation for having focused very hard on bringing software veterans into Gnip was peaking. The problem was being sliced and diced, and the collective experience of everyone on the team was winnowing things down quickly. "It can't be that!" "It must be this!" "I think we should focus here."
  • Step 1: we isolated the symptoms. exactly what was going on!?!
  • Step 2: we checked configurations/environments
  • Step 3: we identified potential code inefficiencies
  • Step 4: we verified probabilities
  • Step 5: we placed a bet on what we thought the problem was, and wrote code to address it
  • Step 6: we watched our hard work pay off; production issue resolved; it was the right bet
Knowing which bet to place comes from experience. The only problem with experience at a startup is that it can be expensive. Like so much in life, you get what you pay for. Had Gnip been tilted toward relatively in-experienced, in-expensive, junior team members, what turned out to be a production blip, could have been a true nightmare for the company.

Glad that problem is behind us, and we all have a nice new chunk of experience to put into the bag of tricks for future use.

Friday, November 7, 2008

Load me, tease me, please me.

Someone kicked computer start/restart/boot times into the media again. Last time I saw this much mainstream media coverage on this topic was a decade or so ago; before I had turned into a Mac user. To cut to the chase, if you're tired of booting/restarting your computer, just buy a Mac which doesn't need rebooting much.

The conversation is akin to a problem we faced at Netscape many moons ago; software load times. When we were battling it out with Microsoft's IE browser, we kept adding features, and hence size, to the binaries that had to load for the app to run; it became a problem. It became a real problem considering more and more of IE code was being baked into core operating system libraries and components, which are loaded into memory at boot-time, not when you fire up IE. Because of the way Netscape was loading it's code, it was much slower to "start" than IE (orders of magnitude slower). Our response was to start loading base libs at OS start; in effect "pre-loading" all the code before the user fired up Netscape. It worked great; solved the problem.

The same model applies to operating systems and their state. If you're having to un-load (shutdown) and load (startup/reboot) OS libraries and components, guess what, you're doomed to incredibly slow startup times. I see a few solutions to this:
  • Run/use an operating system that doesn't leak enough memory or lock-up often enough for a "long boot time" to be an issue. If you're restarting your computer once a week (or more), the start time really matters. If you're restarting your computer once a month, it doesn't matter so much. Switch to Mac. Why *nix based OSes don't crash/leak as much as Windows is a topic that's beaten to death; I won't beat it more here.
  • Build OSes that don't need much code to start.
  • Dynamically load libs on demand (this is related to the previous idea). This one helps, but reality is lots of code needs to load to bring an OS up, so it often doesn't buy you much.
  • Build faster hardware; faster memory, faster solid-state drives, etc. This one's expensive and takes big science brains (rare).
Unless the computer industry tries to come up with new separations of church and state between hardware, operating system, and apps, some combination of above is really all you can do. I do see the need for such feature reach operating systems going away with time. We use probably 5% max of the code that actually comprises an operating system. Specialized devices will eventually win as computer use cases continue to winnow. Most consumers don't need computers that can run database applications, email clients, graphics intensive games, etc.

Now that I wrote that, I think that's the path. 20 years from now, today's OS will be a relic. Hardware will be specialized, and hence its software will be too. Mobile phone OSes are a good example of this kind of evolution.

Wednesday, November 5, 2008

From Defrag to Glue; simplicity

I was on the last Defrag '08 panel yesterday and I had a blast! We were talking about "glue" as a promotion of next year's Glue conference. Geir Magnusson and Aaron Fulkerson were on the panel with me and we had a fun conversation, well stoked by moderator Seth Levine, around the new way to build apps and the various glue components that keep those apps together.

It was a joy to hear Aaron talk about how they've built Mindtouch. It struck a cord with me as it's precisely how we've built Gnip; simply. Bare metal HTTP/REST (we've even built a custom, lightweight, REST layer) is where it's at! Be wary of heavy frameworks to scaffold all of your business logic. Write the code you need, and run it.

The conversation evolved into some SOAP bashing, and how apps should/will be built going forward; always a fun conversation.

The conversation got me excited for GlueGon. I can't wait to dig in even more!

Monday, November 3, 2008

By default...

Software is all about default behavior. From pure infrastructure plays (e.g. Gnip) to general population consumer facing applications (e.g. Apple products).

Whether you want to call your software's configuration, "preferences," "settings," "configuration," or "config," they're all the same, and the choices you make about their default values define your product, and how your product will be used by the vast majority of your users.

If you screw up a "default" setting, from whether or not a calendar entry should "remind" the user that it's there, to how XML is generated, you will find that out when your users engage with your product; or not, as the case may be.

Coming from Netscape/Mozilla, one of the most configurable products known to man (just type "about:config" into your URL bar to see what I'm talking about), and now being a heavy Apple product user (where giving the user options is considered a design flaw), I'm trying to strike the right balance between default behavior decisions and giving end users control of Gnip.

By default, do the right thing.