Wednesday, July 8, 2009

Sarcasm and Sentiment Analysis

Sentiment analysis of digitized content (tweets, email, blog posts, etc) is hard. Sarcasm makes it even harder. Consider how many sarcastic comments are made in our online communications each day. "I love being delayed at the airport." "I can't stand it when everything is going my way." etc. Analyzing text like that has got to throw even the best sentiment analysis engines for a loop, and the false positives start flying.

If you're sarcastic, like me, you've learned to keep your sarcasm to a minimum when you're writing because the context just isn't there for your reader, much less a machine, to understand the subtle shifts in tone or where you're coming from.

I'm looking forward to sentiment deduction getting better, but I'd like to see how the logic evolves to understand age-old sarcasm.

Maybe we will all just stop being sarcastic to support the machines running our lives.

Thursday, July 2, 2009

Speed Date with Google App Engine

I assumed that in the months since the announcement of Google App Engine, that its glaring HTTP client deficiencies would have been resolved. Nope.

Any modern platform needs a robust HTTP client (timeout controls, full method support, custom headers, compression support, authentication support, and redirect handling). Unfortunately, GAE's urlfetch client (which the standard Python HTTP clients all funnel down to) doesn't let you tweak various headers (including Referer). Nor can you customize the connection timeouts. Both of these tweaks are tools of the modern day web services programming trade. Subsequently, I have to cast GAE to the tinker toy pile with the rest of today's high-level web apps. A quick look at the app repository proves this out sadly.

On the other hand, take a look at what's been built on Amazon's AWS. Goes to show what you can do when you have an open platform.

That said, GAE does show promise for hosting simple user facing web applications, or offline data crunching/hosting apps (ala google.com) with quick user facing response times and little reliance on the outside web.