Dirty Data, Python & Gnip

I've been tinkering with Python (2.5) over the past few months; both in Google App Engine, and in free-running apps/processes. My initial free-running apps ingested "structured" content from a variety of web APIs, and would crash roughly every 12 hours, and would need to be restarted. Subsequently I took a "short-lived" process approach to managing these apps; cron would monitor them to make sure they were still running, then restart them if they'd died.

More recently I built an app that digested data from a known clean API (Gnip). Digesting data from Gnip ensures consistency in data format and structure. As a result, the Python app has been running for several days without issue. Now, of course the initial app crashing was due to bugs in code I wrote, but bulletproofing against broken/dirty/poorly-encoded/inconsistently-encoded data coming from random web APIs is a pain. Covering every case in modern apps takes a lot of energy. I opted for the "bounce it" strategy rather than to debug the issue (a major time sync due to variability and inconsistency; any engineer's worst nightmare).

The new application has instilled faith in Python as a choice for long-lived app processes, and reinforced how important clean input data is.

Jud Valeski

Jud Valeski

Parent, photographer, mountain biker, runner, investor, wagyu & sushi eater, and a Boulderite. Full bio here: https://valeski.org/jud-valeski-bio
Boulder, CO