NUBIC decided to do a FedEx Day (following the Atlassian model). It’s our first FedEx Day, and we had about 1/3rd of our staff build or collaborate on projects between the NUBIC software development and EDW groups. There was also unanimous agreement that it was a good exercise, and that we should do it again.
Here’s a rundown of what was built in 24 hours (in the order in which they were presented):
Project: Bundle Recorder
People: Rhett Sutphin
Bundle Recorder is a Jenkins plugin for tracking the gem dependencies used in each build of a bundler-using Ruby project. It stores bundler’s Gemfile.lock at the end of each build and provides a way to view changes between adjacent builds. While it can be used with any Ruby project, it will be most useful for gems, since you don’t usually commit the Gemfile.lock for gems.
I built this plugin to address an issue I ran into more than a few times: NUBIC uses Jenkins for continuous integration. One of the things we use it for is to perform nightly tests of our various gems to ensure that they continue to work as new versions of their own dependencies are released. Sometimes a new dependency does cause a failure — that’s good as far as it goes; it means the builds are doing their jobs. However, when a nightly fails, it’s often hard to tell what changed — by default all we have is the console output from `bundle update`. It lists the dependencies and versions, but — because we also use these builds to verify that none of the dependencies have been yanked from `rubygems.org` and so reinstall everything — not which ones just changed. And they are in no particular order. By parsing the bundler lockfile, Bundle Recorder can give a summary of just what’s changed.
Project: Natural Language Processing (NLP) Abstraction Tool
People: Luke Rasmussen, Tuan Nguyen, Thomas Elbert, Daniel Scheider
Our project was to create a web application to allow someone to go in and make annotations on text documents. Why is this useful? When you’re working with text documents in any type of automated fashion, being able to have a “gold standard” to validate against is really important. For NLP, this gold standard can help with machine learning efforts to improve the NLP engine. It can also help verify if documents have had all of the patient information removed, and annotate where anything was missed. The main goals were to create a tool that is:
- lightweight
- easy to learn and use quickly
- built on top of the UIMA framework (http://uima.apache.org/).
Making this all happen from design to development in 24 hours was quite a challenge, but a fun one. We split into a “data team” and a “UI team”. The data team worked on getting documents from the EDW so the annotation tool could access them. The UI team developed the annotation system, and the web services to connect everything together. Probably the biggest challenge was trying to tone down what we wanted the tool to be able to do (the infamous scope creep – yes, even programmers do it) to the point we could get something working in the time allowed. Getting free time to work on a pet project is probably the greatest gift a programmer can receive, and we had a lot of fun making the system work and talking about what we’d like to do with it. We’re definitely hoping for another FedEx Day in the future!
Project: tardis – your friend in time
People: Jeff Lunt
tardis is a grapher for events against a timeline. I wanted a simple way to graph arbitrary events against a timeline using the MIT Simile timeline widget. The goal was to provide a visual representation of events in web apps, for example graphing the incident of application errors vs. time, vs. other events such as server maintenance, server load, etc. The hopeful outcome is that you have a timeline that covers every level of your stack (host, VM usage, process load, web app framework and errors, user load, etc.) all in one view, so that it’s easier to correlate problems and changes in application performance with events at other levels in the stack.
It’s important to realize that typical graphing libraries, that plot numerical data across two axis, don’t really handle momentary or duration events, or if they do, they don’t do it well. That’s what makes the MIT Simile timeline widget especially useful for this purpose.
tardis can also be used to post any series of events, even if it’s not software related, via a simple RESTful API, to be documented and published in the near future.
Project: tahoe file system implementation
People: Dong Fu
Tahoe-LAFS is an open-source distributed file system project that implements the Principle of Least Authority (POLA) and Reliable Array of Independent Nodes (RAIN). By leveraging an independent pool of SAN storage devices through WAN or LAN links, the software presents end-users with a fault-tolerant and secure resource for backup and online storage. For the FedEx Day, I was able to set up a virtual machine with Tahoe-LAFS installed and configured. I was also able to register the test VM onto the public storage Grid to demonstrate its potential. Other aspects of the product, such as performance, fault-tolerance, and cross-platform compatibility will be explored at the next available opportunity.
A one-page description of the design of Tahoe-LAFS can be found here.
Project: nubic boostrap – a virtual machine build automation using Puppet
People: William Dix, John Dzak, Dong Fu
From Will: On boarding of new developers is a slow and difficult process. Installing dependencies, particularly Oracle, is time consuming to say the least. To improve this process, I wanted to build a VM with Oracle installed and set it up as a Vagrant base box (Vagrant is a Ruby gem which provides easy management of Virtualbox VMs). Using Vagrant, a developer can quickly set up a new Oracle VM on their local machine whenever there is a need.
My project was not as successful as I would have liked. Primarily because of the time required to move large VMs around, conver them to the proper formats, etc. Because of the time it took just to get the VMs in the right place and in the right format, I was not able to do other desired tasks like automated schema loading.
From John: So, my FedEx Day project was the nubic bootstrap project that I worked on with Dong and William. Initially I was thinking of working on a set of scripts that would make setting up an Oracle VM easier, but after talking with William and Dong we decided to take an approach similar to how bundler works where you specify your dependencies in a file (shown below), and run the gem command `nubic_bootstrap install`, and all those dependencies are installed. Here’s an example config file:
example nubic_bootstrap.yml
vms: oracle: schemas: [cc_pers, cc_notis] database: //localhost:15210/XE
I was also hoping to build on some of the functionality that bcdatabase/Oracle offered while building the gem. After a couple of hours of work I realized that invoking the commands I need on the guest VM was trickier than I initially thought since I needed to know information about the guest VM to invoke certain commands (ex. oracle home). As time started running out, I retreated back to the initial, more basic idea of commands to import/export databases in the VM along with hard coding some of the settings. The gem ended up being very hard coded, but it could download the Oracle VM and import a schema into it. Also, as we neared the end of FedEx Day William found a way to invoke commands inside a VM using Vagrant which could be useful in the future. I think once I have some more freetime on my hands I will continue with this project since it is something our group could use when migrating to full application environments inside the VM for projects that use Oracle.
Project: The cost of changing health insurance, to insurance companies
People: Justin Starren