A Fresh Cup is Mike Gunderloy's software development weblog, covering Ruby on Rails and whatever else I find interesting in the universe of software. I'm a full-time software developer: most of my time in recent years has been spent writing Rails, though I've dabbled in many other things and like most people who have been writing code for decades I can learn new stuff as needed.

Currently I'm employed as the Vice President of Engineering at Faria Education Group. If you're interested in working with me, we're often hiring smart developers. Drop me a comment if you're interested or email MikeG1 [at] larkfarm.com.

Navigation
« Double Shot #415 | Main | Double Shot #414 »
Monday
Mar232009

Batting Clean-up

I've spent a lot of time over the past few years working with Rails projects that were written by other people. Sometimes I've come on as a subcontractor to an existing codebase, sometimes I've taken over when another developer got bored or fired, sometimes I've been asked to do a code review.

Over the course of these engagements I've come up with a strategy for getting up and running on a new-to-me Rails codebase quickly. There aren't any hard and fast rules; there are still a lot of variable factors. But overall, I find these guidelines useful:

Start in environment.rb and figure out which version of Rails the project needs. In my case, I'm working with everything from 1.1.6 to 2.3.2.1 at the moment, so using gem Rails is pretty much a non-starter. If a project doesn't have vendored Rails when I get it, it will as soon as I figure out what version it wants to see. If you do vendor Rails into a shared project where others are working with gems, don't forget to .gitignore vendor/rails.

Next I look at bringing the database up from scratch. If at all possible, I ask for a copy of a current production database to avoid doing this. If I can't get one, I'll start by running migrations, just to see whether the migrations have been maintained. If migrating from scratch blows up, there's always schema.rb.

Next comes searching the code for require to see if I can figure out what gems the project needs (in rare cases, there are gems specified via the gem.config route, but so far I'm not seeing much of that). I install and upgrade any gems I can see as required, and then try to actually run the project via script/server. Usually this fails a few times as I discover missing dependencies, but I like to give it a few tries so I can have the code up and running while I explore it.

I spend a few minutes exploring vendor/plugins to see what non-gem plugins the project depends on as well. If it's using any that I'm not familiar with, I check out the readme - assuming there is one.

With the app running, I turn to the MVC heart of things. Here, the models are my first stop. If there are a reasonable number of models (say, anything under about 30) I just start at the top and look at each one, making particular note of association declarations. I haven't found an automatic ERD tool for Rails that I like, so I use this information to sketch out an ERD by hand, sorting out how the major entities connect to each other.

Next for me is routes.rb. Looking at this file is usually a good way to judge the sophistication of the previous developers. It also gives me some URLs to try out on the running code to see what happens. After I understand the basic routing, I'll spend some time in controller code, looking to see if it seems unduly fat or otherwise confusing.

Finally I'll spot check some views and helpers to see how clean the code looks. Usually I don't try to read all the views, though, unless the first couple I touch on show me systemic problems.

By the time this process is done, I usually have a rough handle on how the code is structured. Combine that with some exploratory use of the running application on my local box, and Rails' conventions, and I can go in and find the code that needs to be improved, evaluated, or fixed.

Reader Comments (5)

Greetings,
This is part advice, and part griping...sorry for the latter. :)

Funny you would mention this; I've just started maintaining and trying to improve a Rails app developed by an 'outsourced' group. The only tests were the ones generated automatically by 'restful authentication', and they were never maintained, so they didn't come close to passing. Swaths of the program are written in terribly complex (and sometimes computed) SQL, migrations didn't bring up a fresh database (poor use of acts_as_enumerated causes great hurt), and vendor/plugins should have just had one named 'kitchen_sink'.

It hurts to see Rails abused like that; you want to take the poor application under your arm and say, 'It'll be okay...we'll add some tests and get you right as rain in no time!', but you know you'd be lying...

I did much of what you describe (half the gems it used were config.gem'ed, the other half weren't), vendor'ed rails (it breaks on newer than 2.1.0), and brought the development database kicking and screaming into life. There was no schema.rb, it had been .gitignore'd, and the migrations added data, used models, and everything else you can imagine doing wrong. (Including using a field on a model after adding that column in the previous line...I don't know what version of Rails that ever worked on...) I didn't want a production database; who knows what's been done to that by hand. I want to know what the database is _supposed_ to look like; I can figure out the difference with production later.

Once the clean (only data inserted by migrations) dev database was up, I brought the site up to see if it worked. Surprisingly enough, it did; apparently they used manual QA as their only testing methodology. I appreciate their QA a lot; it means it's a working application, even if it's not going to help me refactor it.

I ran flog and flay and looked at the pain points they found to get an idea how bad things might be. I picked an innocuous join table (with some extra data and functionality) to build the first set of tests for, which gave me insight into both sides of the join without having to REALLY dig into the ball of fur on either side. I viciously stripped all the 'test_truth' tests. I looked for large files that flog and flay hadn't picked up to pore over. Check out custom rake tasks, because those often are clear stories and easy to quickly understand in a small context.

Checking out the deployment process tells you a lot also, although it turns out this was stock engine yard capistrano.

Skimming views (sort by size!) will tell you a lot also, especially when you find SQL queries being run in them...

Use the site for a little while, and watch the log in another window. Just let it skim by; if you've looked at log files much, things that seem wrong will jump out even if it's going faster than you can really read.

In my case, the code's mine now, so it's my responsibility to make it better before anybody else has to touch it. I've got about a week of 'free fix-it-up time' before I need to start actually implementing new features and (thankfully) stripping out old ones... At my previous company, I was the guy pushing folks to test, now I've inherited a codebase with zero tests. Poetic justice, I suppose. :)

Anyhow, good (and timely, for me) article, and I hope my suggestions of other things to look at outweigh my griping. :)

Good luck!

-- Morgan

March 24, 2009 | Unregistered CommenterMorgan

Morgan,

what alternative do you suggest for using models in migrations? I was in several situations where I had to not only change the underlying db structure but change the contained data, too. I recognize the problem (especially when running older migrations with a newer code revision where models have changed), but won't declaring the model inside the migration help with that?

March 24, 2009 | Unregistered CommenterHenning

Greetings,
Henning: Data changes, especially moving data around, are almost always rake task-worthy in my experience.

The other side of that, populating large amounts of seed data into new databases, is a difficult task no matter the method; seed_fu attempted to deal with it, but it's not an optimal solution and pretty old. I'm not even sure if it works anymore. It's worse if you need the seed data to be from a legacy database in tests (e.g. a nutritional database). Reloading lots of data each time a clone_structure_to_test is done makes your tests very slow.

I break down migrations into three kinds; structural (tables, columns, indices, etc.), data (pre-populating tables, etc.) and procedural (moving data around, recalculating counts, etc.). The first is what I strive to limit migrations to. I feel like there should be a good answer for the second, but I haven't found it. The third, I try to relegate to rake tasks that are usually run once, on deployment of the branch.

The procedural tasks don't need to be run when building a fresh database, because there isn't legacy data to correct. That's why you can usually define the model in the migration to force it to work even if the real model is gone or renamed; there's no data, so the operations often don't matter. If they don't NEED to be run when building a fresh database, I try not to put them in the migrations.

It's not 'hard and fast', because I usually work in startups to small companies, where dogma doesn't work so well. Imagine, though, a large and thin piece of foam. It's flexible, and you can make it into all sorts of shapes, and yet it's simple. Each time you add code that makes reasonable changes in the future painful, it's like putting a thin glass rod into the foam. It's still flexible, but there's some bends you can't do without breaking. Add too many and you've got an inflexible and brittle object, no matter how dynamic the base material is.

The fear of breaking things by changing the code is deeply demotivating for everyone.

I know I waterboarded that analogy, but hopefully it makes sense...

-- Morgan

March 24, 2009 | Unregistered CommenterMorgan

Hi,
has everyone experience with ongoing refactoring strategies? At the moment I work a project where I have just a few hours per week to do any refactoring and not the time to refactor everything what needed in once.

Thanks for any advice.

March 24, 2009 | Unregistered CommenterMartin

For seed data, I'm currently using http://github.com/ffmike/db-populate/tree/master .

March 24, 2009 | Unregistered CommenterMike Gunderloy

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>