Ruby on Rails books and screencasts

How to Write Rock Solid Rake Tasks

by Brad Bollenbach

As developers, we spend a lot of time thinking about and debating software best practices. We espouse the benefits of small classes made up of small methods, clean separation of concerns, clear, precise names for things, and the importance of writing tests before writing actual code.

But all that seems to fly out the window when it comes to rake tasks. Crack open your lib/tasks folder, and you're almost guaranteed to find stuff like this:

desc "Clear out stale print job pdfs"
task :clear_stale_pdfs do
  pdf_dir = ENV["PDF_DIR"]

  if pdf_dir.blank?
    raise ArgumentError, "missing required argument PDF_DIR"
  end

  unless Dir.exist?(pdf_dir)
    raise ArgumentError, "directory #{pdf_dir} does not exist"
  end

  num_removed = 0

  Dir[Pathname.new(pdf_dir).join("*.pdf")].each do |pdf_filename|
    if File::Stat.new(pdf_filename).mtime < 1.day.ago
      File.unlink(pdf_filename)
      num_removed += 1
    end
  end

  puts "Removed #{num_removed} stale pdf(s)"
end

In fact, this example is generous. It's more likely that you'll find some rake tasks in your project that are two or three times the size of this, if not more. But even this short example shares the same two problems that are often present in "real world" rake task code:

  • Poor abstractions. Rake tasks written in this "throwaway" style have no abstractions - they're just a big hunk of code inside a task block, which increases the cognitive load required to understand them.

  • Lack of tests. Worse still, they're unlikely to be tested. Sure, you could write a test that calls Rake::Task['my_task_name'].invoke and test them that way, but that's uncommon in my experience. Further, without abstractions, such tasks are likely to be awkward to test anyway (difficult to set up, stub, etc.) The lack of tests makes your rake tasks brittle and hard to maintain.

Thankfully, if you do consider your rake tasks to be suffering from neglect, and are looking for a way to improve them, there is a really simple rule you can follow to make things better.

Limit Rake Task Bodies to a Single Method Call

An effective way to write rock solid rake tasks is to reduce the task to a single method call, and write tests for that method. There are a few sticking points with rake tasks that seem to make them harder to test than "normal" code. Just like our example above, rake tasks often:

  • Manipulate objects on the filesystem.

  • Write to stdout.

  • Take parameters as environment variables, or in the rake foo[bar,baz] form.

With that in mind, let's look at one simple approach you can use to test rake tasks like this. I'm going to use minitest, but the same ideas apply to rspec or whatever other framework you prefer. This is by no means the One True Way to test rake tasks, but it is an approach that's worked well for several years for me.

The File System Is Your Friend

Developers sometimes get weird about reading from and writing to the filesystem in a test, as if this will somehow make your tests non-deterministic, or difficult to maintain. Hence hackarounds like fakefs.

IMHO, this fear is unwarranted. When you need to test code that does file IO, using the actual filesystem is far less bug-prone than relying on monkey patches to core file manipulation APIs.

In this example, our file setup/teardown logic is simple:

before do
  @test_dir = Pathname.new(
    FileUtils.mkdir_p("/tmp/bugroll-examples/test_pdfs").first)
  FileUtils.rm(Dir.glob("#{@test_dir}/*.pdf"))

  @files = []
  @files << create_file(@test_dir.join("pdf1.pdf"), mtime: 1.hour.ago)
  @files << create_file(@test_dir.join("pdf2.pdf"), mtime: 2.days.ago)
  @files << create_file(@test_dir.join("pdf3.pdf"), mtime: 10.hours.ago)
end

after do
  @files.each { |f| f.delete if f.exist? }
end  

Use climate_control to Set Environment Variables

If you use environment variables to configure your rake tasks at runtime, I recommend the climate_control gem for manipulating the value of ENV in your tests. Example:

ClimateControl.modify(PDF_DIR: @test_dir.to_s) do
  StalePdfCleaner.clean!(out)
end

Take the Output Stream as an Argument

Lastly, if your rake task writes to stdout, you probably want to actually test the printed output, while also avoiding polluting your test runs with that output. To do that, take the output stream as an argument, and then call puts on that object. Example:

class StalePdfCleaner
  def self.clean!(out = $stdout)
    # ...do stuff...
    out.puts "Removed #{num_files_removed} stale pdf(s)"
  end
end

Then in your test:

out = StringIO.new

StalePdfCleaner.clean!(out)

out.rewind
out.read.must_equal "Removed 1 stale pdf(s)\n"

Putting It All Together

So bringing these ideas together, what do we end up with? First, we have a test where no test existed before:

require 'minitest/autorun'
require 'active_support/all'
require 'climate_control'

describe StalePdfCleaner do
  describe '.clean!' do
    before do
      @test_dir = Pathname.new(
        FileUtils.mkdir_p("/tmp/bugroll-examples/test_pdfs").first)
      FileUtils.rm(Dir.glob("#{@test_dir}/*.pdf"))

      @files = []
      @files << create_file(@test_dir.join("pdf1.pdf"), mtime: 1.hour.ago)
      @files << create_file(@test_dir.join("pdf2.pdf"), mtime: 2.days.ago)
      @files << create_file(@test_dir.join("pdf3.pdf"), mtime: 10.hours.ago)
    end

    after do
      @files.each { |f| f.delete if f.exist? }
    end  

    it "must raise an ArgumentError if PDF_DIR is not specified" do
      proc { StalePdfCleaner.clean! }.must_raise(ArgumentError)
    end

    it "must remove pdfs that are more than one day old" do
      out = StringIO.new

      ClimateControl.modify(PDF_DIR: @test_dir.to_s) do
        StalePdfCleaner.clean!(out)
      end

      out.rewind
      out.read.must_equal "Removed 1 stale pdf(s)\n"

      Dir.glob("#{@test_dir}/*.pdf").
        map { |fn| Pathname.new(fn).basename.to_s }.
        must_equal %w(pdf1.pdf pdf3.pdf)
    end
  end
  
  private

  def create_file(file_path, options = {})
    Pathname.new(FileUtils.touch(file_path, options).first)
  end
end

Next, we have a small, simple class to contain the logic for our rake task:

require 'active_support/all'

class StalePdfCleaner
  def self.clean!(out = $stdout)
    num_files_removed = 0
    pdf_dir = ENV['PDF_DIR']
    raise ArgumentError, 'missing required argument PDF_DIR' unless pdf_dir.present?

    Dir[Pathname.new(pdf_dir).join("*.pdf")].each do |pdf_filename|
      if File.stat(pdf_filename).mtime < 1.day.ago
        File.unlink(pdf_filename) 
        num_files_removed += 1
      end
    end

    out.puts "Removed #{num_files_removed} stale pdf(s)"
  end
end

And finally, the original, "throwaway" style code we started with is condensed down to this:

desc "Clear out stale print job pdfs"
task :clear_stale_pdfs do
  StalePdfCleaner.clean!
end

Even if your rake task doesn't read from environment variables, or write to stdout, or do file IO, you'll still get all the same benefits from this process of turning your tasks "inside out", reducing them to a single method call each, and writing a test against it. Tasks written this way are easier to read, and can be modified with confidence.

Want more programming tips and techniques?

I'm writing a book called Rock Solid Rails Development: A No-Nonsense Guide to Building High-Quality Rails Apps. Enter your email to get more exclusive articles by email, and find out as soon as the book's released!