Built-In Database Upgrades

Often times an application (web application or otherwise) involves a database for storing it’s persistent data. As the application evolves over time, the database needs to change, and sometimes existing data needs to be updated to reflect those changes.

How do we keep the app in sync with the schema changes of the corresponding database, in all the environments we might deploy into – such as local, testing, perhaps staging, and production? In some situations we want to re-create the database from scratch (such as a testing environment, for instance). In others (such as a production environment), we definitely don’t want to re-create the database, but just make carefully scripted modifications. In all cases, we want to make only the necessary modifications, based on the current state of the database.

A tool that a colleague recently recommended to me for this job is dbdeploy, a utility for maintaining synchronization between our application and our database. Dbdeploy is less complex than other solutions, such as Database Migrations in Rails, arguably easier to implement and more portable to non-Ruby/Rails environments, and uses just SQL as the language to express the database changes.

Like Migrations, dbdeploy allows us to quickly and easily script our database changes, then apply those changes as required to a specific database instance in each of our deployment environments.

We can create a series of simple SQL files, like

  • 001_create_base.sql
  • 002_expand_customer_name.sql
  • 003_add_order_status.sql
  • 004_update_customers.sql

…and so forth. Each file represents a single simple change to our database schema, and is numbered in the order in which they occur. We don’t edit them, but add new ones as further changes are made, to ensure backwards-compatibility with all versions of our database.

Dbdeploy uses a change-log table, added to each of our target databases, to keep track of which scripts have already been run in that environment, automatically producing a script containing the set of changes that bring the current database up to the latest specification, including only the necessary changes and updates.

Dbdeploy comes with a command-line interface and an Ant task implementation, each of which is very straightforward to use. DbDeploy by default does not apply the changes, however – it only produces an SQL script that we can then review and either apply manually or via a second Ant task (or other means). Both dbdeploy and this second task need the connection information for the database supplied, however – the driver, JDBC URL, login and password. This is where I started to think “this can be simpler”…

If we’re deploying an application whose configuration already includes the necessary data to connect to the database, we could employ dbdeploy in a different way that overcomes this issue, as well as adding a few other advantages.

Let’s say in our development environment we keep our dbdeploy delta scripts in a directory that gets bundled into our finished application as resources. Maven can easily do this if you just put the files in the src/main/resources directory (by default), for instance. Now our database scripts can be bundled with our application automatically every time it’s built, without having any extra files to worry about.

We can also include dbdeploy in our classpath, instead of using it as a separate utility, and wrap up our application as a single jar file that includes our database definition scripts. We can extract these files from our application jar on demand with code like this (we don’t show the getResources method for clarity, but it gets a list of resource paths matching a specified pattern from our current classpath):

<code>
  public static String extractScripts() throws Exception {
        File scriptsDir = new File(SCRIPT_DIR_NAME);
        if (!scriptsDir.exists())
            if (!new File(SCRIPT_DIR_NAME).mkdirs())
                throw new RuntimeException("Could not create dir 'scripts'");
        Collection<String> scripts = getResources(".*.sql");
        for (String script : scripts)
            extractToDir(script);
        return SCRIPT_DIR_NAME;
    }

    private static void extractToDir(String script) throws Exception {
        String endName = new File(script).getName();
        InputStream in = script.getClass().getResourceAsStream("/" + endName);
        if (in == null)
            throw new RuntimeException("Can't find resource " + endName);
        FileOutputStream fos = new FileOutputStream(new File(SCRIPT_DIR_NAME + "/" + endName));
        byte[] buf = new byte[1024];
        int i = 0;
        while ((i = in.read(buf)) != -1)
            fos.write(buf, 0, i);
        fos.close();
    }
</code>

Let’s say our main application is invoked with “java -jar someApplication-1.0.jar”. This implies we’ve set up a default main class and method in our jar meta-data – it’s the same as saying “java -cp someApplication-1.0.jar com.point2.main.MyMainClass”, just easier to type.

Nothing stops us having another main class, however, that we can invoke on demand, e.g.:
java -cp someApplication-1.0.jar com.point2.main.DbDeploy

Code like this allows us to run dbdeploy on our extracted scripts, while passing it the database connection info we already know:

<code>
 public static void prepScripts(String dbType, String driverName, String dbUrl, String userName, String password) throws Exception {
        com.dbdeploy.DbDeploy runner = new com.dbdeploy.DbDeploy();
        runner.setDbms(dbType);
        runner.setDriver(driverName);
        runner.setUrl(dbUrl);
        runner.setOutputfile(new File("scripts/output.sql"));
        runner.setUserid(userName);
        runner.setPassword(password);
        runner.setScriptdirectory(new File(SCRIPT_DIR_NAME));
        runner.go();
    }
</code>

This code produces our “output.sql” script, which contains exactly the required changes to our specified database to bring it up to date:

Then we can actually execute the resulting script (if we want to) via a method like so:

<code>
 public static void applyScript(String script, String driverName, String dbUrl, String userName, String password) throws Exception {
        StringBuilder sb = new StringBuilder();
        FileInputStream fstream = new FileInputStream(SCRIPT_DIR_NAME + "/" + script);
        
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;
        while ((strLine = br.readLine()) != null) {
            sb.append(strLine);
            sb.append("n");
        }
        in.close();
        executeSql(sb.toString(), driverName, dbUrl, userName, password);
    }

    private static void executeSql(String sql, String driverName, String dbUrl, String userName, String password) throws Exception {
        Driver driver = (Driver) Class.forName(driverName).newInstance();
        Connection connection = driver.connect(dbUrl, null);
        Statement statement = connection.createStatement();
        statement.executeUpdate(sql);
        connection.close();
    }
</code>

Doing this starts a sequence that

  1. Extracts the database change scripts from the jar file
  2. Uses the application’s own configuration to get the JDBC Driver, URL, login and password information
  3. Invokes dbdeploy (internally) to create the database update script appropriate to whatever database we’re connected to
  4. Optionally, if we include the “execute” option on our command-line, run the resulting script – if necessary, creating the “change log” table used by dbdeploy in the process

As a finishing touch, we can examine the changelog table and determine exactly what incremental version our target database has been brought up to, and compare this to a known required version number in our application. In this way, our application can automatically verify that the dbdeploy process has been done on the database it’s being run against – and if it isn’t, we can output the proper command-line to perform the upgrade as a suggestion to the user (or even do it automatically, if permissable in our environment).

Now, instead of having a number of extra pieces to deploy to our target environment, we’re down to one nice tidy self-contained jar file, and we can choose to either simply update the database immediately or dump the script so we can examine what it’s going to do first, and either then decide to go ahead and apply or apply the changes manually ourselves.

Compared to dbdeploy in either Ant or standalone mode we’ve got the following advantages:

  1. We keep the simplicity of our single-jar deployment
  2. We can upgrade our database in any environment our application can run in – no need for extra tools
  3. We don’t need to repeat our database configuration information in two places – so they can’t get out of sync and end up running against the wrong database
  4. We ensure that the scripts required for the current version of the code are always with the code
  5. We guarantee that our application always runs against the database versioned to the expected state

I definitely recommend dbdeploy for this type of scenario.

Principles and Practices

Tired of the Software Development Grind? Know it can be done better? Check out my book: Principles and Practices of Software Craftsmanship or sign up for my Craftsmanship Dispatches newsletter.