Case Study:
Converting the CATME application
from MySQL to Postgres

Julie Baumler
@competentgirl
juli@baumler.com
Postgres Open 2013

Me

About Me

I tend to rotate between being a sys admin who codes a lot and and a coder who sys admins a lot.
Since 2011 primary programmer on the CATME Project

CATME Application

CATME Smarter Teamwork System

Mission Statement

CATME SMARTER Teamwork prepares students to function effectively in teams and supports faculty as they manage their students’ team experiences.

CATME Smarter Teamwork System

We make it easy for faculty to apply best practices in team building and find teams exhibiting problem behavior.

Feedback for both students and faculty.

In the process of adding more tools to help students learn and practice good teamwork behaviors.

CATME Smarter Teamwork System

Free to use

Web based

CATME Smarter Teamwork System

CATME App History

Data and Scheme Convertion

Data and Scheme Convertion

I talked in depth about the scheme and data conversion at OpenSource Bridge in June, my slides and tools are available at: http://www.baumler.com/postgres

Data and Scheme Conversion TL;DR;

Application Code Changes

Application Code Changes

We didn't really do a lot of this because most of the data transformation in our app is done via SQL.

Application Code Changes

MySQL has a nifty row definition attribute for keeping track of change time:

    ON UPDATE CURRENT_TIMESTAMP
    
Postgres doesn't support this sadly.

Application Code Changes

Options:

We choose to handle in the application code since we didn't actually need to update it every time the row changed.

Automated Changes

Automated Changes

sed & AWK

Date Display

date_format(PARAM, '%d-%b-%Y')
becomes
to_char(PARAM, 'DD-Mon-IYYY')
In hindsight, we wish we'd switched to ISOdates while we were changing everything as this format doesn't work with automatic translation tools.

Null handling

 ifnull(PARAM1, PARAM2) 
becomes
 coalesce(PARAM1, PARAM2) 
There were some other changes we had to make with regard to NULL handling differences, these are covered later.

Time Zones

convert_tz( PARAM1 , @@global.time_zone, PARAM2)
becomes
PARAM1 at time zone PARAM2

Date Math

datediff( DATE1, DATE2)
becomes simply
(DATE1 - DATE2)
likewise
date_add( DATE1, DATE2)
becomes
(DATE1 + DATE2)

Random

rand(SEED)
becomes
random()
Postgres's random function is self-seeding.

Array to string conversion

group_concat( EXPRESSION  separator STRING)
becomes
array_to_string(array_agg(EXPRESSION), STRING)
MySQL defaults to a comma if no string is given, so you also need to handle that case
group_concat( EXPRESSION )
becomes
array_to_string(array_agg(EXPRESSION),',')
This is more powerful, but I also found it more confusing.

sed script for changes


s/date_format(\(.*\), '%d-%b-%Y')/to_char(\1, 'DD-Mon-IYYY')/ig
s/ifnull(\(.*\), \(.*\))/coalesce(\1, \2)/ig
s/convert_tz(\(.*\), \@\@global.time_zone, (.*\))/
    \1 at time zone \2/ig
s/datediff(\(.*\), \(.*\))/(\1 - \2)/ig
s/date_add(\(.*\), \(.*\))/(\1 + \2)/ig
s/rand(.*)/random()/ig
s/group_concat(\(.*\) separator \(.*\)/
    array_to_string(array_agg(\1), \2)/ig
s/group_concat(\(.*\))/array_to_string(array_agg(\1),',')/ig

    

Manual Changes

Manual Changes

Remaining changes in our code required some programmer input or were ripe for refactoring and simplification.

Dates with comparison operations

datediff had been used with comparisons in many cases and could be simplified to things like:

 select created from table_with_dates
   where date_part( 'years', age(created)) >= 1;
   
OR
    select created from table_with_dates
    where created >= (CURRENT_TIMESTAMP - interval '1 year');
    

Conditional Expressions

 if (a = b, c, d) 
becomes
    case a when b then c
	else d
    end
    

Conditional Expressions

BUT Conditional expression in our code were usually replacable with

greatest()
least() 
or even
coalesce()

between

Both MySQL and Postgres support the between keyword

x between y and z
but Postgres supports the overlaps keyword for dates
(x, interval '0 day') overlaps y, z 
often better represented our desired behavior

Subqueries

Postgres needs subqueries to be named with "AS" in situations where MySQL doesn't care. You could write a regular expression to fix this.

Production Conversion

Christmas Morning 2012

User Experience

We made no interface changes in this release and users surveyed with did not notice a difference.

Development and Research Team Results

Positive Results

Development and Research Team Results

Negative Results

BUGS!

Development and Research Team Results

Negative Results

Stricter Typing

Postgres is stricter about column types than MySQL

If your database is set to use UTF-8, all string and text must consist of valid utf-8 characters.

Result: bugs when people enter non-UTF-8 chars

Stricter Typing

Boolean values were a particular problem in our code, we had a lot of places where we'd depended on MySQL's shortcut evaluation and automatic casting. ie

select something from table 
	where (PARAM1 or COLNAME=PARAM1);
    
in places where PARAM1 was either a number or string or NULL

Postgres requires casting to boolean with ::boolean

In most cases this was overly complex code optimized for use with prepare statements and I just simplified it.

Success Factors

Success Factors - 1

good test coverage

Success Factors - 2

solid test data set

7 years of extreme and previously problematic data

Success Factors - 3

lots of data conversion testing

including regularly updated copies of the production database

Success Factors - 4

our application was written using mostly standard SQL

project programmers are all SQL version polyglots, we had only used MySQLism where necessary.

Success Factors - 5

standard coding style for SQL

allowed for easier use of automated tools

Questions

Julie Baumler
@competentgirl
juli@baumler.com
http://www.catme.org

An Aside on Testing

Resources