Case Study:
Converting the CATME application
from MySQL to Postgres

Julie Baumler
@competentgirl
juli@baumler.com
Postgres Open 2013

About Me

I tend to rotate between being a sys admin who codes a lot and and a coder who sys admins a lot.
Since 2011 primary programmer on the CATME Project

CATME Smarter Teamwork System

Mission Statement

CATME SMARTER Teamwork prepares students to function effectively in teams and supports faculty as they manage their students’ team experiences.

CATME Smarter Teamwork System

We make it easy for faculty to apply best practices in team building and find teams exhibiting problem behavior.

Feedback for both students and faculty.

In the process of adding more tools to help students learn and practice good teamwork behaviors.

CATME Smarter Teamwork System

Free to use

Web based

CATME Smarter Teamwork System

Informational site http://www.catme.org
Demo available at https://www.catme.org/login/survey_instructions

CATME App History

2005 - built on mod_perl and MySQL
2008 - Sun buys MySQL AB
2010 - Oracle buys Sun

We start to worry about future of MySQL

2012 - Conversion of CATME backend to Postgres

Data and Scheme Convertion

I talked in depth about the scheme and data conversion at OpenSource Bridge in June, my slides and tools are available at: http://www.baumler.com/postgres

Data and Scheme Conversion TL;DR;

mysqldump --postgres is useless,
mysqldump --ansi is a good start.
Regex are your friends
Know your data!

Application Code Changes

We didn't really do a lot of this because most of the data transformation in our app is done via SQL.

Application Code Changes

MySQL has a nifty row definition attribute for keeping track of change time:

    ON UPDATE CURRENT_TIMESTAMP

Postgres doesn't support this sadly.

Application Code Changes

Options:

Postgres trigger function
handle in application code

We choose to handle in the application code since we didn't actually need to update it every time the row changed.

Automated Changes

sed & AWK

AWK for scheme and most data changes
sed for simple code changes

Date Display

date_format(PARAM, '%d-%b-%Y')

becomes

to_char(PARAM, 'DD-Mon-IYYY')

In hindsight, we wish we'd switched to ISOdates while we were changing everything as this format doesn't work with automatic translation tools.

Null handling

 ifnull(PARAM1, PARAM2)

becomes

 coalesce(PARAM1, PARAM2)

There were some other changes we had to make with regard to NULL handling differences, these are covered later.

Time Zones

convert_tz( PARAM1 , @@global.time_zone, PARAM2)

becomes

PARAM1 at time zone PARAM2

Date Math

datediff( DATE1, DATE2)

becomes simply

(DATE1 - DATE2)

likewise

date_add( DATE1, DATE2)

becomes

(DATE1 + DATE2)

Random

rand(SEED)

becomes

random()

Postgres's random function is self-seeding.

Array to string conversion

group_concat( EXPRESSION  separator STRING)

becomes

array_to_string(array_agg(EXPRESSION), STRING)

MySQL defaults to a comma if no string is given, so you also need to handle that case

group_concat( EXPRESSION )

becomes

array_to_string(array_agg(EXPRESSION),',')

This is more powerful, but I also found it more confusing.

sed script for changes


s/date_format(\(.*\), '%d-%b-%Y')/to_char(\1, 'DD-Mon-IYYY')/ig
s/ifnull(\(.*\), \(.*\))/coalesce(\1, \2)/ig
s/convert_tz(\(.*\), \@\@global.time_zone, (.*\))/
    \1 at time zone \2/ig
s/datediff(\(.*\), \(.*\))/(\1 - \2)/ig
s/date_add(\(.*\), \(.*\))/(\1 + \2)/ig
s/rand(.*)/random()/ig
s/group_concat(\(.*\) separator \(.*\)/
    array_to_string(array_agg(\1), \2)/ig
s/group_concat(\(.*\))/array_to_string(array_agg(\1),',')/ig

Manual Changes

Remaining changes in our code required some programmer input or were ripe for refactoring and simplification.

Dates with comparison operations

datediff had been used with comparisons in many cases and could be simplified to things like:

 select created from table_with_dates
   where date_part( 'years', age(created)) >= 1;

    select created from table_with_dates
    where created >= (CURRENT_TIMESTAMP - interval '1 year');

Conditional Expressions

 if (a = b, c, d)

becomes

    case a when b then c
	else d
    end

Conditional Expressions

BUT Conditional expression in our code were usually replacable with

greatest()

least()

or even

coalesce()

between

Both MySQL and Postgres support the between keyword

x between y and z

but Postgres supports the overlaps keyword for dates

(x, interval '0 day') overlaps y, z

often better represented our desired behavior

Subqueries

Postgres needs subqueries to be named with "AS" in situations where MySQL doesn't care. You could write a regular expression to fix this.

Production Conversion

Christmas Morning 2012

User Experience

We made no interface changes in this release and users surveyed with did not notice a difference.

Development and Research Team Results

Positive Results

tests immediately started running faster, even as we were working on it
queries that had previously timed out regularly now complete
- still could use optimization
we were ready when asked to support a 10,000 student plus MOOC

other requested improvements were easier than they would have been with MySQL

better UTF-8 support

Development and Research Team Results

Negative Results

BUGS!

Development and Research Team Results

Negative Results

stricter typing meant a few existing bugs became user visible
found areas with no test coverage
differences in how undef is handled converted poorly written code to user visible bugs
better UTF-8 support

Stricter Typing

Postgres is stricter about column types than MySQL

If your database is set to use UTF-8, all string and text must consist of valid utf-8 characters.

Result: bugs when people enter non-UTF-8 chars

MS Code pages
some translation services

Stricter Typing

Boolean values were a particular problem in our code, we had a lot of places where we'd depended on MySQL's shortcut evaluation and automatic casting. ie

select something from table 
	where (PARAM1 or COLNAME=PARAM1);

in places where PARAM1 was either a number or string or NULL

Postgres requires casting to boolean with ::boolean

In most cases this was overly complex code optimized for use with prepare statements and I just simplified it.

Success Factors - 1

good test coverage

Success Factors - 2

solid test data set

7 years of extreme and previously problematic data

Success Factors - 3

lots of data conversion testing

including regularly updated copies of the production database

Success Factors - 4

our application was written using mostly standard SQL

project programmers are all SQL version polyglots, we had only used MySQLism where necessary.

Success Factors - 5

standard coding style for SQL
allowed for easier use of automated tools

Questions

Julie Baumler
@competentgirl
juli@baumler.com
http://www.catme.org

An Aside on Testing

Selenium
working on perl-based unit tests using Test::More
psql errors for scheme and database conversion
watch the postgres logs carefully - logcheck.sh
check every query for success in app language
Test-Driven Database Development: Unlocking Agility (Net Objectives Lean-Agile Series) Max Guernsey III

Case Study: Converting the CATME application from MySQL to Postgres

Me

About Me

CATME Application

CATME Smarter Teamwork System

Mission Statement

CATME Smarter Teamwork System

CATME Smarter Teamwork System

CATME Smarter Teamwork System

CATME App History

Data and Scheme Convertion

Data and Scheme Convertion

Data and Scheme Conversion TL;DR;

Application Code Changes

Application Code Changes

Application Code Changes

Application Code Changes

Automated Changes

Automated Changes

Date Display

Null handling

Time Zones

Date Math

Random

Array to string conversion

sed script for changes

Manual Changes

Manual Changes

Dates with comparison operations

Conditional Expressions

Conditional Expressions

between

Subqueries

Production Conversion

Christmas Morning 2012

User Experience

Development and Research Team Results

Positive Results

Development and Research Team Results

Negative Results

BUGS!

Development and Research Team Results

Negative Results

Stricter Typing

Stricter Typing

Success Factors

Success Factors - 1

good test coverage

Success Factors - 2

solid test data set

Success Factors - 3

lots of data conversion testing

Success Factors - 4

our application was written using mostly standard SQL

Success Factors - 5

standard coding style for SQL allowed for easier use of automated tools

Questions

An Aside on Testing

Resources

Case Study:
Converting the CATME application
from MySQL to Postgres

standard coding style for SQL
allowed for easier use of automated tools