Case Study:
Converting the CATME application
from MySQL to Postgres
Julie Baumler
@competentgirl
juli@baumler.com
Postgres Open 2013
Me
About Me
I tend to rotate between being a sys admin who codes a lot and
and a coder who sys admins a lot.
Since 2011 primary programmer on the CATME Project
CATME Application
CATME Smarter Teamwork System
Mission Statement
CATME SMARTER Teamwork prepares students to function
effectively in teams and supports faculty as they manage their
students’ team experiences.
CATME Smarter Teamwork System
We make it easy for faculty to apply best practices in team building
and find teams exhibiting problem behavior.
Feedback for both students and faculty.
In the process of adding more tools to help students learn and practice
good teamwork behaviors.
CATME Smarter Teamwork System
Free to use
Web based
CATME Smarter Teamwork System
CATME App History
-
2005 - built on mod_perl and MySQL
-
2008 - Sun buys MySQL AB
-
2010 - Oracle buys Sun
-
We start to worry about future of MySQL
-
2012 - Conversion of CATME backend to Postgres
Data and Scheme Convertion
Data and Scheme Convertion
I talked in depth about the scheme and data conversion
at OpenSource Bridge
in June, my slides and tools are available at:
http://www.baumler.com/postgres
Data and Scheme Conversion TL;DR;
-
mysqldump --postgres is useless,
-
mysqldump --ansi is a good start.
-
Regex are your friends
-
Know your data!
Application Code Changes
Application Code Changes
We didn't really do a lot of this
because most of the data transformation in our app
is done via SQL.
Application Code Changes
MySQL has a nifty row definition attribute for
keeping track of change time:
ON UPDATE CURRENT_TIMESTAMP
Postgres doesn't support this sadly.
Application Code Changes
Options:
- Postgres trigger function
- handle in application code
We choose to handle in the application code since we didn't actually
need to update it every time the row changed.
Automated Changes
Automated Changes
-
AWK for scheme and most data changes
-
sed for simple code changes
Date Display
date_format(PARAM, '%d-%b-%Y')
becomes
to_char(PARAM, 'DD-Mon-IYYY')
In hindsight, we wish we'd switched to ISOdates while
we were changing everything as this format doesn't work with
automatic translation tools.
Null handling
ifnull(PARAM1, PARAM2)
becomes
coalesce(PARAM1, PARAM2)
There were some other changes we had to make with regard to NULL handling
differences, these are covered later.
Time Zones
convert_tz( PARAM1 , @@global.time_zone, PARAM2)
becomes
PARAM1 at time zone PARAM2
Date Math
datediff( DATE1, DATE2)
becomes simply
(DATE1 - DATE2)
likewise
date_add( DATE1, DATE2)
becomes
(DATE1 + DATE2)
Random
rand(SEED)
becomes
random()
Postgres's random function is self-seeding.
Array to string conversion
group_concat( EXPRESSION separator STRING)
becomes
array_to_string(array_agg(EXPRESSION), STRING)
MySQL defaults to a comma if no string is given, so
you also need to handle that case
group_concat( EXPRESSION )
becomes
array_to_string(array_agg(EXPRESSION),',')
This is more powerful, but I also found it more confusing.
sed script for changes
s/date_format(\(.*\), '%d-%b-%Y')/to_char(\1, 'DD-Mon-IYYY')/ig
s/ifnull(\(.*\), \(.*\))/coalesce(\1, \2)/ig
s/convert_tz(\(.*\), \@\@global.time_zone, (.*\))/
\1 at time zone \2/ig
s/datediff(\(.*\), \(.*\))/(\1 - \2)/ig
s/date_add(\(.*\), \(.*\))/(\1 + \2)/ig
s/rand(.*)/random()/ig
s/group_concat(\(.*\) separator \(.*\)/
array_to_string(array_agg(\1), \2)/ig
s/group_concat(\(.*\))/array_to_string(array_agg(\1),',')/ig
Manual Changes
Manual Changes
Remaining changes in our code required some programmer input
or were ripe for refactoring and simplification.
Dates with comparison operations
datediff had been used with comparisons in many cases
and could be simplified to things like:
select created from table_with_dates
where date_part( 'years', age(created)) >= 1;
OR
select created from table_with_dates
where created >= (CURRENT_TIMESTAMP - interval '1 year');
Conditional Expressions
if (a = b, c, d)
becomes
case a when b then c
else d
end
Conditional Expressions
BUT
Conditional expression in our code were usually replacable with
greatest()
least()
or even
coalesce()
between
Both MySQL and Postgres support the between keyword
x between y and z
but Postgres supports the overlaps keyword for dates
(x, interval '0 day') overlaps y, z
often better represented our desired behavior
Subqueries
Postgres needs subqueries to be named with "AS"
in situations where MySQL doesn't care.
You could write a regular expression to
fix this.
Production Conversion
Christmas Morning 2012
User Experience
We made no interface changes in this release and
users surveyed with did not notice a difference.
Development and Research Team Results
Positive Results
- tests immediately started running faster, even as we
were working on it
- queries that had previously timed out regularly now complete
- still could use optimization
- we were ready when asked to support a 10,000 student plus MOOC
- other requested improvements were easier than they would have been with MySQL
- better UTF-8 support
Development and Research Team Results
Negative Results
BUGS!
Development and Research Team Results
Negative Results
- stricter typing meant a few existing bugs became user visible
- found areas with no test coverage
- differences in how undef is handled converted poorly written code
to user visible bugs
- better UTF-8 support
Stricter Typing
Postgres is stricter about column types than
MySQL
If your database is set to use UTF-8, all string and text must consist of
valid utf-8 characters.
Result: bugs when people enter non-UTF-8 chars
- MS Code pages
- some translation services
Stricter Typing
Boolean values were a particular problem in our code,
we had a lot of places where we'd depended on
MySQL's shortcut evaluation and automatic casting. ie
select something from table
where (PARAM1 or COLNAME=PARAM1);
in places where PARAM1 was either a number or string or NULL
Postgres requires casting to boolean with ::boolean
In most cases this was overly complex code
optimized for use with prepare statements and I just
simplified it.
Success Factors
Success Factors - 1
good test coverage
Success Factors - 2
solid test data set
7 years of extreme and previously problematic data
Success Factors - 3
lots of data conversion testing
including regularly updated copies of the production database
Success Factors - 4
our application was written using mostly standard SQL
project programmers are all SQL version polyglots,
we had only used MySQLism where necessary.
Success Factors - 5
standard coding style for SQL
allowed for easier use of automated tools
Questions
Julie Baumler
@competentgirl
juli@baumler.com
http://www.catme.org
An Aside on Testing
Resources