In this episode, we're going to be looking at how we can use the citext module when working with case insensitive data.
Postgres Version 9.5
In previous episodes, we explored the issues presented by case-insensitive data:
We had to jump through a number of hoops to make sure that we can efficiently query our tables as well as prevent duplicate records.
We learned how to efficiently query our tables and prevent duplicate
records, but there is an easier way – the
Let’s look at how to use it.
First, here is the table that we left off with:
In this table we used the
varchar data type for our email column.
In our new table we will be replacing it with the
citext data type
provided by the
citext module. This data type works just like the
data type, but during comparisons it internally calls the
Like I said,
citext is going to make things easier.
To get started,
create extension citext;
Then we create our new
create table users ( id serial primary key, email citext not null unique );
and we can take a quick look at the description:
Let’s insert an initial record so that we have something to work with, pay attention to the casing here:
insert into users (email) values ('PERSON1@example.com');
Now we can see the case-insensitive comparison in action by querying for the record with different casing:
select * from users where email = 'firstname.lastname@example.org';
We find the record and the casing from the original insertion is preserved.
And what if we try to insert a duplicate record with different casing:
insert into users (email) values ('Person1@example.com');
As expected, we get an error, the insertion was prevented by our uniqueness constraint.
Lastly, let’s see how the performance compares.
We can insert some additional records to our new
insert into users (email) select 'person' || num || '@example.com' from generate_series(2,10000) as num;
Now we can look at the
explain analyze output for queries on each of the
explain analyze select * from old.users where lower(email) = lower('email@example.com');
explain analyze select * from users where email = 'firstname.lastname@example.org';
We can see that we get comparable performance when using the citext field with a unique index.
You may be wondering where the
ilike operator fits into all of this. We
will look at that in the following episode.
Until then, may your data be consistent and your queries performant.
create schema old; create table old.users ( id serial primary key, email varchar not null unique ); create unique index users_unique_lower_email_idx on old.users (lower(email)); insert into old.users (email) select 'person' || num || '@example.com' from generate_series(1,10000) as num; create schema old; create table old.users ( id serial primary key, email varchar not null unique ); create unique index users_unique_lower_email_idx on old.users (lower(email)); insert into old.users (email) select 'person' || num || '@example.com' from generate_series(1,10000) as num;