February 04, 2008

PostgreSQL version 8.3 Released

I just got word that version 8.3 of PostgreSQL has been released.  Along with the usual amount of improvements there are some new features in 8.3 that should be of interest to PostgreSQL admins and developers such as:

  • Integrated TSearch
  • ENUM and UUID data types
  • Faster sorting technique used for LIMIT operations
  • Faster LIKE and ILIKE operations
  • Lazy XID assignment which will make many read only operations much faster

Check out the full list of features at the PostgreSQL site or download it from the download section of their site.

January 24, 2008

EveryBlock.com is now launched

My friend and former co-worker Adrian Holovaty and his team just launched their new project EveryBlock.com. EveryBlock takes the term hyperlocal to a whole new level.  They aggregate tons of public data sources by geo location so you can for example find all of the recent crime around a particular address, neighborhood, zip code, etc.  Or maybe you might be interested in the building code violations of where you live or work?

Right now they have San Francisco, Chicago, and New York up and running, but will be adding more cities as time goes on.  Adrian asked me to help performance tune their PostgreSQL database a couple of months ago and so far things seem to be humming along nicely.

Here are some links to other blogs talking about EveryBlock.com:

Congrats and the best of luck to EveryBlock! I'm sure we'll see even more new and interesting things from this team in the future!

October 05, 2007

Log Buffer #65: a Carnival of the Vanities for DBAs

Welcome to the 65th edition of Log Buffer, the weekly survey of database related blogs.

First let's start with some miscellaneous entries that could be of interest to any DBA.  Crazy DBA has an interesting post about how attending conferences helped to grow his professional network, which in turn has made him a better DBA.  And Thomas Kyte has a great post about why it's the data, not the application itself, that matters.  Brian Aker gives us a great link to a Werner Vogels' entry on Dynamo, one of the key technologies used behind the scenes at Amazon.

Oracle users will certainly find these two links of interest.  First off, Frederik Visser shows you how to play with Oracle 11g RAC in VMWare. And Alex Gorbachev has a nice write up about Miracle Open World.

SQL Server DBAs might enjoy the following posts.  If you're thinking about using or upgrading to Idera SQLsafe v4.5, you'll want to check out Sean McCown's post about some of that product's issues. Steve Jones has some thoughts on monitoring and alerting with your SQL Server, but are valid for any database. Need to know when your SQL Server instance was started? Check out Joe Webb's tip on how to find out.  And Mladen Prajdić has some advice on how to notify a client in a long running process with SQL Server.

MySQL users will find this post on accurately measuring how far behind your slave is lagging. Over at the MySQL Performance Blog there is an opportunity to ask questions of Heikki Tuuri, the creator of InnoDB, and Peter has some thoughts on a few serious bugs in the MySQL 5.0 release.  Kaj Arnö has an interesting post on how MySQL GmbH and MySQL AB help birds of a feather to flock together, quite literally and about how they have opened up the call for papers for the 2008 MySQL Users Conference.

Kevin Burton talks about how to avoid swapping insanity with InnoDB. Want a free MySQL Magazine? Lewis Cunningham has found one for us all. Jan Kneschke introduces us to the Wormhole storage engine for MySQL.  Not really sure how useful it is, but it is definitely interesting.

Hubert Lubaczewski has written a great tool to help you determine the optimal layout of tables, indexes, etc. on your various tablespaces for PostgreSQL.  Robert Treat follows up with some additional thoughts to consider

Joshua Drake has announced the speakers and topics for the PostgreSQL Conference Fall 2007, which is October 20th 2007 at Portland State University. Greg Sabino Mullane has a nice explanation of why you can't used prepared queries when using DBD::Pg and pg_bouncer. And to finish our this week's  links, Francisco Figueiredo Jr mentions that PostgreSQL will have a UUID data type in version 8.3.

Enjoy!

July 30, 2007

Which PostgreSQL backend am I using?

Someone asked me how to determine which PostgreSQL backend a particular client was connected to.  Everyone's first thought is to do a ps aux | grep postgres which will show you the IP and user, but if you have different processes connecting from the same IP with the same usernames, how do you know which is which?

One way to tell would be to see which queries are being executed by which backend and match that up to your client side.  But you can quickly get confused, especially if the various connections are all executing the same SQL statements, a web application for example.

The simplest way was suggested by Jacob Kaplan-Moss, which is to use the pg_backend_pid() function like:

SELECT pg_backend_pid();

I love it when the solution is something really simple!

July 09, 2007

Real PostgreSQL Benchmark

Josh Berkus and Sun have put together the first real PostgreSQL performance benchmark.  I run into people often who are still working under dubious performance comparisons done years ago against against competition like MySQL and Oracle. Hopefully this in depth comparison will put some of these arguments to rest.

If you're just interested in the conclusions, PostgreSQL is as fast or faster than MySQL and nearly as fast as Oracle.  I know the performance improvements over the last few years have been nothing short of phenomenal, glad to see there is now a report to back up my gut feelings.

May 23, 2007

PostgreSQL error messages confusing to new users

In the spirit of my blog post last week, I've created a new page that shows a couple of the more common error messages that confuse newer PostgreSQL users. It is my intention to expand this over time as I see people having trouble.

If you have any error messages you feel should be included or you find any technical inaccuracies please post a comment and I'll include it on the page.   

May 16, 2007

Common PostgreSQL problem

I see this problem pop up in the #postgresql IRC channel so often I felt it was necessary to blog about it. This problem trips up so many new users it might even be worth changing the default error message to indicate what is going on. The error message happens when the user tries to run psql for the first time:

psql: FATAL: database "root" does not exist

Where "root" is the current Unix username of the operator.  By default PostgreSQL attempts to log you into a database that is the same as your username.  However, it does not setup this database for you because it would be silly to setup 500 databases for all of the Unix users on your system, if only two of them are going to be using PostgreSQL. 

When setting up PostgreSQL for the first time you need to do the following:

  1. su ( or otherwise ) become your root user
  2. su ( or otherwise ) become your PostgreSQL user, typically 'postgres'
  3. Create your first database

The ultimate goal here is to become your PostgreSQL user, typically this involves becoming root and then switching to user postgres.  Upon setup this is the only user that is allowed to create users and databases.

Your "first" database can be created in one of two ways:

  1. Run the command 'psql template1' followed by a 'CREATE DATABASE' SQL call
  2. Run the command 'createdb <dbname>'

While you're still the postgres user it is probably best to also create a user with 'createuser <username>' or a 'CREATE USER' SQL call. See this section of the PostgreSQL documentation for more information on creating users and roles. You'll also want to read up on managing databases.

NOTE: The programs createdb and createuser may not be, by default, in your PATH so it may be necessary to use locate or type in the full path to your PostgreSQL bin/ directory.

Hope this helps!

November 22, 2006

Articles claims Open Source databases to have lower TCO

Ran across this article shows some Forrester Research data that Open Source databases such as PostgreSQL are 60% cheaper than the commercial alternatives.

I think most geeks are already aware of this. What I found interesting is the quote:

"Eighty per cent of the applications typically use only 30 per cent of the features found in commercial databases," Yuhanna told vnunet.com. "The open source databases deliver those features today."

In my experience working with clients it is more like 95% of applications use only 10% of the features found in commercial databases. I can't even count the number of times a company absolutely needed Oracle for a 100MB, 3 table, simple CRUD database.

One of my favorite themes that comes up is when a company again needs Oracle/DB2/whatever because it has all of the mission critical features they need such as clustering, fail over, etc.  Then when it comes to implementation time the tune changes to "Oh we don't really need a cluster.  And now that I think about it, we can handle downtime easily, so fail over isn't required either."

I think it comes from the fear that they might need those features and so they play a safe bet.  Much like when you buy something like a car and get the luggage rack on the roof, "just in case", but then realize years later that you've never used it.

Moral of the story? As with any type of project, even non-technical ones, you should worry the most about what you know you need today and let tomorrow take care of itself. The tomorrow you are worried about may never come.

August 04, 2006

Automatically updating a timestamp column in PostgreSQL

One of the great things about modern databases is you can let your database automate some of what used to only happen in application logic.  The example I love to show people is automatically updating a "last modified time" timestamp column in a table. 

This is easily accomplished if you always use the same name for those types of columns.  I like to use 'created' for the creation timestamp and 'modified' for the last modified time.  First we create a simple function:

	CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
   NEW.modified = now();
   RETURN NEW;
END;
$$ language 'plpgsql';

This function simply sets any column named 'modified' to the current timestamp for each row passed to it by the trigger. If you use the same column name consitently you only have to do this step once.  Now, you just have to create your trigger like so:

        CREATE TRIGGER update_customer_modtime BEFORE UPDATE
        ON customer FOR EACH ROW EXECUTE PROCEDURE
        update_modified_column();

This technique is very useful when you don't want to have to rely on your application developers to always remember to update the time stamps.  You can just let PostgreSQL handle it for you.

You should note that you will have to create a separate trigger for each table, which isn't a big deal.  Also, the BEFORE UPDATE is very important.  If you attempt to use AFTER UPDATE you put yourself into an infinite loop!

March 09, 2006

New product marketing trend

I've noticed a trend with software businesses that I think is wonderful. It may not be very new, but I sure haven't seen it before. The trend is to donate money to a charity or Open Source project for each product purchased.

For example, there is a company out there with a product that retails for around $500 for I believe a 25 user license.  They setup a promotion only linked to by blogs that allowed users to download a 5 user copy for $5.  And they donate the $5 to the Red Cross. I unfortunately can't seem to find the link in my book marks.

Now there is another company AM Software Design who is offering to donate a dollar to the PostgreSQL project provided they sell 1,000 copies of their product. The product is a GUI PostgreSQL administration application.

My question is why we don't see more of this in the FOSS world? Why aren't there more companies who will donate a small share of their profits to help out an Open Source project that is directly related to their product? Why doesn't every vendor who sells a Perl application not donating one, five, ten, or more dollars to the Perl Foundation?

If I was debating between two vendors products or whether or not to purchase a particular product at all, a donation to a good cause would help persuade me. It obviously would not make up for a shoddy product or one that doesn't fit my needs, but it would make a big difference when choosing between two like products.

I encourage more vendors to adopt this approach in their marketing promotions. I wager that you will be pleasantly surprised at the results.

January 17, 2006

Article showing PostgreSQL materialized views

I recently came across this article Materialized Views In PostgreSQL which shows different ways of using this technique.

For example it shows techniques much like the one I used in my previous blog entry on the subject along with going into depth on how techniques for tables whose data changes frequently.

January 06, 2006

Tuning your PostgreSQL Database

Several months ago I wrote an article on tuning your PostgreSQL database for performance that has gained a lot of attention. While I think the article covers most of the basic to intermediate level options you can use to better tune your database server, it is by no means all you're ever going to need to know.  If you use PostgreSQL often I strongly suggest you at least scan the posts on the postgresql-performance mailing list.

What surprised me most is how many companies and individual developers that are in need of a consultant to help them get the most out of their PostgreSQL setup. Because of this we've launched a new PostgreSQL Performance Tuning Service designed to help organizations receive better performance out of their systems and reduce the need to upgrade their server hardware. We often find that a few well placed configuration, query, or stored procedure changes can dramatically impact the speed of your application or website.

The problem with online tuning guides and the standard documentation is that every company's database is designed and/or used just differently enough from everyone else that a customized tuning is the best option. Contact us to find out more and schedule a performance analysis. 

January 03, 2006

When to use a materialized view in PostgreSQL

A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL, like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view. This is great for better representation of data for users, but does not do anything to help performance.

Materialized views are different in that they are an actual physical table that is built from the data in other tables. To use another example from my NewsCloud application in order to achieve the performance I needed, I used a materialized view for representing the tag cloud.

In this particular application the data used to build the tag cloud changes very infrequently, but to generate the actual tag cloud the ORDER BY needed to rank the results was terribly slow. They query in question is:

SELECT k.id, k.keyword, c.count FROM news_keywords AS k, news_keyword_total_count AS c WHERE k.id = c.keyword ORDER BY c.count DESC;

This query was taking an average of 2 seconds to complete which would mean, when you figured in all of the other time aspects such as mod_perl, Apache, transporting the HTML back to the browser, etc. this could easily mean the user would see a 3-4 second page load time. However, by creating a new table with:

CREATE TABLE test AS SELECT k.id, k.keyword, c.count FROM news_keywords AS k, news_keyword_total_count AS c WHERE k.id = c.keyword ORDER BY c.count DESC;

And then I dropped my old view table ( named count_mview ) and renamed the test table to the old name. A quick vacuum analyze afterwards and everything is happy. With this simple change I can then directly query the count_mview data and it is returned in the order I need, but this query takes just slightly less than 1 millisecond!
 

If the data in your underlying tables changes more frequently you will be better served by using triggers on those tables that fire when INSERTs, UPDATEs, and/or DELETEs are performed on them and update the materialized view table according. For a good introduction to this check out the PostgreSQL manual section on triggers and PL/pgSQL Trigger Procedures.

Hopefully you can use this technique in the future to speed up some of your slower performing queries.