March 30, 2009

Some quick updates

It's been a busy and exciting week for us.  Jacob has been at PyCon in Chicago where he is participating in a number of panel discussions and giving quite a few talks as well.   Right now I imagine he's neck deep in code in the Django sprint helping to finish up the upcoming 1.1 release. If you're running a production site built with Django you should absolutely check out the talk he is giving with James Bennett on Real World Django.

While my week has been busy hacking away on several client projects and moving my main work machine to a shiny new MacBook Pro (can't recommend these highly enough), I was interviewed by Daniel Dern of Business Trends Quarterly in his post about scaling and performance titled For Scaling, Brains May Beat Brawn. We talk about how just throwing more money and hardware at a problem is not always the best solution.  Often there are architectural, design, and/or configuration changes that can bring significant cost savings to your project.  Both in terms of the hardware necessary to keep everything flowing, but also in on going system maintenance labor costs.  I'm not talking about pre-optimization evils or complicating things for your admin, often these changes are transparent to day to day operations, but certainly not to your bottom line. For example, just using the proper RAID levels and physical disk configurations for your particular PostgreSQL database can be a huge win in performance.

I also added a tidbit of wisdom in an advice post to budding entrepreneurs called 163 Ways How To Become An Entrepreneur.

 

  

March 05, 2009

Welcome Jacob Kaplan-Moss

I'm very pleased to announce that Jacob Kaplan-Moss has joined Revolution Systems to head up a new line of services around the ever growing Django web development framework. First up are commercial Django Support Plans, but look for more Django related offerings in the near future.

Jacob has been a good friend of mine since before Django was even released.  It was a pleasure to work with him at our previous day jobs and I'm very excited for the future ahead.  Not only is he obviously an authority on Django, he's an amazing developer and generally an expert on all things tech. Jacob and Adrian are both great examples of how to lead an Open Source project and grow a real community around it.

By offering Django Support packages we hope to help adoption of Django in the business world, which helps grow the community at large.  

February 03, 2009

ORD Camp a Huge Success

I was luck enough to be invited to attend ORD Camp this last weekend in blisteringly cold Chicago.  ORD Camp is an invite only, FooCamp style unconference targeted at geeks living in the Midwest. Having never attended a FooCamp style event I wasn't sure what to expect.  I can now say if you ever have the opportunity to attend an event like this it is well worth your time.

As you can see from the attendee list it was a very diverse group of people, not just the usual crowd of notable Open Source geeks.  The amount of brain power in that room was simply amazing and I can't remember when I had as much fun.  Some sessions were presentations, others were just focused discussions.  Everything from how words work, brewing beer, life hacking, to what not to do as a startup. 

While I loved the sessions the most fun was getting into random conversations ( some ended up being NSFW after midnight and many beers ) others were more typical.  Spent some time talking with people about PostgreSQL's advantages over MySQL, alternative business models, how a certain entrepreneur might improve the performance of their servers, etc.

It is difficult to determine how important this conference will be to my business in the future, but I can easily say that it has increased my drive, ambition, and overall excitement level.  Passionate people, doing amazing things will do that to you! I can't wait to attend next year.

December 29, 2008

ResumeBucket.com Launches

I hope everyone had a great holiday this year.  For the past few months I've been working on an online resume site ResumeBucket.com and I need your help taking it for a test drive.  Our goal with this site is to create a site where you can upload your current resume in Word form, build a new resume using our online resume creation tool, or even just type in what you want using using our online text editor. 

The site gives you a unique URL you can give out to friends and prospective employers so they can instantly access an up to date copy of your resume.  There are options for them to also download a copy of your resume in Word or PDF format.  For an example of how your resume can look see our CEO's resume page.

Unlike the other resume services out there, employers are able to search the database without paying any huge fees which will drive more qualified employment leads to your INBOX.

Please take a few minutes to kick the tires.  You can leave feedback here in the comments or E-mail me at frank@revsys.com. Thanks!

September 16, 2008

Why isn't PostgreSQL using my index?

This is a common question I see from PostgreSQL users.  In fact, someone was just in IRC asking it and it prompted this post.  The exchange usually goes:

Steve: I have this column foo and I have an index on it, but when I do SELECT * FROM table WHERE foo = X, EXPLAIN doesn't use the index.  What am I doing wrong?

90% of the time the answer is unfortunately "Because the query optimizer is smarter than you are". Or maybe it's fortunately depending on how you think about it!

In this user's particular case he was mocking up a database schema and had only one row in the table he was querying against.  People who are more familiar with PostgreSQL will probably roll their eyes at the question, but if you put yourself in the user's shoes I can see how people would be confused by this.  They are thinking, "I put an index on there on purpose, why the hell isn't it working?"

PostgreSQL's query optimizer is smart, really smart and unless you have evidence otherwise you should trust what it is doing.  In this particular case, the optimizer realizes that if a table has only a few rows that using the index is actually slower than just spinning through the entire table.  Just because PostgreSQL isn't using your index today with a small number of rows, does not mean it won't choose to use it later when you have more data or the query changes. Because he was just mocking up a design he didn't have real world data, which is almost always a bad way to performance tune your system unless you are very familiar with how PostgreSQL behaves. 

Now there are other reasons why it might not be using the index.  If you have lots of data and the query you're running appears that it would benefit from the index, it might be a simple matter of forgetting to run an ANALYZE on the table or not having autovacuum turned on.  Until PostgreSQL re-analyzes your table it doesn't really "know" about that index to take it into account when building the query plan. 

While performance tuning PostgreSQL is much easier and better documented than in days gone by, it can still be very confusing and time consuming for the inexperienced.  If your business needs help tuning their system you might consider my PostgreSQL Tuning Service.

August 20, 2008

Fret Free -- Introduction to Django and the Django Software Foundation

LinuxPro Magazine just released my latest article, an introduction to Django and some discussion about the newly created Django Software Foundation. Being a life long Perl user, I didn't think I would enjoy Django at all. I have to admit that it is a VERY polished system.  It has great PostgreSQL support, in fact the core developers smartly prefer it over MySQL for their own systems.

You can download a PDF copy of the article at, Fret Free -- Django and the Django Software Foundation.  The print issue will hit the stands in October.  Hope you enjoy it!

February 04, 2008

PostgreSQL version 8.3 Released

I just got word that version 8.3 of PostgreSQL has been released.  Along with the usual amount of improvements there are some new features in 8.3 that should be of interest to PostgreSQL admins and developers such as:

  • Integrated TSearch
  • ENUM and UUID data types
  • Faster sorting technique used for LIMIT operations
  • Faster LIKE and ILIKE operations
  • Lazy XID assignment which will make many read only operations much faster

Check out the full list of features at the PostgreSQL site or download it from the download section of their site.

January 24, 2008

EveryBlock.com is now launched

My friend and former co-worker Adrian Holovaty and his team just launched their new project EveryBlock.com. EveryBlock takes the term hyperlocal to a whole new level.  They aggregate tons of public data sources by geo location so you can for example find all of the recent crime around a particular address, neighborhood, zip code, etc.  Or maybe you might be interested in the building code violations of where you live or work?

Right now they have San Francisco, Chicago, and New York up and running, but will be adding more cities as time goes on.  Adrian asked me to help performance tune their PostgreSQL database a couple of months ago and so far things seem to be humming along nicely.

Here are some links to other blogs talking about EveryBlock.com:

Congrats and the best of luck to EveryBlock! I'm sure we'll see even more new and interesting things from this team in the future!

October 05, 2007

Log Buffer #65: a Carnival of the Vanities for DBAs

Welcome to the 65th edition of Log Buffer, the weekly survey of database related blogs.

First let's start with some miscellaneous entries that could be of interest to any DBA.  Crazy DBA has an interesting post about how attending conferences helped to grow his professional network, which in turn has made him a better DBA.  And Thomas Kyte has a great post about why it's the data, not the application itself, that matters.  Brian Aker gives us a great link to a Werner Vogels' entry on Dynamo, one of the key technologies used behind the scenes at Amazon.

Oracle users will certainly find these two links of interest.  First off, Frederik Visser shows you how to play with Oracle 11g RAC in VMWare. And Alex Gorbachev has a nice write up about Miracle Open World.

SQL Server DBAs might enjoy the following posts.  If you're thinking about using or upgrading to Idera SQLsafe v4.5, you'll want to check out Sean McCown's post about some of that product's issues. Steve Jones has some thoughts on monitoring and alerting with your SQL Server, but are valid for any database. Need to know when your SQL Server instance was started? Check out Joe Webb's tip on how to find out.  And Mladen Prajdić has some advice on how to notify a client in a long running process with SQL Server.

MySQL users will find this post on accurately measuring how far behind your slave is lagging. Over at the MySQL Performance Blog there is an opportunity to ask questions of Heikki Tuuri, the creator of InnoDB, and Peter has some thoughts on a few serious bugs in the MySQL 5.0 release.  Kaj Arnö has an interesting post on how MySQL GmbH and MySQL AB help birds of a feather to flock together, quite literally and about how they have opened up the call for papers for the 2008 MySQL Users Conference.

Kevin Burton talks about how to avoid swapping insanity with InnoDB. Want a free MySQL Magazine? Lewis Cunningham has found one for us all. Jan Kneschke introduces us to the Wormhole storage engine for MySQL.  Not really sure how useful it is, but it is definitely interesting.

Hubert Lubaczewski has written a great tool to help you determine the optimal layout of tables, indexes, etc. on your various tablespaces for PostgreSQL.  Robert Treat follows up with some additional thoughts to consider

Joshua Drake has announced the speakers and topics for the PostgreSQL Conference Fall 2007, which is October 20th 2007 at Portland State University. Greg Sabino Mullane has a nice explanation of why you can't used prepared queries when using DBD::Pg and pg_bouncer. And to finish our this week's  links, Francisco Figueiredo Jr mentions that PostgreSQL will have a UUID data type in version 8.3.

Enjoy!

July 30, 2007

Which PostgreSQL backend am I using?

Someone asked me how to determine which PostgreSQL backend a particular client was connected to.  Everyone's first thought is to do a ps aux | grep postgres which will show you the IP and user, but if you have different processes connecting from the same IP with the same usernames, how do you know which is which?

One way to tell would be to see which queries are being executed by which backend and match that up to your client side.  But you can quickly get confused, especially if the various connections are all executing the same SQL statements, a web application for example.

The simplest way was suggested by Jacob Kaplan-Moss, which is to use the pg_backend_pid() function like:

SELECT pg_backend_pid();

I love it when the solution is something really simple!