August 05, 2007

A Guide to Hiring Programmers: The High Cost of Low Quality

I was invited to a wonderful dinner party (I swear it wasn't too spicy Sarah!) with some St. Louis Perl peoples this week while I'm here on business.  At one point we were talking about hiring programmers, specifically Perl programmers.

We agreed on the following:

  • Finding good programmers is hard in any language.  And that a good programmer can be as effective as 5-10 average programmers.
  • Average pay rates between equivalent programmers are out of sync and are based more on the language used than the skill of the programmer.
  • You don't need to hire an expert in language X, you can and should look for expert programmers that are willing to learn language X. An expert can easily cross over from being a novice in any language in a matter of a few weeks.
  • You should seriously consider allowing your expert developers to telecommute full-time. Restricting your search to programmers who live in your area or are willing to move limits the talent you can acquire. Arguments regarding "face time", productivity, etc. can easily be nullified when you look at how some of the largest and most successful Open Source projects such as Linux, Apache, and Firefox are developed by individuals rarely living in the same time zone or even country.
  • We love Perl and think it's a great language that you graduate to after you have been forced to use less agile languages such as Java, C/C++/C#, etc. Not necessarily a first language you get your feet wet with and then move onto a *cough* "real" language.

Many people in the Perl community have been writing on this topic lately and wanted to share my opinions on the subject, as it is one I have put many hours of thought into. Doing my best to keep this language agnostic as I believe these tips can be applied to any programming language. I will however, use Perl in some examples as it is my preferred language.

Why is it so hard to find good programmers?

The simplest reason is when a company finds a good developer they do more to make sure that person is happy which leads to longer tenures. Better salary, more flexible working conditions, good tools, interesting projects, and better perks can often keep a programmer working for you longer.

Another obvious reason is that experts in any field are small in number, so your possible talent pool is limited. This leads managers and HR departments to settle for average or even below average developers.  I believe this is the single biggest mistake a technology oriented company can make, regarding developers, just short of not using a good version control system.

We're not talking about customer service representatives or sales people here. Just having a body to fill the seat is not, I repeat not, always a win for the company. Sub-standard programmers drag down the efficiency of your other developers with beginner questions, poor comments/documentation, and bad code that someone else will later be forced to spend time fixing.

Companies need to stop thinking about their developers as cogs in the machine. They are more akin to artists, authors, designers, architects, scientists, or CEOs.  Would your HR department rush to find the first person who would be willing to take on the role of Chief Scientist, Art Director, or CEO in your company? Of course not, they would spend the time to do a thorough talent search for just the right candidate, court them, and then compensate them appropriately. They realize that having the wrong person in that seat is much worse than having the seat empty. It is absolutely the same with programming.

Anyone who has been a developer or managed developers can tell you that an expert can accomplish as much as 10 average developers.  However, companies typically pay only a 10-20% premium for an expert over the average programmer. Whether or not their title is Lead, Architect, Development Manager, Guru or whatever nomenclature the company uses. I am not saying that if your average developer is paid $50k/year that you should pony up $500k/year for an expert. The employer/employee relationship never works like that, but what employers don't seem to realize is that in the end paying more saves them more.

Let's look at an example.  One common argument from HR departments is that they "can't find any Perl programmers, but they can't swing a cat without hitting a Java developer".  While this is fairly accurate, they are approaching the problem from the wrong direction.  If you fill your shop with 15 average Java developers, paying an average of $60k per developer you have an approximate labor cost of $900k/year for your development staff.  Not considering any non-salary benefits.

Suppose you instead took the time to find 5 experts, or at least above average, Perl developers at $120k each per year. Here is a partial list of the pros and cons of such a scenario:

Cons:

  • You must spend extra time finding, evaluating, and courting these more sought after developers.
  • Your company and what the developer may be asked to build may simply not be attractive to this class of developer.  Very few people want to work for a spammer or a small web design firm that caters solely to freelance accountants for example. Smart people find boring things even more boring than the masses.
  • When one of them leaves the company, there is the feeling that your company's business objectives are more at risk due to having only 4/5ths of your normal resources. Or that a larger chunk of your corporate knowledge just walked out the door. This is more of a perceived problem than an actual one as good developers are better at writing readable/maintainable code, commenting their work, and writing effective documentation.

Pros:

  • Each developer will be more content with their job, due in part to the higher than average salary, but also because his or her co-workers are of a much higher quality which improves anyone's job satisfaction.
  • Development would require less overall communication as there are less people to communicate with.  This obviously improves efficiency as anyone who has been on a 20+ person conference call can attest to. Or read the Mythical Man Month if you want a more in-depth analysis of this phenomenon.
  • Experts travel in the same social circles.  Having one expert on staff makes it much easier to find other experts in the same field, no matter what field that may be.
  • You would save 2/3rds on infrastructure costs.  Things like cubicles, computers, cell phones, free lunches, training costs, travel, office space, air conditioning, electricity, etc, etc. The list is essentially endless.
  • Your HR department would have 1/3rd the number of developers that it would need to take care of. Less paper work, less questions, less everything, and less turn over because of the lower number of employees.
  • Oh and you'd save $300k/year on your labor costs.  Not to mention non-salary benefits such as stock options, retirement matches, health insurance premiums, perks, etc. You could spend as much as $100k/year on your talent searches and still be $200k/year ahead.  Hell, you could dedicate an entire HR person just to this task.

What is an expert programmer?

Experience is key, but not necessarily in ways you might imagine.  Time in the saddle, with a particular language is not as important as diversity of experience.  Someone who has worked in several disparate industries, a generalist, is often a much better developer than one who has spent years in the same industry.  There are exceptions to this, but in general I have found this to be the case.  Bonus points if your developer was a systems administrator in a former life.

Some of the best developers I know were originally trained as journalists, mathmaticians, linguists, and other professions not normally associated with software development.

Experts use better tools and care deeply about their craft.  They aren't assembling bits on an assembly line, they are crafting a unique product to solve a unique problem.  Experts are lazy, they work smarter rather than harder.  Experts prefer the easiest solution that gets the job done. Experts aren't interested in creating complex solutions simply to have the complexity, that misguided egoism is the territory of more junior developers. They often get it right the first try and almost always on the second one.

Simply put, experts write readable code.  They comment and document it appropriately based on the complexity and criticality of that particular piece of code.

All of this pays huge dividends when the next developer has to pick up where they left off. Especially if the next person isn't an expert.

More reasons you want an expert programmer

Is your business technology oriented?  Perhaps the software you create is even your main product. If nothing else I'm sure we can agree that if the software your developers create is to some degree critical to your business.

I've worked in many different environments, with people of every skill level, and it's very easy to tell whether or not a company has expert developers. Do you often find that the software is down? That it has as many bugs or even just idiosyncrasies that make no sense to the user as it does features?  Do the users find it difficult to use?  Is the problem at hand relatively simple compared to the training or documentation necessary to begin using the software?

If you answered yes to any of those questions you more than likely have average or below average developers.

When you work in an environment with experts things simply work.  They are easier to use and require less initial training. The software is easier to modify.  Requested changes happen more frequently and easily.  Things just flow.  It is the difference between Apple and Microsoft.  It's the difference between the iPod and a 400 disc CD changer with 50+ buttons.

As with many things in life, sometimes you get what you pay for. I'd love to hear your comments and opinions on the subject.

UPDATE: I've written a response to some of the questions and comments I've received on this article in a follow up post A follow up to "A Guide for Hiring Programmers"

May 31, 2007

Email, Templates, and Perl

I have been meaning to talk about one of my new favorite Perl modules, MIME::Lite::TT::HTML , for quite a while now.  As I mentioned in a previous post, there are a bazillion different ways to send an Email message from Perl.  This one is just my new favorite.

Here is a short list as to why:

  • Can be used for complex multi-part messages and handles attachments easily
  • Built upon the equally great MIME::Lite module
  • Allows you to easily template your messages using the familiar Template Toolkit package

The templating part is, in my opinion, the important part.  How many times have you had to go edit some source code just to change the text or subject of a message?  Isn't that just terribly annoying. We use configuration files, MVC with HTML templates, etc, etc. to not hard code things into our apps, but for some reason many people ( myself included for years ) have neglected Email.

Not any longer, I've switched to using this module as my standard way of sending Email these days.  If you are interested in learning more about MIME::Lite::TT::HTML, check out my short howto Sending Email with Perl Best Practice on the subject.

May 16, 2007

Common PostgreSQL problem

I see this problem pop up in the #postgresql IRC channel so often I felt it was necessary to blog about it. This problem trips up so many new users it might even be worth changing the default error message to indicate what is going on. The error message happens when the user tries to run psql for the first time:

psql: FATAL: database "root" does not exist

Where "root" is the current Unix username of the operator.  By default PostgreSQL attempts to log you into a database that is the same as your username.  However, it does not setup this database for you because it would be silly to setup 500 databases for all of the Unix users on your system, if only two of them are going to be using PostgreSQL. 

When setting up PostgreSQL for the first time you need to do the following:

  1. su ( or otherwise ) become your root user
  2. su ( or otherwise ) become your PostgreSQL user, typically 'postgres'
  3. Create your first database

The ultimate goal here is to become your PostgreSQL user, typically this involves becoming root and then switching to user postgres.  Upon setup this is the only user that is allowed to create users and databases.

Your "first" database can be created in one of two ways:

  1. Run the command 'psql template1' followed by a 'CREATE DATABASE' SQL call
  2. Run the command 'createdb <dbname>'

While you're still the postgres user it is probably best to also create a user with 'createuser <username>' or a 'CREATE USER' SQL call. See this section of the PostgreSQL documentation for more information on creating users and roles. You'll also want to read up on managing databases.

NOTE: The programs createdb and createuser may not be, by default, in your PATH so it may be necessary to use locate or type in the full path to your PostgreSQL bin/ directory.

Hope this helps!

March 12, 2007

SMTP Connections: How to handle large loads

I recently came across several blog posts about the declining state of E-mail due to spam.  Specifically these:

I've been running E-mail servers for myself and others for over 10 years now and I have to agree that with the current version of SMTP we all use, there isn't much that can be done about spam that isn't already being done.  If you've got some RBLs, SPF, anti-virus, and a decent spam filter setup, there isn't much more you can do.  Sure you can get +/- another percentage point, but you won't really find a solution that is 100% effective 100% of the time.  It just isn't possible with the current standards.

However, these articles also discuss another issue that is overlooked by everyone who doesn't run a large E-mail system.  By large I'll say over 1,000 users. Around that point you start to run into problems being able to handle the shear number of incoming SMTP connections.  Note that you may hit this point long before 1,000 users depending on hardware and your personal traffic levels. But I digress.

I haven't found any Open Source solution to this problem, but I've come pretty close.  TrafficControl is a commercial product from MailChannels that is built upon the good Open Source base of Apache and mod_perl. TrafficControl does two things for you:

  • It uses async I/O to proxy incoming SMTP connections.  This has allowed me to handle 10x the number of connections on the exact same hardware
  • Allows you to configure throttling rules based on RBLs, Operating System, etc. and choke off the bandwidth to suspicious sending servers.  This reduces a great deal of spam as most bots simply move on if they are not getting go flow.  After a configurable time period it will remove the choke hold and allow the message to continue on normally.  Servers that follow the SMTP RFC have no problem with this as they hang on during the process.

I've been using it for about a year now and couldn't imagine trying to run a large E-mail system without it now.  I encourage you to check it out if you see you are having a similar problem.


July 25, 2006

Doing a LEFT OUTER join with DBIx::Class

I have recently been using DBIx::Class instead of the more popular Class::DBI. It has many advantages over Class::DBI that I won't go into here, but if you haven't used it yet you should definitely check it out.

One thing I found the other day is how to setup a special LEFT OUTER join query. If you have a situation where you need to do a LEFT OUTER join on your data, but only say in one particular script.  Or maybe a one off report that you won't be keeping around. You could go ahead and put in this relationship in your main model class, but for a one off that is a bit of overkill. 

What I hadn't thought about, was you can define those relationships from outside the MyModelClass.pm file itself.  Take for example a simple Artist -> CD relationship, where you want all artists even if they don't have any CDs: 

 

use ExampleSchema;

ExampleSchema::Artist->has_many('left_outer_albums' =>
                       'ExampleSchema::Cd', 'artist_id',
                       { join_type => 'LEFT_OUTER' } );

my $schema = ExampleSchema->connect('dbi:Pg:dbname=outer', '', '');

my $rs = $schema->resultset('Artist')->search(
    undef,
); while( my $artist = $rs->next ) {     print "Name: " . $artist->name . "\n";     print "Albums: \n";     foreach my $album ( $artist->left_outer_albums ) {         print "\t" . $album->title . "\n";     } }

The nice thing about this is that this special left_outer_artists is defined and used in the one off and doesn't have to polute your main ExampleSchema::Artist relationships that might confuse someone. It may not be the best practice, but it is something to consider.

January 06, 2006

Tuning your PostgreSQL Database

Several months ago I wrote an article on tuning your PostgreSQL database for performance that has gained a lot of attention. While I think the article covers most of the basic to intermediate level options you can use to better tune your database server, it is by no means all you're ever going to need to know.  If you use PostgreSQL often I strongly suggest you at least scan the posts on the postgresql-performance mailing list.

What surprised me most is how many companies and individual developers that are in need of a consultant to help them get the most out of their PostgreSQL setup. Because of this we've launched a new PostgreSQL Performance Tuning Service designed to help organizations receive better performance out of their systems and reduce the need to upgrade their server hardware. We often find that a few well placed configuration, query, or stored procedure changes can dramatically impact the speed of your application or website.

The problem with online tuning guides and the standard documentation is that every company's database is designed and/or used just differently enough from everyone else that a customized tuning is the best option. Contact us to find out more and schedule a performance analysis. 

January 03, 2006

When to use a materialized view in PostgreSQL

A materialized view is defined as a table which is actually physically stored on disk, but is really just a view of other database tables. In PostgreSQL, like many database systems, when data is retrieved from a traditional view it is really executing the underlying query or queries that build that view. This is great for better representation of data for users, but does not do anything to help performance.

Materialized views are different in that they are an actual physical table that is built from the data in other tables. To use another example from my NewsCloud application in order to achieve the performance I needed, I used a materialized view for representing the tag cloud.

In this particular application the data used to build the tag cloud changes very infrequently, but to generate the actual tag cloud the ORDER BY needed to rank the results was terribly slow. They query in question is:

SELECT k.id, k.keyword, c.count FROM news_keywords AS k, news_keyword_total_count AS c WHERE k.id = c.keyword ORDER BY c.count DESC;

This query was taking an average of 2 seconds to complete which would mean, when you figured in all of the other time aspects such as mod_perl, Apache, transporting the HTML back to the browser, etc. this could easily mean the user would see a 3-4 second page load time. However, by creating a new table with:

CREATE TABLE test AS SELECT k.id, k.keyword, c.count FROM news_keywords AS k, news_keyword_total_count AS c WHERE k.id = c.keyword ORDER BY c.count DESC;

And then I dropped my old view table ( named count_mview ) and renamed the test table to the old name. A quick vacuum analyze afterwards and everything is happy. With this simple change I can then directly query the count_mview data and it is returned in the order I need, but this query takes just slightly less than 1 millisecond!
 

If the data in your underlying tables changes more frequently you will be better served by using triggers on those tables that fire when INSERTs, UPDATEs, and/or DELETEs are performed on them and update the materialized view table according. For a good introduction to this check out the PostgreSQL manual section on triggers and PL/pgSQL Trigger Procedures.

Hopefully you can use this technique in the future to speed up some of your slower performing queries. 

January 02, 2006

Sendmail Virtual User Trick

One of the things people find difficult about Sendmail is virtual users.  These are defined in the virtusertable file ( usually in /etc/mail/virtusertable ). This file instructs Sendmail to translate a "virtual" user into a real user or alias. The reason I mention aliases here is because, with Sendmail, you can have a virtual user that translates into a alias for multiple local and/or remote E-mail accounts.

A situation some people run into is that they want all usernames at domain2.com to be delivered to the same username at domain1.com, except for a few users.... If you really wanted all users to map to the other domain it would be as simple as adding the domain into the local-host-names.  It is the need to have *most* users map to the other domain where the virtusertable file comes in handy.  For this example, let's assume that we want all users names @domain1.com except for postmaster, webmaster, and support to map to the same username @domain2.com this would be accomplished by adding domain2.com to the local-host-names file to tell Sendmail that we wish to receive mail for that domain and then adding the following to the virtusertable file:

postmaster@domain2.com: mailadmin@example.com
webmaster@domain2.com: webmsater@example.com
support@domain2.com: support@example.com
@domain2.com: %1@domain1.com

The first three lines tells Sendmail to send messages sent to those three usernames at domain2.com to the appropriate remote E-mail addresses. The last line instructs sendmail to send *any* other usernames sent to domain2.com to the same username at domain1.com. Note that this will also include any fake usernames that a spammer might send to. The E-mail server at domain1.com will still be responsible for determining what is or is not a valid username. After you've added those entries to the virtusertable file all you need to do is rebuild it and it becomes active.