The Tech Faucet: postgresql

Showing posts with label postgresql. Show all posts

Monday, October 29, 2012

The Internet needs weeding

In librarian terminology, Weeding is jargon for the process of going through a collection of works (books, magazines, etc) and removing ones that are no longer worth having. These works may be:

Out of date to the point of uselessness (like Windows 95 for Dummies);
Damaged and worn out;
Discredited;
Superceded by new revisions;
Surplus to requirements, where they're potentially still useful but space is needed for other more important things; etc

Why do you care? Because the Internet needs weeding, too. Right now individual website operators must take responsibility for that themselves. Some of them aren't; either they can't manage their large libraries of ageing content or they just don't want to.

This TechRepublic article was dubious when it was written, and it's now amazingly out of date and plain wrong, yet there's a steady stream of comments suggesting that people still refer to it. This article, from 2002, doesn't bother to mention little details like version numbers that might help place it in context. It claims, among other things, that MySQL doesn't support subqueries, views, or foreign keys. It also simply says that MySQL is "faster" and PostgreSQL is "slower". It's never been that simple, and it sure isn't now.

I discovered it because someone linked to it on Stack Overflow as if it was current information. Someone who usually does a fairly decent job writing informative answers; they just didn't bother to look at this particular article and see if it was any good before citing it.

In print, at least you can look at a book and go "Ugh, that's old". When an article has been carried over to a site's nice shiny new template and is surrounded by auto-included content with recent dates and context, how's a newbie to know it's complete garbage?

By the way, I don't claim it's easy to manage a library of tens or hundreds of thousands of ageing articles. Periodic review simply isn't practical. Websites that host large content libraries need to provide ways for users to flag content as obsolete, misleading, discredited or otherwise problematic. They also need to make an effort to ensure that their articles will age well by including prominent versions, dates, "as of ..." statements, etc at time of writing. This article would've been OK if it'd simply said "PostgreSQL 7.2" and "MySQL 3.3" (for example) instead of just "MySQL" and "PostgreSQL". It's easy to forget to do this, but being responsive to feedback means you can correct problems and remain a reasonably reputable source.

One of the things you and I - the community - can do is to flag use of these articles when you see them linked to, and try to contact site owners to take them down, add warnings indicating their versions and age, or otherwise fix them.

Time for me to try to have a chat with TechRepublic.

Wednesday, October 24, 2012

Natural sorting: An example of the utility of Pg's composite types and arrays

While looking at a recent stack overflow question I found myself wondering if it was possible to write a natural sort for strings containing numbers interleaved with non-number text using only PostgreSQL's core functionality.

Natural sorts are an important usability feature, as Jeff points out in his post on natural sorts above.

So I asked for ideas, and it turns out that yes, you can, though it's a bit long-winded. Props to Erwin Brandstetter for persistently refining the approach. The general idea is to create a composite type of `(text,integer)` then sort on an array of that type. See the linked question for details.

This illustrates how powerful Pg's composite types and arrays are, though I'm not sure you should consider any of the proposed solutions for real world production use.

It it also helps to show how nice it'd be to have access to native OS-independent Unicode collation in PostgreSQL using the International Components for Unicode (ICU) project, which would not only solve those nasty Windows-vs-Linux locate name issues when restoring dumps, but would also allow the use of advanced collation flags like UCOL_NUMERIC_COLLATION.

I'd really love to be able to use a custom collation function in Pg, either via an ORDER BY extension or by creating a collation that uses a user-defined collation function then using that collation in the COLLATE clause. Then I could write a C function to use ICU to do the special collation required for a particular job. This doesn't appear to be possible at the moment.

I recommend reading Jeff's post on natural sorting and why it's important; as usual, it's excellent.

Thursday, October 18, 2012

Generating random bytea values in PostgreSQL

While playing around with answering an interesting question on dba.stackexchange.com I wrote a simple C extension to PostgreSQL that generates random bytea values of a user-specified size. Fast.

In case anyone else is looking for a good way to dummy up random binary data in PostgreSQL, you can find the code in my scrapcode repository on GitHub

See the extension's README for details.

There's a pure SQL version fast enough to use for generating a few 100s of KB of data, too, or a couple of MB if you're patient.

Sunday, October 14, 2012

Avoiding PostgreSQL database corruption

TL;DR: Don't ever set fsync=off, don't kill -9 the postmaster then delete postmaster.pid, don't run PostgreSQL on network file systems.

Reports of database corruption on the PostgreSQL mailing list are uncommon, but hardly rare. While a few data corruption issues have been found in PostgreSQL over the years, the vast majority of issues are caused by:

Administrator action;
Misconfiguration; or
Bad hardware

A recent mailing list post asked what can be done to reduce the chance of corruption and keep data safe.

If you think you have a corrupt PostgreSQL database, stop the database server and take a complete copy of the data directory now. See the Corruption page on the wiki. Then ask for help on the pgsql-general mailing list, or for critical/urgent issues contact a profesional support provider.

Do not attempt to fix the problem before taking a complete copy of the entire data directory. You might make the problem much worse, turning a recoverable problem into unrecoverable data loss.

Here's my advice for avoiding DB corruption issues, with some general PostgreSQL administration advice thrown in for good measure:

PostgreSQL packaging on Mac OS X is a mess

It appears to me - based primarily on what I see on Stack Overflow rather than direct Mac use experience - that PostgreSQL packaging on Mac OS X is a real mess.

There are at least four widely-used competing package systems:

An ancient version pre-installed in some Mac OS X versions
Homebrew
Postgres.app from Heroku; and
The EnterpriseDB One-Click installer for Pg

Fink and MacPorts packages also exist, but seem to have either fallen into disuse or "just work" so I don't see breakage reports about them.

PostgreSQL rocks, and so does explain.depesz.com

I was modifying one of my aggregate views today and I was struck by just how impressive PostgreSQL is, and how powerful relational databases in general really are. PostgreSQL was executing queries against the view a completely different way when one entry was being queried vs when the whole view was being requested.

You might take this for granted - but when you think about it, it's seriously amazing how a database can take your description of what you want and work out the how for its self. Mostly.

Imagine writing this yourself. Say you're working in Java. You have a complex Criteria query against several different tables related to information about customers. You now want to get information for just one customer, so you add a WHERE clause to the top level. If the database didn't nearly magically push this filter criterion down into all those complex sub-queries and joins you'd be at it for hours; days or weeks if you wanted to make it re-usable and generic across a set of similar criteria queries.

Instead, the DB just does it for you.

PostgreSQL usability - PgAdmin-III and Pg need some usability love

As anyone who's read much here will know, I'm a huge fan of PostgreSQL, the powerful open source relational database management system. It is an amazingly project with a team that keeps on releasing high quality updates full of very useful new features and improvements.

This is my blog, so there must be a "but", right? You're quite right.

I already wrote a PostgreSQL: Great even when you can see the warts in response to a perhaps overly glowing write-up someone did a while ago. I'm not covering that again here; this post is specifically about a topic more and more dear to my heart, that of usability.

As a long-time UNIX geek I've always found PostgreSQL quite easy to use, because I live on the command line where psql is just wonderful. Recently, though, I had to use PostgreSQL on Windows, and in the process couldn't help seeing it from the new user's point of view. In fact, I tried to think like a new user as I performed the tasks I needed to do.

It wasn't all roses. I knew enough about Pg to get past the issues pretty easily, but some of them would be real roadblocks for new or even intermediate users, and I think they're worth highlighting. Many of the issues were with PgAdmin-III, but far from all of them.

This post started out as a minor critique of a few usability sore points in Pg and PgAdmin-III. As I went through step-by-step producing screenshots and thinking through the process as a user, though, I realised it's an absolute, complete and utter train-wreck. If this had been my first experience with PostgreSQL, I'd be a MySQL, MS-SQL or Oracle user now.

PostgreSQL: Great even when you can see the warts

I’m a very happy PostgreSQL user, and was interested to read a recent post by an MS SQL Server DBA advocating PostgreSQL.

This post is a response, enumerating areas where Pg may not be so hot after all. You might say it's my tendency to always see the negative, or perhaps it's just my desire to inform others of the downsides of something so the loud advocacy doesn't leave them feeling let-down when the reality hits. I'd like to think it's the latter. Telling someone how it is from the start is much better than showing them the rosy-goggles view until they're committed. So, from an enthusiastic PostgreSQL user who spends a lot of time helping out on the mailing lists, here's a list of some of the warts and quirks I'm aware of, so you don't have to find out about them the hard way:

This post is based on PostgreSQL 9.2. Each version improves a great deal, so don't be surprised if some of these issues are gone soon.

Database preferences and product selection methodolgy

I recently stumbled across an interesting weblog post by a DBA who expresses a strong preference for Oracle over PostgreSQL. I thought I'd respond to it with a few thoughts here, not so much because of the opinion expressed as the reasons given for it.

Drupal 7, PostgreSQL, unserialize, and bytea_output

I just found out that PHP 5.3's PostgreSQL PDO driver (as used by Drupal) is broken by the Postgresql 8.0 transition from octal to hex encoding for bytea. It passes the raw hex through to the app, including the leading x , causing PHP's unserialize() function to choke.

I'm using Drupal with Apache and PostgreSQL on Windows (sigh) because I'm writing up a step-by-step hand-holding guide for someone who needs to do some testing against our Drupal database. I wouldn't be using that configuration voluntarily ;-)

Drupal doesn't check that bytea_output is set to 'escape' as a workaround or do a sanity test to detect this fault, so the results are ... interesting:

Notice: unserialize(): Error at offset 0 of 27 bytes in variable_initialize() (line 749 of C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\drupal-7.0\includes\bootstrap.inc).

If anybody else lands up whacking their head against this for a while: just

ALTER DATABASE drupal SET bytea_output = 'escape'

to have your sanity restored.

The amount of breakage being seen in drivers really makes me wonder if the the octal to hex transition should've been done with an opt-in from drivers during negotiation or an explicit SET, rather than just making it globally default. But, hey, hindsight.

Sunday, January 2, 2011

PostgreSQL automatic crash dumps for windows

It seems my first real* PostgreSQL patch has been accepted and committed! By the time patch review was done I think it was almost more Magnus Hagander's patch than mine, but I'm still very happy to have put it together and seen it through into Pg mainline. Thanks very much to Marcus for testing, enhancing and committing the patch.

PostgreSQL 9.1 will now have automatic crash dump generation under Windows. That'll allow Windows users to run their production sites with crash dumps so we can do post-mortem debugging without holding up their server or requiring them to run it under a debugger. That gives us a much better chance of tracking down the cause of hard-to reproduce or intermittent faults. To activate the crash dump collection feature, just create a crashdumps directory in the data directory and grant the postgres user (or whatever user your server runs under) "full control" of that folder in Properties->Security. No configuration file changes or server re-starts are needed.

Helpfully, this change will also let the majority who don't know what Visual Studio or windbg.exe are send crash dumps off to someone more experienced with the tools already configured. There should be much less need for this page - and while it took me long enough to write, I'll be glad to see it rendered unnecessary.

* The one-liner fix for the X.509 client certificate validation bug doesn't really count, despite the truly epic amount of convincing required to get the patch applied.

Thursday, May 20, 2010

Using PKCS#12 client certificates with PgJDBC

(This post is a reference copy of a mailing list post I made explaining how to use client certificates with PgJDBC if you wanted to be able to accept user-provided PKCS#12 files.)

The Tech Faucet

Monday, October 29, 2012

The Internet needs weeding

Wednesday, October 24, 2012

More uses for PostgreSQL arrays

Friday, October 19, 2012