wiki.postgresql.org Open in urlscan Pro
2001:4800:3e1:1::201  Public Scan

Submitted URL: https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_upper_case_table_or_column_names%20%20Pleas
Effective URL: https://wiki.postgresql.org/wiki/Don't_Do_This
Submission: On September 11 via manual from US — Scanned from CA

Form analysis 1 forms found in the DOM

/index.php

<form action="/index.php" id="searchform"><input type="hidden" value="Special:Search" name="title"><input type="search" name="search" placeholder="Search PostgreSQL wiki" title="Search PostgreSQL wiki [alt-shift-f]" accesskey="f"
    id="searchInput"><input type="submit" name="go" value="Go" title="Go to a page with this exact name if it exists" id="searchGoButton" class="searchButton">&nbsp; <input type="submit" name="fulltext" value="Search"
    title="Search the pages for this text" id="mw-searchButton" class="searchButton"></form>

Text Content

Want to edit, but don't see an edit button when logged in? Click here.




DON'T DO THIS

From PostgreSQL wiki


Jump to navigationJump to search


CONTENTS

 * 1 Database Encoding
   * 1.1 Don't use SQL_ASCII
     * 1.1.1 Why not?
     * 1.1.2 When should you?
 * 2 Tool usage
   * 2.1 Don't use psql -W or --password
     * 2.1.1 Why not?
     * 2.1.2 When should you?
   * 2.2 Don't use rules
     * 2.2.1 Why not?
     * 2.2.2 When should you?
   * 2.3 Don't use table inheritance
     * 2.3.1 Why not?
     * 2.3.2 When should you?
 * 3 SQL constructs
   * 3.1 Don't use NOT IN
     * 3.1.1 Why not?
     * 3.1.2 When should you?
   * 3.2 Don't use upper case table or column names
     * 3.2.1 Why not?
     * 3.2.2 When should you?
   * 3.3 Don't use BETWEEN (especially with timestamps)
     * 3.3.1 Why not?
     * 3.3.2 When should you?
 * 4 Date/Time storage
   * 4.1 Don't use timestamp (without time zone)
     * 4.1.1 Why not?
     * 4.1.2 When should you?
   * 4.2 Don't use timestamp (without time zone) to store UTC times
     * 4.2.1 Why not?
     * 4.2.2 When should you?
   * 4.3 Don't use timetz
     * 4.3.1 Why not?
     * 4.3.2 When should you?
   * 4.4 Don't use CURRENT_TIME
     * 4.4.1 Why not?
     * 4.4.2 When should you?
   * 4.5 Don't use timestamp(0) or timestamptz(0)
     * 4.5.1 Why not?
     * 4.5.2 When should you?
   * 4.6 Don’t use +/-HH:mm as a text Time Zone name
     * 4.6.1 Why not?
     * 4.6.2 When should you?
 * 5 Text storage
   * 5.1 Don't use char(n)
     * 5.1.1 Why not?
     * 5.1.2 When should you?
   * 5.2 Don't use char(n) even for fixed-length identifiers
     * 5.2.1 Why not?
     * 5.2.2 When should you?
   * 5.3 Don't use varchar(n) by default
     * 5.3.1 Why not?
     * 5.3.2 When should you?
 * 6 Other data types
   * 6.1 Don't use money
     * 6.1.1 Why not?
     * 6.1.2 When should you?
   * 6.2 Don't use serial
     * 6.2.1 Why not?
     * 6.2.2 When should you?
 * 7 Authentication
   * 7.1 Don't use trust authentication over TCP/IP (host, hostssl)
     * 7.1.1 Why not?
     * 7.1.2 When should you?

A short list of common mistakes.

 * Kristian Dupont provides schemalint a tool to verify the database schema
   against those recommendations.


DATABASE ENCODING


DON'T USE SQL_ASCII


WHY NOT?

SQL_ASCII means "no conversions" for the purpose of all encoding conversion
functions. That is to say, the original bytes are simply treated as being in the
new encoding, subject to validity checks, without any regard for what they mean.
Unless extreme care is taken, an SQL_ASCII database will usually end up storing
a mixture of many different encodings with no way to recover the original
characters reliably.


WHEN SHOULD YOU?

If your input data is already in a hopeless mixture of unlabelled encodings,
such as IRC channel logs or non-MIME-compliant emails, then SQL_ASCII might be
useful as a last resort—but consider using bytea first instead, or whether you
could autodetect UTF8 and assume non-UTF8 data is in some specific encoding such
as WIN1252.


TOOL USAGE


DON'T USE PSQL -W OR --PASSWORD

Don't use psql -W or psql --password.


WHY NOT?

Using the --password or -W flags will tell psql to prompt you for a password,
before trying to connect to the server - so you'll be prompted for a password
even if the server doesn't require one.

It's never required, as if the server does require a password psql will prompt
you for one, and it can be very confusing when setting up permissions. If you're
connecting with -W to a server configured to allow you access via peer
authentication you may think that it's requiring a password when it really
isn't. And if the user you're logging in as doesn't have a password set or you
enter the wrong password at the prompt you'll still be logged in and think you
have the right password - but you won't be able to log in from other clients
(that connect via localhost) or when logged in as other users.


WHEN SHOULD YOU?

Never, pretty much. It will save a round trip to the server but that's about it.





DON'T USE RULES

Don't use rules. If you think you want to, use a trigger instead.


WHY NOT?

Rules are incredibly powerful, but they don't do what they look like they do.
They look like they're some conditional logic, but they actually rewrite a query
to modify it or add additional queries to it.

That means that all non-trivial rules are incorrect.

Depesz has more to say about them.


WHEN SHOULD YOU?

Never. While the rewriter is an implementation detail of VIEWs, there is no
reason to pry up this cover plate directly.





DON'T USE TABLE INHERITANCE

Don't use table inheritance. If you think you want to, use foreign keys instead.


WHY NOT?

Table inheritance was a part of a fad wherein the database was closely coupled
to object-oriented code. It turned out that coupling things that closely didn't
actually produce the desired results.


WHEN SHOULD YOU?

Never …almost. Now that table partitioning is done natively, that common use
case for table inheritance has been replaced by a native feature that handles
tuple routing, etc., without bespoke code.

One of the very few exceptions would be temporal_tables extension if you are in
a pinch and want to use that for row versioning in place of a lacking SQL 2011
support. Table inheritance will provide a small shortcut instead of using UNION
ALL to get both historical as well as current rows. Even then you ought to be
wary of caveats while working with parent table.


SQL CONSTRUCTS


DON'T USE NOT IN

Don't use NOT IN, or any combination of NOT and IN such as NOT (x IN (select…)).


WHY NOT?

Two reasons:

1. NOT IN behaves in unexpected ways if there is a null present:

select * from foo where col not in (1,null); -- always returns 0 rows

select * from foo where foo.col not in (select bar.x from bar); -- returns 0 rows if any value of bar.x is null


This happens because col IN (1,null) returns TRUE if col=1, and NULL otherwise
(i.e. it can never return FALSE). Since NOT (TRUE) is FALSE, but NOT (NULL) is
still NULL, there is no way that NOT (col IN (1,null)) (which is the same thing
as col NOT IN (1,null)) can return TRUE under any circumstances.

2. Because of point 1 above, NOT IN (SELECT ...) does not optimize very well. In
particular, the planner can't transform it into an anti-join, and so it becomes
either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the
planner only allows that plan for small result sets; the plain subplan is
horrifically slow (in fact O(N²)). This means that the performance can look good
in small-scale tests but then slow down by 5 or more orders of magnitude once a
size threshold is crossed; you do not want this to happen.

Alternative solution: In most cases, the NULL behavior of NOT IN (SELECT …) is
not intentionally desired, and the query can be rewritten using NOT EXISTS
(SELECT …):

select * from foo where not exists (select from bar where foo.col = bar.x);



WHEN SHOULD YOU?

NOT IN (list,of,values,...) is mostly safe unless you might have a null in the
list (via a parameter or otherwise). So it's sometimes natural and even
advisable to use it when excluding specific constant values from a query result.





DON'T USE UPPER CASE TABLE OR COLUMN NAMES

Don't use NamesLikeThis, use names_like_this.


WHY NOT?

PostgreSQL folds all names - of tables, columns, functions and everything else -
to lower case unless they're "double quoted".

So create table Foo() will create a table called foo, while create table "Bar"()
will create a table called Bar.

These select commands will work: select * from Foo, select * from foo, select *
from "Bar".

These will fail with "no such table": select * from "Foo", select * from Bar,
select * from bar.

This means that if you use uppercase characters in your table or column names
you have to either always double quote them or never double quote them. That's
annoying enough by hand, but when you start using other tools to access the
database, some of which always quote all names and some don't, it gets very
confusing.

Stick to using a-z, 0-9 and underscore for names and you never have to worry
about quoting them.


WHEN SHOULD YOU?

If it's important that "pretty" names are displaying in report output then you
might want to use them. But you can also use column aliases to use lower case
names in a table and still get pretty names in the output of a query: select
character_name as "Character Name" from foo.





DON'T USE BETWEEN (ESPECIALLY WITH TIMESTAMPS)


WHY NOT?

BETWEEN uses a closed-interval comparison: the values of both ends of the
specified range are included in the result.

This is a particular problem with queries of the form

SELECT * FROM blah WHERE timestampcol BETWEEN '2018-06-01' AND '2018-06-08'


This will include results where the timestamp is exactly 2018-06-08
00:00:00.000000, but not timestamps later in that same day. So the query might
seem to work, but as soon as you get an entry exactly on midnight, you'll end up
double-counting it.

Instead, do:

SELECT * FROM blah WHERE timestampcol >= '2018-06-01' AND timestampcol < '2018-06-08'



WHEN SHOULD YOU?

BETWEEN is safe for discrete quantities like integers or dates, as long as you
remember that both ends of the range are included in the result. But it's a bad
habit to get into.





DATE/TIME STORAGE


DON'T USE TIMESTAMP (WITHOUT TIME ZONE)

Don't use the timestamp type to store timestamps, use timestamptz (also known as
timestamp with time zone) instead.


WHY NOT?

timestamptz records a single moment in time. Despite what the name says it
doesn't store a timestamp, just a point in time described as the number of
microseconds since January 1st, 2000 in UTC. You can insert values in any
timezone and it'll store the point in time that value describes. By default it
will display times in your current timezone, but you can use at time zone to
display it in other time zones.

Because it stores a point in time it will do the right thing with arithmetic
involving timestamps entered in different timezones - including between
timestamps from the same location on different sides of a daylight savings time
change.

timestamp (also known as timestamp without time zone) doesn't do any of that, it
just stores a date and time you give it. You can think of it being a picture of
a calendar and a clock rather than a point in time. Without additional
information - the timezone - you don't know what time it records. Because of
that, arithmetic between timestamps from different locations or between
timestamps from summer and winter may give the wrong answer.

So if what you want to store is a point in time, rather than a picture of a
clock, use timestamptz.

More about timestamptz.


WHEN SHOULD YOU?

If you're dealing with timestamps in an abstract way, or just saving and
retrieving them from an app, where you aren't going to be doing arithmetic with
them then timestamp might be suitable.





DON'T USE TIMESTAMP (WITHOUT TIME ZONE) TO STORE UTC TIMES

Storing UTC values in a timestamp without time zone column is, unfortunately, a
practice commonly inherited from other databases that lack usable timezone
support.

Use timestamp with time zone instead.


WHY NOT?

Because there is no way for the database to know that UTC is the intended
timezone for the column values.

This complicates many otherwise useful time calculations. For example, "last
midnight in the timezone given by u.timezone" becomes this:

date_trunc('day', now() AT TIME ZONE u.timezone) AT TIME ZONE u.timezone AT TIME ZONE 'UTC'


And "the midnight prior to x.datecol in u.timezone" becomes this:

date_trunc('day', x.datecol AT TIME ZONE 'UTC' AT TIME ZONE u.timezone)
  AT TIME ZONE u.timezone AT TIME ZONE 'UTC'



WHEN SHOULD YOU?

If compatibility with non-timezone-supporting databases trumps all other
considerations.





DON'T USE TIMETZ

Don't use the timetz type. You probably want timestamptz instead.


WHY NOT?

Even the manual tells you it's only implemented for SQL compliance.

> The type time with time zone is defined by the SQL standard, but the
> definition exhibits properties which lead to questionable usefulness. In most
> cases, a combination of date, time, timestamp without time zone, and timestamp
> with time zone should provide a complete range of date/time functionality
> required by any application.


WHEN SHOULD YOU?

Never.





DON'T USE CURRENT_TIME

Don't use the CURRENT_TIME function. Use whichever of these is appropriate:

 * CURRENT_TIMESTAMP or now() if you want a timestamp with time zone,
 * LOCALTIMESTAMP if you want a timestamp without time zone,
 * CURRENT_DATE if you want a date,
 * LOCALTIME if you want a time


WHY NOT?

It returns a value of type timetz, for which see the previous entry.


WHEN SHOULD YOU?

Never.





DON'T USE TIMESTAMP(0) OR TIMESTAMPTZ(0)

Don't use a precision specification, especially not 0, for timestamp columns or
casts to timestamp.

Use date_trunc('second', blah) instead.


WHY NOT?

Because it rounds off the fractional part rather than truncating it as everyone
would expect. This can cause unexpected issues; consider that when you store
now() into such a column, you might be storing a value half a second in the
future.


WHEN SHOULD YOU?

Never.





DON’T USE +/-HH:MM AS A TEXT TIME ZONE NAME


WHY NOT?

PostgreSQL doesn’t accept fixed time zone offsets in place of ISO time zone
names or abbreviations. If you specify such an offset it will be interpreted as
a custom POSIX time zone specification with the unfortunate property that
positive values shift west while negative values shift east (the ISO convention
is for eastward shifts to be denoted as negative.)

Note that if you provide an interval typed value the ISO convention does apply.
So if you really want to specify a fixed offset you can write:

AT TIME ZONE INTERVAL '04:00'


WHEN SHOULD YOU?

A string timestamptz literal in ISO format may be written using a signed offset
and have the direction of the sign be interpreted by ISO conventions.

select '2024-01-31 17:16:25+04'::timestamptz; -- yields 1pm UTC





TEXT STORAGE


DON'T USE CHAR(N)

Don't use the type char(n). You probably want text.


WHY NOT?

Any string you insert into a char(n) field will be padded with spaces to the
declared width. That's probably not what you actually want.

The manual says:

> Values of type character are physically padded with spaces to the specified
> width n, and are stored and displayed that way. However, trailing spaces are
> treated as semantically insignificant and disregarded when comparing two
> values of type character. In collations where whitespace is significant, this
> behavior can produce unexpected results; for example SELECT 'a '::CHAR(2)
> collate "C" < E'a\n'::CHAR(2) returns true, even though C locale would
> consider a space to be greater than a newline. Trailing spaces are removed
> when converting a character value to one of the other string types. Note that
> trailing spaces are semantically significant in character varying and text
> values, and when using pattern matching, that is LIKE and regular expressions.

That should scare you off it.

The space-padding does waste space, but doesn't make operations on it any
faster; in fact the reverse, thanks to the need to strip spaces in many
contexts.

It's important to note that from a storage point of view char(n) is not a
fixed-width type. The actual number of bytes varies since characters may take
more than one byte, and the stored values are therefore treated as
variable-length anyway (even though the space padding is included in the
storage).


WHEN SHOULD YOU?

When you're porting very, very old software that uses fixed width fields. Or
when you read the snippet from the manual above and think "yes, that makes
perfect sense and is a good match for my requirements" rather than gibbering and
running away.





DON'T USE CHAR(N) EVEN FOR FIXED-LENGTH IDENTIFIERS

Sometimes people respond to "don't use char(n)" with "but my values must always
be exactly N characters long" (e.g. country codes, hashes, or identifiers from
some other system). It is still a bad idea to use char(n) even in these cases.

Use text, or a domain over text, with CHECK(length(VALUE)=3) or CHECK(VALUE ~
'^[[:alpha:]]{3}$') or similar.


WHY NOT?

Because char(n) doesn't reject values that are too short, it just silently pads
them with spaces. So there's no actual benefit over using text with a constraint
that checks for the exact length. As a bonus, such a check can also verify that
the value is in the correct format.

Remember, there is no performance benefit whatsoever to using char(n) over
varchar(n). In fact the reverse is true. One particular problem that comes up is
that if you try and compare a char(n) field against a parameter where the driver
has explicitly specified a type of text or varchar, you may be unexpectedly
unable to use an index for the comparison. This can be hard to debug since it
doesn't show up on manual queries.


WHEN SHOULD YOU?

Never.





DON'T USE VARCHAR(N) BY DEFAULT

Don't use the type varchar(n) by default. Consider varchar (without the length
limit) or text instead.


WHY NOT?

varchar(n) is a variable width text field that will throw an error if you try
and insert a string longer than n characters (not bytes) into it.

varchar (without the (n)) or text are similar, but without the length limit. If
you insert the same string into the three field types they will take up exactly
the same amount of space, and you won't be able to measure any difference in
performance.

If what you really need is a text field with an length limit then varchar(n) is
great, but if you pick an arbitrary length and choose varchar(20) for a surname
field you're risking production errors in the future when Hubert Blaine
Wolfe­schlegel­stein­hausen­berger­dorff signs up for your service.

Some databases don't have a type that can hold arbitrary long text, or if they
do it's not as convenient or efficient or well-supported as varchar(n). Users
from those databases will often use something like varchar(255) when what they
really want is text.

If you need to constrain the value in a field you probably need something more
specific than a maximum length - maybe a minimum length too, or a limited set of
characters - and a check constraint can do all of those things as well as a
maximum string length.


WHEN SHOULD YOU?

When you want to, really. If what you want is a text field that will throw an
error if you insert too long a string into it, and you don't want to use an
explicit check constraint then varchar(n) is a perfectly good type. Just don't
use it automatically without thinking about it.

Also, the varchar type is in the SQL standard, unlike the text type, so it might
be the best choice for writing super-portable applications.





OTHER DATA TYPES


DON'T USE MONEY

The money data type isn't actually very good for storing monetary values.
Numeric, or (rarely) integer may be better.


WHY NOT?

lots of reasons.

It's a fixed-point type, implemented as a machine int, so arithmetic with it is
fast. But it doesn't handle fractions of a cent (or equivalents in other
currencies), it's rounding behaviour is probably not what you want.

It doesn't store a currency with the value, rather assuming that all money
columns contain the currency specified by the database's lc_monetary locale
setting. If you change the lc_monetary setting for any reason, all money columns
will contain the wrong value. That means that if you insert '$10.00' while
lc_monetary is set to 'en_US.UTF-8' the value you retrieve may be '10,00 Lei' or
'¥1,000' if lc_monetary is changed.

Storing a value as a numeric, possibly with the currency being used in an
adjacent column, might be better.


WHEN SHOULD YOU?

If you're only working in a single currency, aren't dealing with fractional
cents and are only doing addition and subtraction then money might be the right
thing.





DON'T USE SERIAL

For new applications, identity columns should be used instead.


WHY NOT?

The serial types have some weird behaviors that make schema, dependency, and
permission management unnecessarily cumbersome.


WHEN SHOULD YOU?

 * If you need support to PostgreSQL older than version 10.
 * In certain combinations with table inheritance (but see there)
 * More generally, if you somehow use the same sequence for multiple tables,
   although in those cases an explicit declaration might be preferable over the
   serial types.





AUTHENTICATION


DON'T USE TRUST AUTHENTICATION OVER TCP/IP (HOST, HOSTSSL)

Don't use trust authentication over any TCP/IP method (e.g. host, hostssl) in
any production environment.

Especially do not set a line like this in your pg_hba.conf file:

host all all 0.0.0.0/0 trust

which allows anyone on the Internet to authenticate as any PostgreSQL user in
your cluster, including the PostgreSQL superuser.

There is a list of authentication methods you can choose that are better for
establishing a remote connection to PostgreSQL. It is fairly easy to set up a
password based authentication method, the recommendation being scram-sha-256
that is available in PostgreSQL 10 and above.


WHY NOT?

The manual says:

> trust authentication is only suitable for TCP/IP connections if you trust
> every user on every machine that is allowed to connect to the server by the
> pg_hba.conf lines that specify trust. It is seldom reasonable to use trust for
> any TCP/IP connections other than those from localhost (127.0.0.1).

With trust authentication, any user can claim to be any other user and
PostgreSQL will trust that assertion. This means that someone can claim to be
the postgres superuser account and PostgreSQL will accept that claim and allow
them to log in.

To take this a step further, it is also not a good idea to allow trust
authentication to be used on local UNIX socket connections in a production
environment, as anyone with access to the instance running PostgreSQL could log
in as any user.


WHEN SHOULD YOU?

The short answer is never.

The longer answer is there are a few scenarios where trust authentication may be
appropriate:

 * Running tests against a PostgreSQL server as part of a CI/CD job that is on a
   trusted network
 * Working on your local development machine, but only allowing TCP/IP
   connections over localhost

but you should see if any of the alternative methods work better for you. For
example, on UNIX-based systems, you can connect to your local development
environment using peer authentication.

Retrieved from
"https://wiki.postgresql.org/index.php?title=Don%27t_Do_This&oldid=39778"





NAVIGATION MENU


PAGE ACTIONS

 * Page
 * Discussion
 * View source
 * History


PAGE ACTIONS

 * Page
 * Discussion
 * More
 * Tools
 * In other languages


PERSONAL TOOLS

 * Log in




NAVIGATION

 * Main Page
 * Random page
 * Recent changes
 * Help


TOOLS

 * What links here
 * Related changes
 * Special pages
 * Printable version
 * Permanent link
 * Page information


SEARCH

 

 * This page was last edited on 4 August 2024, at 17:16.
 * Privacy policy
 * About PostgreSQL wiki
 * Disclaimers