Bobby Tables

From Pin Eight
Revision as of 06:04, 9 June 2015 by Eighty5cacao (Talk | contribs) (https for mysql and xkcd (the latter now uses // links to the imgs subdomain, implying some level of "official" support))

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This is a mini-rant, a short essay refuting a common misconception among users of an Internet forum. If you think this essay is FUD, feel free to explain why on the essay's talk page.

In a nutshell: Parameterized SQL statements are useful but don't work everywhere.

Robert'); DROP TABLE Students;--, nicknamed Bobby Tables,[1] is the personification of failure to escape characters with a special meaning, leading to injection of SQL commands that perform unauthorized changes to a database.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. (what's this?)

One web site about SQL injection warns readers: "Don't try to escape invalid characters. Don't try to do it yourself. Learn how to use parameterized statements. Always, every single time."[2] In the majority of cases, the number of parameters is known in advance, and this advice is practical. But in a few cases, this runs into limitations of parameterized statements in various database management systems. So to avoid SQL injection, applications that use SQL SHOULD use parameterized statements and MUST be careful to escape those strings for which the database management system does not allow parameterization.

Table-valued parameters

Say I have a bunch of users that I want to look up. I could retrieve information about these users by executing a separate SQL statement per user, but that would be slow due to various communication overheads between the application server and the database server. So instead, I use SQL's operator IN, which takes an expression (usually a column name) on the left and a 1-column table on the right.

SELECT username, joinDate FROM dxt_users
WHERE username IN ('filbert', 'bluebear', 'chief')

But a lot of database servers don't support passing a table as a parameter in a parameterized statement. Nor can you reliably do WHERE username IN (?, ?, ?) because you don't know whether you're going to need one placeholder or 100 placeholders. The workaround involves making one well-tested function that does only one thing: serialize arrays into valid SQL expressions for 1-column tables.

/* This code has been tested on PHP 5 MySQLi. It comes with
   ABSOLUTELY NO WARRANTY but serves to document the concept. */
function db_escape_list($conn, $set) {
  $escaped = array();
  foreach ($set as $p) {
    $escaped[] = $conn->escape_string($p);
  $glued = implode("','", $escaped);
  return "('$glued')";

$eusernames = db_escape_list($conn, $usernames);
$stmt = "
SELECT username, joinDate FROM dxt_users
WHERE username IN $eusernames
$results = $conn->query($stmt);
/* omitted: fetch the rows */

If you make a function similar to db_escape_list, do not deploy it into production until you have made and run a suite of test cases. Make sure that this test suite includes strings with all SQL special characters that you can think of, including Bobby's name.

Some databases support passing a table as a parameter, but you usually need the newest version of the database and the newest version of the connection driver for this to work. For example, Microsoft SQL Server 2008 supports table-valued parameters, but earlier versions don't. Nor do older database connection drivers support table-valued parameters even when connecting to Microsoft SQL Server 2008.[3]

Another tactic is to upload the list one entry at a time as a temporary table. But this returns to executing a separate statement per entry, and furthermore, it requires the privilege to create temporary tables and to read and write temporary tables. If you're using MySQL and enforcing privileges on individual tables rather than granting SELECT, INSERT, UPDATE, and DELETE on the whole database to the application's user, you have to create a separate empty database dedicated to the use of temporary tables.[1]

Query by example

Another case where purely parameterized SQL is inconvenient is query by example. For example, the user is searching for products and choosing to narrow the search by price range or by manufacturer. In this case, we need to translate the filters that the user has chosen into a WHERE expression. If the user did not specify a value for a given field, the field should not appear in WHERE.

For each parameter, one could add another parameter that controls whether it should be used or ignored:

WHERE (? = 0 OR username = ?)
  AND (? = 0 OR email = ?)
  AND (? = 0 OR join_date > ?)

This method works wonderfully when all query requirements are known up front. But it may make it difficult to separate the part of the program that parses the human-readable query string into SQL from the part of the program that executes and stores the results, especially in cases with a wide variety of terms that a user may specify.

General workaround

The administrator of has recommended a way to use parameterized queries even in these cases. It involves building the SQL statement (with ? placeholders) and the parameter list in parallel in two separate variables, then submitting them together to the SQL driver.

But even this method isn't without its drawbacks. It can lead to the parameters getting out of order relative to the placeholders; the care needed to keep the order straight is close to the care needed to escape all dynamic arguments. And the APIs of some SQL drivers make it difficult to use a number of parameters that change from call to call, MySQLi in PHP in particular needing a horrible workaround involving call_user_func_array whose details vary from one version of PHP to another.

Perhaps the best plan is to port the entire application to a database framework that supports named placeholders instead of ? placeholders. Instead of the database driver scanning for ? placeholders from start to finish and binding them to successive elements of a list, it scans for the named placeholders and binds them to corresponding elements of an associative array. This makes it much easier to build the SQL statement and parameter dictionary in parallel. For example, one might consider porting an application that uses MySQLi to PHP Data Objects (PDO).