Saturday, 5 March 2016

INSERT INTO table SET field=value,...

Doing a few searches I guess this is a mysql specific variation. I thought for a moment I was being stupid.

One of the pitfalls of learning any new programming language or system is that your knowledge can be incomplete and you do not realise it. I have had this on occasion with various systems, usually because of ongoing improvements to the system itself (like linux libraries) and simply not knowing some new and simpler way to do things now exists. This is one of the reasons code gets old - looking at code I wrote a year ago I cringe as I have learned new ways to do things over time.

Just occasionally my original learning of the system meant I missed some detail, because it was not needed in the first few cases of using the system, and then over time I never go back and re-read the text books (web pages) to realise I have missed something.

This syntax for SQL INSERT is an example I have only just realised exists and I was beginning to think I was going mad, and cursing why I did not know about it. Another example was when I discovered the "ON DUPLICATE" feature of INSERT statements.

I did a few searches, and one rather odd comment was "What a confusing syntax! If I didn't know how to use web search to lookup the documentation, I'd have thought someone mixed up UPDATE and INSERT!", so I thought I would explain good reasons why this syntax is so much better for some uses.

The main example is when using this in a program. (For a start, I have formatting functions that know how to escape values for SQL to avoid injection attacks, obviously).

1. The fact that INSERT and UPDATE were different syntax meant you have to have two separate bits of code for creating and entry and for updating the entry. This was annoying as often you want almost all of the same fields. Now you can make code that creates the query for either in one go :-

(a) INSERT INTO or UPDATE depending if existing or new
(b) table name
(c) SET
(d) lots of field=value pairs (and comma)
(e) either a comma or WHERE depending if new or existing
(f) keyfield=value;

Nice and simple.

2. The other issue with the INSERT INTO table (fields...) VALUES (values...) syntax is maintenance of code, and even worse if some fields are not always needed... You end up with something like

(a) INSERT INTO table (
(b) List of field names
(c) ) VALUES (
(d) List of format controls like %s, %d, etc
(e) );
(f) List of variables to be printed in the formatting where the %s, %d, etc are located

Points (b), (d), and (f) have to all match up exactly, so adding a new field you have to count the position in the list and add field name, and in list of format controls, and in list of variables. If one of the field does not always want to be included you have to have a conditional entry in all three lists. It is tedious in code and messy and easy to get wrong.

Using the SET syntax you have

(b) Add each entry separately using field=, format control, variable

This means that if any field is optional, you have one condition test around that field, not three. It means if a new field is needed, you add one formatted print command. If a field is to be removed, you remove one formatted print command.

If you don't have libraries for formatting SQL as I do (uses %#s as a quoted escaped string, for example), you can have an add_string(field,value) function that adds ,field='escapedvalue' to the query in one go.

It just makes for easier to understand queries and neater code.

Now I have to resist the urge to grep all my code and change the syntax of existing usage - as that will definitely introduce new bugs. I just have to use this on new code and when maintaining old code, one step at a time...


  1. It's rather unusual to hear someone talk about writing SQL code for simple CRUD. You'd be much better off using an ORM or API above the cursor level. Not only would your code be more portable to different backends but you'd benefit from reduced maintainance work when upgrading db versions.

  2. Putting %s in insert or update statements is bad practice IMO. I'd always use bind variables because you never know what's going to end up there..

    Not sure you save much anyway. Once you've written the code to generate the update/insert - which normally happens once - you're passing it arrays of fields and things anyway. Unless you're hand coding the queries, which has its place of course.. OTOH in that case you save nothing as you're writing it each time anyway.

    1. We have tools where it is not simply %s, it is actually properly escaped in query strings that are malloced not foxed size. We know Little Bobby Tables quite well and keep him locked up.

    2. Escaping still isn't good enough, use the placeholders in the way they were meant to be used. There are way too many corner cases for escaping. Here's an interesting one you may have not heard of - only works under MySQL because MySQL likes to do weird stuff, but the principle remains.

      Consider a parameter consisting of two bytes (one of which is a quote mark) passed to your escaping function.

      ¿' (0xbf 0x27)

      The escaping routine takes this and returns three bytes:

      ¿\' (0xbf 0x5c 0x27)

      This is actually interpreted as a multi-byte encoded sequence in MySQL! The result ends up being:

      縗' (0xbf5c 0x27)

      and you've successfully injected an unescaped quote mark.

      So yes, use SQL placeholders in every case. Never allow arbitrary user-supplied information to be used in generated SQL.

    3. that does not look like valid UTF8. Indeed UTF8 is carefully designed so you cannot have a "normal" character as part of the sequence. In it an interesting one which I will check anyway.

    4. Hang on, even the result is not valid UTF8 and never could be. I will look in to that but I am not ℅ incredibly makes sense.

    5. Wow that was odd autocorrect

  3. Insert and Update are such common SQL commands that usually follow the same syntax each time (unlike a select that may involve a complicated join). It's a wonder that a library doesn't exist to build the SQL needed.

    1. Lots of libraries exist to generate SQL. That's how ORMs are able to transparently provide an abstraction of a database upon object oriented function calls.