Normally one would arrive at an answer at the end of this article. But I can already tell that SQL definitely stands for Structured Query Language and not Standard Query Language.

First of all, because Structured Query Language is the official abbreviation as you will find in for example the English Oxford Dictionary. But actually, it originates from the term SEQUEL (Structured English Query Language), initially developed at IBM in the early 1970s. This specification was later made a standard by the American National Standards Institute (called SQL-86). The latest revision of this standard was made in 2011 (and is referenced as ISO/IEC 9075 SQL:2011).

That aside, I would certainly not call it standard. Different database vendors will sell their products as compliant to certain standards, but next to that offer a variety of other features not available in or different from competitive products. Which makes it very difficult to switch between RDBMS vendors for whatever reason. For data migration architects, like me, this is one of the many challenges we have to cope with in migration projects.

In the basic technical examples that follow, I will show that SQL is indeed not as standard as one would assume. These examples are based on the following three well-known RDBMS: DB2 for LUW (Linux, UNIX and Windows) from IBM, SQL Server from Microsoft, and Oracle from the Oracle Corporation.

A basic example

The best known and most widely used SQL statement is, without any doubt, the SELECT .. FROM .. statement. This statement can be used to retrieve data from whatever database system. But one must not dig deep to find big syntax differences in order to retrieve the same result.

Let us for example take a look at the syntax for retrieving only the first row of a data set, which you might expect would be a simple task.

When working with an Oracle database, the way to code this would be as follows:

 

 
SELECT * FROM employee
 WHERE ROWNUM = 1 ;

 

This same select in SQL Server will fail, but this syntax will provide you a single row of the employee table:

 

SELECT TOP (1) * FROM employee ;

 

Both the SQL Server and the Oracle statement will fail in DB2, but there you can achieve the same result using the following syntax:

 

SELECT * FROM employee  FETCH FIRST 1 ROW ONLY ;

 

So three different ways of coding, in order to get the same simple result, the first row of a table. Not very standard, I would dare to say. Again from a migration point of view, something that needs to be tackled. And changing the syntax is in this case very straightforward. But not always, because there is also an underlying implementation difference when used in combination with an ORDER BY clause.

If we want to retrieve the first row of a table ordered on a certain field, here for example employee name (ename), the query to be used in SQL Server can be written as:

 

SELECT TOP (1) * FROM employee   ORDER BY ename;

 

With what we know, it’s very easy to change the syntax to be valid for Oracle, we replace the TOP (1) clause with a WHERE ROWNUM condition:

 

SELECT * FROM employee  WHERE ROWNUM = 1   ORDER BY ename;

 

Unfortunately this does not guarantee returning the same row of data. Because behind the scenes Oracle first handles the WHERE clause before the ORDER BY, so taking the first row and then sorting it. Here SQL Server and DB2 behave differently; they will first sort the data and then return the first row of the sorted data.

In the ANSI/ISO SQL:2008 standard, the FETCH FIRST clause has been added. And next to DB2, also Oracle (since 12c version released in 2013) and SQL Server (since SQL Server 2012) support this clause now.

This is one example of the differences to be found in the SELECT statements of DB2, Oracle and SQL Server, but when you would have a look at the complete syntax diagrams in the different languages (DB2 for LUW v10.5, Oracle 12c, SQL Server 2012), you would quickly see that this is just the ‘top (1)’ of the iceberg.

A frowning example

Tackling differences as the above example in a migration project is not a very big deal. It’s just a matter of investigating all possibilities of the source and target environment and defining the correct conversion approach. But not all differences between concurrent RDBMS systems can be defined and tested so easily. Let us for example have a look at the term ‘Isolation’. Isolation in a database context can be shortly described as the way changes made to the database are made visible to concurrent processes. The ANSI/ISO SQL-92 standard describes four isolation levels:

  1. Read Uncommitted
  2. Read Committed
  3. Repeatable Read
  4. Serializable

This standard also refers to three different read phenomena that may or may not take place depending on the isolation level used:

  1. Dirty Reads (read of uncommitted data)
  2. Non-repeatable Reads (multiple reads of same row with different data)
  3. Phantom Reads (multiple reads, receiving different collection of rows)
Isolation LevelDirty ReadsNon‑repeatable ReadsPhantom Reads
Read Uncommitted
Read Committed
Repeatable Read
Serializable

Looks all well-defined, and nothing to worry about, but when you have an application using a SQL Server database with an isolation level specified as Repeatable Read (which is stricter then the default level Read Committed), and you want to swap to a DB2 database, you would expect that also in DB2, the Repeatable Read isolation level would be available and that DB2 would prevent Dirty Reads and Repeatable Reads thereby giving the same level of security to your application. But when you check the DB2 isolation level options, you will notice that they also support four isolation levels, with more or less the same names: Repeatable read (RR), Read stability (RS), Cursor stability (CS) and Uncommitted read (UR).

The mapping of the ISO/SQL isolation standard with the DB2 isolation levels would look like this

1.Read UncommittedUncommitted Read (UR)
2.Read CommittedCursor Stability (CS)
3.Repeatable ReadRead Stability (RS)
4.SerializableRepeatable Read (RR)

A bit frightening, wouldn’t you agree..? Just copying your repeatable read isolation level from SQL Server in DB2 can cause your application behave completely different. Not so standard, I would say.

A limited example

Lastly, let us look at the physical limitations imposed by the different RDBMS. Consider a limitation you may think inconsequential due to the rarity with which it becomes an issue: the maximum number of columns in a table.

RDBMSMax. nr. of columns
Oracle 12c1000
DB2 LUW 10.51012
SQL Server 20161024

All three, have their own limit. This limitation is something that must be taken into account when you want to change from RDBMS. In this case, it will probably not be an issue, but you can be “lucky” and have a table containing 1001 columns that you want to move to Oracle, where a 1-to-1 swap will not be possible. For completeness on this topic, I also want to add that in SQL Server wide tables can be created. These tables can have up to 30.000 columns (but their row size is limited, meaning most of the data must be put to NULL). This wide table concept is only known in SQL Server, also making it difficult to switch.

Conclusion

As already alluded to in the introduction, the ‘S’ in in SQL stands for Structured and not Standard. Database vendors all try to be as much standard as possible, in order to make a switch to their RDBMS as easy as possible. On the other hand they also deviate or make their own extensions to the standards in order to make it as hard as possible to switch to another RDBMS. A logical way of thinking, which keeps the world of data migrations where I work in very interesting.