The Roots of Relational

Tenth in a series of posts in response to Tim Ford's #EntryLevel Challenge.

Special guest this month: Chris Date

I’m here this month with Chris Date, one of the founders of our field and known for his clear thinking and rigorous writing in books such as SQL and Relational Theory, Relational Theory for Computer Professionals, and The New Relational Database Dictionary. It is my pleasure to have Chris here today to talk about the term relational and how that term came to be used to describe database engines such as SQL Server and Oracle Database that work in terms of tables and columns.

Chris, welcome! 

Glad to be here Jonathan.  Thank you for inviting me.  

Chris, I want to ask about the term relational. So often I hear incorrect explanations of why databases are said to be relational. Those of us who work with relational databases, we ought to at least know why they are described as such. 

Jonathan, I agree. It’s commonly thought that relational derives from the ability to define foreign-key relationships between tables. That’s a wrong understanding. 

Right, and it’s a wrong understanding I’ve held myself back in the day. We have relational databases, and we have foreign-key relationships between tables. It’s easy to see why people might think relational as a qualifier derives from the existence of those relationships. 

That’s right Jonathan. The word relational in relational database actually refers to the relationship between columns in a table, and not between tables in a database. 

Chris, how so? Can you back up and walk us through an explanation? Does the term relation come to us from set theory?

The term relation itself derives, at least primarily, not so much from set theory as it does from predicate logic; in fact, it’s tightly bound up with the logic notion of a predicate as such. By way of example, consider the predicate (let’s call it P) “X gave Y to Z”. Note right away in this example that X, Y, and Z can reasonably be said to be related to one another by the very fact of their being parameters to the same predicate.  

Wow! Ok. So Chris, is this ultimately where we get the term relation that is used in theoretical discussions to describe what is termed as a table in SQL implementations?

It is! Let me explain further. Substituting arguments for the parameters of a given predicate has the effect of converting that predicate into a proposition, which in logic is a statement that’s categorically either true or false. In the case of predicate P, for example, substituting Jonathan for X, The Wind in the Willows for Y, and Chris for Z yields the proposition “Jonathan gave The Wind in the Willows to Chris.”  That proposition, like all propositions, can be represented by a tuple—to be specific, the three-tuple {X Jonathan, Y The Wind in the Willows, Z Chris}. 

I see! And that tuple ends up corresponding to what SQL databases call a row. The elements within a tuple are related by virtue of their being part of the same proposition. And by extension to SQL, the column values within a given row are related because the things they represent are all part of the same proposition – namely, the proposition denoted by that row. Am I correct so far? Hence the term relational database, not because of relations between tables, but because of relations between columns within a table.

Jonathan, that’s exactly right. Please let me elaborate and take you from tuple to relation. Let S be the set of all propositions that can be obtained, or derived, from P in the foregoing manner (i.e., by substituting arguments for the parameters X, Y, and Z); in other words, let S be the set of all propositions of the form “x gave y to z” such that x is a legitimate value for X, y is a legitimate value for Y, and z is a legitimate value for Z. 

Chris, aren’t we interested only in the true propositions? 

You’re getting ahead of me! Now let T be the set of all propositions in S that are true, and let F be the set of all propositions in S that are false (so S is in fact the disjoint union of T and F).  Finally, let RT be the set of all tuples corresponding to propositions in T.  Then RT is a relation—specifically, the relation representing what’s called the extension of predicate P.  

Ah! So RT is a relation, termed so because the values within each tuple are related. And it is that relationship between values in a tuple that ultimately shows up in the term relational database. Since a tuple loosely corresponds to a row, then it’s fair to say that it is the relation between columns in a row that is being referred to by the term relational database.

Jonathan, that’s essentially correct. 

Chris, thank you so much. 

Jonathan, you’re welcome! 

Postscript: Chris is most gracious in allowing me this interview piece. He wanted me to note at the end that although his answers are correct as far as they go, they’re still (and deliberately) somewhat loose. As Bertrand Russell once memorably said, "writing can be either readable or precise, but not at the same time."