Bitemporal Data

Bitemporal Data

CHAPTER BITEMPORAL DATA: PRELIMINARIES 1 All things change. Heraclitus. Objects show up, change in various ways, and eventually go away. Sometimes ...

418KB Sizes 1 Downloads 12 Views

CHAPTER

BITEMPORAL DATA: PRELIMINARIES

1 All things change. Heraclitus.

Objects show up, change in various ways, and eventually go away. Sometimes the same object shows up again. Events happen and then they’re over. An event shows up only once. Objects come into and go out of existence. While they exist, some change frenetically, some change at a leisurely pace, and some change hardly at all. Between these changes are the states of those objects. Objects are in different states at different times. The transition between two successive states of an object occurs during an event in which that object takes part. Events happen either at moments in time, or within an interval of time. But either way, events don’t change. Once they happen, they’re over, and they fall immediately into the past. The connection is this: objects change because they are affected by other objects, and events are the occasions on which objects affect and are affected by other objects. So there is only one way that events are, but there are many ways that objects are, those ways being the states of those objects. The succession of states that an object moves through over time constitutes the life history of that object. The one state of an event is all there is to its life history. The data we keep in our databases is in general data about contingent matters of fact. It’s data about the things that are of interest to us, and the interesting things that happen to them. This data is the inscription of information about those things. Transactions are what take place during events.1 If we are interested in those events, we keep a record of those transactions in a transaction table, and we keep that transaction table available for querying. From this event-centric point of view, objects are temporally enduring threads that run through a succession of events. For example, a bank account is a temporally enduring thread that runs through a succession of deposit and withdrawal events. By starting with the creation event for any object, and then tracking that object through the succession of events in which it was/is/will be involved, we can recreate any state of that object, as of any point in time. 1 There are two senses of the term “transaction”. Here, it means what takes place during an event. For example, making a bank deposit is a transaction. The other sense of the term “transaction” is its technical sense, in which it refers to inserts, updates and deletes to data in databases. The connection is that transactions in this second sense modify a database in order to record transactions in the first sense.

Bitemporal Data. DOI: http://dx.doi.org/10.1016/B978-0-12-408067-6.00001-2 © 2014 Elsevier Inc. All rights reserved.

1

2

CHAPTER 1 BITEMPORAL DATA: PRELIMINARIES

That’s one way to keep track of how objects change over time. Another way is to move events to the background, and track objects directly, state by state, from their initial states to their current and/or terminal states, and sometimes on to their anticipated future states as well. From this objectcentric point of view, events are the occasions on which objects change state or cause other objects to change state. If we are interested in what happened to any object in any event in which it took part, that is simply the delta between its pre-event state and its post-event state, or else the continuance of its pre-event state through the event and out the other side. But most of our tables about objects do not track multiple states of those objects. In these object tables, each object is represented by only one row, and that one row tells us, at all times, about the current state of that object. With every update, that data is overwritten; with every deletion, that data is removed. Nor do object tables tell us anything about future states. Object tables tell us about objects, specifically about their current states. State-time tables tell us about states of objects, past, present and/or future. So in an object table, there is only one row for each object because only one state of each object is being kept track of. In a state-time table, many states of the same object may be kept track of, so there may be many rows for the same object, one for each state. Also  and this limitation applies to event tables as well as to object tables  these tables only show us what we currently believe to be true. Over time, as we learn more about the states of objects or about the events they take part in, we may be able to fill in missing information about those states or about those events, or correct data that we later discover to be in error. A series of such additions to information about a given state of an object, or to information about an event, may be worth keeping track of, and making available for querying. So too for a series of corrections. But we cannot keep either kind of series in object tables or in event tables. State-time tables are the first step in removing these limitations. With state-time tables, we need not be limited to information about only the current states of objects. We can keep data about past states as well. This data will not be locked up in a difficult-to-access logfile, or segregated from current data in an historical data warehouse. With state-time tables, we can also keep a record of anticipated future states. In both cases, that data can be just as readily available and just as easily retrievable as current state data is. Also, as we extend the concept of a state-time table, we will not be limited to the data about the current states of objects and events that we currently believe is correct. We will be able to keep track of a history of supplements and/or corrections to data we already have about the past, present or future states of objects, or about events. I will need a term that covers both objects and events. I will use the term referent for this purpose, indicating something that can be referred to. It will also be convenient to have a more colloquial term to rely on and, for that purpose, I will occasionally use the term thing. In ordinary language, the term “thing” usually designates some object that is being referred to, but it can be used in a more general sense as well. For example, in “That thing over there can be used as a hammer”, “thing” refers to an object. But in “When the new immigration law is passed, three things will happen right away”, “thing” refers to three events. So “thing” is indeed an adequate colloquialism for “referent”, since it can be used to refer to either objects or events.

NONTEMPORAL, UNITEMPORAL AND BITEMPORAL DATA

3

In summary: every object is a referent, and so is every event. Every referent is either an object or an event, and no referent is both. This important set of concepts will be developed more fully in later chapters, especially in Chapter 5.

NONTEMPORAL, UNITEMPORAL AND BITEMPORAL DATA Bitemporal data is a special kind of temporal data, one in which there are two time periods which are part of the primary key of a table. The version tables most of us have worked with at one time or another are unitemporal tables. They are tables in which there is only one time period which is part of the primary key. And just as unitemporal tables are nontemporal tables whose primary keys have been supplemented with one time period, bitemporal tables are unitemporal tables whose primary keys have been supplemented with a second time period. Figure 1.1 shows these three kinds of tables. PK non-temporal

unitemporal

data

id PK bt1 et1 id

data

PK bitemporal

bt1 et1 bt2 et2 id

data

FIGURE 1.1 Nontemporal, Unitemporal and Bitemporal Tables.

In this Figure, “bt” stands for begin time and “et” stands for end time. “id” stands for the component of a primary key which identifies the referent that the row of data refers to and describes. I will hereafter refer to it as the referent identifier, or RefId.2 In a nontemporal table, no two rows have the same RefId, since a RefId, by itself, is a unique identifier of a row in the table. This is what we want, of course, because in a nontemporal table, if two rows could designate the same referent, then since both would be about the current state of that referent, they could say contradictory things. For example, in a Customer table, one row could say that a customer’s current name is “Smith”, and another row designating that same customer could say that her current name is “Jones”. In a unitemporal or a bitemporal table, however, several rows may have the same RefId without conflicting with one another, as long as each row is about a different state of that referent, or (in the case of a bitemporal table) a different description of a state. How this works is what temporal and bitemporal data is all about. 2

Of course, there are many other ways of making temporal tables out of nontemporal ones, for example by adding only a begin time or an end time to a table’s primary key, instead of adding both. These variations are categorized and analyzed in Chapter 4 of MTRD.

4

CHAPTER 1 BITEMPORAL DATA: PRELIMINARIES

NONTEMPORAL TABLES Unless otherwise noted, all tables used for examples in this book  temporal tables and nontemporal tables  are relational tables.3 As relational tables, they are relations. As relations, they are sets. Their columns are sets. The set members of those tables are the rows they contain. Since the members of a set must be distinct from one another, nontemporal relational tables must satisfy the entity integrity constraint. The DBMS enforces this constraint by requiring that every primary key value be unique within its table. To illustrate some notational conventions, four conventional tables are shown in Figure 1.2. S-Cat sc-id SC1 SC2 SC3

sc-abbr SCA SCB SCC

sc-desc tier 1 tier 2 off contract

Supplier s-id s-nm s-scfk s-type S1 Acme SC1 T4 S2 Superior SC3 T22 Part p-id P1 P2 P3

p-nbr W45 A02 C01

p-nm wheel axle chassis

p-upr $3.25 $5.50 $13.25

Supplier-Part sp-id sp-sfk sp-pfk SP1 S1 P3 SP2 S2 P1 SP3 S1 P2

FIGURE 1.2 Four Conventional Tables.

Table Names The name of each table is shown above the table. Here, there are four tables, whose names are S-Cat, Supplier, Part, and Supplier-Part. S-Cat is a table of supplier categories. Supplier-Part is an associative table which shows, for each supplier, the parts he is eligible to supply, and for each part, the suppliers who are eligible to supply it. 3

I will also frequently refer to nontemporal tables as conventional tables.

NONTEMPORAL, UNITEMPORAL AND BITEMPORAL DATA

5

Schema Rows Next comes the schema row for the table. A schema row is a list of all the columns in the table. It is sometimes called the “header” of the table. Note that it is an abstraction from the table’s full definition in the catalog of the database that contains it, or in the physical data model from which that catalog entry is generated. It is an abstraction because it doesn’t contain all the information in the table’s definition. For example, the schema row doesn’t show the datatype of each column. In the schema row, primary key columns are listed left-most, and their column headings are underlined. Foreign key column headings are italicized.

Foreign Keys and Primary Keys Foreign keys may appear anywhere in a schema row. The Supplier table contains the foreign key s-scfk to the S-Cat table. The Supplier-Part table contains two foreign keys, sp-sfk and sp-pfk to, respectively, the Supplier and Part tables.4 In many databases, there are tables which include one or more foreign keys in their primary keys. But most of the nontemporal tables in this book use a single surrogate-valued column as the primary key, and so do not have foreign keys as part of their primary keys. For example, neither foreign key in the Supplier-Part table is part of the primary key of that table. Nor do the temporal tables in this book, in general, contain foreign keys as part of their primary keys. Nothing in this book depends on this convention, however. Temporal tables, including bitemporal tables, may have foreign keys as part of their primary keys. Temporal tables, including bitemporal tables, may use multiple columns of business data instead of single surrogate-valued columns in their primary keys.

Data Model Diagrams I don’t show a data model diagram for any of the tables used as examples in this book. Instead, I use sample data tables. My examples require only a small number of tables in each case, and so the inter-table relationships should be easy to see. For example, there is a parent-child relationship from S-Cat to Supplier, and the Supplier-Part table is an associative table between the Supplier and Part tables. Another reason for not using data model diagrams is that relationship lines drawn in data model diagrams always represent a relationship in which a row containing a foreign key  the child row  is related to exactly one other row  the parent row. But this is not necessarily the case with the temporal form of the relationship, in which a child row is more commonly related to several parent rows.

Stand-Alone Representation of Rows Rows of tables are always shown in a non-proportionally-spaced font so that, when several of them are shown together, the columns of each row line up one under the other. Also, rows of tables are sometimes shown on their own, rather than as rows underneath a table name and a schema row. 4 My foreign key naming convention is to indicate the related tables with a short abbreviation, separate the abbreviations with a dash, and append “fk” to the name. Thus, for example, in the foreign key name “sp-sfk”, “sp” stands for “Supplier-Part”, the “s” following the dash stands for “Supplier”, and the suffix “fk” stands for “foreign key. I also italicize these foreign key names as they appear in the text, or else replace then with their expanded form.

6

CHAPTER 1 BITEMPORAL DATA: PRELIMINARIES

In these cases, each row will be enclosed in braces, and its columns will be separated by a vertical bar. For example, the second row in the Part table in Figure 1.2 is: ½P2 jA02jaxlej$5:50

and the third row in the Supplier-Part table is [SP3|S1|P2]. Note that these stand-alone rows may be either set apart on their own line of text, or included in-line as part of the narrative. Note also that the underlining and italicizing conventions used for schema rows are carried over to the data rows themselves when those data rows are shown in stand-alone mode or in-line with the text.

UNITEMPORAL TABLES As well as conventional tables, there are also temporal tables. They are either unitemporal or bitemporal, containing either a single time period in their primary keys or containing two time periods in their primary keys. One of the two temporal dimensions of bitemporal data is called valid time in the computer science literature. I used the term “effective time” for this temporal dimension in my first book, MTRD. In this book, I will use the term “state time” because it seems to me to be the least misleading term to apply to this concept. And this leads to my choice of the column headings sbeg and send to stand for, respectively, state begin time and state end time.5,6 All time periods used in the examples in this book are shown as a begin month and an end month. Each date is shown as a three-character month abbreviation together with two numerals representing a year. For example, “Nov15” represents the month of November 2015. Figure 1.3 illustrates how time periods are shown in these tables. The state-time periods of these tables are shown as a begin time and an end time. So the time period for the second row in the Supplier-Part-S table, for example, extends from October 2014 up to January 2015. I use a “from” and “up to” terminology to emphasize that October 2014 is part of that time period, but that January 2015 is not, and is, rather, the first month after the last month of that time period. This way of representing time periods by means of a begin time and end time, in which the begin time is part of the time period but the end time is the next clock tick after the end of the time period, is called the closed/open convention for representing time periods by saying when they begin and when they end.7 5

All column names (not just foreign key names) are represented in a proportional font in the text, or else replaced with the expanded version of those names. Thus “sbeg” and “state begin time” are equivalent expressions. 6 I regret contributing to the profusion of terminology surrounding bitemporal data, a profusion, however, to which the computer science community, DBMS vendors, and standards committees have also contributed. I use a terminology of my own choosing because I think that terminology is important, that inappropriate terminology can persistently mislead those who use it, and that appropriate terminology can greatly enhance the clarity of an explanation. In fact, I originally wrote most of this book using the terminology standardized in the ISO 9075:2011 SQL standard, and only late in the project went back and substituted my own terms. The fact that there is now a de jure standard terminology, and that I have decided not to use it, indicates the importance I attach to terminology. I will, of course, describe a mapping among the different terminologies, which include the standard computer science terminology, DBMS vendor terminology, and the terminology used in the temporal SQL standards. For a summary of those mappings, see the end of this chapter. For details, see Chapter 2. 7 Another way of representing time periods, for example, would be to say when they begin, and how long they last.

NONTEMPORAL, UNITEMPORAL AND BITEMPORAL DATA

S-Cat-S sbeg send Jan14 9999 Jun14 9999 May14 Jun15

sc-id SC1 SC2 SC3

Supplier-S sbeg send Feb14 9999 Aug14 Mar15 Mar15 Nov15

s-id S1 S2 S2

s-nm Acme Superior Superior

Part-S sbeg send May14 9999 Jul14 Mar16 Mar16 9999 Jun15 9999

p-id P1 P2 P2 P3

p-nbr W45 A02 A02 C01

Supplier-Part-S sbeg send sp-id Sep15 9999 SP1 Oct14 Jan15 SP2 Jan15 9999 SP3

sc-abbr SCA SCB NUC

sc-desc tier 1 tier 2 off contract

s-scfk SC1 SC3 SC3

p-nm wheel axle axle chassis

sp-sfk S1 S2 S1

7

s-type T4 T22 T15

p-upr $3.25 $5.50 $6.25 $13.25

sp-pfk P3 P1 P2

FIGURE 1.3 Four State-Time Tables.

In reality, most time periods for real data will be specified at the level of dates or timestamps. The level at which time periods are specified is called the granularity of those time periods. I have chosen a granularity of months simply because it takes fewer characters to represent time periods this way, and this makes it easier to show each row of a table on one line across a page. I use the special string 9999 to represent the latest point in time, at the chosen level of granularity, that a specific DBMS can represent. For example, in SQL Server, and for our level of granularity, that is December of the year 9999. For a date datatype, it is 12/31/9999. Time periods that end on the latest representable point in time are said to be open time periods, and all others closed time periods. The semantics of an open time period is that it is one whose end point is unknown.8 This unknown end point could be represented by a null. But for reasons that I will present later, most implementations of time periods use the highest value a specific DBMS can represent, at the chosen level of granularity, for these time periods, instead of a null. As far as the DBMS is concerned, a 8 An open time period could also be one whose begin point is unknown. However, it could not be one in which both begin and end points are unknown.

8

CHAPTER 1 BITEMPORAL DATA: PRELIMINARIES

time period ending on December 9999 is a time period that ends just under eight-thousand years from now. But as far as users of a database are concerned, it is a time period which does not yet have a determinate end point in time. In Figure 1.3, begin and end times are underlined to show that they are part of the primary keys of their tables. Temporal columns which are not primary key or candidate key columns, as far as this book is concerned, are just like any other non-key columns. All non-key temporal columns describe something about whatever it is that each row represents. But primary and candidate key temporal columns do something quite different. They change what it is that each row represents. For example, each row in the Supplier table of Figure 1.2 refers to a supplier. No two rows in that table refer to the same supplier. But in the Supplier-S table of Figure 1.3, two rows both refer to supplier S2. What distinguishes those two rows is their non-overlapping state-time periods. Each row refers to a different state of S2. In state-time tables, it is not referents that each row refers to. It is states of those referents. I shall also say that each row represents a timeslice from an episode in the life history of a referent.

Table Names In Figure 1.3, there are four tables, whose names are S-Cat-S, Supplier-S, Part-S and Supplier-Part-S. “S” is a suffix standing for “state time”. These state-time tables are all unitemporal tables, utilizing only one of the two kinds of bitemporal time in their primary keys. For now, we can think of a unitemporal table as one that is created from a nontemporal table by adding one of the two temporal dimension time periods to the primary key of the nontemporal table.

Schema Rows As shown in Figure 1.3, a unitemporal table contains both the begin time and end time of a time period as part of its primary key. But as I said before, that isn’t necessary. A table is a state-time table if its rows represent state-time intervals in the life history of something that can exist over time, and can change while remaining that same thing  something such as a supplier, a supplier category, a part, or a relationship between a supplier and a part. Whatever column or columns function, in a state-time or bitemporal table, as a unique identifier of a state-time segment in an episode in the life history of a referent, I will call the state identifier (StId) of that table.

Temporal Foreign Keys and Temporal Primary Keys A temporal primary key is a primary key of a temporal table in which the one or two time periods associated with the table are included in the primary key.9 A temporal foreign key is a foreign key which points to a parent row (or rows!) that exists in a temporal table. For example, the s-scfk column of Supplier-S, and the sp-sfk and sp-pfk columns of Supplier-Part-S are not conventional foreign keys; they are temporal foreign keys. They are temporal foreign keys because the tables they reference  S-Cat-S for s-scfk, and Supplier-S and Part-S, respectively, for sp-sfk and sp-pfk  are temporal tables. 9 For the most part, I will use temporal primary keys in my sample tables, because this emphasizes the point that they are the unique identifiers of those tables, and also because this helps to fit one row of data on one printed line. However, my preferred implementation would be to use a single surrogate-valued primary key, and change temporal primary keys into non-primary-key temporal unique identifiers.

NONTEMPORAL, UNITEMPORAL AND BITEMPORAL DATA

9

Notice that s-scfk points to the RefId of the S-Cat-S table, and not to the full primary key of that table. Similarly, sp-sfk points to the RefId of the Supplier-S table, and sp-pfk points to the RefId of the Part-S table. Since there are two rows for the Superior company in the Supplier-S table, the sp-sfk temporal foreign key value of “S2” does not uniquely identify one parent row. By the same token, the sp-pfk temporal foreign key value of “P2” does not uniquely identify one parent row. It is only because RefIds are the sole components of the primary keys of nontemporal tables that nontemporal foreign keys uniquely identify single parent rows. Temporal and nontemporal foreign keys are both RefId pointers. The temporally-extended constraints of entity integrity and referential integrity  which I call, respectively, temporal entity integrity and temporal referential integrity, apply to these temporal tables. If they did not, then temporal tables would not be relational tables, as I have said they are. As we will see later on, it is the temporal form of these constraints, not the conventional form, which is basic. Conventional entity integrity and conventional referential integrity are merely special cases of temporal entity integrity and temporal referential integrity.

Stand-Alone Representation of Temporal Rows Sometimes rows of temporal tables are shown on their own rather than as rows underneath a table name and a schema row. In these cases, as with rows of nontemporal tables, each row is enclosed in braces. For example, the second row in the Part-S table in Figure 1.3 is: [Jul14|Mar16|P2|A02|axle|$5.50]

and the third row in the Supplier-Part-S table is [Jan15|9999|SP3|S1|P2]. Note that these stand-alone rows may be either set apart on their own line of text, or included in-line as part of the narrative.

A Note on Assertion-Time Tables State-time tables are one kind of unitemporal table. In the computer science literature, the term “transaction-time table” designates another kind of unitemporal table. But the term “transaction time”, I believe, has and continues to be misleading. As I will explain later, that term is a homonym. As designating a physical feature of data, it means one thing. As designating the semantics supported by that physical feature, it means something different. The best term for the semantics which this kind of time expresses is assertion, because these semantics are that these time periods mark the time during which a row of data is asserted to make a true statement. And so I will use the terms assertion begin time and assertion end time to mark the delimiters of these assertion time periods, and will use the abbreviations abeg and aend as the column names for these delimiters. I will also use “A” as the table-name suffix for these assertion-time tables. Except for those changes, these tables use the same names, the same schema row conventions, and the same conventions for stand-alone representations as do statetime tables. A more detailed discussion of the difference between the physical and semantic senses of “transaction time” will be found in Chapter 14. The semantics of assertion time, as I will explain

10

CHAPTER 1 BITEMPORAL DATA: PRELIMINARIES

there, are an extension of the semantics of transaction time.10 Assertion time is a proper superset of transaction time, both in its semantics and in its implementation. However, since those extensions will not be discussed until Chapter 14, the change in terminology, until then, is nothing more than that  a change in terminology. One important difference between state time and (the pre-Chapter 14 concept of) assertion time is that only state time may be specified on temporal transactions. Furthermore, an insert, update or delete temporal transaction may specify a state-time period that ends in the past or begins in the future, and therefore is not restricted to state-time periods that include the current moment in time. As for assertion time, all new rows added to a table by temporal transactions are assigned their assertion-time periods by the DBMS. This is the time period during which a row has the status of representing a statement which its authors assert to be true. Until Chapter 14, we will assume that this time period is identical to the time period delimited by the moment in time on which the transaction creating the row completes, to the moment in time at which another transaction (logically) deletes that row. In Chapter 14, I will discuss how and why assertion time can be specified on temporal transactions, and how and why temporal transactions can add rows to temporal tables with a transaction-time period that begins and/or ends in the future. Clearly, if I kept the term “transaction time”, the very term future transaction time would be an oxymoron. But a point about terminology is not an argument. For data that carries useful information, and that can be managed so as to preserve a consistent semantics, an oxymoronic label isn’t an argument against it. It’s an indication that the label was ill-chosen.

BITEMPORAL TABLES Just as unitemporal tables are nontemporal tables with a time period added to their primary keys, bitemporal tables are unitemporal tables with a second time period added to their primary keys. As indicated above, the second of the two temporal dimensions of bitemporal data is called transaction time in the computer science literature. I used the term assertion time for an extended form of this temporal dimension in MTRD, and will continue to use that term in this book.

Table Names In Figure 1.4, there are four tables, whose names are S-Cat-B, Supplier-B, Part-B and SupplierPart-B. “B” is a suffix standing for “bitemporal time”.

Temporal Foreign Keys and Temporal Primary Keys Like temporal primary keys in unitemporal tables, temporal primary keys in bitemporal tables contain a RefId. To be more precise, RefIds are components in non-surrogate candidate keys. For like any table, a temporal table may have a surrogate key as its primary key, and may indeed have a single-column surrogate key as its primary key. But for the most part, I will ignore this issue, and speak about temporal primary keys including RefIds. Most of the temporal 10

I will use the term standard theory to refer to the theory of bitemporal data developed by computer scientists, implemented in several major DBMSs, and standardized in the ISO and TSQL2 standards. The extension to that theory, which I will describe in Chapter 14, I will refer to as the Asserted Versioning theory of bitemporal data. That is the theory which I originally developed with Randy Weis, the co-author of MTRD.

SEMANTICS AND ITS IMPLEMENTATIONS

S-Cat-B abeg aend Jan14 9999 Jun14 9999 May14 9999

sbeg Jan14 Jun14 May14

send 9999 9999 Jun15

sc-id SC1 SC2 SC3

Supplier-B abeg aend Feb14 9999 Aug14 9999 Mar15 9999

sbeg Feb14 Aug14 Mar15

send 9999 Mar15 Nov15

s-id S1 S2 S2

s-nm Acme Superior Superior

Part-B abeg aend May14 9999 Mar16 9999 Mar16 9999 Jun15 9999

sbeg May14 Jul14 Mar16 Jun15

send 9999 Mar16 9999 9999

p-id P1 P2 P2 P3

p-nbr W45 A02 A02 C01

Supplier-Part-B abeg aend sbeg Sep15 9999 Sep15 Oct14 9999 Oct14 Jan15 9999 Jan15

send 9999 Jan15 9999

sp-id SP1 SP2 SP3

sc-abbr SCA SCB NUC

sc-desc tier 1 tier 2 off contract

s-scfk SC1 SC3 SC3

p-nm wheel axle axle chassis

sp-sfk S1 S2 S1

11

s-type T4 T22 T15

p-upr $3.25 $5.50 $6.25 $13.25

sp-pfk P3 P1 P2

FIGURE 1.4 Four Bitemporal Tables.

tables used in the examples in this book will consist of a RefId plus additional columns to represent time periods.

SEMANTICS AND ITS IMPLEMENTATIONS Data is persisted by means of database tables or software-specific file structures. Data is transformed by means of algorithms which exist as program code, macros, or scripts. Data is moved by means of messages, or by the physical movement of persisted copies of that data. Data is presented by means of screen displays, reports, and query result sets, or in various non-character-based forms such as graphs, charts and icons. Throughout all these activities, the data itself is something physical. And yet these physical things carry information, and these physical activities preserve that information. In creating data, we encode information in a physical form. In updating data, we preserve that information.

12

CHAPTER 1 BITEMPORAL DATA: PRELIMINARIES

In consuming data, we decode it. That is, we understand it, and in understanding it, we recover the information it contains. All these things we do to and with data  creating it, persisting it, transforming it, presenting it, moving it, archiving it, and so on  are rule-governed. The rules align the data with what the data is about. The person creating the data follows those rules, and the person consuming that data understands those rules, and operates on the assumption that the person creating the data also understands those rules, and has followed those rules in creating that data. This is how data has meaning, how it is something more than vibrations in the air, marks on a page, or strings of zeroes and ones in a computer’s storage. The physical structures and instances that we manage, and the transformations we apply to those instances, constitute the syntax of data management. The rules by virtue of which data conveys information constitute the semantics of that data. Most books on information management focus on data, on algorithms, and on the technology by which data is created, persisted, transformed, presented, moved, and archived. That technology is the infrastructure; that data and those algorithms are the syntax. Semantics exists in the background, and little is said about it. Perhaps this is because the semantics of data seem so obvious. A very simple set of transformations, for example, is this: • • •

At some point in time, a row is inserted into a database table. At various later points in time, one or more columns of that row are updated. At a yet later point in time, that row may be deleted from that database table.

The semantics implemented by these transformations is the semantics of keeping track of what happens to things. For example, one of the objects any company wants to keep track of is its employees. At some point in time, someone is hired as an employee. At later points in time, important things about that employee may change, such as the department she works in. And at a yet later point in time, that person may leave the company and cease being an employee. Semantics are the rules for managing data so that the data provides us with information. In this example, one rule is that when a person is hired as an employee, a row is added to an Employee table. Another rule is that when something important about the employee changes, the row is updated to reflect that change. A third rule is that when the employee leaves the company, that row is deleted from the Employee table. The real semantics of real data will be more detailed than these very basic correlations, of course. For example, there will be rules which associate each column of the Employee table with a property of an employee, or with a relationship an employee has with someone or something else. I introduced the distinction between semantics and syntax in MTRD, and used it to structure the extensive Glossary in that book. Because data is the means by which we store information in databases, semantics and syntax run in parallel. Syntax is about the data. Semantics is about the information it carries. The constraints imposed on introducing data into a database, altering data in a database, or removing data from a database, exist so that those transformations will reflect what is happening to what that data is about. In doing so, the syntax of those transformations preserve the semantics of that data. The constraints which every valid database state must conform to  such as entity integrity, referential integrity, domain integrity, and application-specific business rules  exist so that the data in the database will accurately and intelligibly describe what that data is about to those who query the database.

GLOSSARY LIST

13

For any physical structures or processes dealing with data in databases, it should be possible to state the semantics supported by those structures or those processes. This is because the whole reason for having those structures and processes is to persist information about the objects we are interested in, and to make that information available on demand. Conversely, for any semantic rule, it should be possible to describe the physical structures and processes that implement that rule. For example, the rule that at no time may a database contain contradictory statements is implemented by enforcing the relational constraint of entity integrity. That is what entity integrity is  a mechanism for keeping contradictory statements out of a database by permitting only one statement at a time in the database about any object, or any state of an object, represented in that database. The absence of contradictory statements in a database is a universal semantic requirement.

ASIDE Many readers will be familiar with the standard terminology of “transaction time” and “valid time”, especially those working with the releases of DB2 or of Teradata that support bitemporal data. For them, I note that the standard term “valid time” and my term “state time” are synonymous. Throughout this book, “state time” may be read as “valid time”, as much as I advise against using the latter term. As for the standard term “transaction time”, it is only in Chapter 14 and in a few places downstream from that chapter, that transaction time and assertion time differ. So except for those places, “assertion time” may be read as “transaction time”, albeit with the same caveat. For those who read my earlier book Managing Time in Relational Databases, or who wish to consult it as they read this book, the terminological mappings are quite simple. Assertion time in this book is assertion time in that book. State time in this book is effective time in that book. Other terminological differences are insignificant.

GLOSSARY LIST assertion time assertion assertion-time period assertion-time table bitemporal table clock tick closed time period closed/open conventional table episode event table event future transaction time granularity information inscription

interval life history object table object open time period referent identifier referent schema row standard theory state identifier state time state state-time period state-time table temporal entity integrity

temporal foreign key temporal primary key temporal referential integrity temporal transaction temporal unique identifier thing time period timeslice transaction table transaction time transaction unitemporal table valid time