Extending relational algebra to manipulate temporal data

Extending relational algebra to manipulate temporal data

In/arm. Sysrems Vol. 13, No. 3, pp. 289-296, 1988 0306-4379/88 $3.00+ 0.00 Printed in Great Britain. All rights reserved Copyright c EXTENDING REL...

814KB Sizes 0 Downloads 14 Views

In/arm. Sysrems Vol. 13, No. 3, pp. 289-296, 1988

0306-4379/88 $3.00+ 0.00

Printed in Great Britain. All rights reserved

Copyright c

EXTENDING RELATIONAL ALGEBRA MANIPULATE TEMPORAL DATA

1988 Pergamon Press plc

TO

NIKOS A. LORENTZOS and ROGER G. JOHNSON Department of Computer Science, Birkbeck College, London University, Malet Street, London WClE 7HX, U.K.

(Received

17 September

1987; in revised form 30 March 1988)

Abstract--Some weaknesses of relational algebra when manipulating temporal data are identified. An extension of this algebra is described, which helps overcome the problems. The proposed model is simple in structure, powerful and its operational capabilities are readily understandable by the user. In addition, an experimental implementat& is describeh. Keywords:

Temporal data, temporal database.

1. INTRODUCTION In a conventional database only the most recent snapshot of the real world is recorded, which is usually referred to as the current state. This state is not steady, since the contents of the database is subject to change, whenever a real world event is recorded. New pieces of data are added to the database and existing items are deleted or updated. In this paper such a database is called a Current-time Database (CDB). In a Temporal Database (TDB) past and future states, in addition to the current one, are also recorded. Its data is called temporal and it is “stamped” with the time during which it is valid. In the relational model, an effective method for manipulating data is by the use of Relational Algebra. However, certain weaknesses of Relational Algebra are identified, in this paper, when it is applied to temporal relations. For the purposes of this paper, the conventional relational algebra is called Current-time Relational Algebra (CRA). An extension of it, to manipulate temporal data, is called a Temporal Relational Algebra (TRA). This paper presents a formalisation of a TRA model, introduced by the authors in [l]. It is part of a research programme on temporal data management. In Section 2, some weaknesses of CRA, when manipulating temporal data are outlined. The proposed model is described in Section 3. Section 4 contains a series of examples, which demonstrate its expressive power. In Section 5, an experimental implementation is described. Some conclusions are drawn in the last section.

Fig. 1, the salary history of a group of employees is recorded. In this relation, data is “stamped” with time-points. Thus a new tuple is appended to it whenever a salary increase occurs. The first two tuples show that John’s salary was 10 k on date d2 and it became 11 k on date d8. Modelling temporal data, in a way similar to this, has been previously proposed by Bubenko in [2] and Kowalski in [3], though not in the context of defining of TRA. However, the method for retrieving data on a date which is not explicitly recorded in a relation, is not fully explained in these papers. To overcome this problem, Clifford, in [4], suggests the use of interpolation functions but he has to introduce three types of null values. Two of these are shown in Fig. 1. The value NULL, in the fifth tuple, is used with its most common interpretation, “currently, it is unknown what George’s salary was, from date d6 up to date d8”. The value NULLI, in the last tuple, has another interpretation, that “George ceases being paid on date d15”, probably because he resigned. Finally, a third type of null is used to show that a value is unknown for all points in time. In [4, 51, Clifford proposes a Historical Relational Algebra. In his approach, first normal form is not maintained. In addition, relations, in which many to many relations are recorded, become unduly complicated. Finally, a distinction is necessary between the ordinary SALARY

2. PROBLEMS IN MANIPULATING TEMPORAL DATA Some problems with the manipulation of temporal data using CRA, are now shown. In the example, in

Fig. 1. Stamping temporal data with time-points. 289

NIKOS A. L~RENTZOS

290

Fig. 2. Stamping

temporal

data

with time-intervals.

relations and those ones, which incorporate interpolation functions. Figure 2 shows the dual of the relation shown in Fig. 1. That is, the relation, in Fig. 2, contains the same data as the relation, in Fig. 1. The difference is that, in Fig. 2, temporal data is “stamped” with the time-interval during which it is valid. The attributes Fdate and Tdate denote intervals of time, of the form [Fdate, Tdate), during which temporal data is valid. For presentation reasons, attributes forming intervals are shown, in this figure, by omitting a vertical line, partitioning the values of the tuples, for these attributes. In the third tuple, “Tdate = @” indicates that John’s salary continues at 12 k, until the relation is updated. This notation is fully explained, in Section 3.1. Some examples will highlight the weaknesses of CRA, for manipulating temporal data “stamped” with time-intervals. (1) If the UNION operation is applied, directly, to append the tuple (george, 8 k, d5, dl0) to relation SALARY, in Fig. 2, then duplication of data will occur for certain dates. This has undesirable side effects when data is subsequently deleted, retrieved or aggregate functions are applied. An effective solution, however, should combine the appended tuple with the fourth and fifth one, into a single one, (george, 8k, d3, dll). (2) Assume that John was not employed during the period [d4, d6). Then the information (john, 10 k, d4, d6) has to be deleted. It is observed that this information is not explicitly recorded in a tuple, in relation SALARY. Therefore, if the set-difference operation of CRA is applied to SALARY and another relation, which contains this tuple, the contents of relation SALARY will not be changed. However, the effect of the operation should be that the first tuple, in SALARY, be replaced by two tuples, (john, 10 k, d2, d4) and (john, 10 k, d6, d8). (3) Figure 3 shows that there are 13 distinct ways in which two intervals can be placed relative to each other. Consequently, it is not immediately obvious how a simple query such as “Retrieve the salaries of all the employees during the interval [d4, d 13)” should be expressed, in CRA. In more complicated queries, the difficulties become greater. Approaches to the definition of a TRA, which overcomes the above problems, are reported by Gadia [6], Ahn [7], Tansel [8] and McKenzie and Snodgrass [9]. None of them preserve first normal

and R. G. JOHNSON

form. An evaluation of various proposals for a temporal extension of the relational model are reported in [lo]. In addition, an extension of SQL is defined by Navathe in [1 11. No implementation is reported in these papers. One implementation to manipulate temporal data, “stamped” with timeintervals, is reported by Snodgrass in [12]. However, a formal definition of a TRA is not given in this paper. The authors, at this point, would make two general remarks concerning the implementation of a TRA. (1) Previous implementation efforts lack an underlying formalisation and, as a result, they provide partial, often application dependent, solutions. (2) Approaches to the definition of a TRA do not relate it to the implementation problem. They are all extensions of Non-First Normal Form (NF’). Relational Algebra, [13], and they all exhibit some degree of complexity. No description has yet been published of any implementation of any of them. In contrast, the authors claim that the model proposed in this paper, is simple, since it preserves first normal form and a successful experimental implementation has been developed without difficulty. In addition, the model allows the user to have a dual perception of time. That is, the user may choose, at will, to see time either as a time-point or as a time-interval. To the authors’ knowledge, it is the only model which has this flexibility. 3. DESCRIPTION

OF THE MODEL

3.1 Time domains Time can be measured with a wide range of precisions. For instance, historical events are usually associated only with a year while a railway timetable is measured in minutes. Consider, as an example, the use of day, month and year to form dates. Dates form one “time” data type. They also constitute a totally ordered set. Because of the computer’s physical representation limits, only a finite number of dates can be represented in a database. This does not introduce any problems since, practically, users are not interested in recording data which is valid with respect to the infinite past or future. Therefore, for the purposes of this work, the following definition is given.

Fig. 3. Relative

positions

of two I-dimensional

intervals.

291

Extending relational algebra A Time Domain (TD) is a non-empty, finite, totally ordered set of consecutive elements of the same time type. For instance,

HOUR:MINUTE:SECOND = {OO:OO:OO,.. . , 23:59:59} are TDs. The elements of a TD, D, are called D-points. The implementation of various TDs is reported in [14]. Let D =(d,, d2,..., d,,}, d,
where d, E D and d, E D U { @}. The points d,, d, are called the boundary points of this interval or the start and stop points, respectively. Given that, by definition, an interval is open to the right, it becomes obvious that the element al, E D, as specified above, allows the representation in a database of one interval (d,, @?)={d,,...,d,},

l,
d,,, the greatest

point of a TD D.

3.2 Functions

. ,d,} be a TD and

N “+I = {1,2,.

, n + 1).

Also let N={O, 1,2 ,... }. Then, because of the above properties, some functions, which can be defined, are the following: ord,:

D U { @I~} + N,+, ,

ord,(d,)

= i.

This is a one to one onto function and its inverse is denoted by iord,. As shown in Section 4, the functions ord,,, ord,, and iord,, allow the conversion of one interval over a TD D, into an interval over another TD, D,. succ,:DU{@,}

x N-*DU{@,}, succ,(di,

pred,:DU{@:,j

succ,(di, k) = iord,(ord,(d,)

+ k),

k) = d;+k

x N--+DU{@,}, pred,&

The general format of a Relational Algebra in this paper, is R3 = adopted, operation, OPERATION-NAME [Expression] (R 1) or R3 = OPERATION-NAME [Expression] (R 1, R2), where OPERATION-NAME is any of the primitive operations UNION, DIFF (set difference), SELECT, PROJECT, CP (Cartesian product) or the nonprimitive operation JOIN “Expression” is optional. It may be either a selection formula or attribute names on which a projection occurs. R 1 (R 1 and R2) are the relation (relations) on which the OPERATION-NAME is applied and R3 is the resulting relation. One additional operation, COMPUTE, is also used, SO that functions can be incorporated in the model. For example, the statement R 1 = COMPUTE

V,,, d,)={d,ld,ED,dp~d,
Let D = {d,,

for example

3.3 Relational algebra operations

= (1, 2,. . . , 24},

which contains

functions,

and they are not total.

DATE = (01.01.87, . . . , 31.12.87}, HOUR

They are composite

k) = d,_/,

These allow the user to refer to a point which is k places to the right or to the left of di, respectively.

[Namount = Amount

where SALARY relation

is the relation in Fig. 1, results in the

Rl (Name, a sample

+ 2 k] (SALARY),

Amount,

Date, Namount),

tuple of which is Cjohn, 10 k, d2, 12 k).

An operation, of this type, is provided by all practical relational DBMSs. Some authors use “RENAME” as another operation, which assigns new names to the attributes of a relation, so that ambiguities can be avoided, if two relations, having attributes with common names, are JOINed. In this paper, the symbol “t” is used, for this purpose. In the previous example, the statement R 1 = COMPUTE

[Namount Employee

= Amount

+ 2 k,

+- Name] (SALARY)

gives rise to the relation R l(Employee,

Amount,

Date, Namount).

To assist the user manipulate temporal data, three additional operations are defined. 3.3.1 The FOLD operation. The FOLD operation converts a relation in which temporal data is “stamped” with time-points to an equivalent relation, in which the same data is “stamped” with timeintervals. As an example, for the relation CJSALARY, in Fig. 4, the operation SALARY

= FOLD

[Sdate,

Fdate, Tdate] (USALARY),

results

in the relation

shown

in Fig. 2

292

NIKOS A. LORENTZOS and

JSAL

:Y

;datc YE

Name john john john

d3 d4

Amount 1Ok 1Ok 10k

R. G.

JOHNSON iSAL

.Y

idalc -E

Name john john john

d3 d4

john john john john john john john

10; Ilk Ilk Ilk Ilk 12k

12k

di d8 d8 d8 d8 d12 d12

dti d3 d4 d5 d8 d9 d10 dll

john

gmrge

12i 8k 8k 8k 8k 8k 8k 9k

6 d3 d4 d5 d8 d9 d10 dll

john george gwrge george george george george george

12; 8k 8k 8k 8k 8k 8k 9k

dli d3 d3 d3 d8 d8 d8 dll

@i d6 d6 d6 dll dll dll d15

george

9k

- did

george

9;

dl;

dl;

gmw gmge george gemge george george

Fig. 5. An extended

= D,, . . . , A,=D,,

Q =D)

then the scheme of R is ,,...,

FQ=D,TQ=D)

A,=D,,

and its contents is R = ((a,, . . . , a,, d,, d&l (a,, . . ,anr 4) E UR,

d, < di < d,,

dPml)pUR,

@,,...,a,,

d,)eUR).

Two further operations are now defined, which are refinements of those originally introduced by the authors in [l]. 3.3.2 The EXTEND operation. This operation extracts the time-points of a time-interval. Specifically, let ,,...,

A, = D,, FQ = D, TQ = D),

be a folded relation, where FQ and TQ are used to record intervals of the form [FQ, TQ).

If ER = EXTEND[SQ,

FQ,

TQIWI

FQ=D,

and its contents is ER = {M, aI, . . , a,, d,, d,)l (at,. ..,a,,d,,d,)~R,d~~d,
TQ=D),

relation.

where SALAKY is the relation, in Fig. 2. 3.3.3 The UNFOLD operation. This operation converts a temporal relation whose data is “stamped” with time-intervals to an equivalent one, in which data is “stamped” with time-points. Specifically, let A,=D,,

,,...,

FQ=D,TQ=D),

where FQ, TQ are used to record intervals of the form

[FQ, TQ). If UR = UNFOLD

[SQ, FQ, TQ] (R),

then the scheme of UR is = D, A, = D,, . . . , A, = D,),

UR(SQ

and its contents is UR = {(d,, a,, . . . , anIl (a,, . . .,an,d,,d~)ER,dp~d,
operation,

since

[SQ, FQ, TQJ (R) [on all attributes

except for FQ, TQ] o EXTEND

. . , A, = D,,

J

ESALARY = EXTEND [Sdate, Fdate, Tdate] (SALARY),

then the scheme of ER is = D, A, = D,,

@@

ER is called an extended relation with respect to the attributes FQ and TQ. Figure 5 shows

R(A,=D

UR is called an unfolded relation, with respect to Q having unfolded tuples whereas R is a folded relation, with respect to FQ, TQ, having folded tuples.

ER(SQ

di dl2 dl2 dl2 d12

d; d8 d9 d10 dll dl2 d13

be a relation and let D, the underlying domain of Q, be a TD. If R = FOLD[Q, FQ, TQ] (UR)

R(A,=D

d8 d8 d8

1oi Ilk Ilk Ilk Ilk 12k 12k

Generally, let

(a ,,...,a”,

d2 d2 d2

john john john john john john john

Fig. 4. An unfolded relation.

R(A,=D

1Ok 1Ok 1Ok

di d8 d9 d10 dll d12 d13

- dli

UR(A,

FLY&Jcz

Amount

[SQ, FQ,TQI (RI,

where “0” stands for composition Figure 4 shows

of operations.

USALARY = UNFOLD [Sdate, Fdate, Tdate] (SALARY), where SALARY is in Fig. 2.

Extending

relational

It should be noted that the UNFOLD and FOLD operations are not the same as UNNEST and NEST of NF* relational algebra, defined in [13]. As an example FOLD results in an interval, interpreted as a set of consecutive time-points, whereas NEST results in a set, which is the value of a tuple for a set valued attribute. 4. TEMPORAL

DATA MANIPULATION

A series of examples are now given, to show the expressive power of the model. They have been chosen to show its rich capabilities, in comparison to other proposals. In some of these examples, historical data is updated for correction purposes. Updating such data is not otherwise permitted. Examples 1 and 2 refer to the relation SALARY, in Fig. 2. Example I: “Update SALARY to record that George’s salary was 8 k, during [d5, dl0)“. If R is a relation with one tuple, (george,

8 k, d5, dlO),

the steps are: R 1 = UNFOLD

[Sdate, Fdate,

R2 = UNFOLD

[Sdate,

R3 = UNION R4 = FOLD

(Rl, [Sdate,

Fdate,

Tdate] (SALARY), Tdate] (R),

R2), Fdate,

Tdate] (R3).

R4 contains the tuple (george, 8 k, d3, dll), in place of the fourth and fifth tuple of the SALARY relation. It is observed that appending data, to a folded relation, may result in a decrease of the number of its tuples. In addition, it is observed that this tuple, of R4, was not an explicitly recorded tuple, in any of the relations, from which R4 was derived. Example 2: “John did not work during the period (d4, d6). Update SALARY to show that he was not paid during this period”. Assume that a relation R, set-difference compatible to SALARY, contains exactly one tuple (john, 10 k, d4, d6). This update is carried out as follows. R 1 = UNFOLD

[Sdate, Fdate,

R2 = UNFOLD

[Sdate,

R3 = DIFF R4 = FOLD

(Rl,

Fdate,

Tdate] (SALARY), Tdate] (R),

R2),

[Sdate,

Fdate,

Tdate] (R3).

R4 has the same tuples as SALARY, except that the tuple Cjohn, 10 k, d2, d8) has been replaced, in R4, by the tuples (john, 10 k, d2, d4), Cjohn, 10 k, d6, d8). It is observed that a deletion, in a folded relation, may result in an increase in the number of its tuples, something that can never occur in a CDB. An updating of a folded relation is achieved by a deletion, followed by an append operation, as has been shown in the above examples. Furthermore,

algebra

293

since “@” can be a date in the future, the reader will observe that past, present and future temporal data is inserted, deleted or updated, in a uniform way. In [6] such a uniformity is not achieved and, in [8], future data is not modelled. It should be noted that d,, the greatest time-point of a time domain, need not be a fixed point. In a DBMS, supporting a temporal database, d, can be declared as d, = current-date + 365. In this way, it is possible to record, in a folded relation, temporal data, which will be in effect, during a period [d,, d,). in the future, provided that d, < d,, ,. Example 3: the Cartesian product of two folded relations is a relation, in which the two pairs of attributes, on which time-intervals are recorded, form a 2-dimensional time-interval. To the authors’ knowledge, the importance of this fact has not been previously discussed. The resulting relation is a twice folded one. The following example shows that 2-dimensional time-intervals can be used for the recording of periodic events, in a temporal database. Figure 6 shows a user defined twice-folded relation. Fdate and Z’date form an interval over the TD DATE and Fhour, Thour form another interval, over the TD HOUR. Its unique tuple shows that John works as a guard during the hours h 8-h I2 (h 12 excluded), for every day in the interval [d5, d9). Let a user operation be “Update the GUARDS relation to record that John does not work as a guard during the hours h 10-h 12, in the interval [d7, d9)“. If F is another relation, set-difference compatible with GUARDS, containing the tuple (john, (i7, d9, hl0, h12), the steps are the following: R 1 = UNFOLD

[Sdate,

Fdate,

R2 = UNFOLD

[Shour,

Fhour,

R3 = UNFOLD

[Sdate,

Fdate,

R4 = UNFOLD

[Shour,

Fhour,

R5 = DIFF

Tdate] (GUARDS), Thour]

(RI),

Tdate] (F), Thour]

(R3).

(R2, R4),

R6 = FOLD

[Sdate,

Fdate,

S = FOLD

[Shour,

Fhour,

Tdate] (R5), Thour]

(R6).

S is shown in Fig. 7(a). If R5 is folded firstly on hours and then on dates then relation T, in Fig. 7(b), is derived. S and r, though not equal, are equivalent, in the sense that every piece of information which can be derived from the former is also derivable from the latter and vice versa. Furthermore, a sequence of unfoldings followed by a proper sequence of foldings, leads from either of them to the other. It should be noted, however, that different semantics are associated with

GUARDS Name 1 Fdate john I d5

1 Tdate 1 Fhour1 Thour d91

Fig. 6. A twice folded

h8 relation.

h12

Nxos A. Loa~~~zos and R. G.

294

JOHM~N

OVERTIME

~~ (a) T NiUiW Fhour 1 l-hour Fdate 1 Tdatc h8 hi2 d5 d7 john h8 h10 d7 d9 john

AOVERTIME

Fig. 7. Two equivalent twice folded relations.

each of them. In particular, every tuple of S shows the range of hours during which John worked as a guard, for every specific range of dates. In contrast, every tuple of T shows the range of dates, during which John worked as a guard, for every specific range of hours. Although a generic symbol “a” is used, its value is TD dependent. Specifically, for the TD, DATE, it could have been defined as @ = @nATE = current-date + 366 whereas for the TD, HOUR, @ = @nOuR = h24. Example 4: a folded relation can be interpreted as a stored step function. However, the mode1 incorporates interpolation and approximation functions. The relation, in Fig. 8, shows one possible way of modelling patients’ temperatures, during a 24-hr period. Initially, PATIENT contained only one tuple, (john, 37.0, ho, @I), where @ = @nOuR = h24. Subsequently, new readings were recorded, until the relation, in Fig. 8, was derived. Now, supposing that a patient’s temperature changes linearly between two consecutive readings, consider the query “Retrieve the patients’ hourly temperature”. To answer this query, let PATIENT1

(Namel, Templ,

Fhourl,

(PATIENT,

Thour = Fhourl]

PATIENT]),

R2 = PROJECT pame, Temp, Templ, Fhour, Thour] (Rl), R3 = EXTEND

[Hour, Fhour,

R5 = PROJECT

[Hour, Name, Hourly-temperature] (R4),

give the final result, where f(Hour)

= (Templ-

Thour] (R2),

R4 = COMPUTE [Hourly-temperature =f (Hour)1(R 3) PATIENT

Temp)*(Hour

- Fhour)

/( Thour - Fhour) + Temp The graph of R 5 could be plotted, if such a facility is available. Example 5: it is also possible to convert between intervals defined over arbitrary TDs. Figure 9(a) shows the overtime worked by employees. The underlying domain of Fuhour and Tuhour is a user defined TD, UHOUR, with time-points UHOUR = fUhour]O G Uhour < 100, Uhour = integer). Let ord, be the ordering function Let also ABSOLUTE-HOUR

Thourl)

be a copy of PATIENT. This is done to avoid ambi~ities, when the PATIENT relation is JOINed with itself. Then, the operations R 1 = JOIN [Name = Namel,

Fig. 9. Conversion between arbitrary time-intervals.

on this domain,

= (A hour = Date:Hourl

01.0187 6 Date < 01.01.88, 0 < Hour < 24, Hour = integer) be another TD and ord, be its ordering function. Now, consider the following query: “Convert the intervals of OVERTIME into intervals over the TD ABSOLUTE-HOUR, so that Cihour = 0 to correspond to Ahour = 31.01.87:20”. One way, to answer this, is the following. R I = UNFOLD

[Uhour, Fuhour, 2-uhour] (OVERTIME)

R2 = COMPUTE [A hour = iord, (ord”( Uhour) - ord,(O) + ord,(31.01.87:20))] (Rl) R3 = PROJECT

[A hour, Name] (R2).

Finally, A OVERTIME ~

Fig. 8. One way of modelling patients’ temperature.

= FOLD[Ahour,

Fahour,

Tahour] (R3).

This relation is shown, in Fig. 9(b). It is possible to obtain a more efficient solution to this query, by

295

Extending relational algebra extending the COMPUTE operation to apply to intervals. This is the subject of further work, which is intended to publish in the near future. Example 6: let mod(p, q) be the function which returns the remainder of the integer division p/q, where p, q are positive integers and q ~0. The composite use of it with an ordering function ord, allows the retrieval of temporal data on D-points of an arbitrary periodicity. As an example, let a relation, SALES(Sales-date,

Amount),

be augmented every day by a tuple containing the value of sales, during that day, and let a user query be “Retrieve the amount of sales on every fifth day, in the interval [03.02.87, 03.07.87)“. This is achieved as follows: R 1 = SELECT (SALES),

[03.02.87<

Sales-date

R2 = COMPUTE [Remainder = mod(ord,,,,(Sales-date), R3 = SELECT

[Sales-date

R4 = PROJECT R5 = JOIN

< 03.07.871

5)] (R l),

= 03.02.871 (R2),

[Remainder]

[R2,Remainder

In the current implementation, an EXTEND operation is issued as ESALARY = EXTEND [Sdate, Fdate, Tdate] (DATE, SALARY) and, internally, it is transformed, into a QUEL statement, equivalent to ESALARY = JOIN[Fdate < Sdate < Tdate](DATE, SALARY). It should be noted that, in general, a TD should appear to the user’s view as a relation. For example, in the case of the TD DATE, this allows the user to issue queries of the form “What was the date 3. 18 days ago by usmg the predDATE function. The implementation of the FOLD operation is more complicated. In [l5] it has been proved that it can be expressed as a sequence of CRA operations, including COMPUTE. When SALARY

= FOLD

[Sdate,

Tdate] (USALARY) is issued to the relation in Fig. 4, it is transformed, internally, into a sequence of QUEL statements, equivalent to those which follow. In this sequence, R72 is a copy of R71. That is, relation renaming is used, so that joining R71 with itself does not introduce ambiguity problems. R 1 = COMPUTE (USALARY)

(R3), = R4.Remainder,

R2 = PROJECT

Rem2 + R 2. Remainder,

Fdate,

[Sdatel

[Name,

= Sdate]

Amount,

Sdatel]

(Rl) Rem4 + R4.Remainderl R6 = PROJECT

[Sales-date,

(R2, R4), Amount]

R3 = COMPUTE [Sdate2 = succ,,,,(Sdatel, l)] (R2)

(R 5).

Several other capabilities of the model are described in [ 11and [ 151. These include an effective use of SUCC~, pred, and other functions as well, which allow: the retrieval time-intervals ljoining folded section; . realisation of [l6], between

l

of temporal data or subintervals; relations under

during their

the OUTER-EQUI-JOIN two such relations.

arbitrary time

inter-

operation,

The reader can also verify that operations on “time sequence arrays”, defined by Shoshani in [17], are also supported. 5. IMPLEMENTATION The model described in this paper has been implemented on a PDP-1 l/44 machine, using INGRES for low level data management. The implementation of the UNION, DIFF, CP, PROJECT, SELECT and COMPUTE operations, in terms of QUEL statements, was straightforward. For the EXTEND operation, additional relations are maintained in the database, one for every for the TD DATE = TD. For example, (01.01.87, ,31.12.87}, one relation DATE(Sdate), contains all these dates. It is not necessary for these dates to be in ascending order.

R4 = PROJECT [Name, Amount, Sdatel +- Sdate2] (R3) R5 = DIFF

(R2, R4)

R6 = DIFF

(R4, R2)

Sdate2,

R7 = JOIN [Name t RS.Name = R6.Name, Amount +- R5.Amount = R6.Amount, R5.Sdatel < R6.Sdate1, Fdate +- R5.Sdate1, Tdate + R6.Sdatel] (R5, R6) R8 = JOINlName+R71.Name = R72.Name, Amount+ R71 .Amount = R72Amount, Fdate + R 7 1. Fdate = R 72. Fdate, R71 Tdate < R72. Tdate, M +R71.Tdate, N t R 72. Fdate, Tdate + R72. Tdate] (R71, R72) R9 = PROJECT [Name, Amount, Fdate, Tdate] (R8) SALARY

= DIFF

(R71, R9).

The implementation was achieved without difficulty using QUEL macros. The most complex, FOLD, consists of 42 QUEL lines. Efficiency has been neglected at this stage. The decision to implement EXTEND and FOLD in this way, was based on the necessity to check whether

NIKOS

296

A.

LORENTZOS and

actual results match the theoretical ones. The FOLD operation can be implemented more efficiently and the same appears to be true for EXTEND. 6. CONCLUSIONS This paper has described a model for a Temporal Relational Algebra. Its expressive power has been demonstrated by a series of examples and an experimental implementation has been described. Further research includes a study of the normal forms, a definition of a calculus and the incorporation of optimisation techniques. In the authors’ opinion, a relation in which temporal data is “stamped” with time-intervals can be considered as a special case of a general interval relation. An initial study of the properties of a formally defined Interval Relational Algebra is currently being undertaken [ 181. In conclusion, the authors believe that by using an Extended Relational Algebra, time need not be understood as a means of “stamping” temporal data but can be treated, in all respects, like other data. Acknowledgement-The authors would like to thank the referees for their helpful suggestions. REFERENCES [1] N. A. Lorentzos and R. G. Johnson. TRA: a model for a temporal relational algebra. Proc. Co& Temporal Aspects in Information Systems. (Edited by C. Rolland, F. Bodart and M. Leonard), pp. 95-108. NorthHolland, Amsterdam (1987). [2] J. Bubenko. information modeling in the context of system development. Information Processing (Edited by S. H. Lavington), pp. 395411. North-Holland, Amsterdam (1980). [3] R. Kowalski. Logic as a database language. Proc. 3rd Br. Natn. Conf. on Databases (Edited by J. Longstaff), pp. 1033132 (1984). [4] J. Clifford and A. Tansel. On an algebra for historical relational databases: two views. Proc. ACM SIGMOD Int. [email protected] on Management of Data, pp. 247-265 (1985).

R. G. JOHN~CIN

[5] J. Clifford and A. Croker. The historical relational data model (HRDM) and algebra based on lifespans. Internal Rept, Graduate School of Business Administration, New York University, (1986). [6] S. Gadia and J. Vaishav. A query language for a homogeneous temporal database. Proc. ACM Symp. on Principles of Database Systems, pp. 51-56 (1985). [7] I. Ahn. Towards an implementation of database management systems with temporal support. Proc. Inf. Conf. on Data Engineering. IEEE, pp. 374381 (1986). [S] A. U. Tansel. Adding time dimension to relational model and extending relational algebra. Inform. Sysferns 11 (4), 343-355 (1986). [9] E. McKenzie and R. Snodgrass. Supporting valid time: an historical algebra. Technical Rept TR87-008, Department of Computer Science, University of North Carolina at Chapel Hill. [lo] E. McKenzie and R. Snodgrass. An evaluation of historical algebras. Technical Rept TR87-020, Department of Computer Science, University of North Carolina at Chapel Hill. Ull S. Navathe and R. Ahmed. TSQL: a language interface for history databases. Proc. Conf. Temporal Aspects in Information Systems (Edited by C. Rolland, F. Bodart and M. Leonard), pp. 109-122. North-Holland, Amsterdam (1987). WI R. Snodgrass. The temporal query language TQuel. Proc. 3rd ACM SIGMOD Svmu. on Princioles of Database Systems, Waterloo, - Ontario, dana&

pp. 204212 (1984). P31 G. Jaechke and H. Schek. Remarks on the algebra of non-first normal form relations. Proc. ACM-SIGACTSIGMOD Symp. on Principles of Database Systems, Los Angeles, pp. 124-138 (1982).

P41 M. Adiba, N. Quang and J. Palazzo. Time concepts in generalised data bases. Proc. ACM Annual konj, Denver, Colorado, pp. 214223 (1985). [I51 N. A. Lorentzos. Time in databases. Internal Rept NL/1/86, Department of Computer Science, Birkbeck College, University of London (1986). [I61 E. F. Codd. Extending the database relational model to capture more meaning. ACM Trans. on Database Systems 4(4), 397434

(1979).

1171A. Shoshani and K. Kawagok. Temporal data management. Proc. 12th VLDB Conf. Japan, pp. 79988 (1986). P81 N. A. Lorentzos and R. G. Johnson. An extension of the relational model to support generic intervals. Advances in Database Technology-EDBT 1988 (Edited by .I. W. Schmidt, S. Ceri and M. Missikotl), pp. 528-542. Springer, Berlin (1988).