Applications developed today, including Internet applications, data-mining software, and other general applications, will most likely have database components on the back end. Example applications vary from the online CD database located at CDDB.com to a recipe database, like the one located at FOODTV.COM. The databases used may range from powerful enterprise-scale versions of Oracle, DB2, and Sybase to more personal desktop versions like Microsoft Access. This chapter will focus primarily on standard SQL, not any specific implementation. Regardless of the database platform, one accesses the data utilizing the provided query language. Because most popular databases use the relational model, the query language provided is a derivative of SQL. Currently, most database systems provide users with languages such as SQL to allow them to retrieve or update stored information (Chan, Tan, & Wei, 1999).
Although database technology has been around for quite some time and in various forms, the relational model put forth in 1970 by E. F. Codd has enjoyed the greatest success. The relational model is firmly grounded in relational algebra, which has roots in basic set theory. This actually makes the basics of SQL accessible to most people.
Basic Set Theory
In basic set theory, a universe of the domain is specified. In the database arena, this translates to all of the data included in a database. In our universe, items may be categorized (perhaps multiple times) into the set(s) and then relationships between these sets may be explored, using inclusion and exclusion operations. Likewise, in the relational model, data are often retrieved as result sets. These result sets may then be compared either inclusively or exclusively.
Relational algebra “is a collection of operations that are used to manipulate entire relations. … The result of each operation is a new relation, which can be further manipulated by the relational algebra operations”. This is consistent with the mathematical definition of set operators. Relational operators are often separated into two groups. The first “group includes set operations from mathematical set theory. The others consist of operations developed specifically for relational databases.”
Null Values and Trivalue Logic
One concept that was added to basic set theory was the concept of a null value. An attribute that has a value of null is said to be an attribute that is not applicable, unknown, or undefined for the data row. One must be careful to properly account for the possibility of null values appearing in a database.
Null values, when used in numeric computations, act like infinity; anything we do in conjunction with a null value ends up as null. For example, if we add 3 + null, the result is null. When used in character operations (such as concatenation) the null typically acts as a blank.
Development is currently underway to enhance SQL into a computationally complete language for the definition and management of persistent, complex objects. This includes generalization and specialization hierarchies, multiple inheritances, user-defined data types, triggers and assertions, support for knowledge-based systems, recursive query expressions, and additional data administration tools. It also includes the specification of abstract data types (ADTs), object identifiers, methods, inheritance, polymorphism, encapsulation, and all of the other facilities normally associated with object data management.
Perhaps we want to enlarge the result set or perform a computation, such as computing the total for a shopping cart application or performing what-if scenarios relating to the price of a movie ticket. We can perform operations on the attributes listed in the select statement.
Arranging the ResultSet
SQL provides us with a mechanism for sorting the rows of a result set. This mechanism is the order by clause. When using the order by clause, the query developer specifies which attribute(s) to sort the data on and the direction asc (ascending) or desc (descending). If more than one attribute is specified, then the outer attribute specifies the major ordering and the next attribute(s) indicates the ordering within the primary ordering.
Segregating the ResultSet
Often we want to work on a partitioned result set (i.e., for each movie rating in our result set, we want to count the number of movies that have the rating). Or we want to get the most (or least) expensive book in our inventory. To support these types of queries, SQL has defined a number of aggregate functions? Example aggregate functions include COUNT(), MIN(), MAX(), SUM(), and AVG(). All functions take in column expressions (either column names or column computations).
So far the various options have been demonstrated against a single table, but if we are using a relational database, then the relationships between tables are equally important. To retrieve result sets with data from multiple tables, we need to use a join. There are several types of joins defined: equi, natural, Cartesian, inner, outer, and self. The equi and natural joins are the most common; the inner, outer, self, and Cartesian are special cases. In all cases, all tables that participate in a join must be listed in the from clause