Database Standards
The goal of the ITS Database Standards is to document new databases and applications in a consistent manner across platforms. These methods and requirements should provide a template for better documentation and make it easier to identify objects during daily maintenance and crisis events.
Below is the list of deliverables and standards for any new database development:
- The application and database documentation must be listed in the Global Index Notebook.
- A logical data model must be created with the entities and unique identifiers for each entity listed.
- Each entity must have a definition written in paragraph form.
- Each table must have a definition that may be identical to the corresponding entity.
- Each column in a table must have a definition, a descriptor, a size, a domain, and a datatype.
- The column abbreviations used must be in the Data Management Abbreviation list.
- There must be a physical data model of the database that includes table and column names prior to the creation of the physical database.
- The data stored in a column must correspond to the column definition and domain.
- All OLTP databases must have separate development, test, and production environments.
1. Record and Table Names
Goal: To have record and table names that are consistent, clear, and descriptive of the data maintained in the record or table. A record or table is a data representation of an entity with an existence in the real world. Buildings, employees, stock items, etc. have an existence, and the university has a need to keep data about each occurrence of those entities. A record or table name should indicate what entity that record or table contains data about.
- Name the table (the word table will be used to mean either record or table) to indicate the entity that is represented by that table. A table containing data about a building should be called Building. A table containing data about an employee should be called Employee. A table containing data about a stock item should be called Stock-Item.
- Sometimes there is a need for more than one table to contain data about an entity. For example, we may have a table containing data about an employee's personnel data, a table with payroll data, and a table with historical data. The table name should be expanded. In the above example, we would have the table names: Employee-Pers, Employee-Pay, and Employee-Hist.
- All table names will be singular. A table name for a table representing buildings will be Building. A table name for a table representing employees will be Employee.
- Because a table may be used by more than one application, there will not be a reference to an application in a table name. An FRS account table should always have the same name no matter what application needs to use the data in that table.
- A column or element name may not be a table name. An element or column represents an attribute of an entity, not the entity itself. Employee name is an attribute of employee. The Employee table would have a column called Employee-Name, or properly abbreviated, EMP-NM.
- In a network IDMS database, the record name will have an IDMS internal record ID appended as a suffix to the record type. The record name for university buildings will be Building-303. These suffix numbers are not necessary in databases defined as relational in IDMS, Sybase or Oracle.
2. Data Element and Column Names
Goal: To have data element and table column names that are consistent, clear, and descriptive of the data maintained in the element or column. An element or column should have the same name every place it appears in all databases across all platforms. Customers and programmers should not have to guess what name is used in a specific database on a specific platform. Additionally, standard names will improve productivity, facilitate sharing of programmer resource among groups, and provide coherent table joins.
Process:
- 2.1. Name the column (this will mean element or column) based on what the data is, not how it is used. A column may be used different ways but it should have only one name. Example: Suppose we have an indicator that has two values: 0 means USA, and 1 means foreign or not USA. One program may use the indicator to summarize fields by adding to one counter if the indicator is 0 and to another counter if it is 1. Another program may print only records that are non-USA or value 1. If the column is named for its use, we could have two names for the same column; Summary-Indicator and Print-Indicator. And if there are other uses such as purge, selection etc., there could be many other names for the same column. Furthermore, if a different program does print selection on some other column we would have two columns named Print-Indicator with different values. An appropriate column name could be USA_IND (IND for indicator).
- 2.2. A domain is a of set of values for a column. The domain of the column must be identified, named and included in the column name. Explanation: the set of all Organization Numbers is a domain and thus should be part of the (or possibly the entire) column name, Organization-Number (abbreviated to ORG_NUM). There are times when a domain is so broad it is not useful such as on-hand quantity of a stock item. The values could be from zero to nearly infinity. In this case, the column does not have a domain included in the name, Stock-Item-On-Hand-Quantity (abbreviated STK-ITM-ON-HAND-QTY).
- 2.3. If a domain name appears twice in a table or record, then the role it plays must be identified and added to the column name. Example: an employee works in ITS but is paid by the Cashiers office. The employee table would need to show two organization numbers, possibly called Organization-Number-Works and Organization-Number-Paid. Note this is not in conflict with item #2.1 above. A role differentiates one column from another column when the same domain is represented twice or more in a table. A role does not represent how a column is used. The organization number may still be used for print selection criteria, summary criteria etc. Another example is an Employee table with an employee's person ID and the person ID of the employee's boss; PERS_ID_EMP and PERS_ID_BOSS. Both columns have the same domain.
- 2.4. The column name must be clear without the context of its membership in a table. An employee name should be called Employee-Name (EMP_NM); an organization name called Organization-Name (ORG_NM). Simply calling each column Name is not sufficient. Some software products cannot handle two columns with the same name. Furthermore, two columns called Name would seem to indicate the same domain when in fact, we have two different domains; that is, the entire set of employee names and the entire set of organization names.
- 2.5. Each column must be categorized with a descriptor (class code) identifying the type of data it contains. Initially developed by IBM, these descriptors have now been embraced, with some individual modifications, as nearly an industry wide standard. Normally, the descriptor will be the last component of the column name. There are two exceptions to this location described below in paragraph 2.9.
Descriptor Meaning Description ADDR Address Location of a person or building, or a person's email address. AMT Amount Numeric representation of currency or other amount (e.g. scores, points) CD Code Alphanumeric code which can be translated or has assigned meaning. Has a distinct set of values (e.g., customer_type_cd). Often used as a primary key on a reference table. CNT Count Represents a count of a number of occurrences. Typically used in summary tables. DT Date Represents a date. IND Boolean Indicator Indicates a logical condition which has only two values (e.g., "Y" or "N", 1 or O). May also contain NULL values. (e.g., return_item_ind) ID Code Identifier Alphanumeric code used to represent some type of entity. Generally has no built in meaning. Could also the ID reference from an external system. Often used as the primary key for a transactional table. (e.g., sales_order_id) NM Name Alphanumeric word or phrase that is usually a proper noun (e.g., last_nm) NUM Number A number which describes an attribute but is not a quantity or the identifier for the table. For example, line item numbers associated with line item tables. PCT Percent Numeric representation of percent. RT Rate Numeric values used for factors and multipliers. TXT Text Descriptions Free-form textual descriptions. - 2.6. Abbreviations
- 2.6.1 Abbreviations used at I.T.S. will be created according to the rules defined here, with one exception. Enhancements and interfaces to vendor purchased software can use the vendor's naming standards, rather than these standards.
- 2.6.2 Abbreviations may be used, but are not required in naming data elements/columns.
- 2.6.3 When used, the abbreviation or acronym must be in or added to the standard list of acronym & abbreviations. Standard abbreviations and acronyms (widely used, commonly known) can be used even if the word is not normally eligible for abbreviating given the rules below, e.g., months of the year may be abbreviated at the discretion of the designer, as long as they are the standard abbreviations for the months.
- 2.6.4. Abbreviations and Acronyms are defined based on the following principles:
- 2.6.4.1 All abbreviations & acronyms will be unique.
- 2.6.4.2 Abbreviations will be long enough to clearly indicate the unique meaning.
- 2.6.4.3 One abbreviation will be used for all forms of a single word. Example: Admin = Administrator, Administration, Administrative
- 2.6.4.4 Try not to create abbreviations that are themselves English words. For example, do not use CLASS as an abbreviation for CLASSIFICATION.
- 2.6.4.5. It is best to have an abbreviation begin with the same letter as the word being abbreviated. For example, use EXCPT not XCPT for EXCEPTION.
- 2.6.4.6 No word may be abbreviated to less than 3 characters, unless it is a widely used abbreviation.
- 2.6.4.7. No word may be its own abbreviation.
- 2.6.4.8. Words four characters in length should generally not be abbreviated. Existing four character word abbreviations are to be used only when necessary to fit a column name within the length restrictions of a given vendor's database (Oracle, MS SQL, etc.).
- 2.6.5. A column may have none, some or all of its parts abbreviated. The DBA and developer are to determine how and when abbreviations are to be used for a given application database for new data elements. Once named a data element will be named consistently across all applications.
- 2.6.6. If a word needs abbreviating, submit a request to the DBA assigned to your project. Data Warehouse staff will also assign abbreviations for data warehouse tables as needed. All abbreviations will be documented by the person who made them (who, date, reason for chosen abbreviation). New abbreviations will be broadcast to all Database Management, and Data Warehouse staff. Initial suggestions for abbreviations will become standards after 3 business days.
- 2.6.7 In the event that the desired name can not fit within the length restrictions of the database in question, several options exist:
- 2.6.7.1. Choose a different wording.
- 2.6.7.2. Eliminate words that are not critical.
- 2.6.8 If a column is replicated from an existing legacy system, it should be named according to ITS naming standards. A comment should be included in the data model, and if possible in the database with the column stating where the column was originally taken from. This should be as specific as possible, e.g, This column contains data from the ORGANIZATION-469 record, column UNC-ORG-NUM-469.
- 2.6.9 Resources for Abbreviations & Acronyms. A resource for finding acronyms is:
http://www.ucc.ie/cgi-bin/uncgi/acronym
Links for viewing abbreviations in use at other universities:
http://www.itap.purdue.edu/ea/data/standards/abbrev%20A_B.cfm
http://www.oit.umd.edu/cgi-bin/acrolist.cgi
-
2.7. Participation of an element in a record is shown by a suffix. In the case of an IDMS element, the suffix is the IDMS internal record number. For an Oracle table the element will have a table name as a qualifier prefix. Examples:
IDMS ORG-NUM-469
Oracle ORG.ORG_NUM - 2.8 The syntax for an element/column name is as follows: Domain+Descriptor+Role(If applicable)+Suffix(If applicable). The role may be placed before the domain. For example, contact_person_id and person_id_contact are both acceptable. It is not acceptable to split the domain, e.g., person_contact_id.
- 2.9. Exceptions:
- 2.9.1 If a column exists in two different lengths, such as FRS account number, an integer will be appended to the element name following the descriptor to show the length. Example: ACCT-NUM-6 and ACCT-NUM-10. (Note: This method of naming is used sparingly since a change to the column length could translate to hours of work to replace code in programs.)
- 2.9.2 If a domain ends with a descriptor, e.g., ORGANIZATION_NAME, you may omit the descriptor, since it is included in the column name.
3. Security
UNDER CONSTRUCTION
4. Backup and Recovery Issues, Suggestions, and ITS methods
UNDER CONSTRUCTION
5. Recommended Ways to Load Database
UNDER CONSTRUCTION
6. System Test plan
A System Test plan should be scheduled to allow enough time to test the new database and application before it is put in production. This test will allow customers to use the application before it is production critical. This testing should provide the information the database and programming staff need to know that the security pieces are in place for the customers' access to the database.
7. Oracle Database Design
- 7.1 In-house Oracle database designs should be documented using Computer Associates' ERWIN. It has the capability to create the diagrams needed with textual definition for entities, tables and columns. It can produce the database documentation in an HTML format.
- 7.2 Index Standards
- 7.2.1. Indexes for primary keys should be the same as the constraint name for the primary key.
- 7.2.2. Nonprimary key indexes should begin with IX_. The name can be either the column being indexed, or the table name plus a _1, _2, etc. for each index on the table. Do not let Oracle supply a default name.
- 7.2.3. Indexes on columns which are foreign keys should be the same as the relational constraint name.
- 7.2.4. Indexes should go in separate tablespaces from the data tablespace. Generally the odd number tablespaces contain data, and the even numbered tablespaces contain indexes, e.g., GAFTBS001 is for data; GAFTBS002 is for indexes for tables stored in GAFTBS001.
- 7.3. Constraint Standards
- 7.3.1. Database Integrity is to be enforced via use of constraints with few exceptions. Triggers may be used when use of a constraint is not possible.
- 7.3.2. The primary key constraint name should be the concatenation of the literal "PK_" with the name of the table. If that name exceeds the length allowable by the database vendor, you may shorten the name of the table as you see fit.
- 7.3.3. Relational constraint index names should be the same as the relational constraint name.
- 7.3.4. All relational constraints will be named rather than taking the default Oracle name. Relational constraint names are the concatenation of "FK_" plus the short name of the parent table plus an underscore plus the short name of the child table. So each table will have a short name assigned to it; the short name should be no longer than 13 bytes. So the relational constrain name and the foreign key index name will be the same.
- 7.3.5. If there are multiple constraints, appending a role name, or a number to the end of the constraint name should be done. A role name can be used when the same column in the parent table has two relations with the same child table, such as "manages" and "reports to".
- 7.3.6. There is currently no standard for column constraint names, so you may allow the data modeling software, or the database vendor's default names to be used.
- 7.4 Table Standards
- 7.4.1. Each table will have two date columns: one for the row creation date (row_creat_dt) and row modification date (row_mod_dt). The dates will be populated by use of a trigger on the table.
- 7.4.2. The table name does not require any abbreviation, but if an abbreviation is used, it must come from the Data Management Abbreviation list.
- 7.4.3. The Department table is the canonical source of department definitions including rollup structures. The Organization table is derived from that table. No other department or organization tables are authorized for use by ITS written applications.
- 7.5. Column Standards
- 7.5.1. For alphanumeric fields, the VARCHAR2 data type should always be used instead of the CHAR datatype.
- 7.5.2. Columns containing dates and/or date timestamps should be defined as the Oracle date datatype.
- 7.5.3. Certain data elements which are universal across applications will be defined consistently in terms of name, datatype, and size. A listing of these data elements can be obtained from the DBAs in Enterprise Database Administration. A web document is being prepared and will be available soon.


