Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.
 
SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.
 

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

Database quality analysis

15 June 2020

Database quality analysisDatabase quality analysis – To analyze the properties of databases, it is proposed to highlight the quality characteristics of the database management system and the information contained in it. It is recommended to systematize the composition of these characteristics based on the requirements of the international standard ISO 9126.

Data Quality Concepts


Modern databases are one of those objects in the field of informatization, which sometimes require particularly high quality and availability of its evaluation. But what does the quality of databases mean, what requirements should be imposed on their quality, what characteristics can describe the quality, how to evaluate and measure them? For this purpose methods and the standards developed for the analysis of difficult software can be useful.

At the complex analysis of quality of databases it is not always possible to divide accurately requirements and values of quality characteristics for each of these objects. One DBMS can process different data in structure, composition and content, while the same data can be managed by different DBMS.

When analyzing the database quality, it is reasonable to consider two components: a data management program system and a set of data ordered by some rules. Although these components interact closely in the implementation of a particular database, they are initially created independently and can be considered in their life cycle as two objects that are different:

  • the nomenclature and content of quality indicators that define their purpose, function and consumer properties;
  • technology and means of automation of development and provision of the whole object life cycle;
  • categories of specialists, providing creation, operation or application of databases;
  • sets of the operational and technological documentation supporting object life cycle.

Practically all set of characteristics and attributes from standard ISO 9126 “Quality of software”  can be used to some extent within the DBMS requirements. The peculiarities consist in changing accents when choosing and ordering them. In all cases the most important DBMS quality characteristics are the requirements for the functional suitability of the processes of forming and altering the database information content by administrators, as well as data access and presentation of the results to users.

Differences in the requirements for the quality characteristics led to creating a wide range of local, specialized and distributed DBMS. Depending on the field of application, the priority in quality assessment can be given to various structural characteristics: reliability and security of application (financial sphere), convenience of use by low-skilled users (social sphere), resource efficiency (logistics sphere). However, in almost all cases, there is still some role for other constructional quality indicators – for each of them, it is necessary to evaluate its priority for the specific application, measures and scales of necessary and acceptable quality characteristics.

Database systems are dominated by the data themselves, their storage and processing. In order to assess the quality of information, a common methodological approach can be adopted to identify an adequate range of ISO 9126 standardized basic characteristics and sub-properties. However, their content needs to be clarified to apply to the quality of databases. The quality indicators identified should be of practical interest to users and be organized according to priorities for practical application. In addition, each quality indicator selected should be suitable for reliable expert evaluation or measurement as well as for comparison with the required value.

By working out of a database in the technical project and the specification on it the representative set of functional requirements to quality of a database, adequate to its purpose and area of application, and also requirements of the customer and potential users should be formalized. As for software systems, information quality characteristics can be divided into functional and constructive. Their nomenclature, content and subcharacteristics are based on ISO 9126. The basis for the standardized formation of database quality requirements can be laid down and the nomenclature and descriptions of characteristics can be used in the presentation of the content of quality characteristics, where the objects for quality analysis – “programs”, replaced by the term and content – “database information”. However, the nomenclature of quality indicators cannot always be limited to the characteristics of information in the database, but should include a number of specifications reflecting the complex efficiency and functional suitability of its application in real-life conditions.

The functional suitability of database information can be a complex problem for measuring and assessing compliance with the requirements of real values of quality attributes. This is particularly true for large distributed databases in which diverse and complex information about the objects under analysis is circulated. A measure of the quality of functional suitability can be the extent to which the objectives, purpose and functions of the databases are covered by the information available to users. As for software systems, for databases it is reasonable to use a group of subcharacteristics defining functional, structural and operational requirements. At the content level, the functional suitability of many databases reflects:

  • Completeness of accumulated object descriptions – the relative number of objects or documents available in the database, to the total number of objects on the subject or to the number of objects in similar databases;
  • identity – the relative number of object descriptions that do not contain defects and errors to the total number of object documents in databases;
  • relevance – relative number of obsolete object data in databases to the total number of accumulated and processed data.

The variety of database functions

The variety of database functions limits the possibility of standardizing the requirements to them only by general rules of their organization, structuring and documentation, which are described in sufficient detail in. Measures and quality scales of functional suitability are as diverse as the purposes and specificities of database information functions, but design characteristics can be largely standardized.

Practically all standardized quality indicators presented in ISO 9126 can be attributed to the design characteristics of information quality. Requirements for database information should also include features to ensure its reliability, computer resource efficiency, usability, applicability, maintainability and mobility. The content and attributes of these features are slightly different from those used for programs, but their essence is useful. Measures and scales for estimation of constructive characteristics, to a great extent, can be applied the same as for the analysis of quality of software.

Correctness or reliability of data

Correctness or reliability of data is a degree of conformity of data about objects in databases to real objects at a given moment of time, determined by changes of objects themselves, incorrect records about their condition or incorrect calculations of their characteristics. Selection and setting of requirements to data correctness can be estimated by the degree of coverage of accumulated, actual and reliable data on the state and changes of external objects, which they reflect. Some volumetric-temporal characteristics of the data being saved and processed can also be attributed to it:

  • database volume – the relative number of records of descriptions of objects or documents available for storage and processing, in comparison with the full number of real objects in the external environment;
  • efficiency – the degree of conformity of dynamics of change of the given to conditions of real objects;
  • retrospective depth – time interval from the date of release and / or record in the database of the earliest document to the present time;
  • dynamism – relative number of variable object descriptions to the total number of records in the database for a certain period of time determined by the periodicity of issuing the database versions..

Information security is implemented by DBMS facilities in combination with data protection facilities supporting them. The purposes, purpose and functions of protection are closely connected with features of functional suitability of each database. In the distributed databases security indicators are closely connected with integrity characteristics and reflect a degree of identity of the identical data in memory of remote components.

In practice ability to protect the information of databases from negative influences is described usually by structure of the means used for protection against external and internal threats. However there are attempts to measure and describe quality of protection of the information in general, labour input and the time necessary for overcoming by malefactors of protection system. The indirect indicator of its quality the relative share of the computing resources used directly by protection frames of the information can serve.

In practice the main attention is focused on protection against malicious destruction, distortions and theft of database information. The basis of such protection is audit of authorization of access, and also the control of the organisation and efficiency of restrictions of access. In real databases it is possible and not always considered catastrophic consequences and the information anomalies reflected on safety of application at which their sources are casual, unpredictable destabilising factors. Quality of protection can be characterized by value of the prevented damage, possible at display of destabilizing factors and realisation of concrete threats of safety, and also average time between possible displays of the threats overcoming data protection.

Reliability of the database information can be based on application of concepts and methods of the reliability theory which allows to receive a number of clear, well measured integral indicators. A reliable database must first and foremost provide a low probability of loss of performance. Rapidly responding to data loss or corruption and restoring data reliability and performance in less time than the threshold between failure and failure provides high reliability.

Classifying failures and failures by recovery time results in the need to analyze the dynamic characteristics of subscribers who are the sources and/or consumers of data. For any consumer of information, there is a allowable time of absence of data from the database, at which their values, changing by inertia, reach the limit deviation from what should have been calculated. This tolerable deviation of the results after the interruption of the database operation depends mainly on the dynamic characteristics of the information sources and consumers.

Resource utilization (or resource efficiency) in the standards is reflected in CPU, RAM, external and virtual memory, I/O channels, terminals and communication channels. Depending on the specific tasks and features of the database in the choice of quality attributes may be dominated by either the absolute value of the resources of different types, or the relative value of the use of resources of each type in the normal operation of the database. The task of evaluation and effective use of computing resources remains relevant.

Practicality (applicability) is a difficult concept to formalize, but ultimately significantly determines the functional suitability and usefulness of the database for certain users. This group of indicators includes sub-performance, reflecting functional clarity, usability, system efficiency and ease of use of data from different angles. Some subcharacteristics can be assessed by economic indicators, i.e., the labour and time spent by professionals on certain data interface functions. The ISO 9126 standard offers the largest number of attributes for this quality characteristic, detailing the properties of software tools, which can also be useful for evaluating databases by customers in their practical choice, mastering and application. Practicality assessments depend not only on the databases’ own characteristics, but also on the organization and adequacy of documentation of their operation processes.

Clarity depends on the quality of documentation and subjective impressions of potential users. It can be described qualitatively by clarity of the functional concept, breadth of demonstration possibilities, completeness, completeness and clearness of representation in the operational documentation of possible functions and features of realisation of the data. It should be provided with correctness and completeness of the description of the initial and resultant information, and also all details of application of a database for users.

Ease of use – the ability to conveniently and comfortably operate the database and manipulate the data. It matches manageability, resistance to data defects and consistency with user expectations and skills. Some of the attributes of this sub feature can be quantified by measuring the laboriousness and duration of the relevant training processes and the training of qualified users.

Learnability can be measured by the laboriousness and duration of user training. The quality of learning depends on the internal properties and complexity of the information structure of the database, as well as the subjective characteristics of the qualifications of specific users. Studiability can also be characterized by the volume of operational documentation or the volume and quality of electronic textbooks.

Accompantability of the information can be reflected by convenience and efficiency of correction, improvement or adaptation of structure and content of data descriptions depending on changes in the external environment of application, and also in requirements and functional specifications of the customer. Generally speaking, the quality of database maintainability can be assessed by the need for resources for its provision and implementation. Possible expenses of resources for development and improvement of database quality depend not only on internal properties of data, but also on inquiries and needs of users and on readiness of the customer and the developer to satisfy these needs. In terms of the volume of expected changes, as well as newly introduced data, taking into account the complexity and novelty of their development, the costs of their creation can be estimated. Such an analysis can provide guidelines for predicting the total cost of maintenance and for assessing this quality characteristic in specific projects. The set of software system maintainability subcharacteristics presented in ISO 9126 is fully applicable for describing this database quality, mainly with the same organizational and technological subcharacteristics.

The database evaluability depends on the architecture consistency, interfaces uniformity, completeness and correctness of technological and operational documentation. Variability consists in adaptability of data structure and content to implementation of specified changes and to data configuration management. Variability depends not only on internal properties of the database, but also on the organization and instrumental equipment of the processes of support and configuration management, on which the architecture, external and internal data interfaces are oriented. Testability depends on the size of the area of influence of changes, which should be tested when modifying the structure and content of data, on the complexity of tests to check their characteristics. Its attributes depend on the clarity of the rules of structural construction of components and the database as a whole, on the unification of intermodular and external interfaces, on the completeness and correctness of technological documentation. Subcharacteristics of data variability and testability are available for quantitative estimations on the value of labor intensiveness and duration of these functions realization at typical operations with data at application of different methods and means of automation. These economic scales essentially (though not explicitly) can also reflect analyzability and stability, and can be used for integral evaluation of maintainability in general.

The mobility of databases, as well as programs, can be characterized mainly by the duration and labor intensity of their installation, adaptation and substitutability when transferred to other hardware and operating platforms. The information on processes taking place in the external environment can have large volumes and labor intensity of primary accumulation and actualization, which determines the necessity of its careful storage and regulated change. There may be situations when such data is unique and unrecoverable. However, the primary hardware or operating platform where it has been accumulated and processed may require expansion and replacement. Formation and filling of databases with information is a rather complicated and time-consuming process, the technical and economic performance of which depends heavily on the structure, composition, volume, connectivity and other characteristics of the original information.

However, often the possibility of transfer at the primary formation and filling of the database is not provided and manifests itself after a long period of operation. In this case, the complexity, labor intensiveness and duration of transfer increase significantly and require careful planning and organization of work approaching the creation of a new database. At the same time, it must be ensured that its operation on the new platform is maintained or improved.

In order to assess the quality and determine the requirements for database mobility, the task of comparing the achieved effect and costs for methods of transfer or re-design of components and filling the database in specific conditions should be solved, taking into account all these factors and costs. These tasks are greatly simplified with simultaneous reduction of costs when applying the ideology and concept of open computer systems supported by a set of international standards POSIX, as well as modern versions of OS and DBMS as de facto standards.

Formalization of quality characteristics of the information of databases, on the basis of the standards developed for an estimation of software, opens a way for application of the methods of systematization, definition and increase of their quality tested on complexes of programs. The use of standardized quality characteristics of database information makes it possible to streamline the choice of requirements for them and the assessment of the achieved quality. This should contribute to improving the quality of databases in general, taking into account their software and information components, the ability to reliably determine their real characteristics in the development, testing and certification.

On reliability in terms of ISO 9126

The ISO 9126 standard recommends analyzing and taking into account the reliability of program complexes with four subcharacteristics.

Completeness – the ability of a database not to fall into a failure state due to data loss, distortion, errors and defects. This sub-requirement is affected by loss of performance, which may be due to incomplete test coverage of components and the system as a whole, as well as insufficient completeness and distortion protection.
Defect and Error Resistance – A property of a database that automatically maintains a specified level of data quality in the event of defects and errors or a failure of the installed interface with the external environment.

For this purpose, temporary and information redundancy should be entered into the database, which implements the rapid detection of information defects and errors, their identification and automatic restoration of normal operation. The relative share of computing resources used directly for fast elimination of consequences of failures and operative restoration of data is reflected in increase of reliability and defines value of stability.

Restoreability – a property of the database in case of failure to restore the required level of information quality and to correct damaged data. After a failure, the database is inoperable for a period of time, the duration of which is determined by the recoverability of its data. This requires computing resources and time to identify the inoperable state, to diagnose the causes of failure, as well as to implement recovery processes. The main indicators of the data recovery process are its duration and probabilistic nature. The recoverability is also characterized by the full recovery of normal content.

Availability (or readiness) – the property of database to be able to fully perform the required function at a given moment of time and under specified conditions of its use. Externally, availability can be estimated by the relative time during which the database is functional, in proportion to the total time of its application. Generalization of failure and recovery characteristics is done through the availability factor, which reflects the probability of working with normal data at any point in time. Lower limits of reliability attribute scales can be reflected by values at which the functional suitability of the database is sharply reduced, and the use of a particular database becomes inconvenient and dangerous.

The efficiency of using computer resources in real operation is reflected in the time characteristics of interaction between end users and database administrators. These characteristics depend on the DBMS quality as well as on the volume, structure and quality indicators of the information used. For databases, the most important resource is the computer memory occupied by the information as well as its usability. These quality parameters affect the system response time to different types of user queries and database bandwidth.

ISO 9126 highlights two sub-parameters of quality, which are recommended to describe mainly quantitative attributes reflecting the dynamics of the database components. The temporal efficiency is determined by the duration of the assigned functions and the expectation of results in medium and/or worst cases, taking into account the priorities of the tasks. It depends on the volume, structure and speed of data processing that directly affect the time interval of the completion of a particular computer process, and on the bandwidth, ie, the number of tasks that can be implemented on a given computer in a given time interval. It also depends on the functional content of data and their constructive implementation, thus can be considered as one of the internal quality indicators.

Features and labor intensity of the transfer depend, first of all, on the compatibility characteristics of architectures and the content of the information transferred between platforms.

  • Format compatibility is characterized by the degree to which the data meet the requirements of standards for data presentation formats for documentary, factographic, vocabulary and other databases.
  • Linguistic compatibility is determined by the extent to which the databases in question use common linguistic tools (classifiers, rubrics, dictionaries) formalized by the corresponding standards of these platforms.
  • Physical compatibility consists in the degree to which the encoding of information meets the same standards for machine-readable media.
 
Tags: , ,

MORE NEWS

 

Preamble​​NoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

Preamble​​MongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

Preamble​​MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Preamble​​Atom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Preamble​​MongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

FlexibilityOne of the most advertised features of MongoDB is its flexibility.  Flexibility, however, is a double-edged sword. More flexibility means more...

Preamble​​SQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Preamble​​Writing an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Preamble​​Oracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

Preamble​​IBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

  What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

Preamble​​If you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Preamble​​Starting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...