In this post, we discuss what are the things we need to keep in mind while dealing with datatypes in SQL Server when it comes to performance and best practice.
Below points are applicable not only for SQL Server but also for other RDBMS SQL also like Oracle, Sybase, Greenplum..etc.
- Always specify the narrowest columns you can. The narrower the column, the less amount of data SQL Server has to store, and the faster SQL Server can read and write data. In addition, if any sorts need to be performed on the column, the narrower the column, the faster the sort will be.
- If you need to store large strings of data and less than 8,000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have the extra overhead that drags down performance.
- Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
- If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable-length columns can greatly reduce I/O reads cache memory used to hold data, improving overall SQL Server performance.
- Another advantage of using VARCHAR over CHAR columns is that sorts performed on VARCHAR columns are generally faster than on CHAR columns.
- This is because the entire width of a CHAR column needs to be sorted.
- If a column’s data does not vary widely in length, consider using a fixed-length CHAR field instead of a VARCHAR. While it may take up a little more space to store the data, processing fixed-length columns are faster in SQL Server than processing variable-length columns.
- Always choose the smallest data type you need to hold the data you need to store in a column. For example, if all you are going to be storing in a column are the numbers 1 through 10, then the TINYINT data type is more appropriate than the INT data type. The same goes for CHAR and VARCHAR data types. Don’t specify more characters in the character columns that you need. This allows you to store more rows in your data and index pages, reducing the amount of I/O needed to read them. It also reduces the amount of data moved from the server to the client, reducing network traffic and latency. And it reduces the amount of wasted space in your buffer cache.
- If you have a column that is designed to hold only numbers, use a numeric data type, such as INTEGER, instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value as does a character data type. This helps to reduce the size of the columns, and can boost performance when the columns are searched (WHERE clause), joined to another column, or sorted.
- Avoid using FLOAT or REAL data types for primary keys, as they add unnecessary overhead that hurts performance. Use one of the integer data types instead.
- When specifying data types during table creation, always specify NULL or NOT NULL for each column. If you don’t, then the column will default to NOT NULL if the ANSI NULL DEFAULT database option is not selected (the default), and will default to NULL if the ANSI NULL DEFAULT database option is selected.
- For best performance, and to reduce potential code bugs, columns should ideally be set to NOT NULL. For example, the use of the IS NULL keywords in the WHERE clause makes that portion of the query non-chargeable, which means that portion of the query cannot make good use of an index.
- If you are using fixed-length columns (CHAR, NCHAR) in your table, consider avoiding storing NULLs in them. If you do, the entire amount of space dedicated to the column will be used up. For example, if you have a fixed-length column of 255 characters and place a NULL in it, then 255 characters have to be stored in the database. This is a large waste of space that will cause SQL Server to have to perform extra disk I/O to read data pages. It also wastes space in the data cache buffer. Both of these contribute to reduced SQL Server performance.
- If you use the CONVERT function to convert a value to a variable-length datatype, such as VARCHAR, always specify the length of the variable datatype. If you do not, SQL Server assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the required task. This helps to reduce memory use and SQL Server resources.
- Generally, using computed columns in a table is not recommended because it does not follow the standard rules of normalization. But, it is sometimes more efficient overall to use computed columns in a table rather than re-computing the same data repeatedly in queries. This is especially true if you are running the same query over and over against your data that performs the same calculations over and over. Performing the calculations in the table can reduce the amount of work performed by a query each time it is run. You have to determine for yourself where the performance bottleneck is, and act accordingly. If the bottleneck is in INSERTS and UPDATES, then using calculated columns may not be a good idea. But if your SELECT statements are the bottleneck, then using calculated columns may pay off.
- Avoid using the bigint data type unless you really need its additional storage capacity. The bigint data type uses 8 bytes of memory versus 4 bytes for the int data type.
- Avoid using the SQL Server sql_variant datatype. Besides being a performance hog, it significantly affects what you can do with the data stored as a sql_variant.
- For example, sql_variant columns cannot be a part of primary or foreign keys, can be used in indexes and unique keys if they are shorter than 900 bytes, cannot have an identity property, cannot be part of a computed column, must convert the data to another datatype when moving data to objects with other datatypes, are automatically converted to nvarchar(4000) when accessed by client applications using the SQL Server 7.0 OLE DB or ODBC providers, are not supported by the LIKE predicate in the WHERE clause, cannot be concatenated, and don’t work with some functions.
- Avoid using date data types as a primary key. From a performance perspective, it is more efficient to use a data type that uses less space. For example, the DATETIME datatype uses 8 bytes of space, while the INT datatype only takes up 4 bytes. The less space used, the smaller the table and index, and the less I/O overhead that is required to access the primary key.
- If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data faster than character data.