Hey guys!
In this article I will demonstrate to you when you should use ORDER BY and when you shouldn't use it at all, because it has no effect in practice and just makes our query take longer and consume more resources.
The main purpose of this article is to break the myth that data is physically ordered in the table when you do INSERT… FROM SELECT and ORDER BY, causing many programmers to insist on using ORDER BY in INSERT operations, a scenario that I encounter a lot among consulting clients and is much more common than I would like.
To start, I'm going to do a very simple test with this, creating a new table and inserting the data into the table from a query in sys.objects:
CREATE TABLE dbo.Post_OrderBy (
Id INT IDENTITY(1,1) NOT NULL,
Nome VARCHAR(900) NOT NULL
)
INSERT INTO dbo.Post_OrderBy
SELECT [name]
FROM sys.objects
ORDER BY name
SELECT * FROM dbo.Post_OrderBy
And when we analyze the results, we see that even using ORDER BY in INSERT, the records are not being ordered:

Well, the myth has been debunked. This theory that it is worth ordering data in INSERT so that you do not need to use ORDER BY in SELECT has fallen apart. In other words, in this type of scenario, you should NEVER use ORDER BY to insert data into a table, unless you have a TOP clause in the SELECT.
Now I will explain why this happens. As I already explained in my article Understanding how indexes work in SQL Server, the index is responsible for ordering these records, since indexes generally use algorithms such as QuickSort, which require the data to be ordered to obtain maximum performance in Seek operations.
To demonstrate how the index orders the data, I will create a clustered and a nonclustered index on the table and then query the data:
CREATE CLUSTERED INDEX SK01_Post_OrderBy ON dbo.Post_OrderBy(Id) WITH(FILLFACTOR=100)
CREATE NONCLUSTERED INDEX SK02_Post_OrderBy ON dbo.Post_OrderBy(Nome) WITH(FILLFACTOR=100)
Did you see how the data was ordered? Pretty cool, right 🙂
So can we trust that the indexes will keep the data in my table always ordered and I no longer need to use ORDER BY even in queries? NO!! Indexes are subject to data fragmentation as they are inserted/deleted/updated, losing order on the index pages.
In the example of the index I created above, I specified the FILLFACTOR of the index as 100%, that is, the index will sort all the data during creation and will store it in an orderly manner, trying to occupy all pages of the index (each page contains 8 KB, between data and headers), without leaving any vacant space for any new data that is inserted/updated in the table. In other words, as SQL Server does not have free space in the index, if I insert a new record in this table, that record will not be inserted in order but at the end of the last page (or a new page will be created for it, if they are all already full).
After creating my index, where it ordered the records, the fragmentation of my indexes should be very close to zero:
SELECT
OBJECT_NAME(B.object_id) AS TableName,
B.name AS IndexName,
A.index_type_desc AS IndexType,
A.avg_fragmentation_in_percent
FROM
sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') A
INNER JOIN sys.indexes B WITH(NOLOCK) ON B.object_id = A.object_id AND B.index_id = A.index_id
WHERE
OBJECT_NAME(B.object_id) = 'Post_OrderBy'
ORDER BY
A.avg_fragmentation_in_percent DESC
And now let's force data fragmentation using the T-SQL command below:
-- Força a fragmentação do índice
DECLARE @Contador INT = 1, @Total INT = (SELECT COUNT(*) FROM dbo.Post_OrderBy)
WHILE(@Contador <= @Total)
BEGIN
UPDATE dbo.Post_OrderBy
SET Nome = REPLICATE(CHAR(65 + (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 57), (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 10)
WHERE Id = @Contador
INSERT INTO dbo.Post_OrderBy
SELECT [name]
FROM sys.objects
WHERE SUBSTRING([name], 1, 1) NOT IN ('D', 'H', 'M', 'T', 'S')
SET @Contador += 1
END
Analyzing the index fragmentation level:

Did my data continue to be ordered with this level of fragmentation?

Vixi.. Now everything is messed up.. To correct this fragmentation problem caused by the script I used above, we can use the REORGANIZE (Recommendation: 5 to 30% fragmentation) and REBUILD (Recommendation: Above 30% fragmentation) operations to reorder the data in the index:
ALTER INDEX ALL ON dbo.Post_OrderBy REBUILD
Or REBUILD/REORGANIZE can also be done on individual indexes:
ALTER INDEX SK01_Post_OrderBy ON dbo.Post_OrderBy REBUILD
ALTER INDEX SK02_Post_OrderBy ON dbo.Post_OrderBy REBUILD
As a result, our fragmentation decreased:

And the data is sorted again. But can we rely on index ordering alone to return sorted data? The answer is: NO! THE UNIQUE A reliable way to return data in an ordered way is through ORDER BY.
But here comes the key question: Do you really need to sort this data in the database? In many scenarios (not to say most), the answer is NO. The data can be perfectly queried in the database without sorting, returned to the application and sorted there, in the user interface. The problem with this approach is that it “gives more work” to the Developer, and therefore, we rarely see this scenario happen on a daily basis, which is a shame, as it is the ideal scenario, as it doesn’t change anything for the end user and the bank is not burdened with several sorting operations in “heavy” queries.
Are there scenarios in which ORDER BY in the database query is justifiable? Yes definitely! Especially in scenarios where there is the TOP operator to return the TOP N largest/smallest records according to some sorting criteria. In this case, ORDER BY is recommended, as it is better to sort a table of 10 million records in the database and return the 10 largest, than to return all 10 million records and filter/sort this in the application.
Well guys, I hope you enjoyed this article and was able to clarify some doubts about indexes, fragmentation, fillfactor (I will create a more detailed post about this later) and when ORDER BY should be used and when it should NEVER be used.
If you are a Developer, I strongly urge you to consider removing ORDER BY from queries and start ordering these records in the application layer instead of the database. The DBA and the database thank you 🙂
Do you like studying Performance Tuning? So don't waste time and start reading my articles in the Series Performance Tuning and I hope you finish these readings with another view of your database.
A big hug to you and see you in the next post!


Comentários (0)
Carregando comentários…