SQL Server and Azure SQL: How to Delete or Update Data in Large Tables
Post Views 1,775 views
Reading time 4minutes
Introduction
Hey guys!
In this article I would like to share with you a small code that I needed to use today to UPDATE a relatively large table (55M+ records) in an Azure SQL Database base and, after 1h and 30 mins waiting, there was a connection error and I had to do it all over again.
Not only in this case, but it can happen to overflow the log and give an error in the operation too, and, as we know, when an error occurs during UPDATE or DELETE, the automatic rollback is started and no row is actually changed. Breaking this single operation into smaller, segmented operations will allow the log to be flushed and thus minimize possible log overflows.
You may also want to break these large, time-consuming operations into smaller chunks to be able to track the progress, or your maintenance window is not long enough to process the UPDATE/DELETE on all the necessary lines and you want to continue in another window.
I've also seen cases of tables with triggers and that can end up generating a very large overhead when executing an UPDATE/DELETE that changes many rows. And we also have to remember locks on the table, which, when broken into smaller parts, can be released quickly while the next batch is starting processing.
There are several reasons that can influence the decision to participate in a large UPDATE/DELETE and in this article I will show some easy ways to do this.
In the example below, I'm using an integer type autoincrement column to assemble the ranges of values that I'm going to update. The @Increase variable defines the number of lines of each batch that will be updated and I defined in this example that the number of lines updated at a time will be 1 million lines.
SET@Msg=CONCAT('Processando dados no intervalo ',@LimiteInferior,'-',@LimiteSuperior,'...')
RAISERROR(@Msg,1,1)WITHNOWAIT
END
Result:
UPDATE partitioned by date field
In the example below, I'm using a date column to assemble the ranges of values that I'm going to update. The @Aumento variable defines the number of days for each batch that will be updated and I defined in this example that this number of days that will be updated for each block will be 30 days.
SET@Msg=CONCAT('Processando dados no intervalo ',CONVERT(VARCHAR(10),@LimiteInferior,103),'-',CONVERT(VARCHAR(10),@LimiteSuperior,103),'...')
RAISERROR(@Msg,1,1)WITHNOWAIT
END
Result:
DELETE TOP(N) partitioned using percentage
In the example below, I am deleting 10% of my table every iteration. As the data is erased, I don't need to control intervals, I just erase every 10%.
DELETE TOP(N) partitioned using percentage
Transact-SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
DECLARE
@MsgVARCHAR(MAX),
@Qt_LinhasINT
WHILE(1=1)
BEGIN
DELETETOP(10)PERCENT
FROMdbo.Tabela
WHERE[Status]=6
SET@Qt_Linhas=@@ROWCOUNT
IF(@Qt_Linhas=0)
BREAK
SET@Msg=CONCAT('Quantidade de Linhas Apagadas: ',@Qt_Linhas)
RAISERROR(@Msg,1,1)WITHNOWAIT
END
Result:
Observation: This solution can have problems with very large tables, as 10% can represent a very large volume of rows. And when few records are left, the 10% can take many iterations to clear.
DELETE TOP(N) partitioned using number of rows
In the example below, I am deleting 500K rows from my table with each iteration. As the data is erased, I don't need to control intervals, I just erase every 500,000 lines until there are no more lines left that meet the filter criteria.
DELETE TOP(N) partitioned using number of rows
Transact-SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
DECLARE
@MsgVARCHAR(MAX),
@Qt_LinhasINT
WHILE(1=1)
BEGIN
DELETETOP(100000)
FROMdbo.Tabela
WHERE[Status]=6
SET@Qt_Linhas=@@ROWCOUNT
IF(@Qt_Linhas=0)
BREAK
SET@Msg=CONCAT('Quantidade de Linhas Apagadas: ',@Qt_Linhas)
RAISERROR(@Msg,1,1)WITHNOWAIT
END
Result:
And that's it, folks!
I hope you liked it and a big hug!
Boas dicas. Mas no caso do delete eu sempre uso o TRUNCATE TABLE. Já que vai deletar tudo mesmo! Inclsuive a perfomance é bem maior, utiliza menos recurso de log, não aciona trigger e ainda limpa todas as páginas da tabela.
Oi Colli, boa ideia, mas a minha ideia no post é pra expurgo de dados, onde não seriam todos os dados apagados, aà no truncate não iria funcionar.
Boas dicas. Mas no caso do delete para não perder tempo fazendo count, eu gosto de no while passar 1=2, e após o delete passar
if @@rowcount = 0 break
Os primeiros counts podem demorar muito caso a tabela seja muito grande, ou se tiver uma cláusula where para apagar apenas parte da tabela.
Boa ideia! Realmente fica mais rápido mesmo.