SQL Server e Azure SQL: Como apagar ou atualizar dados em tabelas grandes

Post Views 1,932 views

Reading time 4 minutes

Introduction

Hey guys!
In this article I would like to share with you a small code that I needed to use today to UPDATE a relatively large table (55M+ records) in an Azure SQL Database base and, after 1h and 30 mins waiting, there was a connection error and I had to do it all over again.

Not only in this case, but it can happen to overflow the log and give an error in the operation too, and, as we know, when an error occurs during UPDATE or DELETE, the automatic rollback is started and no row is actually changed. Breaking this single operation into smaller, segmented operations will allow the log to be flushed and thus minimize possible log overflows.

You may also want to break these large, time-consuming operations into smaller chunks to be able to track the progress, or your maintenance window is not long enough to process the UPDATE/DELETE on all the necessary lines and you want to continue in another window.

I've also seen cases of tables with triggers and that can end up generating a very large overhead when executing an UPDATE/DELETE that changes many rows. And we also have to remember locks on the table, which, when broken into smaller parts, can be released quickly while the next batch is starting processing.

There are several reasons that can influence the decision to participate in a large UPDATE/DELETE and in this article I will show some easy ways to do this.

Important: In case you just want to delete or change the first or last records of a table, you can use the solution I shared in the article SQL Server - How to perform UPDATE and DELETE with TOP x records.

How to delete or update data in large tables

UPDATE partitioned by integer field

In the example below, I'm using an integer type autoincrement column to assemble the ranges of values that I'm going to update. The @Increase variable defines the number of lines of each batch that will be updated and I defined in this example that the number of lines updated at a time will be 1 million lines.

UPDATE partitioned by integer field

DECLARE
    @Min INT,
    @Max INT,
    @Contador INT = 0,
    @Aumento INT = 1000000,
    @LimiteInferior INT = 0,
    @LimiteSuperior INT = 0,
    @Msg VARCHAR(MAX)
    

SELECT 
    @Min = MIN(Id_Registro),
    @Max = MAX(Id_Registro)
FROM
    dbo.Tabela


WHILE(@LimiteSuperior < @Max)
BEGIN
    

    SET @LimiteInferior = @Min + (@Contador * @Aumento)
    SET @LimiteSuperior = @LimiteInferior + @Aumento

    
    UPDATE
        A
    SET
        Dt_Registro = (CASE
            WHEN ISNUMERIC(A.Data_Registro_String) = 1 THEN CONVERT(DATE, DATEADD(DAY, TRY_CONVERT(INT, A.Data_Registro_String), '1900-01-01'))
            WHEN LEN(A.Data_Registro_String) = 10 AND TRY_CONVERT(DATE, A.Data_Registro_String, 103) IS NOT NULL THEN CONVERT(DATE, A.Data_Registro_String, 103)
            ELSE CONVERT(DATE, A.Data_Registro_String)
        END)
    FROM
        dbo.Tabela A
    WHERE
        Id_Registro >= @LimiteInferior
        AND Id_Registro < @LimiteSuperior


    SET @Contador += 1


    SET @Msg = CONCAT('Processando dados no intervalo ', @LimiteInferior, '-', @LimiteSuperior, '...')
    RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

DECLARE

@Min INT,

@Max INT,

@Contador INT = 0,

@Aumento INT = 1000000,

@LimiteInferior INT = 0,

@LimiteSuperior INT = 0,

@Msg VARCHAR(MAX)

SELECT

@Min = MIN(Id_Registro),

@Max = MAX(Id_Registro)

FROM

dbo.Tabela

WHILE(@LimiteSuperior < @Max)

BEGIN

SET @LimiteInferior = @Min + (@Contador * @Aumento)

SET @LimiteSuperior = @LimiteInferior + @Aumento

UPDATE

SET

Dt_Registro = (CASE

WHEN ISNUMERIC(A.Data_Registro_String) = 1 THEN CONVERT(DATE, DATEADD(DAY, TRY_CONVERT(INT, A.Data_Registro_String), '1900-01-01'))

WHEN LEN(A.Data_Registro_String) = 10 AND TRY_CONVERT(DATE, A.Data_Registro_String, 103) IS NOT NULL THEN CONVERT(DATE, A.Data_Registro_String, 103)

ELSE CONVERT(DATE, A.Data_Registro_String)

END)

FROM

dbo.Tabela A

WHERE

Id_Registro >= @LimiteInferior

AND Id_Registro < @LimiteSuperior

SET @Contador += 1

SET @Msg = CONCAT('Processando dados no intervalo ', @LimiteInferior, '-', @LimiteSuperior, '...')

RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

Result:

UPDATE partitioned by date field

In the example below, I'm using a date column to assemble the ranges of values that I'm going to update. The @Aumento variable defines the number of days for each batch that will be updated and I defined in this example that this number of days that will be updated for each block will be 30 days.

UPDATE partitioned by date field

DECLARE
    @Min DATE,
    @Max DATE,
    @Contador INT = 0,
    @Aumento INT = 30,
    @LimiteInferior DATE = '1900-01-01',
    @LimiteSuperior DATE = '1900-01-01',
    @Msg VARCHAR(MAX)
    

SELECT 
    @Min = MIN(Dt_Cadastro),
    @Max = MAX(Dt_Cadastro)
FROM
    dbo.Tabela


WHILE(@LimiteSuperior < @Max)
BEGIN
    

    SET @LimiteInferior = DATEADD(DAY, (@Contador * @Aumento), @Min)
    SET @LimiteSuperior = DATEADD(DAY, @Aumento, @LimiteInferior)

    
    UPDATE
        A
    SET
        Dt_Registro = (CASE
            WHEN ISNUMERIC(A.Data_Registro_String) = 1 THEN CONVERT(DATE, DATEADD(DAY, TRY_CONVERT(INT, A.Data_Registro_String), '1900-01-01'))
            WHEN LEN(A.Data_Registro_String) = 10 AND TRY_CONVERT(DATE, A.Data_Registro_String, 103) IS NOT NULL THEN CONVERT(DATE, A.Data_Registro_String, 103)
            ELSE CONVERT(DATE, A.Data_Registro_String)
        END)
    FROM
        dbo.Tabela A
    WHERE
        Dt_Cadastro >= @LimiteInferior
        AND Dt_Cadastro < @LimiteSuperior


    SET @Contador += 1


    SET @Msg = CONCAT('Processando dados no intervalo ', CONVERT(VARCHAR(10), @LimiteInferior, 103), '-', CONVERT(VARCHAR(10), @LimiteSuperior, 103), '...')
    RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

DECLARE

@Min DATE,

@Max DATE,

@Contador INT = 0,

@Aumento INT = 30,

@LimiteInferior DATE = '1900-01-01',

@LimiteSuperior DATE = '1900-01-01',

@Msg VARCHAR(MAX)

SELECT

@Min = MIN(Dt_Cadastro),

@Max = MAX(Dt_Cadastro)

FROM

dbo.Tabela

WHILE(@LimiteSuperior < @Max)

BEGIN

SET @LimiteInferior = DATEADD(DAY, (@Contador * @Aumento), @Min)

SET @LimiteSuperior = DATEADD(DAY, @Aumento, @LimiteInferior)

UPDATE

SET

Dt_Registro = (CASE

WHEN ISNUMERIC(A.Data_Registro_String) = 1 THEN CONVERT(DATE, DATEADD(DAY, TRY_CONVERT(INT, A.Data_Registro_String), '1900-01-01'))

WHEN LEN(A.Data_Registro_String) = 10 AND TRY_CONVERT(DATE, A.Data_Registro_String, 103) IS NOT NULL THEN CONVERT(DATE, A.Data_Registro_String, 103)

ELSE CONVERT(DATE, A.Data_Registro_String)

END)

FROM

dbo.Tabela A

WHERE

Dt_Cadastro >= @LimiteInferior

AND Dt_Cadastro < @LimiteSuperior

SET @Contador += 1

SET @Msg = CONCAT('Processando dados no intervalo ', CONVERT(VARCHAR(10), @LimiteInferior, 103), '-', CONVERT(VARCHAR(10), @LimiteSuperior, 103), '...')

RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

Result:

DELETE TOP(N) partitioned using percentage

In the example below, I am deleting 10% of my table every iteration. As the data is erased, I don't need to control intervals, I just erase every 10%.

DELETE TOP(N) partitioned using percentage

DECLARE 
    @Msg VARCHAR(MAX),
    @Qt_Linhas INT

WHILE (1=1)
BEGIN
    
    DELETE TOP(10) PERCENT
    FROM dbo.Tabela
    WHERE [Status] = 6

    SET @Qt_Linhas = @@ROWCOUNT

    IF (@Qt_Linhas = 0)
        BREAK

    SET @Msg = CONCAT('Quantidade de Linhas Apagadas: ', @Qt_Linhas)
    RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

DECLARE

@Msg VARCHAR(MAX),

@Qt_Linhas INT

WHILE (1=1)

BEGIN

DELETE TOP(10) PERCENT

FROM dbo.Tabela

WHERE [Status] = 6

SET @Qt_Linhas = @@ROWCOUNT

IF (@Qt_Linhas = 0)

BREAK

SET @Msg = CONCAT('Quantidade de Linhas Apagadas: ', @Qt_Linhas)

RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

Result:

Observation: This solution can have problems with very large tables, as 10% can represent a very large volume of rows. And when few records are left, the 10% can take many iterations to clear.

DELETE TOP(N) partitioned using number of rows

In the example below, I am deleting 500K rows from my table with each iteration. As the data is erased, I don't need to control intervals, I just erase every 500,000 lines until there are no more lines left that meet the filter criteria.

DELETE TOP(N) partitioned using number of rows

DECLARE 
    @Msg VARCHAR(MAX),
    @Qt_Linhas INT

WHILE (1=1)
BEGIN
    
    DELETE TOP(100000)
    FROM dbo.Tabela
    WHERE [Status] = 6

    SET @Qt_Linhas = @@ROWCOUNT

    IF (@Qt_Linhas = 0)
        BREAK

    SET @Msg = CONCAT('Quantidade de Linhas Apagadas: ', @Qt_Linhas)
    RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

DECLARE

@Msg VARCHAR(MAX),

@Qt_Linhas INT

WHILE (1=1)

BEGIN

DELETE TOP(100000)

FROM dbo.Tabela

WHERE [Status] = 6

SET @Qt_Linhas = @@ROWCOUNT

IF (@Qt_Linhas = 0)

BREAK

SET @Msg = CONCAT('Quantidade de Linhas Apagadas: ', @Qt_Linhas)

RAISERROR(@Msg, 1, 1) WITH NOWAIT

END

Result:

And that's it, folks!
I hope you liked it and a big hug!

Colli disse:

November 9, 2022 às 12:59

Boas dicas. Mas no caso do delete eu sempre uso o TRUNCATE TABLE. Já que vai deletar tudo mesmo! Inclsuive a perfomance é bem maior, utiliza menos recurso de log, não aciona trigger e ainda limpa todas as páginas da tabela.

Responder
- Dirceu Resende disse:
  
  November 9, 2022 às 13:07
  
  Oi Colli, boa ideia, mas a minha ideia no post é pra expurgo de dados, onde não seriam todos os dados apagados, aí no truncate não iria funcionar.
  
  Responder
Charles disse:

November 9, 2022 às 06:52

Boas dicas. Mas no caso do delete para não perder tempo fazendo count, eu gosto de no while passar 1=2, e após o delete passar
if @@rowcount = 0 break
Os primeiros counts podem demorar muito caso a tabela seja muito grande, ou se tiver uma cláusula where para apagar apenas parte da tabela.

Responder
- Dirceu Resende disse:
  
  November 9, 2022 às 13:07
  
  Boa ideia! Realmente fica mais rápido mesmo.
  
  Responder

Subscribe to blog by email

Blog Views

Microsoft MVP

Post Archive

Categories

Recent posts

SQL Server and Azure SQL: How to Delete or Update Data in Large Tables

Introduction

How to delete or update data in large tables

You may also like...

4 Responses

Deixe uma respostaCancelar resposta

Introduction

How to delete or update data in large tables

You may also like...

SQL Server – Utilizando a STRING_SPLIT para transformar strings de uma linha em colunas

SQL Server – Como listar, ler, escrever, copiar, excluir e mover arquivos com o CLR (C#)

SQL Server – Usuário só conecta na instância com permissão sysadmin – Login failed for user ‘teste’. Reason: Login-based server access validation failed with an infrastructure error

4 Responses

Deixe uma respostaCancelar resposta