Hey guys! How are you? Excited for another post???

Introduction

Today I want to chat with you about something that can be a real “gotcha” for those who are migrating from SQL Server On-premises (or even Azure SQL Database) to Azure SQL Managed Instance (MI) and that I learned in practice.

You know that classic DBA routine of looking at a log file that has grown too big after heavy maintenance and thinking: “I'm going to run a Shrink here to free up this unused space”? Well, in Managed Instance, specifically in the General Purpose – Standard tier, this “good practice” can become your worst performance nightmare.

The Scenario: The Shrink Habit

In the On-premises world, keeping data and log files at the “just” size is almost a mantra to save storage and maintain organization.

In my case, I was working in a very large environment, I saw a log file that had 900 GB of space allocated and less than 1 GB used. Especially when it comes to log files, it's natural for any SQL Server DBA to immediately think about running a shrink to recover this allocated space, right? This is where the problem lies when the database in question is in an Azure SQL Managed Instance in the General Purpose tier.

Observation: In the case of a data file, when you perform a Shrink operation, the fragmentation of the indexes and data can increase greatly, so it is generally recommended to rebuild the indexes after the Shrink.

In Azure SQL Managed Instance, if you are using the newer tiers such as Next-gen General Purpose or Business Critical, you have greater flexibility and the problem that I will detail in this post does not happen. But the vast majority of environments still run on General Purpose (Standard), and this is where the bug catches on.

Unlike a local disk where IOPS is usually fixed by the type of disk, in Managed Instance General Purpose, I/O performance (IOPS and Throughput) is directly proportional to the size you allocate to the file.

This means that Azure reserves performance for you based on the size of the file, whether it is the data file (mdf) or log file (ldf). If you have a 1.1 TB log file, you have a 7500 IOPS limit. If you run a shrink and this file drops to 100 GB, you have just reduced the disk's IOPS to 500, reducing the performance of your disk (and the bank) by 15x, often without realizing it.

According to the official Microsoft documentation, scaling works in bands.

File size and IOPS ranges:

From 1,026 GiB onwards, IOPS and Throughput stagnate at 7,500 and 250 MiB/s. This means that, in large files, you stop gaining proportional performance. But the real danger lies in the first ranges, where a Shrink can drop your IOPS from 7,500 to 500 in the blink of an eye!

The Real Problem: The “Domino Effect”

Imagine that you ran shrink on the log at 9pm in a maintenance routine. The file went from 600GB to 50GB. At 8am, several jobs are taking much longer to execute, the application is running slowly and everyone is complaining to you. Since the log now only has 500 IOPS (instead of 5,000), each INSERT, UPDATE or DELETE takes 10x longer to commit the commit to disk.

The result? Wait types of LOG_RATE_GOVERNOR or WRITELOG at the top, slow application, users complaining and you look at the CPU and see that everything is low. Then you think: “But I didn’t change anything, I just cleaned the log!” lol.

And now, what to do?
The recommendation here changes the mindset of the traditional DBA a little:

  • Avoid Shrink: If your log file has grown to 200 GB and you know that it periodically needs that space for maintenance or loads, leave it there. The cost of extra storage is much lower than the cost of downtime or application slowness.
  • Monitor the IOPS ranges: You can even run Shrink, but be careful not to change the IOPS performance ranges and whether this reduction in disk speed will have an impact. Before reducing a file, check the documentation to make sure you won't fall into a lower “performance range”.
  • Evaluate Next-gen GP: If disk performance is a constant bottleneck and you need flexibility, consider moving to the Next-gen General Purpose tier, where IOPS is isolated from file size (you can pay for as much IOPS as you want).

To help you identify the speed range of each file in your environment, I prepared a simple script to help you on a daily basis:

SELECT 
    DB_NAME(database_id) AS [Database Name],
    [name] AS [Logical Name],
    [type_desc] AS [File Type],
    CAST(size * 8.0 / 1024 / 1024 AS DECIMAL(10, 2)) AS [Size (GiB)],
    
    -- Lógica baseada nos tiers oficiais de IOPS
    CASE 
        WHEN (size * 8.0 / 1024 / 1024) <= 129 THEN 500
        WHEN (size * 8.0 / 1024 / 1024) <= 513 THEN 2300
        WHEN (size * 8.0 / 1024 / 1024) <= 1025 THEN 5000
        ELSE 7500
    END AS [Official IOPS],

    -- Lógica baseada nos tiers oficiais de Throughput (MiB/s)
    CASE 
        WHEN (size * 8.0 / 1024 / 1024) <= 129 THEN 100
        WHEN (size * 8.0 / 1024 / 1024) <= 513 THEN 150
        WHEN (size * 8.0 / 1024 / 1024) <= 1025 THEN 200
        ELSE 250
    END AS [Official Throughput (MiB/s)],
    
    physical_name AS [Physical Path]
FROM 
    sys.master_files
WHERE 
    database_id > 4 
ORDER BY 
    [Size (GiB)] DESC;

Conclusion

If you're feeling like your WRITELOG is high, take a look at your log file size. If it is 100 GB, you are limited to 500 IOPS. If you increase this file to 514 GB (even if you don't use all that space), Azure will free you 5,000 IOPS instantly!

It's worth remembering that this applies to the size of the data file too, it's not just for the log file.

Sometimes “wasting” a little storage is the best investment you can make in the performance of your database in Azure.

I hope you enjoyed this update, a big hug and see you next time!