Hey guys! How are you? Excited for another post???
Recently, I had the honor of participating in a sensational live broadcast on CanaldotNET, alongside my great friends Renato Groffe and Thiago Bertuzzi. There were more than three hours of intense conversation about the data ecosystem, changes in the role of the DBA and the AI trends that are knocking on our door.
As the content was extremely dense and rich, I decided to create this post that is practically a summary of what we discussed, detailing each relevant topic for those who live their daily lives in the data “trench”. If you want to understand where the market is going and how not to be left behind, this post is for you.
For those who want to check out the full video and practical demonstrations by Renato Groffe and Bertuzzi, the link is here:
The Golden Square of Databases (0:15:20)
We started the conversation by demystifying the emergence of new databases every week. Although technologies like SurrealDB or CockroachDB bring interesting innovations, the mission-critical enterprise market is still dominated by what I call the “Golden Square”: SQL Server, Oracle, MySQL and PostgreSQL.
My view as a DBA is pragmatic: fads emerge all the time, but the solid job market and mission-critical systems revolve around these 4 main DBMSs. NoSQL is often sold as a replacement for relational, but in practice it usually serves as a quick ingestion layer or cache, while the truth of the data ends up dying in a robust relational database due to ACID consistency.
THE PostgreSQL has gained absurd prominence, not only because it is open source, but because of its ease of extensions. However, the SQL Server continues to be the most complete tool in terms of management, diagnosis and integration with the Microsoft ecosystem.
AI and the Rise of Vector Banks (0:42:15)
One of the highlights of the live was the discussion about Vector Stores and Embeddings. With the advancement of LLMs (such as GPT-4), databases now need to deal with unstructured data transformed into numeric vectors for semantic searches. Generative AI is not just chat. For real applications (RAG – Retrieval-Augmented Generation), we need Vector Stores.
I explained how the PGVector gave Postgres a competitive advantage, but I highlighted that the SQL Server is already integrating native support for vectors. This will allow us to perform searches for “proximity of meaning” directly via T-SQL, without having to export data to niche databases.
We also discuss the use of the Semantic Kernel to abstract this data layer, allowing you to change the AI model or vector database without having to rewrite all the business logic.
Career: The “End” of the DBA and the Emergence of the Data Engineer (1:05:40)
We've discussed a lot about how the role of the traditional DBA has changed. Today, it is not enough to just know how to install SQL Server and create users. The modern professional needs to understand Infrastructure as Code (IaC), containers and automation.
The line between DBA and Data Engineer is increasingly tenuous. While the DBA focuses on the stability and performance of the engine, the engineer focuses on the fluidity of the data pipeline. But one thing is certain: they both need to be masters of SQL.
Architecture for IoT and Data Streaming (1:10:15)
A common question in the chat was about where to save sensor and IoT data. Here the secret is not the bank, it is Architecture. Throwing millions of events per second directly into a relational table will generate a bottleneck. WRITELOG violent.
Correct flow generally involves:
- Ingestion via Kafka or Azure Event Hubs.
- Raw storage in one Data Lake (ADLS).
- Processing via Spark (Databricks or Fabric) in micro-batches.
Time Series with TimescaleDB (1:25:40)
If your problem is specifically Time Series, TimescaleDB has been highly praised. It is a Postgres extension that optimizes the writing and reading of chronological data, maintaining all the flexibility of SQL.
Other options mentioned were Cosmos DB, Elasticsearch and Redis.
Cloud Migration and the Cost of IOPS (1:32:10)
We talk about the challenges of migrating On-Premise environments to Azure SQL or AWS RDS. The biggest mistake I see is “Lift and Shift” without performance planning.
In the cloud, IOPS is money. If you have a poorly written query that scans a table of millions of rows, you are not just slowing down the system; you are burning the company's budget. We discuss tools like Azure Database Migration Service (DMS) to facilitate this process with minimal downtime.
Cheap SQL Server on Azure for POCs and MVPs (1:45:10)
A lot of people think that Azure SQL is expensive, but I showed you on screen how to get started with the DTU model (Basic or S0). You can get a real SQL Server database, with automatic backup and high availability, paying between R$25.00 and R$80.00 per month.
Provisioned vs. Serverless on Azure (2:05:30)
I explained the crucial difference between these two models:
- Provisioned: You reserve the resource and pay 24x7, since in the case of Azure SQL Database, there is no option to turn off the database. Ideal for constant workloads.
- Serverless: The bank “pauses” when no one uses it. It's perfect for test environments or BI processes that only run in the early hours of the morning.
Microsoft Fabric and OneLake: The Analytics Revolution (2:05:45)
We go deep into Microsoft Fabric. The proposal to unify storage in a single place (OneLake) using the open format Delta/Parquet It’s a watershed moment.
The concept of Shortcut is phenomenal: you can “point” to data that is in AWS S3 and query it from within Fabric as if it were local tables, without having to physically move a single byte (zero ETL). This solves the nightmare of siled data fragmentation.
Diving into Azure Database for PostgreSQL (2:25:15)
I showed Postgres provisioning on Azure. It follows the PaaS (Platform as a Service) model, where you don't worry about the operating system.
We looked at the machine profiles:
- Burstable: The cheapest, for light use and occasional peaks.
- General Purpose: Balance between CPU and memory.
- Memory Optimized: For banks that need a lot of cache (RAM) for performance.
Microservices and Bank Choice (2:45:00)
Today, PostgreSQL is the de facto standard for microservices. It is light, robust and scales very well in containers. The golden recommendation here is isolation: try to maintain one database (or at least one isolated schema/user) per microservice to prevent a slow query from one service from bringing down all the others.
The Danger of Raspberry Pi as a Bank Server (2:55:20)
The question arose: “Can I use a Raspberry Pi for company banking?”
The short answer is: No. For study it's fantastic, but for production you have no physical redundancy, SD cards constantly fail under I/O stress, and you don't have a managed backup strategy. The cost of Azure SQL Basic is less than the risk of losing your data.
Career and Certifications (3:05:45)
For those who want to stand out in the data area, I recommended two paths from Microsoft:
- DP-900 (Azure Data Fundamentals): For those who want the basis of everything (Relational, NoSQL, Data Lake).
- DP-300 (Administering Azure SQL Solutions): For those who already have experience and already work managing databases in the cloud.
Conclusion
The live was a true marathon of content focused on the data area, but the final message is simple: technology evolves quickly, but data fundamentals are eternal. Whether using SQL Server, Oracle, MySQL or PostgreSQL, anyone who understands how data is processed will always have a place in the market.
I hope you liked this tip, a big hug and see you next time!
Comentários (0)
Carregando comentários…