Introduction to Transactions in SQL
Imagine a banking system that uses SQL as a database. User A wants to deposit some money into User B’s account. What happens if they send the money, we take this money out of their account balance, and we want to deposit this money into User B’s account, but suddenly our database crashes.
Does this mean that the money we took out of User A’s balance disappeared? Well, not with SQL databases because they use SQL transactions for these.
Transactions and ACID Properties
Transactions are a sequence of one or more SQL operations performed as a single atomic unit. The purpose of this is to ensure data consistency in the database. A transaction has the following properties, often referred to by the ACID acronym:
Atomicity: The entire transaction is treated as a single unit, which either completely succeeds or completely fails.
Consistency: The transaction transforms the database from one valid state to another valid state, maintaining database invariants.
Isolation: Modifications made by concurrent transactions are isolated from one another until they are committed.
Durability: Once a transaction has been committed, it remains so, even in the event of a system failure.
Key Commands in SQL Transactions
We mark the beginning of an SQL Transaction with the BEGIN TRANSACTION keyword.
To save all the changes made during the transaction, we COMMIT the changes to the database.
If something goes wrong during the transaction, we can revert all the changes made during the transaction with the ROLLBACK command, returning the database to its state at the beginning of the transaction.
Example
Let's return to our simple banking application example, where you need to transfer $100 from Account A to Account B. This involves two steps:
deducting the amount from Account A
and adding it to Account B.
Both steps need to be completed for the transaction to be successful. Here’s how you might write this as a SQL transaction:
BEGIN TRANSACTION;
-- Deducting 100 from Account A's balance
UPDATE Accounts
SET balance = balance - 100
WHERE account_id = 'A';
-- Check if Account A has enough balance, if not rollback
IF @@ROWCOUNT = 0
ROLLBACK;
-- Adding 100 to Account B's balance
UPDATE Accounts
SET balance = balance + 100
WHERE account_id = 'B';
-- If everything is fine then commit the transaction
COMMIT;
This transaction does the following:
Starts a transaction to ensure the following operations are part of a single atomic process.
Deducts $100 from Account A: It assumes there’s a table named
accounts
with columnsaccount_id
andbalance
.Checks if Account A has sufficient funds: If Account A does not have enough money, the transaction is rolled back using
ROLLBACK TRANSACTION
, canceling all changes.Adds $100 to Account B: If Account A has enough money, then $100 is added to Account B.
Commits the transaction: If both updates are successful, the
COMMIT TRANSACTION
command is executed to permanently apply the changes made during this transaction to the database.
This ensures that either both accounts are updated appropriately, or if there’s an issue at any point, none of the changes are applied, maintaining the integrity of the data.
Isolation Layers of SQL Transactions
Isolation levels in database transactions determine how transaction integrity is maintained and to what extent each transaction is isolated from other transactions.
SQL standards define four isolation levels that trade-off between consistency and performance.
1. Read Uncommitted
Description: The lowest level of isolation. Transactions can see changes made by other transactions even before they are committed.
Impact on Example: While transferring money, if another transaction is updating the balance of Account A or B, this transaction might read these uncommitted values. This can lead to issues like seeing a balance that doesn’t actually exist (if the other transaction fails and rolls back).
2. Read Committed
Description: Ensures that a transaction can only read data that has been committed.
Impact on Example: This level avoids the issues of ‘Read Uncommitted’ by ensuring that only committed balances of Account A and B are read. However, if the balance is read multiple times within the transaction, it might see different values (non-repeatable reads) if other transactions are modifying the data.
3. Repeatable Read
Description: Ensures that if a transaction reads data a second time, it will find the same data values (avoiding non-repeatable reads).
Impact on Example: This level prevents the transaction from seeing changes made by others between multiple reads of the same data within the transaction. It would help in maintaining consistent read results during the balance check and update operations. However, it might not prevent phantom reads (new rows added by other transactions).
4. Serializable
Description: The highest level of isolation. Transactions are completely isolated from each other as if they are executed serially.
Impact on Example: This ensures complete isolation. No other transaction can interfere with the transfer process. It prevents all concurrent issues (dirty reads, non-repeatable reads, and phantom reads) but at the cost of reduced concurrency and potential performance issues due to locking.
With different isolation levels, several phenomena can occur, like Dirty Reads, Non-Repeatable Reads, or Phantom Reads. Let’s see what these terms mean below:
Dirty Reads
A dirty read occurs when a transaction reads data that has been written by a concurrent uncommitted transaction. As a result, if the other transaction rolls back, the first transaction will have read data that was never officially in the database.
Example:
Transaction 1 starts and transfers $100 from Account A to Account B.
Before Transaction 1 is committed, Transaction 2 starts and reads the balance of Account A.
If Transaction 1 fails and rolls back, Transaction 2 has read a balance that was never officially committed.
Non-Repeatable Reads (Read Uncommitted)
This happens when, during the course of a transaction, a row is retrieved twice, and the values within the row differ between reads. Essentially, another transaction modifies the row between the two reads.
Example:
Transaction 1 starts and reads the balance of Account A.
Transaction 2 transfers $100 from Account A to Account B and commits.
Transaction 1 again reads the balance of Account A and sees a different balance than before.
Phantom Reads
A phantom read occurs when, during a transaction, new rows are added (or existing ones deleted) by another transaction to the records being read. This means a subsequent read in the same transaction might return a set of rows that include newly added rows or exclude deleted ones which were not part of the original read.
Example:
Transaction 1 starts a query to count the number of transactions for Account A.
Transaction 2 inserts a new transaction record for Account A and commits.
Transaction 1 counts the number of transactions for Account A again and finds more transactions than before.
Obviously, higher isolation levels reduce the types of phenomena that can occur but at the cost of lower concurrency and potential performance impacts.
If you’d like to learn more about the scaling mechanisms of SQL Databases, then check out my in-depth Database Essentials video, which goes over Database Sharding, Replication, and more:
Hey Simonyan. I just want to tell you that your blogs really helped me in learning system design. but i need to know how to implement them in code, like sharding, partitioning, pubsub messaging etc. etc. i read some books on that as well but that is just theory. if there is a good (paid/free) course or any other resource please list. That will be really helpful. thanks again man. 🌹
This is phenomenal. never find informations like these on the whole internet.