Data & Identifiers

What is UUID/GUID? Understanding Unique Identifiers

Learn about UUID/GUID - universally unique identifiers used in software development, databases, and distributed systems.

7 min read
#uuid#guid#unique-identifier#database#distributed-systems

What is UUID/GUID?

UUID (Universally Unique Identifier) and GUID (Globally Unique Identifier) are 128-bit numbers used to uniquely identify information in computer systems. They're designed to be unique across space and time without requiring a central registration authority. The probability of generating duplicate UUIDs is so low that it's considered practically impossible.

UUID Structure and Format

A UUID is a 128-bit number typically displayed as 32 hexadecimal digits in a specific format.

Standard Format

UUIDs are represented as 5 groups of hexadecimal digits separated by hyphens in the format 8-4-4-4-12.

text
550e8400-e29b-41d4-a716-446655440000

Breakdown:
550e8400 - 8 hex digits (32 bits)
e29b     - 4 hex digits (16 bits)
41d4     - 4 hex digits (16 bits)
a716     - 4 hex digits (16 bits)
446655440000 - 12 hex digits (48 bits)

Total: 36 characters including hyphens

Variants

The variant field determines the layout of the UUID. Most modern UUIDs use the RFC 4122 variant.

text
Variant bits (highlighted in example):
550e8400-e29b-41d4-[a]716-446655440000
                    ^
                    Variant identifier

RFC 4122 variant: 10xx in binary
- a = 1010 (variant 1)
- b = 1011 (variant 1)
- 8 = 1000 (variant 1)
- 9 = 1001 (variant 1)

UUID Versions

There are several UUID versions, each using different methods for generation:

Version 1 - Time-based UUID

Generated from current timestamp and MAC address. Guarantees uniqueness but may expose machine information.

text
// UUID v1 Example
6ba7b810-9dad-11d1-80b4-00c04fd430c8

Components:
- Timestamp: Current time in 100-nanosecond intervals since Oct 15, 1582
- Clock sequence: Counter to prevent duplicates
- Node: MAC address of the generating computer

Pros: Sortable by creation time
Cons: Exposes MAC address (privacy concern)

Version 4 - Random UUID

Generated using random or pseudo-random numbers. Most commonly used version due to simplicity and privacy.

text
// UUID v4 Examples
550e8400-e29b-41d4-a716-446655440000
f47ac10b-58cc-4372-a567-0e02b2c3d479
7c9e6679-7425-40de-944b-e07fc1f90ae7

Generation:
- 122 random bits
- 6 bits for version and variant

Collision probability:
- 1 billion UUIDs = 0.00000006% chance
- Practically zero for most applications

Version 5 - Name-based (SHA-1)

Generated from namespace and name using SHA-1 hash. Deterministic - same input always produces same UUID.

text
// UUID v5 Example
Namespace: DNS (6ba7b810-9dad-11d1-80b4-00c04fd430c8)
Name: "example.com"
Result: 2ed6657d-e927-568b-95e1-2665a8aea6a2

// Same input = same UUID
Namespace + Name → Always same UUID v5

Use cases:
- Consistent IDs across systems
- Reproducible identifiers
- Idempotent operations

Other Versions

Less commonly used UUID versions include v2 (DCE Security), v3 (Name-based with MD5), and v6/v7/v8 (proposed improvements).

text
Version 2 (DCE Security): Rarely used, similar to v1
Version 3 (Name-based MD5): Deprecated, use v5 instead
Version 6: Time-ordered (proposed)
Version 7: Unix timestamp-based (proposed)
Version 8: Custom/vendor-specific (proposed)

UUID vs GUID

UUID and GUID are essentially the same thing with minor differences:

  • UUID: Standard term defined by RFC 4122, used in Unix/Linux environments
  • GUID: Microsoft's term for the same concept, used in Windows/.NET
  • Format: Both use the same 128-bit structure and format
  • Compatibility: UUIDs and GUIDs are interchangeable
  • Byte Order: GUIDs may use different byte ordering (endianness) in some Microsoft APIs
  • Terminology: The terms are often used interchangeably in modern development

Common Use Cases

UUIDs are used in various scenarios requiring unique identification:

  • Database Primary Keys: Alternative to auto-incrementing integers, especially in distributed databases
  • Distributed Systems: Generate unique IDs without central coordination
  • File Names: Ensure unique filenames for uploaded files
  • Session IDs: Unique identifiers for user sessions
  • API Resources: Identify resources in RESTful APIs
  • Message Queues: Track messages across distributed systems
  • Transaction IDs: Unique identifiers for financial transactions
  • Software Licensing: Generate unique license keys

Advantages of UUIDs

  • No Central Authority: Generate IDs independently without coordination
  • Globally Unique: Extremely low probability of collisions
  • Merge-Friendly: Easy to merge data from different sources
  • Distributed Generation: Multiple systems can generate IDs simultaneously
  • No Sequence Gaps: Unlike auto-increment IDs, no gaps from rollbacks
  • Privacy: Don't reveal information about number of records (unlike sequential IDs)
  • Scalability: Works well in distributed and cloud environments

Disadvantages and Limitations

  • Storage Size: 128 bits (16 bytes) vs 4-8 bytes for integers
  • Index Performance: Larger indexes, random UUIDs don't cluster well
  • Not Human-Readable: Harder to remember and communicate than sequential IDs
  • Sorting: Version 4 UUIDs are not sortable by creation time
  • URL Length: Makes URLs longer when used as identifiers
  • Database Performance: Can impact B-tree index performance with random inserts
  • Debugging: Harder to work with in logs and debugging compared to simple integers

Generating UUIDs in Different Languages

Examples of UUID generation across popular programming languages:

JavaScript/Node.js

Use the crypto module for native UUID generation or the uuid npm package.

javascript
// Native (Node.js 14.17+)
const { randomUUID } = require('crypto');
const uuid = randomUUID();

// Using uuid package
const { v4: uuidv4 } = require('uuid');
const uuid = uuidv4();

// Result: '550e8400-e29b-41d4-a716-446655440000'

Python

Python's built-in uuid module supports all UUID versions.

python
import uuid

# Version 4 (random)
uuid4 = uuid.uuid4()
print(uuid4)  # 550e8400-e29b-41d4-a716-446655440000

# Version 5 (name-based)
namespace = uuid.NAMESPACE_DNS
uuid5 = uuid.uuid5(namespace, 'example.com')
print(uuid5)  # 2ed6657d-e927-568b-95e1-2665a8aea6a2

Java

Java provides UUID class in java.util package.

java
import java.util.UUID;

// Generate random UUID (v4)
UUID uuid = UUID.randomUUID();
System.out.println(uuid.toString());

// Generate name-based UUID (v5)
UUID uuid5 = UUID.nameUUIDFromBytes("example.com".getBytes());
System.out.println(uuid5.toString());

Best Practices

  • Use Version 4 for most general-purpose unique identifiers
  • Use Version 5 when you need deterministic, reproducible UUIDs
  • Avoid Version 1 if privacy is a concern (exposes MAC address)
  • Store UUIDs in binary format (16 bytes) instead of string (36 chars) in databases
  • Use UUID-OSSP extension in PostgreSQL for efficient UUID operations
  • Consider sequential UUIDs (UUIDv6/v7) for better database index performance
  • Always validate UUID format when accepting user input
  • Use lowercase for consistency (though UUIDs are case-insensitive)
  • Don't rely on UUIDs for security - they're not cryptographically secure tokens

Conclusion

UUIDs/GUIDs are essential tools for generating unique identifiers in distributed systems and modern applications. While they have trade-offs in terms of storage size and index performance, their ability to generate globally unique IDs without coordination makes them invaluable for scalable, distributed architectures. Choose the appropriate UUID version based on your specific requirements for randomness, determinism, and sortability.