What is Regular Expression (Regex)?
Learn about Regular Expressions (Regex) - powerful pattern matching tools used for searching, validating, and manipulating text.
What is a Regular Expression?
A Regular Expression (regex or regexp) is a sequence of characters that defines a search pattern. It's used to match, search, and manipulate text based on patterns rather than exact strings. Regular expressions are supported in most programming languages and text editors, making them a universal tool for text processing tasks.
Basic Regex Syntax
Regular expressions use special characters and sequences to define patterns.
Literal Characters
The simplest regex is a literal string that matches itself exactly.
Pattern: cat
Matches: "cat", "category", "concatenate"
Does not match: "Cat", "CAT" (case-sensitive by default)
Pattern: hello
Matches: "hello world", "say hello"
Does not match: "Hello", "HELLO"Metacharacters
Special characters with special meaning in regex. Must be escaped with backslash to match literally.
Metacharacters: . ^ $ * + ? { } [ ] \ | ( )
Examples:
. (dot) - Matches any single character
^ (caret) - Matches start of string
$ (dollar) - Matches end of string
* (asterisk)- Matches 0 or more times
+ (plus) - Matches 1 or more times
? (question)- Matches 0 or 1 time
Escaping:
\. matches literal dot
\$ matches literal dollar signCharacter Classes
Square brackets define a set of characters to match.
[abc] - Matches 'a', 'b', or 'c'
[a-z] - Matches any lowercase letter
[A-Z] - Matches any uppercase letter
[0-9] - Matches any digit
[a-zA-Z] - Matches any letter
[^abc] - Matches any character EXCEPT a, b, c
Examples:
[0-9]+ matches "123", "42", "999"
[a-z]+ matches "hello", "world"Common Regex Patterns
Frequently used regex patterns for common tasks:
Predefined Character Classes
Shorthand notations for commonly used character classes.
\d - Digit [0-9]
\D - Not a digit [^0-9]
\w - Word character [a-zA-Z0-9_]
\W - Not a word character
\s - Whitespace (space, tab, newline)
\S - Not whitespace
Examples:
\d{3} - Matches exactly 3 digits: "123"
\w+ - Matches one or more word chars: "hello_world"
\s+ - Matches whitespace: " "Quantifiers
Specify how many times a pattern should match.
* - 0 or more times
+ - 1 or more times
? - 0 or 1 time
{n} - Exactly n times
{n,} - n or more times
{n,m} - Between n and m times
Examples:
colou?r - Matches "color" or "colour"
\d{3} - Matches exactly 3 digits
\w{3,5} - Matches 3 to 5 word characters
a+ - Matches "a", "aa", "aaa", etc.Anchors
Match positions rather than characters.
^ - Start of string/line
$ - End of string/line
\b - Word boundary
\B - Not a word boundary
Examples:
^hello - Matches "hello" at start of string
world$ - Matches "world" at end of string
\bcat\b - Matches "cat" as whole word, not "category"
^\d{3}$ - Matches exactly 3 digits, nothing moreAdvanced Regex Features
More complex regex capabilities for sophisticated pattern matching:
Groups and Capturing
Parentheses create groups for capturing or applying quantifiers.
(abc)+ - Matches "abc", "abcabc", etc.
(\d{3})-(\d{4}) - Captures area code and number: "555-1234"
Named groups (some languages):
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Matches: "2024-11-18" with named captures
Non-capturing group:
(?:abc)+ - Groups but doesn't captureAlternation
The pipe symbol | acts as OR operator.
cat|dog - Matches "cat" OR "dog"
gray|grey - Matches "gray" OR "grey"
(Mr|Mrs|Ms)\. - Matches "Mr.", "Mrs.", or "Ms."
With groups:
(https?|ftp):// - Matches "http://", "https://", or "ftp://"Lookahead and Lookbehind
Assert what comes before or after without including it in the match.
Positive lookahead: (?=...)
\d+(?= dollars) - Matches numbers followed by " dollars"
Negative lookahead: (?!...)
\d+(?! dollars) - Matches numbers NOT followed by " dollars"
Positive lookbehind: (?<=...)
(?<=\$)\d+ - Matches numbers preceded by "$"
Negative lookbehind: (?<!...)
(?<!\$)\d+ - Matches numbers NOT preceded by "$"Real-World Regex Examples
Practical regex patterns for common validation and extraction tasks:
Email Validation
A simplified email validation pattern (full RFC compliance is very complex).
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^ - Start of string
[a-zA-Z0-9._%+-]+ - Username part
@ - Literal @ symbol
[a-zA-Z0-9.-]+ - Domain name
\. - Literal dot
[a-zA-Z]{2,} - TLD (2+ letters)
$ - End of string
Matches: user@example.com, john.doe+tag@domain.co.ukPhone Number
Pattern for US phone numbers in various formats.
^(?:\+?1[-.]?)?\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})$
Matches:
555-123-4567
(555) 123-4567
555.123.4567
+1-555-123-4567
5551234567URL Validation
Pattern for matching HTTP/HTTPS URLs.
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
Matches:
https://example.com
http://www.example.com/path
https://example.com/path?query=valuePassword Strength
Ensure password has uppercase, lowercase, digit, and special character.
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Breakdown:
(?=.*[a-z]) - At least one lowercase
(?=.*[A-Z]) - At least one uppercase
(?=.*\d) - At least one digit
(?=.*[@$!%*?&]) - At least one special char
[A-Za-z\d@$!%*?&]{8,} - Min 8 characters totalExtracting Data
Extract specific information from formatted text.
// Extract dates in YYYY-MM-DD format
\b(\d{4})-(\d{2})-(\d{2})\b
// Extract hashtags
#\w+
// Extract URLs
https?://[^\s]+
// Extract email addresses
\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\bCommon Use Cases
- Input Validation: Validate email, phone numbers, URLs, passwords
- Search and Replace: Find and replace patterns in text editors
- Data Extraction: Extract specific information from logs or documents
- String Parsing: Parse structured data formats
- Form Validation: Validate user input in web forms
- Log Analysis: Filter and analyze log files
- URL Routing: Match and route URLs in web frameworks
- Syntax Highlighting: Identify code patterns in IDEs
Regex Flags/Modifiers
Flags modify how the regex pattern is interpreted:
- i (case-insensitive): Match regardless of case - /hello/i matches "Hello", "HELLO"
- g (global): Find all matches, not just the first
- m (multiline): ^ and $ match start/end of each line, not just string
- s (dotall): Dot matches newline characters too
- u (unicode): Enable full Unicode matching
- x (extended): Ignore whitespace and allow comments (some languages)
Best Practices and Tips
- Start simple and test incrementally - build complex patterns step by step
- Use online regex testers (like our Regex Tester tool) to test patterns
- Comment complex regex patterns to explain what each part does
- Be specific - avoid overly greedy patterns like .* when possible
- Use non-capturing groups (?:...) when you don't need to capture
- Escape special characters with backslash when matching them literally
- Consider performance - complex regex can be slow on large inputs
- Use raw strings in code to avoid double-escaping backslashes
- Remember regex is not suitable for parsing HTML/XML - use proper parsers
Common Pitfalls
- Catastrophic Backtracking: Nested quantifiers can cause exponential time complexity
- Greedy vs Lazy: .* matches as much as possible; .*? matches as little as possible
- Forgetting Anchors: /\d{3}/ matches "123" in "12345"; use /^\d{3}$/ for exact match
- Not Escaping Metacharacters: Use \. to match literal dot, not any character
- Regex for Everything: Don't use regex for complex parsing (HTML, JSON) - use proper parsers
- Ignoring Edge Cases: Test with empty strings, special characters, and extreme inputs
Conclusion
Regular expressions are powerful tools for pattern matching and text manipulation. While they have a steep learning curve, mastering regex will significantly improve your text processing capabilities across various programming tasks. Start with simple patterns, use testing tools to verify your regex, and gradually build more complex expressions as you gain confidence.
Related Tools
Try these tools related to this topic