Web Development

What is URL Encoding (Percent Encoding)?

Learn about URL encoding - the process of converting special characters in URLs to a format that can be safely transmitted over the internet.

6 min read
#url-encoding#percent-encoding#web#http#uri#character-encoding

What is URL Encoding?

URL encoding, also called percent encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) using only ASCII characters that are permitted in URLs. Special characters and non-ASCII characters are converted to a format starting with a percent sign (%) followed by hexadecimal digits. This ensures URLs can be safely transmitted across different systems and networks.

Why URL Encoding is Necessary

URLs can only contain a limited set of characters from the ASCII character set.

Safe vs Unsafe Characters

Not all characters are safe to use directly in URLs:

text
Safe Characters (no encoding needed):
- Letters: A-Z, a-z
- Digits: 0-9
- Unreserved: - _ . ~

Reserved Characters (special meaning in URLs):
: / ? # [ ] @ ! $ & ' ( ) * + , ; =

Unsafe Characters (must be encoded):
- Space: %20 or +
- Special characters: < > " { } | \ ^ `
- Non-ASCII: é → %C3%A9, 中 → %E4%B8%AD

Problems Without Encoding

Why we need URL encoding:

text
Without encoding:
Bad: http://example.com/search?q=hello world
Problem: Space breaks URL parsing

Bad: http://example.com/name?user=John&Jane
Problem: & is interpreted as parameter separator

Bad: http://example.com/path?url=http://other.com
Problem: : and / have special meaning

With encoding:
Good: http://example.com/search?q=hello%20world
Good: http://example.com/name?user=John%26Jane
Good: http://example.com/path?url=http%3A%2F%2Fother.com

How URL Encoding Works

The encoding process converts characters to percent-encoded format:

Encoding Process

Characters are converted to hexadecimal byte values prefixed with %.

text
Step-by-step encoding:

1. Take character: space " "
2. Get ASCII/UTF-8 code: 32 (decimal) = 0x20 (hex)
3. Add % prefix: %20

More examples:
! → ASCII 33 → %21
# → ASCII 35 → %23
$ → ASCII 36 → %24
& → ASCII 38 → %26
= → ASCII 61 → %3D
? → ASCII 63 → %3F
@ → ASCII 64 → %40

UTF-8 multi-byte:
é → UTF-8: C3 A9 → %C3%A9
中 → UTF-8: E4 B8 AD → %E4%B8%AD

Common Encoded Characters

Frequently encountered URL-encoded characters:

text
Character  →  Encoded  →  Usage
---------      -------      -----
Space          %20 or +     Spaces in query strings
!              %21          URLs with exclamation
"              %22          Quotes in parameters
#              %23          Fragment identifier (not usually encoded)
$              %24          Special character
&              %26          Query parameter separator
'              %27          Single quote
(              %28          Opening parenthesis
)              %29          Closing parenthesis
*              %2A          Asterisk
+              %2B          Plus sign
,              %2C          Comma
/              %2F          Forward slash (path separator)
:              %3A          Colon (scheme separator)
;              %3B          Semicolon
=              %3D          Equals (key-value separator)
?              %3F          Question mark (query string start)
@              %40          At sign (user info separator)
[              %5B          Opening bracket
]              %5D          Closing bracket

URL Components and Encoding

Different parts of a URL have different encoding rules:

URL Structure

Understanding which parts need encoding:

text
https://user:pass@example.com:8080/path/to/resource?key=value&foo=bar#section
|---| |-------| |---------| |--| |---------------| |---------------| |-----|
  |       |         |         |         |                    |              |
Scheme  User    Hostname    Port     Path             Query String    Fragment

Encoding rules:
- Scheme: No encoding
- User/Pass: Encode : @ /
- Hostname: Use Punycode for internationalized domains
- Port: No encoding (numbers only)
- Path: Encode most except / - _ . ~
- Query: Encode everything except unreserved chars
- Fragment: Usually not sent to server

Query String Encoding

Special rules for query parameters:

text
Original:
http://example.com/search?q=hello world&category=news & events

Encoded:
http://example.com/search?q=hello%20world&category=news%20%26%20events

Key-Value pairs:
key1=value with spaces → key1=value%20with%20spaces
key2=hello&goodbye     → key2=hello%26goodbye
key3=50%                → key3=50%25

Note: Space can be %20 or + in query strings
q=hello+world   (application/x-www-form-urlencoded)
q=hello%20world (standard percent encoding)

URL Encoding in Different Languages

Examples of encoding URLs in popular programming languages:

JavaScript

JavaScript provides multiple encoding functions:

javascript
// encodeURI - for full URLs
const url = 'https://example.com/path?q=hello world';
console.log(encodeURI(url));
// https://example.com/path?q=hello%20world

// encodeURIComponent - for URL components (recommended)
const query = 'hello world & stuff';
const encoded = encodeURIComponent(query);
console.log(encoded);
// hello%20world%20%26%20stuff

// Build URL with parameters
const params = new URLSearchParams({
  q: 'hello world',
  category: 'news & events'
});
const fullUrl = `https://example.com/search?${params}`;
// https://example.com/search?q=hello+world&category=news+%26+events

// Decode
const decoded = decodeURIComponent('hello%20world');
console.log(decoded); // hello world

Python

Python's urllib provides URL encoding:

python
from urllib.parse import quote, quote_plus, urlencode, unquote

# quote - standard encoding
encoded = quote('hello world & stuff')
print(encoded)  # hello%20world%20%26%20stuff

# quote_plus - uses + for spaces
encoded = quote_plus('hello world')
print(encoded)  # hello+world

# urlencode - for query parameters
params = {'q': 'hello world', 'category': 'news & events'}
query_string = urlencode(params)
print(query_string)  # q=hello+world&category=news+%26+events

# Decode
decoded = unquote('hello%20world')
print(decoded)  # hello world

PHP

PHP encoding functions:

php
<?php
// urlencode - for query parameters (+ for space)
$encoded = urlencode('hello world & stuff');
echo $encoded; // hello+world+%26+stuff

// rawurlencode - RFC 3986 (%20 for space)
$encoded = rawurlencode('hello world & stuff');
echo $encoded; // hello%20world%20%26%20stuff

// http_build_query - for query strings
$params = ['q' => 'hello world', 'cat' => 'news & events'];
$query = http_build_query($params);
echo $query; // q=hello+world&cat=news+%26+events

// Decode
$decoded = urldecode('hello+world');
echo $decoded; // hello world
?>

Common Use Cases

When and where URL encoding is essential:

  • Query Parameters: Encoding search terms, filters, and user input
  • API Requests: Passing data in GET requests
  • Form Submissions: application/x-www-form-urlencoded data
  • OAuth/Authentication: Encoding redirect URLs and tokens
  • File Paths: URLs containing filenames with special characters
  • Internationalization: Encoding non-ASCII characters in URLs
  • Social Sharing: Pre-filling share text with special characters
  • Email Links: mailto: URLs with subject and body parameters

Practical Examples

Real-world URL encoding scenarios:

Search Query

Encoding search terms:

text
Original query: "how to use C++ & Python?"

Unencoded (wrong):
http://example.com/search?q=how to use C++ & Python?

Encoded (correct):
http://example.com/search?q=how%20to%20use%20C%2B%2B%20%26%20Python%3F

Alternate (+):
http://example.com/search?q=how+to+use+C%2B%2B+%26+Python%3F

Multiple Parameters

Building complex query strings:

text
Parameters:
- name: John Doe
- email: john+test@example.com
- message: Hello! How are you?

Encoded URL:
http://example.com/contact?
name=John%20Doe&
email=john%2Btest%40example.com&
message=Hello%21%20How%20are%20you%3F

Redirect URLs

Encoding URLs as parameters:

text
Redirect to: https://example.com/dashboard?tab=settings

Login URL:
https://auth.example.com/login?
redirect=https%3A%2F%2Fexample.com%2Fdashboard%3Ftab%3Dsettings

Note: The redirect URL itself is fully encoded!

Best Practices

  • Always encode user input before adding to URLs
  • Use built-in functions provided by your programming language
  • Encode components separately - don't encode the entire URL
  • Use encodeURIComponent in JavaScript, not encodeURI for parameters
  • Be consistent with space encoding (+ or %20)
  • Test with special characters during development
  • Consider international characters - use UTF-8 encoding
  • Validate after encoding - ensure URLs are well-formed
  • Don't double-encode - check if data is already encoded

Common Mistakes

  • Not encoding user input: Leads to broken URLs and security issues
  • Encoding too much: Encoding the entire URL breaks it
  • Double encoding: Encoding already-encoded data
  • Wrong function: Using encodeURI instead of encodeURIComponent
  • Forgetting fragments: Not encoding # in parameter values
  • Mixing styles: Inconsistent + vs %20 for spaces
  • Encoding when not needed: Over-encoding safe characters reduces readability

Security Considerations

URL encoding and security:

  • Prevent Injection Attacks: Always encode user input to prevent URL manipulation
  • Open Redirect Prevention: Validate and encode redirect URLs
  • XSS Prevention: Encoding helps prevent some XSS attacks in URLs
  • Path Traversal: Encoding prevents ../ attacks in file paths
  • SQL Injection: Not a substitute for parameterized queries, but helps
  • Validate Decoded Data: Always validate after decoding user input

URL Encoding vs Other Encodings

How URL encoding differs from other encoding methods:

  • vs Base64: Base64 encodes binary data; URL encoding handles special characters
  • vs HTML Entities: HTML entities (&) for HTML; URL encoding for URLs
  • vs Unicode Escapes: \u0020 in strings; %20 in URLs
  • vs Punycode: Punycode for domain names; percent encoding for path/query

Conclusion

URL encoding is fundamental to web development, ensuring that URLs can safely contain any character while remaining compatible with internet standards. Understanding when and how to properly encode URLs prevents bugs, improves security, and ensures your web applications work correctly across different systems and browsers. Always use your programming language's built-in encoding functions and encode user input before including it in URLs.