F·02DATA STRUCTURE2025.06.282 MIN READ

Trie Data Structure: The Secret Behind Autocomplete

Trie(트라이): 검색어 자동완성은 어떻게 0.01초 만에 나올까?

Why 'SELECT * FROM words WHERE word LIKE "app%"' is too slow. How Tries solve prefix search in O(L) time. Implementation guide and Radix Tree optimization.

codemapo

INTERDISCIPLINARY DEV · SEOUL

13. FAQ

Q1. HashMaps give O(1) lookup. Why use Tries?

HashMaps only find exact key matches. To find all words starting with "app", you must iterate all keys—$O(N)$. Tries do prefix search in $O(L)$. Also, HashMaps are vulnerable to collision attacks, while Tries guarantee deterministic performance.

Q2. Difference between Trie and Suffix Tree?

Tries share prefixes (beginnings) of words. Suffix Trees store all suffixes of a string for substring pattern matching. For "BANANA", a Suffix Tree stores "BANANA", "ANANA", "NANA", "ANA", "NA", "A". Completely different use cases.

Q3. How to handle Unicode, Korean, or emoji?

Unicode codepoints: Convert characters to Unicode values as keys. HashMap approach required.
Jamo separation (Korean): Decompose "한글" → "ㅎ ㅏ ㄴ ㄱ ㅡ ㄹ". Higher search precision but deeper trees.
Normalization: Emoji can be multi-codepoint sequences; normalization needed (NFD, NFC, etc.).

Q4. Do production systems implement Tries from scratch?

Most use search engines (Elasticsearch, Solr), Redis Sorted Sets, or PostgreSQL Full-Text Search. But network equipment (routers), embedded systems, and game engine command parsers often use hand-optimized Tries. In memory- and latency-critical environments, techniques like Double Array Tries are common.

Q5. Array vs HashMap for Trie nodes—how to decide?

Array: Small alphabet (26 letters), most characters actually used. Speed priority.
HashMap: Large alphabet (Unicode) or sparse data. Memory priority.
Hybrid: Upper levels use arrays (common chars), lower levels use HashMaps (rare chars).

Q6. Other ways to reduce Trie space complexity?

Compressed Trie (Radix Tree): Path compression as described
Bitwise Trie: Branch bit-by-bit to increase depth, reduce branching (IP routing)
Level Compression: Merge multiple levels (4-way Trie, etc.)
Cache-Oblivious Trie: Memory layout optimized for cache efficiency
Lazy Deletion: Don't actually delete nodes, just mark inactive (accept fragmentation)

#Data Structure #Algorithm #Search #String #Autocomplete

← Back to List

Trie Data Structure: The Secret Behind Autocomplete

Trie(트라이): 검색어 자동완성은 어떻게 0.01초 만에 나올까?

Why 'SELECT * FROM words WHERE word LIKE "app%"' is too slow. How Tries solve prefix search in O(L) time. Implementation guide and Radix Tree optimization.

codemapo

INTERDISCIPLINARY DEV · SEOUL

13. FAQ

Q1. HashMaps give O(1) lookup. Why use Tries?

Q2. Difference between Trie and Suffix Tree?

Q3. How to handle Unicode, Korean, or emoji?

Unicode codepoints: Convert characters to Unicode values as keys. HashMap approach required.
Jamo separation (Korean): Decompose "한글" → "ㅎ ㅏ ㄴ ㄱ ㅡ ㄹ". Higher search precision but deeper trees.
Normalization: Emoji can be multi-codepoint sequences; normalization needed (NFD, NFC, etc.).

Q4. Do production systems implement Tries from scratch?

Q5. Array vs HashMap for Trie nodes—how to decide?

Array: Small alphabet (26 letters), most characters actually used. Speed priority.
HashMap: Large alphabet (Unicode) or sparse data. Memory priority.
Hybrid: Upper levels use arrays (common chars), lower levels use HashMaps (rare chars).

Q6. Other ways to reduce Trie space complexity?

Compressed Trie (Radix Tree): Path compression as described
Bitwise Trie: Branch bit-by-bit to increase depth, reduce branching (IP routing)
Level Compression: Merge multiple levels (4-way Trie, etc.)
Cache-Oblivious Trie: Memory layout optimized for cache efficiency
Lazy Deletion: Don't actually delete nodes, just mark inactive (accept fragmentation)