
Regular Expression (Regex)
Don't write 100 lines of 'if' to validate email. Use the cryptic power of Regex.

Don't write 100 lines of 'if' to validate email. Use the cryptic power of Regex.
Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.

Two ways to escape a maze. Spread out wide (BFS) or dig deep (DFS)? Who finds the shortest path?

Fast by name. Partitioning around a Pivot. Why is it the standard library choice despite O(N²) worst case?

Establishing TCP connection is expensive. Reuse it for multiple requests.

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."
const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false
const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false
I thought of this as the first and last pages of a book. ^ is the cover, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."
const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false
const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false
I thought of this as the first and last pages of a book. ^ is the cover, is the back cover. They anchor exactly where the pattern should be.
[abc]: One of a, b, c[a-z]: One lowercase letter from a to z[0-9]: One digit from 0 to 9[^abc]: Any character except a, b, c (negation)Brackets [] mean "one of these."
const vowel = /[aeiou]/;
vowel.test('apple'); // true
vowel.test('sky'); // true (y isn't a vowel but word contains vowel)
const notDigit = /[^0-9]/;
notDigit.test('123'); // false (all digits)
notDigit.test('12a3'); // true (a is not digit)
I understood this as a vending machine. [abc] means "you can press any button: a, b, or c."
\d: Digit = [0-9]\w: Word character = [a-zA-Z0-9_]\s: Whitespace = space, tab, newline, etc.\D: Not digit = [^0-9]\W: Not word character\S: Not whitespaceUppercase is the opposite of lowercase. Easy to remember.
const hasNumber = /\d/;
hasNumber.test('abc123'); // true
const noSpaces = /^\S+$/;
noSpaces.test('hello'); // true
noSpaces.test('hello world'); // false (has space)
*: Zero or more+: One or more?: Zero or one (optional){n}: Exactly n times{n,}: n or more times{n,m}: Between n and m timesQuantifiers specify how many times the preceding pattern repeats.
const optionalS = /cats?/; // cat or cats
optionalS.test('cat'); // true
optionalS.test('cats'); // true
optionalS.test('catttt'); // false
const phoneNumber = /\d{3}-\d{4}-\d{4}/; // 010-1234-5678 format
phoneNumber.test('010-1234-5678'); // true
phoneNumber.test('010-123-5678'); // false (wrong digit count)
I understood quantifiers as item counts in a game. "Must have 1 or more swords (+)," "shield is optional (?)."
(): Grouping|: OR conditionParentheses bundle multiple characters into a single unit. Pipe | means "or."
const catOrDog = /cat|dog/;
catOrDog.test('I have a cat'); // true
catOrDog.test('I have a dog'); // true
catOrDog.test('I have a bird'); // false
const repeatingGroup = /(ha)+/; // ha, haha, hahaha...
repeatingGroup.test('hahaha'); // true
repeatingGroup.test('haa'); // false (haa, not ha)
.: Any character except newlineA single dot is the joker card.
const threeChars = /a.c/; // a + anything + c
threeChars.test('abc'); // true
threeChars.test('a9c'); // true
threeChars.test('ac'); // false (no middle character)
Caution: To find a literal dot, escape it as \..
Regex has options you append after the pattern.
g: Global (find all matches, not just the first)i: Ignore case (case insensitive)m: Multiline (^ and $ apply to each line)s: Dotall (. includes newline)u: Unicodey: Stickyconst findAllDigits = /\d/g;
'a1b2c3'.match(findAllDigits); // ['1', '2', '3'] (finds all)
const caseInsensitive = /hello/i;
caseInsensitive.test('HELLO'); // true
caseInsensitive.test('HeLLo'); // true
I once forgot the g flag and got confused when only one result appeared. When using match(), I almost always need the g flag.
Theory alone is useless. I needed to actually use it. Here are patterns I use frequently.
const emailPattern = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/;
function isValidEmail(email) {
return emailPattern.test(email);
}
isValidEmail('user@example.com'); // true
isValidEmail('user.name@company.co.kr'); // true
isValidEmail('invalid@'); // false
isValidEmail('@invalid.com'); // false
Breaking it down:
^[\w-\.]+: Start with word chars/hyphens/dots, one or more@: @ symbol required([\w-]+\.)+: Domain (word-hyphen + dot) repeated one or more times[\w-]{2,4}## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."
const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false
const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false
I thought of this as the first and last pages of a book. ^ is the cover, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of string## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on EarthThe first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth
The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.
Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.
Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.
I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.
I actually encountered this in a company codebase:
const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;
They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?
So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.
function extractNumbers(phone) {
let result = '';
for (let i = 0; i < phone.length; i++) {
const char = phone[i];
if (char >= '0' && char <= '9') {
result += char;
}
}
return result;
}
It works. But... is this really the best way? I felt like I was doing something stupid.
The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:
2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1
I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:
"Just use regex. It's one line."
const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);
That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.
I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."
This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."
I started learning regex systematically as a language. First, the alphabet.
^: Start of stringThese two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."
const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false
const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false
I thought of this as the first and last pages of a book. ^ is the cover, is the back cover. They anchor exactly where the pattern should be.
[abc]: One of a, b, c[a-z]: One lowercase letter from a to z[0-9]: One digit from 0 to 9[^abc]: Any character except a, b, c (negation)Brackets [] mean "one of these."
const vowel = /[aeiou]/;
vowel.test('apple'); // true
vowel.test('sky'); // true (y isn't a vowel but word contains vowel)
const notDigit = /[^0-9]/;
notDigit.test('123'); // false (all digits)
notDigit.test('12a3'); // true (a is not digit)
I understood this as a vending machine. [abc] means "you can press any button: a, b, or c."
\d: Digit = [0-9]\w: Word character = [a-zA-Z0-9_]\s: Whitespace = space, tab, newline, etc.\D: Not digit = [^0-9]\W: Not word character\S: Not whitespaceUppercase is the opposite of lowercase. Easy to remember.
const hasNumber = /\d/;
hasNumber.test('abc123'); // true
const noSpaces = /^\S+$/;
noSpaces.test('hello'); // true
noSpaces.test('hello world'); // false (has space)
*: Zero or more+: One or more?: Zero or one (optional){n}: Exactly n times{n,}: n or more times{n,m}: Between n and m timesQuantifiers specify how many times the preceding pattern repeats.
const optionalS = /cats?/; // cat or cats
optionalS.test('cat'); // true
optionalS.test('cats'); // true
optionalS.test('catttt'); // false
const phoneNumber = /\d{3}-\d{4}-\d{4}/; // 010-1234-5678 format
phoneNumber.test('010-1234-5678'); // true
phoneNumber.test('010-123-5678'); // false (wrong digit count)
I understood quantifiers as item counts in a game. "Must have 1 or more swords (+)," "shield is optional (?)."
(): Grouping|: OR conditionParentheses bundle multiple characters into a single unit. Pipe | means "or."
const catOrDog = /cat|dog/;
catOrDog.test('I have a cat'); // true
catOrDog.test('I have a dog'); // true
catOrDog.test('I have a bird'); // false
const repeatingGroup = /(ha)+/; // ha, haha, hahaha...
repeatingGroup.test('hahaha'); // true
repeatingGroup.test('haa'); // false (haa, not ha)
.: Any character except newlineA single dot is the joker card.
const threeChars = /a.c/; // a + anything + c
threeChars.test('abc'); // true
threeChars.test('a9c'); // true
threeChars.test('ac'); // false (no middle character)
Caution: To find a literal dot, escape it as \..
Regex has options you append after the pattern.
g: Global (find all matches, not just the first)i: Ignore case (case insensitive)m: Multiline (^ and $ apply to each line)s: Dotall (. includes newline)u: Unicodey: Stickyconst findAllDigits = /\d/g;
'a1b2c3'.match(findAllDigits); // ['1', '2', '3'] (finds all)
const caseInsensitive = /hello/i;
caseInsensitive.test('HELLO'); // true
caseInsensitive.test('HeLLo'); // true
I once forgot the g flag and got confused when only one result appeared. When using match(), I almost always need the g flag.
Theory alone is useless. I needed to actually use it. Here are patterns I use frequently.
const emailPattern = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/;
function isValidEmail(email) {
return emailPattern.test(email);
}
isValidEmail('user@example.com'); // true
isValidEmail('user.name@company.co.kr'); // true
isValidEmail('invalid@'); // false
isValidEmail('@invalid.com'); // false
Breaking it down:
^[\w-\.]+: Start with word chars/hyphens/dots, one or more@: @ symbol required([\w-]+\.)+: Domain (word-hyphen + dot) repeated one or more timesNot perfect. To satisfy 100% of RFC spec would be extremely complex. But in practice, this was sufficient.
const phonePattern = /\d{2,3}-\d{3,4}-\d{4}/;
function formatPhone(phone) {
// Extract digits only
const numbers = phone.replace(/\D/g, '');
// Convert to 010-1234-5678 format
if (numbers.length === 11) {
return numbers.replace(/(\d{3})(\d{4})(\d{4})/, '$1-$2-$3');
} else if (numbers.length === 10) {
return numbers.replace(/(\d{2,3})(\d{3,4})(\d{4})/, '$1-$2-$3');
}
return phone; // Return original if format doesn't match
}
formatPhone('01012345678'); // '010-1234-5678'
formatPhone('010-1234-5678'); // '010-1234-5678'
formatPhone('10-1234-5678'); // '10-1234-5678' (area code)
In replace(), $1, $2, $3 reference groups captured by parentheses. Once I learned this, string transformation became incredibly easy.
Extracting links from markdown documents.
const urlPattern = /https?:\/\/[^\s]+/g;
function extractUrls(markdown) {
return markdown.match(urlPattern) || [];
}
const text = `
Check out https://example.com and http://test.org
Visit https://github.com/user/repo for more info.
`;
extractUrls(text);
// ['https://example.com', 'http://test.org', 'https://github.com/user/repo']
https? means "http or https" (s is optional). [^\s]+ means "one or more non-whitespace characters."
function isStrongPassword(password) {
const minLength = /.{8,}/; // Minimum 8 chars
const hasUpper = /[A-Z]/; // Has uppercase
const hasLower = /[a-z]/; // Has lowercase
const hasNumber = /\d/; // Has digit
const hasSpecial = /[!@#$%^&*(),.?":{}|<>]/; // Has special char
return (
minLength.test(password) &&
hasUpper.test(password) &&
hasLower.test(password) &&
hasNumber.test(password) &&
hasSpecial.test(password)
);
}
isStrongPassword('Pass123!'); // false (less than 8 chars)
isStrongPassword('Password123!'); // true
isStrongPassword('password123!'); // false (no uppercase)
Breaking one complex pattern into multiple simple patterns made readability much better.
This was harder. "Look ahead or behind but don't actually consume" patterns.
(?=...): Positive lookahead (if ... follows)(?!...): Negative lookahead (if ... doesn't follow)(?<=...): Positive lookbehind (if ... precedes)(?<!...): Negative lookbehind (if ... doesn't precede)For example, to extract only digits after a dollar sign:
const pricePattern = /(?<=\$)\d+/g;
'Item costs $100 and $250'.match(pricePattern); // ['100', '250']
(?<=\$) means "check if $ precedes but don't include $."
Password with "minimum 8 chars and contains digit" in one pattern:
const strongPassword = /^(?=.*\d).{8,}$/;
// (?=.*\d): Check if digit exists somewhere (lookahead)
// .{8,}: Actually match 8 or more chars
Honestly, this part still confuses me sometimes. But understanding it as "check condition without moving cursor" helped.
Regex's biggest problem is readability. "Write Once, Read Never" isn't just a joke. Six months later, looking at my own regex, I think "What is this?"
Solutions:
// Good: Comments and variable names clarify intent
const EMAIL_PATTERN = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/; // Basic email format validation
// Bad: Complex pattern without explanation
const x = /^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$/i;
Regex uses backtracking to match. If the pattern is complex and the input is long, it can slow down exponentially.
Dangerous pattern:
const dangerous = /(a+)+b/;
// Feeding input like 'aaaaaaaaaaaaaaaaaaaaac' to this pattern
// makes the engine backtrack infinitely trying to find 'b'
Nested quantifiers like (a+)+ are the problem. For each a, the engine must decide "include in first +? or second +?" causing combinatorial explosion.
Solutions:
Don't use regex for HTML parsing. Seriously. There's a famous Stack Overflow answer:
"The moment you try to parse HTML with regex, you have two problems."
HTML has nested structure. Regex can't handle nesting properly. Use a DOM parser or library (cheerio, jsdom, etc.).
Other cases:
string.includes()startsWith(), endsWith() are fasterDebugging regex is genuinely hard. I use these tools:
match() and exec() console logs: Check intermediate results.const pattern = /(\d+)-(\d+)-(\d+)/;
const result = pattern.exec('2025-04-29');
console.log(result);
// [
// '2025-04-29', // Full match
// '2025', // First group
// '04', // Second group
// '29', // Third group
// index: 0,
// input: '2025-04-29',
// groups: undefined
// ]
Regex is powerful but double-edged. Used in the right place, code becomes concise and productivity rises. Overused, it becomes maintenance hell.
I summarized it this way:
At first it looked like alien language, but now it's an essential tool for string processing. Complex patterns still give me headaches, but they're better than 100 lines of if statements.
In the end, regex is a "pattern language." Learn it like learning a new language, invest time, and you can perform magic in the world of strings.