Regular Expression (Regex)

Prologue: `^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

Then someone told me this validates email addresses. I had been trying to validate emails with if statements. "Does it have @? Does it have a dot? No spaces?" That approach. When I actually coded it, the script grew to over 50 lines. And I still missed most edge cases.

Regular Expressions (Regex) are a mini-language for finding and validating string patterns. At first, they look like alien script, but once I understood them, my entire approach to string processing transformed.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

I decided to accept it this way: "Regex is a language for drawing the shape of strings." \d is the picture of a digit, {1,3} means "between 1 and 3," \. is the symbol for a dot. Combine these drawings, and you get the pattern for "IP address."

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."

const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false

const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false

I thought of this as the first and last pages of a book. ^ is the cover, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."

const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false

const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false

I thought of this as the first and last pages of a book. ^ is the cover, is the back cover. They anchor exactly where the pattern should be.

Character Classes: Options

[abc]: One of a, b, c
[a-z]: One lowercase letter from a to z
[0-9]: One digit from 0 to 9
[^abc]: Any character except a, b, c (negation)

Brackets [] mean "one of these."

const vowel = /[aeiou]/;
vowel.test('apple'); // true
vowel.test('sky'); // true (y isn't a vowel but word contains vowel)

const notDigit = /[^0-9]/;
notDigit.test('123'); // false (all digits)
notDigit.test('12a3'); // true (a is not digit)

I understood this as a vending machine. [abc] means "you can press any button: a, b, or c."

Shorthand Character Classes: Common Abbreviations

\d: Digit = [0-9]
\w: Word character = [a-zA-Z0-9_]
\s: Whitespace = space, tab, newline, etc.
\D: Not digit = [^0-9]
\W: Not word character
\S: Not whitespace

Uppercase is the opposite of lowercase. Easy to remember.

const hasNumber = /\d/;
hasNumber.test('abc123'); // true

const noSpaces = /^\S+$/;
noSpaces.test('hello'); // true
noSpaces.test('hello world'); // false (has space)

Quantifiers: How Many?

*: Zero or more
+: One or more
?: Zero or one (optional)
{n}: Exactly n times
{n,}: n or more times
{n,m}: Between n and m times

Quantifiers specify how many times the preceding pattern repeats.

const optionalS = /cats?/; // cat or cats
optionalS.test('cat'); // true
optionalS.test('cats'); // true
optionalS.test('catttt'); // false

const phoneNumber = /\d{3}-\d{4}-\d{4}/; // 010-1234-5678 format
phoneNumber.test('010-1234-5678'); // true
phoneNumber.test('010-123-5678'); // false (wrong digit count)

I understood quantifiers as item counts in a game. "Must have 1 or more swords (+)," "shield is optional (?)."

Groups and Alternation

(): Grouping
|: OR condition

Parentheses bundle multiple characters into a single unit. Pipe | means "or."

const catOrDog = /cat|dog/;
catOrDog.test('I have a cat'); // true
catOrDog.test('I have a dog'); // true
catOrDog.test('I have a bird'); // false

const repeatingGroup = /(ha)+/; // ha, haha, hahaha...
repeatingGroup.test('hahaha'); // true
repeatingGroup.test('haa'); // false (haa, not ha)

Wildcard: Anything

.: Any character except newline

A single dot is the joker card.

const threeChars = /a.c/; // a + anything + c
threeChars.test('abc'); // true
threeChars.test('a9c'); // true
threeChars.test('ac'); // false (no middle character)

Caution: To find a literal dot, escape it as \..

Flags: Search Options

Regex has options you append after the pattern.

g: Global (find all matches, not just the first)
i: Ignore case (case insensitive)
m: Multiline (^ and $ apply to each line)
s: Dotall (. includes newline)
u: Unicode
y: Sticky

const findAllDigits = /\d/g;
'a1b2c3'.match(findAllDigits); // ['1', '2', '3'] (finds all)

const caseInsensitive = /hello/i;
caseInsensitive.test('HELLO'); // true
caseInsensitive.test('HeLLo'); // true

I once forgot the g flag and got confused when only one result appeared. When using match(), I almost always need the g flag.

Real-World Patterns: Actually Using It

Theory alone is useless. I needed to actually use it. Here are patterns I use frequently.

Email Validation

const emailPattern = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/;

function isValidEmail(email) {
  return emailPattern.test(email);
}

isValidEmail('user@example.com'); // true
isValidEmail('user.name@company.co.kr'); // true
isValidEmail('invalid@'); // false
isValidEmail('@invalid.com'); // false

Breaking it down:

^[\w-\.]+: Start with word chars/hyphens/dots, one or more
@: @ symbol required
([\w-]+\.)+: Domain (word-hyphen + dot) repeated one or more times
[\w-]{2,4}## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."

const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false

const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false

I thought of this as the first and last pages of a book. ^ is the cover, ## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and world## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
## Prologue: ^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The first time I saw this string, I was genuinely confused. I thought it was some kind of joke cipher developers made up. "This is... code?" It looked like a cat walked across the keyboard. ^, , \w, +, ., []... I had absolutely no idea what these symbols meant.

The Struggle: "Why Is This So Ugly?"

I initially rejected regex. It was too ugly. I learned that code should be readable by humans, and regex seemed to violate that principle directly.

I actually encountered this in a company codebase:

const urlPattern = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\�CB0�'\(\)\*\+,;=.]+$/;

They said it validates URLs. I looked at this and grumbled, "Who's going to maintain this?" Won't I fail to understand my own code 6 months later?

So I tried processing strings without regex. I wrote a function to extract numbers from phone numbers using pure for and if loops.

function extractNumbers(phone) {
  let result = '';
  for (let i = 0; i < phone.length; i++) {
    const char = phone[i];
    if (char >= '0' && char <= '9') {
      result += char;
    }
  }
  return result;
}

It works. But... is this really the best way? I felt like I was doing something stupid.

The Aha Moment: "It's All About Patterns"

The turning point came during a log parsing project. I needed to extract IP addresses from server logs. The log format looked like this:

2025-04-29 10:23:45 [INFO] User login from 192.168.1.100
2025-04-29 10:24:12 [ERROR] Failed request from 10.0.0.55
2025-04-29 10:25:03 [INFO] API call from 172.16.0.1

I started the manual labor with split() and indexOf(). Split by spaces, find the word after "from"... the code got increasingly complex. Then a senior developer walked over and said:

"Just use regex. It's one line."

const ipPattern = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g;
const ips = logContent.match(ipPattern);

That was it. Two lines extracted all IP addresses. At that moment, I understood. Regex wasn't ugly—it was "compressed." It was a DSL (Domain Specific Language) for expressing patterns.

This metaphor clicked for me completely. Regex wasn't regular code—it was "pattern sketching."

Basic Syntax: Learning the Alphabet

I started learning regex systematically as a language. First, the alphabet.

Anchors: Start and End

^: Start of string
: End of string

These two represent "position." For example, ^Hello means "string starting with Hello," and means "string ending with world."

const startsWithHello = /^Hello/;
startsWithHello.test('Hello world'); // true
startsWithHello.test('Say Hello'); // false

const endsWithWorld = /world$/;
endsWithWorld.test('Hello world'); // true
endsWithWorld.test('world is big'); // false

I thought of this as the first and last pages of a book. ^ is the cover, is the back cover. They anchor exactly where the pattern should be.

Character Classes: Options

[abc]: One of a, b, c
[a-z]: One lowercase letter from a to z
[0-9]: One digit from 0 to 9
[^abc]: Any character except a, b, c (negation)

Brackets [] mean "one of these."

const vowel = /[aeiou]/;
vowel.test('apple'); // true
vowel.test('sky'); // true (y isn't a vowel but word contains vowel)

const notDigit = /[^0-9]/;
notDigit.test('123'); // false (all digits)
notDigit.test('12a3'); // true (a is not digit)

I understood this as a vending machine. [abc] means "you can press any button: a, b, or c."

Shorthand Character Classes: Common Abbreviations

\d: Digit = [0-9]
\w: Word character = [a-zA-Z0-9_]
\s: Whitespace = space, tab, newline, etc.
\D: Not digit = [^0-9]
\W: Not word character
\S: Not whitespace

Uppercase is the opposite of lowercase. Easy to remember.

const hasNumber = /\d/;
hasNumber.test('abc123'); // true

const noSpaces = /^\S+$/;
noSpaces.test('hello'); // true
noSpaces.test('hello world'); // false (has space)

Quantifiers: How Many?

*: Zero or more
+: One or more
?: Zero or one (optional)
{n}: Exactly n times
{n,}: n or more times
{n,m}: Between n and m times

Quantifiers specify how many times the preceding pattern repeats.

const optionalS = /cats?/; // cat or cats
optionalS.test('cat'); // true
optionalS.test('cats'); // true
optionalS.test('catttt'); // false

const phoneNumber = /\d{3}-\d{4}-\d{4}/; // 010-1234-5678 format
phoneNumber.test('010-1234-5678'); // true
phoneNumber.test('010-123-5678'); // false (wrong digit count)

I understood quantifiers as item counts in a game. "Must have 1 or more swords (+)," "shield is optional (?)."

Groups and Alternation

(): Grouping
|: OR condition

Parentheses bundle multiple characters into a single unit. Pipe | means "or."

const catOrDog = /cat|dog/;
catOrDog.test('I have a cat'); // true
catOrDog.test('I have a dog'); // true
catOrDog.test('I have a bird'); // false

const repeatingGroup = /(ha)+/; // ha, haha, hahaha...
repeatingGroup.test('hahaha'); // true
repeatingGroup.test('haa'); // false (haa, not ha)

Wildcard: Anything

.: Any character except newline

A single dot is the joker card.

const threeChars = /a.c/; // a + anything + c
threeChars.test('abc'); // true
threeChars.test('a9c'); // true
threeChars.test('ac'); // false (no middle character)

Caution: To find a literal dot, escape it as \..

Flags: Search Options

Regex has options you append after the pattern.

g: Global (find all matches, not just the first)
i: Ignore case (case insensitive)
m: Multiline (^ and $ apply to each line)
s: Dotall (. includes newline)
u: Unicode
y: Sticky

const findAllDigits = /\d/g;
'a1b2c3'.match(findAllDigits); // ['1', '2', '3'] (finds all)

const caseInsensitive = /hello/i;
caseInsensitive.test('HELLO'); // true
caseInsensitive.test('HeLLo'); // true

I once forgot the g flag and got confused when only one result appeared. When using match(), I almost always need the g flag.

Real-World Patterns: Actually Using It

Theory alone is useless. I needed to actually use it. Here are patterns I use frequently.

Email Validation

const emailPattern = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/;

function isValidEmail(email) {
  return emailPattern.test(email);
}

isValidEmail('user@example.com'); // true
isValidEmail('user.name@company.co.kr'); // true
isValidEmail('invalid@'); // false
isValidEmail('@invalid.com'); // false

Breaking it down:

^[\w-\.]+: Start with word chars/hyphens/dots, one or more
@: @ symbol required
([\w-]+\.)+: Domain (word-hyphen + dot) repeated one or more times
: Final domain extension 2-4 chars, end

Not perfect. To satisfy 100% of RFC spec would be extremely complex. But in practice, this was sufficient.

Phone Number Formatting

const phonePattern = /\d{2,3}-\d{3,4}-\d{4}/;

function formatPhone(phone) {
  // Extract digits only
  const numbers = phone.replace(/\D/g, '');

  // Convert to 010-1234-5678 format
  if (numbers.length === 11) {
    return numbers.replace(/(\d{3})(\d{4})(\d{4})/, '$1-$2-$3');
  } else if (numbers.length === 10) {
    return numbers.replace(/(\d{2,3})(\d{3,4})(\d{4})/, '$1-$2-$3');
  }

  return phone; // Return original if format doesn't match
}

formatPhone('01012345678'); // '010-1234-5678'
formatPhone('010-1234-5678'); // '010-1234-5678'
formatPhone('10-1234-5678'); // '10-1234-5678' (area code)

In replace(), $1, $2, $3 reference groups captured by parentheses. Once I learned this, string transformation became incredibly easy.

URL Extraction

Extracting links from markdown documents.

const urlPattern = /https?:\/\/[^\s]+/g;

function extractUrls(markdown) {
  return markdown.match(urlPattern) || [];
}

const text = `
Check out https://example.com and http://test.org
Visit https://github.com/user/repo for more info.
`;

extractUrls(text);
// ['https://example.com', 'http://test.org', 'https://github.com/user/repo']

https? means "http or https" (s is optional). [^\s]+ means "one or more non-whitespace characters."

Password Strength Check

function isStrongPassword(password) {
  const minLength = /.{8,}/; // Minimum 8 chars
  const hasUpper = /[A-Z]/; // Has uppercase
  const hasLower = /[a-z]/; // Has lowercase
  const hasNumber = /\d/; // Has digit
  const hasSpecial = /[!@#$%^&*(),.?":{}|<>]/; // Has special char

  return (
    minLength.test(password) &&
    hasUpper.test(password) &&
    hasLower.test(password) &&
    hasNumber.test(password) &&
    hasSpecial.test(password)
  );
}

isStrongPassword('Pass123!'); // false (less than 8 chars)
isStrongPassword('Password123!'); // true
isStrongPassword('password123!'); // false (no uppercase)

Breaking one complex pattern into multiple simple patterns made readability much better.

Advanced Techniques: Lookahead and Lookbehind

This was harder. "Look ahead or behind but don't actually consume" patterns.

(?=...): Positive lookahead (if ... follows)
(?!...): Negative lookahead (if ... doesn't follow)
(?<=...): Positive lookbehind (if ... precedes)
(?<!...): Negative lookbehind (if ... doesn't precede)

For example, to extract only digits after a dollar sign:

const pricePattern = /(?<=\$)\d+/g;

'Item costs $100 and $250'.match(pricePattern); // ['100', '250']

(?<=\$) means "check if $ precedes but don't include $."

Password with "minimum 8 chars and contains digit" in one pattern:

const strongPassword = /^(?=.*\d).{8,}$/;
// (?=.*\d): Check if digit exists somewhere (lookahead)
// .{8,}: Actually match 8 or more chars

Honestly, this part still confuses me sometimes. But understanding it as "check condition without moving cursor" helped.

Pitfalls and Warnings

The Curse of Readability

Regex's biggest problem is readability. "Write Once, Read Never" isn't just a joke. Six months later, looking at my own regex, I think "What is this?"

Solutions:

Add comments. State what the pattern finds.
If complex, split into multiple patterns.
Write test cases.

// Good: Comments and variable names clarify intent
const EMAIL_PATTERN = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/; // Basic email format validation

// Bad: Complex pattern without explanation
const x = /^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$/i;

ReDoS (Regular Expression Denial of Service)

Regex uses backtracking to match. If the pattern is complex and the input is long, it can slow down exponentially.

Dangerous pattern:

const dangerous = /(a+)+b/;

// Feeding input like 'aaaaaaaaaaaaaaaaaaaaac' to this pattern
// makes the engine backtrack infinitely trying to find 'b'

Nested quantifiers like (a+)+ are the problem. For each a, the engine must decide "include in first +? or second +?" causing combinatorial explosion.

Solutions:

Avoid nested quantifiers
Use simpler patterns when possible
Limit input length

When NOT to Use Regex

Don't use regex for HTML parsing. Seriously. There's a famous Stack Overflow answer:

"The moment you try to parse HTML with regex, you have two problems."

HTML has nested structure. Regex can't handle nesting properly. Use a DOM parser or library (cheerio, jsdom, etc.).

Other cases:

Simple string comparison: Use string.includes()
Fixed patterns: startsWith(), endsWith() are faster
Complex business logic: Regular functions are better for maintenance

Debugging Tools

Debugging regex is genuinely hard. I use these tools:

regex101.com: Enter a pattern and it explains step-by-step. Highlights matching parts. Essential tool.
regexr.com: Visually test patterns.
JavaScript match() and exec() console logs: Check intermediate results.

const pattern = /(\d+)-(\d+)-(\d+)/;
const result = pattern.exec('2025-04-29');
console.log(result);
// [
//   '2025-04-29',  // Full match
//   '2025',        // First group
//   '04',          // Second group
//   '29',          // Third group
//   index: 0,
//   input: '2025-04-29',
//   groups: undefined
// ]

Wrapping Up

Regex is powerful but double-edged. Used in the right place, code becomes concise and productivity rises. Overused, it becomes maintenance hell.

I summarized it this way:

Simple pattern matching: Regex is best
Complex validation: Split into multiple regex or use regular logic
Parsing: Use dedicated libraries
Always comment and test: For future me

At first it looked like alien language, but now it's an essential tool for string processing. Complex patterns still give me headaches, but they're better than 100 lines of if statements.

In the end, regex is a "pattern language." Learn it like learning a new language, invest time, and you can perform magic in the world of strings.

Regular Expression (Regex)

Related Posts

Memory Management: Contiguous vs Non-Contiguous Allocation

BFS vs DFS: Graph Traversal

Quick Sort: Divide and Conquer

Keep-Alive: Don't hang up yet

Prologue: `^[\w-.]+@([\w-]+.)+[\w-]## Prologue: What on Earth

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

Character Classes: Options

Shorthand Character Classes: Common Abbreviations

Quantifiers: How Many?

Groups and Alternation

Wildcard: Anything

Flags: Search Options

Real-World Patterns: Actually Using It

Email Validation

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

The Struggle: "Why Is This So Ugly?"

The Aha Moment: "It's All About Patterns"

Basic Syntax: Learning the Alphabet

Anchors: Start and End

Character Classes: Options