πŸ“„ Regular Expression Guide

Regular Expression Guide

Regular Expressions or regex are a sequence of characters that forms a pattern used to search character combinations in string values.

This article serves as a guide on how to write regex for some Strato features that may need regex values.

πŸ’‘

This article is not an exhaustive list of regular expression syntax. For an exhaustive guide, consult the official Java 8 regex syntax.

To play around with regex before implementing in your Strato instance, use sites like regex101 and set your flavor to Java 8.


Tokens and Characters

  • To get a specific character, type the exact character.
  • To get only a specific range of characters, type the range in square brackets [].
    • Examples:
      • [a-z] targets lowercase characters.
      • [A-Z] targets uppercase characters.
      • [a-zA-Z] targets both lowercase and uppercase characters.
      • [0-9] targets digits from 0 to 9.
  • There are specific shorthands that target certain character classes:
    • \d targets digits from 0 to 9. Equivalent to [0-9].
    • \w targets word characters. This includes lowercase and uppercase characters, digits, and underscores. Equivalent to [a-zA-Z0-9_].
    • \s targets whitespace characters. This includes tabs, spaces, and new line characters.
    • \D is the opposite of \d and targets anything that is not a digit.
    • \W is the opposite of \w and targets anything that is not a word character.
    • \S is the opposite of \s and targets anything that is not a whitespace character.
  • To get any character except line breaks, type a period ..
  • To get any special character by itself, type a backslash \ first before the special character.
    • Examples:
      • \. targets a period.
      • \\ targets a backslash.
      • \( targets an open parenthesis.


Quantifiers

Quantifiers determine how many characters should be taken.

Specific

You can specify numbers in a quantifier using curly brackets {}.

  • To get an exact number of characters, type a number inside the curly brackets.
    • Example: \D{5} targets five non-digit characters.
  • To get an X to Y number of characters, type two numbers inside the curly brackets separated by a comma.
    • Example: \w{5,9} targets five to nine word characters.
  • To get X or more characters, type a number and a comma inside the curly brackets.
    • Example: \d{8,} targets eight or more digits.


Greedy

Greedy quantifiers will select as many characters as possible, and output the longest match.

  • To get one or more characters, type a plus + after the character.
  • To get zero or more characters, type an asterisk * after the character.
  • To get zero or one character, type a question mark ? after the character.

For example, in the string Sales Representative (Malaysia) (Asia), the regex pattern \(.+\) will output (Malaysia) (Asia).


Lazy

Lazy quantifiers will select as few characters as possible, and output the shortest match. To make a quantifier lazy, add a question mark ? after any of the specific or greedy quantifiers.

For example, in the string Sales Representative (Malaysia) (Asia), the regex pattern \(.+?\) will output (Malaysia) and (Asia) separately.


Modifiers

  • To create an extraction or capturing group, type your regex pattern inside parentheses (). This will output only what is inside the parentheses.
    • Example: In the string Sales Representative (Asia), the regex pattern (.*?)\s*\( will output Sales Representative.


  • To set the position of your search at the beginning of the string, type a caret ^ at the beginning of your regex pattern.
    • Example: In the string Sales Representative, the regex pattern ^(.+?)\s will output Sales.
    • a

  • To set the position of your search at the end of the string, type a dollar sign $ at the end of your regex pattern.
    • Example: In the string Sales Representative, the regex pattern \s(.+?)$ will output Representative.