# Regular Expressions and Finite Automata (refa)
[](https://github.com/RunDevelopment/refa/actions)
[](https://www.npmjs.com/package/refa)
A library for regular expressions (RE) and finite automata (FA) in the context of [Javascript RegExp](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp).
## About
refa is a general library for [DFA](https://en.wikipedia.org/wiki/Deterministic_finite_automaton), [NFA](https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton), and REs of [formal regular languages](https://en.wikipedia.org/wiki/Induction_of_regular_languages). It also includes methods to easily convert from JS RegExp to the internal RE AST and vice versa.
## Installation
Get [refa from NPM](https://www.npmjs.com/package/refa):
```
npm i --save refa
```
or
```
yarn add refa
```
## Features
- Conversions
* RE AST to NFA and ENFA (_assertions are not implemented yet_)
* DFA, NFA, and ENFA can all be converted into each other
* DFA, NFA, and ENFA to RE AST
- DFA, NFA, and ENFA operations
* Construction from other FA, the intersection of two FA, or a finite set of words
* Print as [DOT](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) or [Mermaid](https://mermaid.js.org/).
* Test whether a word is accepted
* Test whether the accepted language is the empty set/a finite set
* Accept all prefixes/suffixes of a language
- DFA specific operations
* Minimization
* Complement
* Structural equality
- NFA and ENFA specific operations
* Union and Concatenation with other FA
* Quantification
* Reverse
- AST transformations
* Simplify and change the AST of a regex
* Remove assertions
- JavaScript RegExp
* RegExp to RE AST and RE AST to RegExp
* All flags are fully supported
* Unicode properties
* Change flags
* Limited support for simple backreferences
See the [API documentation](https://rundevelopment.github.io/refa/docs/latest/) for a complete list of all currently implemented operations.
### RE AST format
refa uses its own AST format to represent regular expressions. The RE AST format is language agnostic and relatively simple.
It supports:
- Concatenation (e.g. `ab`)
- Alternation (e.g. `a|b`)
- Quantifiers (greedy and lazy) (e.g. `a{4,6}`, `a{2,}?`, `a?`, `a*`)
- Assertions (e.g. `(?=a)`, `(? false
console.log(nfa.test(Words.fromStringToUTF16("123")));
// => true
console.log(nfa.test(Words.fromStringToUTF16("abc123")));
// => true
console.log(nfa.test(Words.fromStringToUTF16("123abc")));
// => false
```
### Finding the intersection of two JS RegExps
```ts
const regex1 = /a+B+c+/i;
const regex2 = /Ab*C\d?/;
const intersection = NFA.fromIntersection(toNFA(regex1), toNFA(regex2));
console.log(toRegExp(intersection));
// => /Ab+C/
```
### Finding the complement of a JS RegExp
```ts
const regex = /a+b*/i;
const dfa = toDFA(regex);
dfa.complement();
console.log(toRegExp(dfa));
// => /(?:(?:[^A]|A+(?:[^AB]|B+[^B]))[^]*)?/i
```
### Converting a JS RegExp to an NFA
In the above examples, we have been using the `toNFA` helper function to parse and convert RegExps. This function assumes that the given RegExp is a pure regular expression without assertions and backreferences and will throw an error if the assumption is not met.
However, the JS parser and `NFA.fromRegex` provide some options to work around and even solve this problem.
#### Backreferences
Firstly, the parser will automatically resolve simple backreferences. Even `toNFA` will do this since it's on by default:
```ts
console.log(toRegExp(toNFA(/("|').*?\1/)));
// => /".*"|'.*'/i
```
But it will throw an error for non-trivial backreferences that cannot be resolved:
```ts
toNFA(/(#+).*\1|foo/);
// Error: Backreferences are not supported.
```
The only way to parse the RegExp despite unresolvable backreferences is to remove the backreferences. This means that the result will be imperfect but it might still be useful.
```ts
const regex = /(#+).*\1|foo/;
const { expression } =
JS.Parser.fromLiteral(regex).parse({ backreferences: "disable" });
console.log(JS.toLiteral(expression));
// => { source: 'foo', flags: '' }
```
Note that the `foo` alternative is kept because it is completely unaffected by the unresolvable backreferences.
#### Assertions
While the parser and AST format can handle assertions, the NFA construction cannot.
```ts
const regex = /\b(?!\d)\w+\b|->/;
const { expression, maxCharacter } = JS.Parser.fromLiteral(regex).parse();
console.log(JS.toLiteral(expression));
// => { source: '\\b(?!\\d)\\w+\\b|->', flags: 'i' }
NFA.fromRegex(expression, { maxCharacter });
// Error: Assertions are not supported yet.
```
Similarly to backreferences, we can let the parser remove them:
```ts
const regex = /\b(?!\d)\w+\b|->/;
const { expression, maxCharacter } =
JS.Parser.fromLiteral(regex).parse({ assertions: "disable" });
console.log(JS.toLiteral(expression));
// => { source: '->', flags: 'i' }
const nfa = NFA.fromRegex(expression, { maxCharacter });
console.log(toRegExp(nfa));
// => /->/i
```