375 lines
15 KiB
Markdown
375 lines
15 KiB
Markdown
# parse-statements ✂️
|
|
|
|
[![NPM version][npm-image]][npm-url]
|
|
[![dependencies: none][dependencies-none-image]][dependencies-none-url]
|
|
[![minzipped size][size-image]][size-url]
|
|
[![code style: prettier][prettier-image]][prettier-url]
|
|
[![Conventional Commits][conventional-commits-image]][conventional-commits-url]
|
|
[![License MIT][license-image]][license-url]
|
|
|
|
Fast and easy parser of statements in source code in any language.
|
|
|
|
`parse-statements` ✂️ allows you to parse statements consisting of a sequence of tokens
|
|
with arbitrary text between them. Statements cannot overlap.
|
|
|
|
In addition to statements, language comments can be described, which can also be located
|
|
inside statements (between its neighboring tokens).
|
|
|
|
Strings are used to describe (find) tokens, from which regexps with `gmu` flags are generated
|
|
(therefore, the backslash in these lines must be escaped, that is, it must be doubled).
|
|
|
|
For each parsed statement, the optional `onParse` callback is called with the context,
|
|
source code (string), and an array of tokens of statement
|
|
(and an array of comments between this token and the next one, if any).
|
|
|
|
If the sequence of tokens of statement has not completed, instead of the `onParse` callback,
|
|
an `onError` callback with the same signature is called, receiving an incomplete sequence
|
|
of parsed tokens of statement.
|
|
|
|
Similar optional callbacks can be set for comments.
|
|
|
|
Callbacks for statements (only for statements, not for comments) can return a number
|
|
instead of an `undefined` — then this number will be used as an index at the source code,
|
|
starting from which the parser will find the next statement.
|
|
|
|
In fact, this index will be interpreted as the end of the statement. By default,
|
|
the end of the statement coincides with the end of its last token,
|
|
but sometimes we may need to go beyond the boundaries of the found tokens
|
|
(or, conversely, reduce the length of the statement, that is, reduce its end index).
|
|
|
|
With such manual parsing, if we increase the index of the end of the statement,
|
|
we must remember to manually parse the comments that may appear
|
|
in this part of the statement — because the parser itself will not do this.
|
|
It will continue to work from the new end of the statement as usual.
|
|
|
|
## Basic example
|
|
|
|
Below is a simplified example ([see on TypeScript Playground](https://www.typescriptlang.org/play?#code/JYWwDg9gTgLgBAbwMZQKYEMaoArqgZ1QDEBXAOyRmAjIF84AzKCEOAcjD0IFp8ZNUIVGRj42AbgBQk0JFhwYATzCpEAeTIBhFkJEBRKMygAaOBu0hdMXAVSmNNwvSYt2nW735YrYqZKQ0fAoAFswA7gZGcAC8cAAUQvj46ADmqABccHxQwGQpAJSZAG4QwAAmMQB8iJJwIeFwZKhhcJHQCahJqaj5UrR+Sipw2iKoAB7wsQBKGGU0ADaKADwItXCohtD4meQA1mQQYWQA2gC6UnXjcqKZx1fQN1kwOXmmAHQfAZbCj9m5KWdTmcLnBZA9tnBjmDYJk-q84B83l8fLDnv9AcC1iASPMqPNcqgLCini8Aec1vh-vNUPimkSfhC4WS+pU-AEyEE0jB6SIAMoQEhQJCqWJxCkCoUZEn-YxrTjAKCZNDoOZkRaQhDCMqZMgkEAAIw2tFMCD4eBgOr1hqgtFOssK0ryVSyEuFb3w+OFcXlUGOAAZTm8taYfccAIyBs2wXrSdlBGg8mBtRVmLQ6H7JpYjLATaqigD67JzMFM+FddkQUZgtHyztWdRgoUOybiZcFbo9wC9VfyMf6scC8AT6ZEjil5hH1i4qCzNGLefipqpNIJifwxpd7YrmrIZQ3pv4sBrdYpy9phMn+DeYBI+GCrfL7s9qDiwae5t7fTZg7gNGTmQcadZ1GXMYkXDYjHXUty3eD4YAgXZhHwWtomqet1k2Ahr1ve820lJ8uxfeDEI5f1TgAQndQ8SwUBCkOOYikLeak8kbOBuDgCNKK1OAAGo4AAZj9T9JH7fwfz-MZrjHACyDHYDi1MASFzFOpNSk8FjXFLdZUuDTYF5ajdPWfSYAAGWAPg9F3Yz7lgayyntE86jjeA7JgQzzTXMD3M82AkUvAB+N4QHQMA4m9dAFRQ6ouUTfktwfHS4B9ESXJ-dyLL4bzYkyyyk13ALvhEfBgtC8LIui504snBLJSSyUQyiqARLWdyrxvO84mONY6jwjtn1fUy-JgIMbJM64HKoj9jLqREhuuEbvIAH2WyFTnyWaEQ+BaHiy7lLzgVb1s2tYNq-AcOSHMgAElwAeGTU3k7Nxho5SwNUxBoVELS+pgtZvpG4zvqIZgQGB+77JsyQYpqdKrtBSGPOonLEcW6iip8Mqwoi1LqtQA7io88sGuFJropjeGghBsHUZplhMYZbGKrx1C4Bqom6u7GCUua1q6m+jqcO63rN3wzsvUBjG32+qaey2+apa8w7jrOU66jmnb6ZAFa1rV4zztE78EfcQg7uufA9FM-AwJQDAsDHUgKCoGgFNeyo4nQ5EGVuUX0I138yH-QPE2TLa6hoR7hyJsdw9okiIWONgAB1k4AelTtO2FMNgABJU7IQK2DtUXfo1-2Nb-TDMmjqww9FiO5OnTJPexXFgHPNcN367ctX3KtjzZiuA7bvFV0vbCup7givTfHtKYDuAy4DxiOVuFP09TgAqbP2G3zPi62suS8b4O4kLOdXtMfMe9MRJkjSWHG3CFt7+6dX328H3IT9hu4CQdAt0KDzBIGUC8RMITPBIHYP+VcjBx0js3QO5sHrTjgOgG2gFbBx1XonNgAA9b6qd9S7w3vqFwIBiGkPzsnQuh8-53gFPMMovIMBCmCAAIVQAwaA4DiRQJgQHZecNF4AKAUgEBYC1yZAEQgoOmE5FRyDqZMc6DMFN2wX-XB698HuSoTnVOtBqEFyLifRejCcQsLYUgTh3DeHSIUFAaBR8DayhrMbamSNLbW0yC9CYYF0IQS2LcMx7UQkQwtuErEOIx50kvFEvqZ5x4QKiWJU2qAUGwG8RbT6gsrYW2MgAAxkEjRAPCIDODBuwN4adykSBKdcOA+o8CMCqWwGpzSoBsGkGnNOcAAAqwRLKghtugf+k5pDfTgGnLeLohDjKJnALefTOmtNcJ0vwbB0DzDLGjB4azWAbO6Q0-ZqyKHVLTs0gAXscqZYRgBsSCVAKQUzekLKsKCMgez5D3MbAKeAFC-DuTKRACp0zZnbN2d7EQ9BllSGBTM95PxPkTQeLCvpCBOm0EkIU+0HiIDUmYhAFIcQ8nWxjEAA))
|
|
of parsing `import` and `export` statements in ECMAScript
|
|
(a complete example can be found [here](https://github.com/joomcode/parse-imports-exports/blob/main/src/index.ts)):
|
|
|
|
```ts
|
|
import {createParseFunction} from 'parse-statements';
|
|
|
|
import type {OnCommentError, OnCommentParse, OnParse} from 'parse-statements';
|
|
|
|
const throwError = (message: string): void => {
|
|
throw new Error(message);
|
|
};
|
|
|
|
type Context = Readonly<{
|
|
errors: unknown[];
|
|
exports: [exports: string, ...comments: string[]][];
|
|
imports: [import: string, ...comments: string[]][];
|
|
multilineComments: string[];
|
|
singlelineComments: string[];
|
|
}>;
|
|
|
|
const getCommentSource = (
|
|
source: string,
|
|
pair: readonly [{end: number}, {start: number}],
|
|
): string => source.slice(pair[0].end, pair[1].start);
|
|
|
|
const onCommentError: OnCommentError<Context> = (_context, source, {start}) => {
|
|
throwError(source.slice(start));
|
|
};
|
|
|
|
const onCommentParse: OnCommentParse<Context> = ({singlelineComments}, source, {end}, {start}) => {
|
|
singlelineComments.push(source.slice(end, start));
|
|
};
|
|
|
|
const onError: OnParse<Context> = ({errors}, source, ...tokens) => {
|
|
errors.push(source.slice(tokens[0]!.start, tokens[tokens.length - 1]!.end + 30));
|
|
};
|
|
|
|
const onExportParse: OnParse<Context, 3> = (
|
|
{exports},
|
|
source,
|
|
exportStart,
|
|
exportListEnd,
|
|
exportEnd,
|
|
) => {
|
|
const exportStartComments = exportStart.comments?.map((pair) => getCommentSource(source, pair));
|
|
const exportListComments = exportListEnd.comments?.map((pair) => getCommentSource(source, pair));
|
|
|
|
exports.push([
|
|
source.slice(exportStart.end, exportEnd.start),
|
|
...(exportStartComments || []),
|
|
...(exportListComments || []),
|
|
]);
|
|
};
|
|
|
|
const onImportParse: OnParse<Context, 3> = (
|
|
{imports},
|
|
source,
|
|
importStart,
|
|
importFrom,
|
|
importEnd,
|
|
) => {
|
|
const importStartComments = importStart.comments?.map((pair) => getCommentSource(source, pair));
|
|
const importFromComments = importFrom.comments?.map((pair) => getCommentSource(source, pair));
|
|
|
|
imports.push([
|
|
source.slice(importStart.end, importEnd.start),
|
|
...(importStartComments || []),
|
|
...(importFromComments || []),
|
|
]);
|
|
};
|
|
|
|
const parseImportsExports = createParseFunction<Context>({
|
|
comments: [
|
|
{
|
|
onError: onCommentError,
|
|
onParse: onCommentParse,
|
|
tokens: ['\\/\\/', '$\\n?'],
|
|
},
|
|
{
|
|
onError: onCommentError,
|
|
onParse: ({multilineComments}, source, {end}, {start}) => {
|
|
multilineComments.push(source.slice(end, start));
|
|
},
|
|
tokens: ['\\/\\*', '\\*\\/'],
|
|
},
|
|
],
|
|
onError: (_context, _source, message) => throwError(message),
|
|
statements: [
|
|
{
|
|
canIncludeComments: true,
|
|
onError,
|
|
onParse: onImportParse as OnParse,
|
|
tokens: ['^import\\b', '\\bfrom\\b', '$\\n?'],
|
|
shouldSearchBeforeComments: true,
|
|
},
|
|
{
|
|
canIncludeComments: true,
|
|
onError,
|
|
onParse: onExportParse as OnParse,
|
|
tokens: ['^export\\b', '\\}', '$\\n?'],
|
|
shouldSearchBeforeComments: true,
|
|
},
|
|
],
|
|
});
|
|
|
|
const importsExports: Context = {
|
|
errors: [],
|
|
exports: [],
|
|
imports: [],
|
|
multilineComments: [],
|
|
singlelineComments: [],
|
|
};
|
|
|
|
parseImportsExports(
|
|
importsExports,
|
|
`
|
|
import {foo} from './foo';
|
|
import bar from './bar'
|
|
|
|
// This is a comment
|
|
|
|
import /* some comment */ bar from bar;
|
|
|
|
'also import from bar;'
|
|
|
|
import bar from './baz'
|
|
|
|
import with error;
|
|
import // comment in import without from;
|
|
|
|
export {foo} /* also comment} */;
|
|
export /* comment in export} */ {bar}
|
|
`,
|
|
);
|
|
|
|
console.log(importsExports);
|
|
```
|
|
|
|
## Install
|
|
|
|
Requires [node](https://nodejs.org/en/) version 10 or higher:
|
|
|
|
```sh
|
|
npm install parse-statements
|
|
```
|
|
|
|
`parse-statements` ✂️ works in any environment that supports ES2018
|
|
(because package uses [RegExp Named Capture Groups](https://github.com/tc39/proposal-regexp-named-groups)).
|
|
|
|
## API
|
|
|
|
`parse-statements` ✂️ exports one runtime value — the `createParseFunction` function:
|
|
|
|
```ts
|
|
import {createParseFunction} from 'parse-statements';
|
|
|
|
type Context = ...; // some type
|
|
|
|
const parse = createParseFunction<Context>(options);
|
|
|
|
const context: Context = ...;
|
|
|
|
parse(context, 'some source code (as string)');
|
|
```
|
|
|
|
The `options` object defines comments, statements, and a global error callback handler
|
|
(all of these fields are optional):
|
|
|
|
```ts
|
|
import type {Comment, OnGlobalError, Options, ParsedToken, Statement} from 'parse-statements';
|
|
|
|
const options: Options<Context> = {
|
|
comments, // an optional array of comments
|
|
onError, // an optional callback handler for global parsing errors
|
|
statements, // an optional array of statements
|
|
};
|
|
|
|
const comments: readonly Comment<Context>[] = [
|
|
{
|
|
onError(
|
|
context: Context,
|
|
source: string,
|
|
parsedToken: {start: number; end: number; match: RegExpExecArray; token: string},
|
|
) {
|
|
// An optional callback handler is called if, after the opening comment token,
|
|
// its closing token was not found.
|
|
// Parsing continues from the point immediately after the opening token.
|
|
},
|
|
onParse(
|
|
context: Context,
|
|
source: string,
|
|
openParsedToken: {start: number; end: number; match: RegExpExecArray; token: string},
|
|
closeParsedToken: {start: number; end: number; match: RegExpExecArray; token: string},
|
|
) {
|
|
// An optional callback handler of comment for putting something in context.
|
|
// The handler is called when the parsing of the comment is completed,
|
|
// that is, the parsing of the close comment token is completed.
|
|
// The handler receives opening parsed token and closing parsed token.
|
|
// Parsing continues from the point immediately after the closing token.
|
|
},
|
|
// Opening and closing tokens of comment
|
|
// (which are converted to regexps using the `RegExp` constructor).
|
|
tokens: ['open raw token', 'close raw token'],
|
|
},
|
|
];
|
|
|
|
const onError: OnGlobalError<Context> = (
|
|
context: Context,
|
|
source: string,
|
|
message: string,
|
|
index: number,
|
|
) => {
|
|
// An optional callback handler is called when there are global parsing errors.
|
|
};
|
|
|
|
const statements: readonly Statement<Context>[] = [
|
|
{
|
|
// If `true`, then we parse comments inside the statement (between its parts).
|
|
canIncludeComments: true,
|
|
onError(
|
|
context: Context,
|
|
source: string,
|
|
firstParsedToken: ParsedToken & {comments?: [ParsedToken, ParsedToken][]},
|
|
secondParsedToken: ParsedToken & {comments?: [ParsedToken, ParsedToken][]},
|
|
// ...,
|
|
lastParsedToken: ParsedToken,
|
|
) {
|
|
// An optional callback handler is called if parsing the statement failed, that is,
|
|
// parsing started with the first statement token, but some next token was not found.
|
|
// The handler receives all already parsed statement tokens.
|
|
// If there were comments between a token and its next token, they are passed
|
|
// to the parsed token object as a separate `comments` property
|
|
// (thus, the last parsed token cannot have comments).
|
|
// Parsing continues from the point immediately after the last parsed token.
|
|
},
|
|
onParse(
|
|
context: Context,
|
|
source: string,
|
|
firstParsedToken: ParsedToken & {comments?: [ParsedToken, ParsedToken][]},
|
|
secondParsedToken: ParsedToken & {comments?: [ParsedToken, ParsedToken][]},
|
|
// ...,
|
|
lastParsedToken: ParsedToken,
|
|
) {
|
|
// An optional callback handler of statement for putting something in context.
|
|
// The handler is called when the parsing of the statement is completed,
|
|
// that is, the parsing of the last statement token is completed.
|
|
// The handler receives all parsed statement tokens.
|
|
// If there were comments between a token and its next token, they are passed
|
|
// to the parsed token object as a separate `comments` property
|
|
// (thus, the last parsed token cannot have comments).
|
|
// Parsing continues from the point immediately after the last statement token.
|
|
},
|
|
// Not-empty array of statement raw tokens
|
|
// (which are converted to regexps using the `RegExp` constructor).
|
|
// A statement can have any positive number of tokens.
|
|
tokens: ['first raw token', 'second raw token'],
|
|
// If `true`, then the statement fisrt token is searched before the comment tokens,
|
|
// otherwise after. This can affect parsing because if several different tokens
|
|
// (first tokens of statements or opening comment tokens) are found
|
|
// at some position in the source, only the first one will be selected and parsed.
|
|
shouldSearchBeforeComments: true,
|
|
},
|
|
];
|
|
```
|
|
|
|
`parse-statements` ✂️ also exports all types included in the API:
|
|
|
|
```ts
|
|
export type {
|
|
/**
|
|
* Description of comment as the callback handlers and open and close tokens.
|
|
*/
|
|
Comment,
|
|
/**
|
|
* Pair of the comment open and close tokens (raw or parsed).
|
|
*/
|
|
CommentPair,
|
|
/**
|
|
* `onError` callback handler for error on comment parsing.
|
|
*/
|
|
OnCommentError,
|
|
/**
|
|
* `onParse` callback handler of comment.
|
|
*/
|
|
OnCommentParse,
|
|
/**
|
|
* Global `onError` callback handler for error on parsing.
|
|
*/
|
|
OnGlobalError,
|
|
/**
|
|
* `onParse` callback handler of statement with concrete length (number of tokens).
|
|
*/
|
|
OnParse,
|
|
/**
|
|
* Options of `createParseFunction` function.
|
|
*/
|
|
Options,
|
|
/**
|
|
* Parse function.
|
|
*/
|
|
Parse,
|
|
/**
|
|
* The result of parsing the token.
|
|
*/
|
|
ParsedToken,
|
|
/**
|
|
* Description of statement as the callback handlers and a sequence of tokens.
|
|
*/
|
|
Statement,
|
|
};
|
|
```
|
|
|
|
## License
|
|
|
|
[MIT][license-url]
|
|
|
|
[conventional-commits-image]: https://img.shields.io/badge/Conventional_Commits-1.0.0-yellow.svg 'The Conventional Commits specification'
|
|
[conventional-commits-url]: https://www.conventionalcommits.org/en/v1.0.0/
|
|
[dependencies-none-image]: https://img.shields.io/badge/dependencies-none-success.svg 'No dependencies'
|
|
[dependencies-none-url]: https://github.com/joomcode/parse-statements/blob/main/package.json
|
|
[license-image]: https://img.shields.io/badge/license-MIT-blue.svg 'The MIT License'
|
|
[license-url]: LICENSE
|
|
[npm-image]: https://img.shields.io/npm/v/parse-statements.svg 'parse-statements'
|
|
[npm-url]: https://www.npmjs.com/package/parse-statements
|
|
[prettier-image]: https://img.shields.io/badge/code_style-prettier-ff69b4.svg 'Prettier code formatter'
|
|
[prettier-url]: https://prettier.io/
|
|
[size-image]: https://img.shields.io/bundlephobia/minzip/parse-statements 'parse-statements'
|
|
[size-url]: https://bundlephobia.com/package/parse-statements
|