Skip to content

Commit 1324bc4

Browse files
committed
ParseXS: refactor: update POD and comments
Update the POD at the top of Node.pm and the code comments at the top of ParseXS.pm to reflect the changes in this branch which have extended the AST from representing just an XSUB to representing the whole XS file.
1 parent e93d277 commit 1324bc4

File tree

2 files changed

+109
-36
lines changed

2 files changed

+109
-36
lines changed

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,24 @@ use warnings;
99
# to be used for example by Module::Build without having to shell out to
1010
# xsubpp. It also makes it easier to test the individual components.
1111
#
12-
# The bulk of this file is taken up with the process_file() method which
13-
# does the whole job of reading in a .xs file and outputting a .c file. It
14-
# in turn relies on fetch_para() to read chunks of lines from the input,
15-
# and on various ExtUtils::ParseXS::Node::FOO::parse() methods which build
16-
# up an AST representing the parsed XS file. Then a bunch of as_code()
17-
# methods walk that tree, emitting C code.
12+
# The main function in this file is process_file(), which oversees the
13+
# whole job of reading in a .xs file, parsing it into an Abstract Syntax
14+
# Tree (AST), then walking the tree to generate C code and output it to a
15+
# .c file.
16+
#
17+
# Most of the actual logic is in the ExtUtils::ParseXS::Node::FOO
18+
# subclasses, which hold the nodes of the AST. The parse() methods of
19+
# these subclasses do a top-down recursive-descent parse of the input
20+
# file, building the AST; while the as_code() methods walk the tree,
21+
# emitting C code.
22+
#
23+
# The main parsing loop is contained in the Node::cpp_scope::parse()
24+
# method, which in turn relies on fetch_para() to read a paragraph's worth
25+
# of lines from the input while stripping out any POD or XS comments. It
26+
# is fetch_para() which decides where an XSUB, BOOT or TYPEMAP block ends,
27+
# mainly by using a blank line followed by character in column 1 as the
28+
# delimiter (except for TYPEMAP, where it looks for the matching EOF-style
29+
# string).
1830
#
1931
# The remainder of this file mainly consists of helper functions and
2032
# functions to help with outputting stuff.

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

Lines changed: 91 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -11,24 +11,20 @@ ExtUtils::ParseXS::Node - Classes for nodes of an Abstract Syntax Tree
1111
1212
=head1 SYNOPSIS
1313
14-
# Create a node to represent the Foo part of an XSUB; then
14+
# Create a node to represent the Foo part of an XS file; then
1515
# top-down parse it into a subtree; then top-down emit the
1616
# contents of the subtree as C code.
1717
1818
my $foo = ExtUtils::ParseXS::Node::Foo->new();
1919
$foo->parse(...)
2020
or die;
2121
$foo->as_code(...);
22-
$foo->as_concise(1);
22+
print STDERR $foo->as_concise(1); # for debugging
2323
2424
=head1 DESCRIPTION
2525
2626
This API is currently private and subject to change.
2727
28-
Node that as of May 2025, this is a Work In Progress. An AST is created
29-
for each parsed XSUB, but those nodes aren't yet linked into a
30-
higher-level tree representing the whole XS file.
31-
3228
The C<ExtUtils::ParseXS::Node> class, and its various subclasses, hold the
3329
state for the nodes of an Abstract Syntax Tree (AST), which represents the
3430
parsed state of an XS file.
@@ -51,29 +47,42 @@ children; however, both C<INPUT_line> and C<OUTPUT_line> have an
5147
C<ioparam> field which points to the C<IO_Param> object associated with
5248
this line, which is located elsewhere in the tree.
5349
54-
The various C<foo_part> nodes divide the parsing of the main body of the
50+
The various C<foo_part> nodes divide the parsing of the main body of an
5551
XSUB into sections where different sets of keywords are allowable, and
5652
where various bits of code can be conveniently emitted.
5753
5854
=head2 Methods
5955
60-
There are two main methods, in addition to new(), which are present in all
61-
subclasses. First, parse() consumes lines from the source to satisfy the
62-
construct being parsed. It may itself create objects of lower-level
63-
constructs and call parse on them. For example, C<Node::xbody::parse()>
64-
may create a C<Node::input_part> node and call parse() on it, which will
65-
create C<Node::INPUT> or C<Node::PREINIT> nodes as appropriate, and so on.
56+
There are two main methods in addition to C<new()>, which are present in
57+
all subclasses. First, C<parse()> consumes lines from the source to
58+
satisfy the construct being parsed. It may itself create objects of
59+
lower-level constructs and call parse on them. For example,
60+
C<Node::xbody::parse()> may create a C<Node::input_part> node and call
61+
C<parse()> on it, which will create C<Node::INPUT> or C<Node::PREINIT>
62+
nodes as appropriate, and so on.
63+
64+
Secondly, C<as_code()> descends its sub-tree, outputting the tree as C
65+
code.
66+
67+
The C<as_concise()> method returns a line-per-node string representation
68+
of the node and any children. Most node classes just inherit this method
69+
from the base C<Node> class. It is intended mainly for debugging.
6670
67-
Secondly, as_code() descends its sub-tree, outputting the tree as C code.
71+
Some nodes also have an C<as_boot_code()> method for adding any code to
72+
the boot XSUB. This returns two array refs, one containing a list of code
73+
lines to be inserted early into the boot XSUB, and a second for later
74+
lines.
6875
69-
Some nodes also have an as_boot_code() method for adding any code to
70-
the boot XSUB. This returns two array refs, one containing a list of code lines to be inserted early into the boot XSUB, and a second for later lines.
76+
Finally, in the IO_Param subclass, C<as_code()> is replaced with
77+
C<as_input_code> and C<as_output_code()>, since that node may need to
78+
generate I<two> sets of C code; one to assign a Perl argument to a C
79+
variable, and the other to return the value of a variable to Perl.
7180
7281
Note that parsing and code-generation are done as two separate phases;
73-
parse() should only build a tree and never emit code.
82+
C<parse()> should only build a tree and never emit code.
7483
75-
In addition to C<$self>, both these methods are always provided with
76-
these three parameters:
84+
In addition to C<$self>, methods may commonly have some of these
85+
parameters:
7786
7887
=over
7988
@@ -85,22 +94,21 @@ lines read in from the source file for the current paragraph.
8594
8695
=item C<$xsub>
8796
88-
The current C<ExtUtils::ParseXS::xsub> node being processed.
97+
For nodes related to parsing an XSUB, the current
98+
C<ExtUtils::ParseXS::xsub> node being processed.
8999
90100
=item C<$xbody>
91101
92-
The current C<ExtUtils::ParseXS::xbody> node being processed. Note that
93-
in the presence of a C<CASE> keyword, an XSUB can have multiple bodies.
102+
For nodes related to parsing an XSUB, the current
103+
C<ExtUtils::ParseXS::xbody> node being processed. Note that in the
104+
presence of a C<CASE> keyword, an XSUB can have multiple bodies.
94105
95106
=back
96107
97-
The parse() and as_code() methods for some subclasses may have additional
98-
parameters.
108+
The C<parse()> and C<as_code()> methods for some subclasses may have
109+
parameters in addition to those.
99110
100-
Some subclasses may have additional helper methods.
101-
102-
The as_concise() method returns a line-per-node string representation of
103-
the node and any children. It is intended mainly for debugging.
111+
Some subclasses may also have additional helper methods.
104112
105113
=head2 Class Hierachy
106114
@@ -112,6 +120,18 @@ next keyword, and emit that code, possibly wrapped in C<#line> directives.
112120
This common behaviour is provided by the C<codeblock> class.
113121
114122
Node
123+
XS_file
124+
preamble
125+
C_part
126+
C_part_POD
127+
C_part_code
128+
C_part_postamble
129+
cpp_scope
130+
global_cpp_line
131+
BOOT
132+
TYPEMAP
133+
pre_boot
134+
boot_xsub
115135
xsub
116136
xsub_decl
117137
ReturnType
@@ -126,6 +146,12 @@ This common behaviour is provided by the C<codeblock> class.
126146
cleanup_part
127147
autocall
128148
oneline
149+
MODULE
150+
REQUIRE
151+
FALLBACK
152+
include
153+
INCLUDE
154+
INCLUDE_COMMAND
129155
NOT_IMPLEMENTED_YET
130156
CASE
131157
enable
@@ -160,9 +186,44 @@ This common behaviour is provided by the C<codeblock> class.
160186
161187
=head2 Abstract Syntax Tree structure
162188
189+
A typical XS file might compile to a tree with a node structure similar to
190+
the following. Note that this is unrelated to the inheritance hierarchy
191+
shown above. In this example, the XS file includes another file, and has a
192+
couple of XSUBs within a C<#if/#else/#endif>. Note that a C<cpp_scope>
193+
node is the parent of all the nodes within the same branch of an C<#if>,
194+
or in the absence of C<#if>, within the same file.
195+
196+
XS_file
197+
preamble
198+
C_part
199+
C_part_POD
200+
C_part_code
201+
C_part_postamble
202+
cpp_scope: type="main"
203+
MODULE
204+
PROTOTYPES
205+
BOOT
206+
TYPEMAP
207+
INCLUDE
208+
cpp_scope: type="include"
209+
xsub
210+
...
211+
global_cpp_line: directive="ifdef"
212+
cpp_scope: type="if"
213+
xsub
214+
...
215+
global_cpp_line: directive="else"
216+
cpp_scope: type="if"
217+
xsub
218+
...
219+
global_cpp_line: directive="endif"
220+
xsub
221+
...
222+
pre_boot
223+
boot_xsub
224+
163225
A typical XSUB might compile to a tree with a structure similar to the
164-
following. Note that this is unrelated to the inheritance hierarchy
165-
shown above.
226+
following.
166227
167228
xsub
168229
xsub_decl

0 commit comments

Comments
 (0)