jcs's openbsd hax
openbsd
1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5 <head>
6 <!--
7 __ __ _
8 ___\ \/ /_ __ __ _| |_
9 / _ \\ /| '_ \ / _` | __|
10 | __// \| |_) | (_| | |_
11 \___/_/\_\ .__/ \__,_|\__|
12 |_| XML parser
13
14 Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net>
15 Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
16 Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net>
17 Copyright (c) 2017-2026 Sebastian Pipping <sebastian@pipping.org>
18 Copyright (c) 2017 Jakub Wilk <jwilk@jwilk.net>
19 Copyright (c) 2021 Tomas Korbar <tkorbar@redhat.com>
20 Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
21 Copyright (c) 2022 Thijs Schreijer <thijs@thijsschreijer.nl>
22 Copyright (c) 2023-2025 Hanno Böck <hanno@gentoo.org>
23 Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com>
24 Licensed under the MIT license:
25
26 Permission is hereby granted, free of charge, to any person obtaining
27 a copy of this software and associated documentation files (the
28 "Software"), to deal in the Software without restriction, including
29 without limitation the rights to use, copy, modify, merge, publish,
30 distribute, sublicense, and/or sell copies of the Software, and to permit
31 persons to whom the Software is furnished to do so, subject to the
32 following conditions:
33
34 The above copyright notice and this permission notice shall be included
35 in all copies or substantial portions of the Software.
36
37 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
38 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
39 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
40 NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
41 DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
42 OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
43 USE OR OTHER DEALINGS IN THE SOFTWARE.
44-->
45
46 <title>
47 Expat XML Parser
48 </title>
49 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" />
50 <link href="ok.min.css" rel="stylesheet" />
51 <link href="style.css" rel="stylesheet" />
52 </head>
53 <body>
54 <div>
55 <h1>
56 The Expat XML Parser <small>Release 2.7.4</small>
57 </h1>
58 </div>
59
60 <div class="content">
61 <p>
62 Expat is a library, written in C, for parsing XML documents. It's the underlying
63 XML parser for the open source Mozilla project, Perl's <code>XML::Parser</code>,
64 Python's <code>xml.parsers.expat</code>, and other open-source XML parsers.
65 </p>
66
67 <p>
68 This library is the creation of James Clark, who's also given us groff (an nroff
69 look-alike), Jade (an implementation of ISO's DSSSL stylesheet language for
70 SGML), XP (a Java XML parser package), XT (a Java XSL engine). James was also the
71 technical lead on the XML Working Group at W3C that produced the XML
72 specification.
73 </p>
74
75 <p>
76 This is free software, licensed under the <a href="../COPYING">MIT/X Consortium
77 license</a>. You may download it from <a href="https://libexpat.github.io/">the
78 Expat home page</a>.
79 </p>
80
81 <p>
82 The bulk of this document was originally commissioned as an article by <a href=
83 "https://www.xml.com/">XML.com</a>. They graciously allowed Clark Cooper to
84 retain copyright and to distribute it with Expat. This version has been
85 substantially extended to include documentation on features which have been added
86 since the original article was published, and additional information on using the
87 original interface.
88 </p>
89
90 <hr />
91
92 <h2>
93 Table of Contents
94 </h2>
95
96 <ul>
97 <li>
98 <a href="#overview">Overview</a>
99 </li>
100
101 <li>
102 <a href="#building">Building and Installing</a>
103 </li>
104
105 <li>
106 <a href="#using">Using Expat</a>
107 </li>
108
109 <li>
110 <a href="#reference">Reference</a>
111 <ul>
112 <li>
113 <a href="#creation">Parser Creation Functions</a>
114 <ul>
115 <li>
116 <a href="#XML_ParserCreate">XML_ParserCreate</a>
117 </li>
118
119 <li>
120 <a href="#XML_ParserCreateNS">XML_ParserCreateNS</a>
121 </li>
122
123 <li>
124 <a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a>
125 </li>
126
127 <li>
128 <a href=
129 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a>
130 </li>
131
132 <li>
133 <a href="#XML_ParserFree">XML_ParserFree</a>
134 </li>
135
136 <li>
137 <a href="#XML_ParserReset">XML_ParserReset</a>
138 </li>
139 </ul>
140 </li>
141
142 <li>
143 <a href="#parsing">Parsing Functions</a>
144 <ul>
145 <li>
146 <a href="#XML_Parse">XML_Parse</a>
147 </li>
148
149 <li>
150 <a href="#XML_ParseBuffer">XML_ParseBuffer</a>
151 </li>
152
153 <li>
154 <a href="#XML_GetBuffer">XML_GetBuffer</a>
155 </li>
156
157 <li>
158 <a href="#XML_StopParser">XML_StopParser</a>
159 </li>
160
161 <li>
162 <a href="#XML_ResumeParser">XML_ResumeParser</a>
163 </li>
164
165 <li>
166 <a href="#XML_GetParsingStatus">XML_GetParsingStatus</a>
167 </li>
168 </ul>
169 </li>
170
171 <li>
172 <a href="#setting">Handler Setting Functions</a>
173 <ul>
174 <li>
175 <a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a>
176 </li>
177
178 <li>
179 <a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a>
180 </li>
181
182 <li>
183 <a href="#XML_SetElementHandler">XML_SetElementHandler</a>
184 </li>
185
186 <li>
187 <a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a>
188 </li>
189
190 <li>
191 <a href=
192 "#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a>
193 </li>
194
195 <li>
196 <a href="#XML_SetCommentHandler">XML_SetCommentHandler</a>
197 </li>
198
199 <li>
200 <a href=
201 "#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a>
202 </li>
203
204 <li>
205 <a href=
206 "#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a>
207 </li>
208
209 <li>
210 <a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a>
211 </li>
212
213 <li>
214 <a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a>
215 </li>
216
217 <li>
218 <a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a>
219 </li>
220
221 <li>
222 <a href=
223 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a>
224 </li>
225
226 <li>
227 <a href=
228 "#XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</a>
229 </li>
230
231 <li>
232 <a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a>
233 </li>
234
235 <li>
236 <a href=
237 "#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a>
238 </li>
239
240 <li>
241 <a href=
242 "#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a>
243 </li>
244
245 <li>
246 <a href=
247 "#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a>
248 </li>
249
250 <li>
251 <a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a>
252 </li>
253
254 <li>
255 <a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a>
256 </li>
257
258 <li>
259 <a href=
260 "#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a>
261 </li>
262
263 <li>
264 <a href=
265 "#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a>
266 </li>
267
268 <li>
269 <a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a>
270 </li>
271
272 <li>
273 <a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a>
274 </li>
275
276 <li>
277 <a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a>
278 </li>
279
280 <li>
281 <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a>
282 </li>
283
284 <li>
285 <a href=
286 "#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a>
287 </li>
288
289 <li>
290 <a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a>
291 </li>
292
293 <li>
294 <a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a>
295 </li>
296 </ul>
297 </li>
298
299 <li>
300 <a href="#position">Parse Position and Error Reporting Functions</a>
301 <ul>
302 <li>
303 <a href="#XML_GetErrorCode">XML_GetErrorCode</a>
304 </li>
305
306 <li>
307 <a href="#XML_ErrorString">XML_ErrorString</a>
308 </li>
309
310 <li>
311 <a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a>
312 </li>
313
314 <li>
315 <a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a>
316 </li>
317
318 <li>
319 <a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a>
320 </li>
321
322 <li>
323 <a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a>
324 </li>
325
326 <li>
327 <a href="#XML_GetInputContext">XML_GetInputContext</a>
328 </li>
329 </ul>
330 </li>
331
332 <li>
333 <a href="#attack-protection">Attack Protection</a>
334 <ul>
335 <li>
336 <a href=
337 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a>
338 </li>
339
340 <li>
341 <a href=
342 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a>
343 </li>
344
345 <li>
346 <a href=
347 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a>
348 </li>
349
350 <li>
351 <a href=
352 "#XML_SetAllocTrackerActivationThreshold">XML_SetAllocTrackerActivationThreshold</a>
353 </li>
354
355 <li>
356 <a href=
357 "#XML_SetReparseDeferralEnabled">XML_SetReparseDeferralEnabled</a>
358 </li>
359 </ul>
360 </li>
361
362 <li>
363 <a href="#miscellaneous">Miscellaneous Functions</a>
364 <ul>
365 <li>
366 <a href="#XML_SetUserData">XML_SetUserData</a>
367 </li>
368
369 <li>
370 <a href="#XML_GetUserData">XML_GetUserData</a>
371 </li>
372
373 <li>
374 <a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a>
375 </li>
376
377 <li>
378 <a href="#XML_SetBase">XML_SetBase</a>
379 </li>
380
381 <li>
382 <a href="#XML_GetBase">XML_GetBase</a>
383 </li>
384
385 <li>
386 <a href=
387 "#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a>
388 </li>
389
390 <li>
391 <a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a>
392 </li>
393
394 <li>
395 <a href="#XML_GetAttributeInfo">XML_GetAttributeInfo</a>
396 </li>
397
398 <li>
399 <a href="#XML_SetEncoding">XML_SetEncoding</a>
400 </li>
401
402 <li>
403 <a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a>
404 </li>
405
406 <li>
407 <a href="#XML_SetHashSalt">XML_SetHashSalt</a>
408 </li>
409
410 <li>
411 <a href="#XML_UseForeignDTD">XML_UseForeignDTD</a>
412 </li>
413
414 <li>
415 <a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a>
416 </li>
417
418 <li>
419 <a href="#XML_DefaultCurrent">XML_DefaultCurrent</a>
420 </li>
421
422 <li>
423 <a href="#XML_ExpatVersion">XML_ExpatVersion</a>
424 </li>
425
426 <li>
427 <a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a>
428 </li>
429
430 <li>
431 <a href="#XML_GetFeatureList">XML_GetFeatureList</a>
432 </li>
433
434 <li>
435 <a href="#XML_FreeContentModel">XML_FreeContentModel</a>
436 </li>
437
438 <li>
439 <a href="#XML_MemMalloc">XML_MemMalloc</a>
440 </li>
441
442 <li>
443 <a href="#XML_MemRealloc">XML_MemRealloc</a>
444 </li>
445
446 <li>
447 <a href="#XML_MemFree">XML_MemFree</a>
448 </li>
449 </ul>
450 </li>
451 </ul>
452 </li>
453 </ul>
454
455 <hr />
456
457 <h2>
458 <a id="overview" name="overview">Overview</a>
459 </h2>
460
461 <p>
462 Expat is a stream-oriented parser. You register callback (or handler) functions
463 with the parser and then start feeding it the document. As the parser recognizes
464 parts of the document, it will call the appropriate handler for that part (if
465 you've registered one.) The document is fed to the parser in pieces, so you can
466 start parsing before you have all the document. This also allows you to parse
467 really huge documents that won't fit into memory.
468 </p>
469
470 <p>
471 Expat can be intimidating due to the many kinds of handlers and options you can
472 set. But you only need to learn four functions in order to do 90% of what you'll
473 want to do with it:
474 </p>
475
476 <dl>
477 <dt>
478 <code><a href="#XML_ParserCreate">XML_ParserCreate</a></code>
479 </dt>
480
481 <dd>
482 Create a new parser object.
483 </dd>
484
485 <dt>
486 <code><a href="#XML_SetElementHandler">XML_SetElementHandler</a></code>
487 </dt>
488
489 <dd>
490 Set handlers for start and end tags.
491 </dd>
492
493 <dt>
494 <code><a href=
495 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code>
496 </dt>
497
498 <dd>
499 Set handler for text.
500 </dd>
501
502 <dt>
503 <code><a href="#XML_Parse">XML_Parse</a></code>
504 </dt>
505
506 <dd>
507 Pass a buffer full of document to the parser
508 </dd>
509 </dl>
510
511 <p>
512 These functions and others are described in the <a href=
513 "#reference">reference</a> part of this document. The reference section also
514 describes in detail the parameters passed to the different types of handlers.
515 </p>
516
517 <p>
518 Let's look at a very simple example program that only uses 3 of the above
519 functions (it doesn't need to set a character handler.) The program <a href=
520 "../examples/outline.c">outline.c</a> prints an element outline, indenting child
521 elements to distinguish them from the parent element that contains them. The
522 start handler does all the work. It prints two indenting spaces for every level
523 of ancestor elements, then it prints the element and attribute information.
524 Finally it increments the global <code>Depth</code> variable.
525 </p>
526
527 <pre class="eg">
528int Depth;
529
530void XMLCALL
531start(void *data, const char *el, const char **attr) {
532 int i;
533
534 for (i = 0; i < Depth; i++)
535 printf(" ");
536
537 printf("%s", el);
538
539 for (i = 0; attr[i]; i += 2) {
540 printf(" %s='%s'", attr[i], attr[i + 1]);
541 }
542
543 printf("\n");
544 Depth++;
545} /* End of start handler */
546</pre>
547 <p>
548 The end tag simply does the bookkeeping work of decrementing <code>Depth</code>.
549 </p>
550
551 <pre class="eg">
552void XMLCALL
553end(void *data, const char *el) {
554 Depth--;
555} /* End of end handler */
556</pre>
557 <p>
558 Note the <code>XMLCALL</code> annotation used for the callbacks. This is used to
559 ensure that the Expat and the callbacks are using the same calling convention in
560 case the compiler options used for Expat itself and the client code are
561 different. Expat tries not to care what the default calling convention is, though
562 it may require that it be compiled with a default convention of "cdecl" on some
563 platforms. For code which uses Expat, however, the calling convention is
564 specified by the <code>XMLCALL</code> annotation on most platforms; callbacks
565 should be defined using this annotation.
566 </p>
567
568 <p>
569 The <code>XMLCALL</code> annotation was added in Expat 1.95.7, but existing
570 working Expat applications don't need to add it (since they are already using the
571 "cdecl" calling convention, or they wouldn't be working). The annotation is only
572 needed if the default calling convention may be something other than "cdecl". To
573 use the annotation safely with older versions of Expat, you can conditionally
574 define it <em>after</em> including Expat's header file:
575 </p>
576
577 <pre class="eg">
578#include <expat.h>
579
580#ifndef XMLCALL
581#if defined(_MSC_VER) && !defined(__BEOS__) && !defined(__CYGWIN__)
582#define XMLCALL __cdecl
583#elif defined(__GNUC__)
584#define XMLCALL __attribute__((cdecl))
585#else
586#define XMLCALL
587#endif
588#endif
589</pre>
590 <p>
591 After creating the parser, the main program just has the job of shoveling the
592 document to the parser so that it can do its work.
593 </p>
594
595 <hr />
596
597 <h2>
598 <a id="building" name="building">Building and Installing Expat</a>
599 </h2>
600
601 <p>
602 The Expat distribution comes as a compressed (with GNU gzip) tar file. You may
603 download the latest version from <a href=
604 "https://sourceforge.net/projects/expat/">Source Forge</a>. After unpacking this,
605 cd into the directory. Then follow either the Win32 directions or Unix directions
606 below.
607 </p>
608
609 <h3>
610 Building under Win32
611 </h3>
612
613 <p>
614 If you're using the GNU compiler under cygwin, follow the Unix directions in the
615 next section. Otherwise if you have Microsoft's Developer Studio installed, you
616 can use CMake to generate a <code>.sln</code> file, e.g. <code>cmake -G"Visual
617 Studio 17 2022" -DCMAKE_BUILD_TYPE=RelWithDebInfo .</code> , and build Expat
618 using <code>msbuild /m expat.sln</code> after.
619 </p>
620
621 <p>
622 Alternatively, you may download the Win32 binary package that contains the
623 "expat.h" include file and a pre-built DLL.
624 </p>
625
626 <h3>
627 Building under Unix (or GNU)
628 </h3>
629
630 <p>
631 First you'll need to run the configure shell script in order to configure the
632 Makefiles and headers for your system.
633 </p>
634
635 <p>
636 If you're happy with all the defaults that configure picks for you, and you have
637 permission on your system to install into /usr/local, you can install Expat with
638 this sequence of commands:
639 </p>
640
641 <pre class="eg">
642./configure
643make
644make install
645</pre>
646 <p>
647 There are some options that you can provide to this script, but the only one
648 we'll mention here is the <code>--prefix</code> option. You can find out all the
649 options available by running configure with just the <code>--help</code> option.
650 </p>
651
652 <p>
653 By default, the configure script sets things up so that the library gets
654 installed in <code>/usr/local/lib</code> and the associated header file in
655 <code>/usr/local/include</code>. But if you were to give the option,
656 <code>--prefix=/home/me/mystuff</code>, then the library and header would get
657 installed in <code>/home/me/mystuff/lib</code> and
658 <code>/home/me/mystuff/include</code> respectively.
659 </p>
660
661 <h3>
662 Configuring Expat Using the Pre-Processor
663 </h3>
664
665 <p>
666 Expat's feature set can be configured using a small number of pre-processor
667 definitions. The symbols are:
668 </p>
669
670 <dl class="cpp-symbols">
671 <dt>
672 <a id="XML_GE" name="XML_GE">XML_GE</a>
673 </dt>
674
675 <dd>
676 Added in Expat 2.6.0. Include support for <a href=
677 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-physical-struct">general
678 entities</a> (syntax <code>&e1;</code> to reference and syntax
679 <code><!ENTITY e1 'value1'></code> (an internal general entity) or
680 <code><!ENTITY e2 SYSTEM 'file2'></code> (an external general entity) to
681 declare). With <code>XML_GE</code> enabled, general entities will be replaced
682 by their declared replacement text; for this to work for <em>external</em>
683 general entities, in addition an <code><a href=
684 "#XML_SetExternalEntityRefHandler">XML_ExternalEntityRefHandler</a></code> must
685 be set using <code><a href=
686 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>.
687 Also, enabling <code>XML_GE</code> makes the functions <code><a href=
688 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code>
689 and <code><a href=
690 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code>
691 available.<br />
692 With <code>XML_GE</code> disabled, Expat has a smaller memory footprint and can
693 be faster, but will not load external general entities and will replace all
694 general entities (except the <a href=
695 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">predefined
696 five</a>: <code>amp</code>, <code>apos</code>, <code>gt</code>,
697 <code>lt</code>, <code>quot</code>) with a self-reference: for example,
698 referencing an entity <code>e1</code> via <code>&e1;</code> will be
699 replaced by text <code>&e1;</code>.
700 </dd>
701
702 <dt>
703 <a id="XML_DTD" name="XML_DTD">XML_DTD</a>
704 </dt>
705
706 <dd>
707 Include support for using and reporting DTD-based content. If this is defined,
708 default attribute values from an external DTD subset are reported and attribute
709 value normalization occurs based on the type of attributes defined in the
710 external subset. Without this, Expat has a smaller memory footprint and can be
711 faster, but will not load external parameter entities or process conditional
712 sections. If defined, makes the functions <code><a href=
713 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code>
714 and <code><a href=
715 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code>
716 available.
717 </dd>
718
719 <dt>
720 <a id="XML_NS" name="XML_NS">XML_NS</a>
721 </dt>
722
723 <dd>
724 When defined, support for the <cite><a href=
725 "https://www.w3.org/TR/REC-xml-names/">Namespaces in XML</a></cite>
726 specification is included.
727 </dd>
728
729 <dt>
730 <a id="XML_UNICODE" name="XML_UNICODE">XML_UNICODE</a>
731 </dt>
732
733 <dd>
734 When defined, character data reported to the application is encoded in UTF-16
735 using wide characters of the type <code>XML_Char</code>. This is implied if
736 <code>XML_UNICODE_WCHAR_T</code> is defined.
737 </dd>
738
739 <dt>
740 <a id="XML_UNICODE_WCHAR_T" name="XML_UNICODE_WCHAR_T">XML_UNICODE_WCHAR_T</a>
741 </dt>
742
743 <dd>
744 If defined, causes the <code>XML_Char</code> character type to be defined using
745 the <code>wchar_t</code> type; otherwise, <code>unsigned short</code> is used.
746 Defining this implies <code>XML_UNICODE</code>.
747 </dd>
748
749 <dt>
750 <a id="XML_LARGE_SIZE" name="XML_LARGE_SIZE">XML_LARGE_SIZE</a>
751 </dt>
752
753 <dd>
754 If defined, causes the <code>XML_Size</code> and <code>XML_Index</code> integer
755 types to be at least 64 bits in size. This is intended to support processing of
756 very large input streams, where the return values of <code><a href=
757 "#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></code>, <code><a href=
758 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and
759 <code><a href=
760 "#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code> could
761 overflow. It may not be supported by all compilers, and is turned off by
762 default.
763 </dd>
764
765 <dt>
766 <a id="XML_CONTEXT_BYTES" name="XML_CONTEXT_BYTES">XML_CONTEXT_BYTES</a>
767 </dt>
768
769 <dd>
770 The number of input bytes of markup context which the parser will ensure are
771 available for reporting via <code><a href=
772 "#XML_GetInputContext">XML_GetInputContext</a></code>. This is normally set to
773 1024, and must be set to a positive integer to enable. If this is set to zero,
774 the input context will not be available and <code><a href=
775 "#XML_GetInputContext">XML_GetInputContext</a></code> will always report
776 <code>NULL</code>. Without this, Expat has a smaller memory footprint and can
777 be faster.
778 </dd>
779
780 <dt>
781 <a id="XML_STATIC" name="XML_STATIC">XML_STATIC</a>
782 </dt>
783
784 <dd>
785 On Windows, this should be set if Expat is going to be linked statically with
786 the code that calls it; this is required to get all the right MSVC magic
787 annotations correct. This is ignored on other platforms.
788 </dd>
789
790 <dt>
791 <a id="XML_ATTR_INFO" name="XML_ATTR_INFO">XML_ATTR_INFO</a>
792 </dt>
793
794 <dd>
795 If defined, makes the additional function <code><a href=
796 "#XML_GetAttributeInfo">XML_GetAttributeInfo</a></code> available for reporting
797 attribute byte offsets.
798 </dd>
799 </dl>
800
801 <hr />
802
803 <h2>
804 <a id="using" name="using">Using Expat</a>
805 </h2>
806
807 <h3>
808 Compiling and Linking Against Expat
809 </h3>
810
811 <p>
812 Unless you installed Expat in a location not expected by your compiler and
813 linker, all you have to do to use Expat in your programs is to include the Expat
814 header (<code>#include <expat.h></code>) in your files that make calls to
815 it and to tell the linker that it needs to link against the Expat library. On
816 Unix systems, this would usually be done with the <code>-lexpat</code> argument.
817 Otherwise, you'll need to tell the compiler where to look for the Expat header
818 and the linker where to find the Expat library. You may also need to take steps
819 to tell the operating system where to find this library at run time.
820 </p>
821
822 <p>
823 On a Unix-based system, here's what a Makefile might look like when Expat is
824 installed in a standard location:
825 </p>
826
827 <pre class="eg">
828CC=cc
829LDFLAGS=
830LIBS= -lexpat
831xmlapp: xmlapp.o
832 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS)
833</pre>
834 <p>
835 If you installed Expat in, say, <code>/home/me/mystuff</code>, then the Makefile
836 would look like this:
837 </p>
838
839 <pre class="eg">
840CC=cc
841CFLAGS= -I/home/me/mystuff/include
842LDFLAGS=
843LIBS= -L/home/me/mystuff/lib -lexpat
844xmlapp: xmlapp.o
845 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS)
846</pre>
847 <p>
848 You'd also have to set the environment variable <code>LD_LIBRARY_PATH</code> to
849 <code>/home/me/mystuff/lib</code> (or to
850 <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if LD_LIBRARY_PATH already
851 has some directories in it) in order to run your application.
852 </p>
853
854 <h3>
855 Expat Basics
856 </h3>
857
858 <p>
859 As we saw in the example in the overview, the first step in parsing an XML
860 document with Expat is to create a parser object. There are <a href=
861 "#creation">three functions</a> in the Expat API for creating a parser object.
862 However, only two of these (<code><a href=
863 "#XML_ParserCreate">XML_ParserCreate</a></code> and <code><a href=
864 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>) can be used for constructing
865 a parser for a top-level document. The object returned by these functions is an
866 opaque pointer (i.e. "expat.h" declares it as void *) to data with further
867 internal structure. In order to free the memory associated with this object you
868 must call <code><a href="#XML_ParserFree">XML_ParserFree</a></code>. Note that if
869 you have provided any <a href="#userdata">user data</a> that gets stored in the
870 parser, then your application is responsible for freeing it prior to calling
871 <code>XML_ParserFree</code>.
872 </p>
873
874 <p>
875 The objects returned by the parser creation functions are good for parsing only
876 one XML document or external parsed entity. If your application needs to parse
877 many XML documents, then it needs to create a parser object for each one. The
878 best way to deal with this is to create a higher level object that contains all
879 the default initialization you want for your parser objects.
880 </p>
881
882 <p>
883 Walking through a document hierarchy with a stream oriented parser will require a
884 good stack mechanism in order to keep track of current context. For instance, to
885 answer the simple question, "What element does this text belong to?" requires a
886 stack, since the parser may have descended into other elements that are children
887 of the current one and has encountered this text on the way out.
888 </p>
889
890 <p>
891 The things you're likely to want to keep on a stack are the currently opened
892 element and it's attributes. You push this information onto the stack in the
893 start handler and you pop it off in the end handler.
894 </p>
895
896 <p>
897 For some tasks, it is sufficient to just keep information on what the depth of
898 the stack is (or would be if you had one.) The outline program shown above
899 presents one example. Another such task would be skipping over a complete
900 element. When you see the start tag for the element you want to skip, you set a
901 skip flag and record the depth at which the element started. When the end tag
902 handler encounters the same depth, the skipped element has ended and the flag may
903 be cleared. If you follow the convention that the root element starts at 1, then
904 you can use the same variable for skip flag and skip depth.
905 </p>
906
907 <pre class="eg">
908void
909init_info(Parseinfo *info) {
910 info->skip = 0;
911 info->depth = 1;
912 /* Other initializations here */
913} /* End of init_info */
914
915void XMLCALL
916rawstart(void *data, const char *el, const char **attr) {
917 Parseinfo *inf = (Parseinfo *) data;
918
919 if (! inf->skip) {
920 if (should_skip(inf, el, attr)) {
921 inf->skip = inf->depth;
922 }
923 else
924 start(inf, el, attr); /* This does rest of start handling */
925 }
926
927 inf->depth++;
928} /* End of rawstart */
929
930void XMLCALL
931rawend(void *data, const char *el) {
932 Parseinfo *inf = (Parseinfo *) data;
933
934 inf->depth--;
935
936 if (! inf->skip)
937 end(inf, el); /* This does rest of end handling */
938
939 if (inf->skip == inf->depth)
940 inf->skip = 0;
941} /* End rawend */
942</pre>
943 <p>
944 Notice in the above example the difference in how depth is manipulated in the
945 start and end handlers. The end tag handler should be the mirror image of the
946 start tag handler. This is necessary to properly model containment. Since, in the
947 start tag handler, we incremented depth <em>after</em> the main body of start tag
948 code, then in the end handler, we need to manipulate it <em>before</em> the main
949 body. If we'd decided to increment it first thing in the start handler, then we'd
950 have had to decrement it last thing in the end handler.
951 </p>
952
953 <h3 id="userdata">
954 Communicating between handlers
955 </h3>
956
957 <p>
958 In order to be able to pass information between different handlers without using
959 globals, you'll need to define a data structure to hold the shared variables. You
960 can then tell Expat (with the <code><a href=
961 "#XML_SetUserData">XML_SetUserData</a></code> function) to pass a pointer to this
962 structure to the handlers. This is the first argument received by most handlers.
963 In the <a href="#reference">reference section</a>, an argument to a callback
964 function is named <code>userData</code> and have type <code>void *</code> if the
965 user data is passed; it will have the type <code>XML_Parser</code> if the parser
966 itself is passed. When the parser is passed, the user data may be retrieved using
967 <code><a href="#XML_GetUserData">XML_GetUserData</a></code>.
968 </p>
969
970 <p>
971 One common case where multiple calls to a single handler may need to communicate
972 using an application data structure is the case when content passed to the
973 character data handler (set by <code><a href=
974 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code>) needs to
975 be accumulated. A common first-time mistake with any of the event-oriented
976 interfaces to an XML parser is to expect all the text contained in an element to
977 be reported by a single call to the character data handler. Expat, like many
978 other XML parsers, reports such data as a sequence of calls; there's no way to
979 know when the end of the sequence is reached until a different callback is made.
980 A buffer referenced by the user data structure proves both an effective and
981 convenient place to accumulate character data.
982 </p>
983 <!-- XXX example needed here -->
984
985 <h3>
986 XML Version
987 </h3>
988
989 <p>
990 Expat is an XML 1.0 parser, and as such never complains based on the value of the
991 <code>version</code> pseudo-attribute in the XML declaration, if present.
992 </p>
993
994 <p>
995 If an application needs to check the version number (to support alternate
996 processing), it should use the <code><a href=
997 "#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></code> function to set a
998 handler that uses the information in the XML declaration to determine what to do.
999 This example shows how to check that only a version number of <code>"1.0"</code>
1000 is accepted:
1001 </p>
1002
1003 <pre class="eg">
1004static int wrong_version;
1005static XML_Parser parser;
1006
1007static void XMLCALL
1008xmldecl_handler(void *userData,
1009 const XML_Char *version,
1010 const XML_Char *encoding,
1011 int standalone)
1012{
1013 static const XML_Char Version_1_0[] = {'1', '.', '0', 0};
1014
1015 int i;
1016
1017 for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) {
1018 if (version[i] != Version_1_0[i]) {
1019 wrong_version = 1;
1020 /* also clear all other handlers: */
1021 XML_SetCharacterDataHandler(parser, NULL);
1022 ...
1023 return;
1024 }
1025 }
1026 ...
1027}
1028</pre>
1029 <h3>
1030 Namespace Processing
1031 </h3>
1032
1033 <p>
1034 When the parser is created using the <code><a href=
1035 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, function, Expat performs
1036 namespace processing. Under namespace processing, Expat consumes
1037 <code>xmlns</code> and <code>xmlns:...</code> attributes, which declare
1038 namespaces for the scope of the element in which they occur. This means that your
1039 start handler will not see these attributes. Your application can still be
1040 informed of these declarations by setting namespace declaration handlers with
1041 <a href=
1042 "#XML_SetNamespaceDeclHandler"><code>XML_SetNamespaceDeclHandler</code></a>.
1043 </p>
1044
1045 <p>
1046 Element type and attribute names that belong to a given namespace are passed to
1047 the appropriate handler in expanded form. By default this expanded form is a
1048 concatenation of the namespace URI, the separator character (which is the 2nd
1049 argument to <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>),
1050 and the local name (i.e. the part after the colon). Names with undeclared
1051 prefixes are not well-formed when namespace processing is enabled, and will
1052 trigger an error. Unprefixed attribute names are never expanded, and unprefixed
1053 element names are only expanded when they are in the scope of a default
1054 namespace.
1055 </p>
1056
1057 <p>
1058 However if <code><a href=
1059 "#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></code> has been called with
1060 a non-zero <code>do_nst</code> parameter, then the expanded form for names with
1061 an explicit prefix is a concatenation of: URI, separator, local name, separator,
1062 prefix.
1063 </p>
1064
1065 <p>
1066 You can set handlers for the start of a namespace declaration and for the end of
1067 a scope of a declaration with the <code><a href=
1068 "#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></code> function.
1069 The StartNamespaceDeclHandler is called prior to the start tag handler and the
1070 EndNamespaceDeclHandler is called after the corresponding end tag that ends the
1071 namespace's scope. The namespace start handler gets passed the prefix and URI for
1072 the namespace. For a default namespace declaration (xmlns='...'), the prefix will
1073 be <code>NULL</code>. The URI will be <code>NULL</code> for the case where the
1074 default namespace is being unset. The namespace end handler just gets the prefix
1075 for the closing scope.
1076 </p>
1077
1078 <p>
1079 These handlers are called for each declaration. So if, for instance, a start tag
1080 had three namespace declarations, then the StartNamespaceDeclHandler would be
1081 called three times before the start tag handler is called, once for each
1082 declaration.
1083 </p>
1084
1085 <h3>
1086 Character Encodings
1087 </h3>
1088
1089 <p>
1090 While XML is based on Unicode, and every XML processor is required to recognized
1091 UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), other encodings may be
1092 declared in XML documents or entities. For the main document, an XML declaration
1093 may contain an encoding declaration:
1094 </p>
1095
1096 <pre>
1097<?xml version="1.0" encoding="ISO-8859-2"?>
1098</pre>
1099 <p>
1100 External parsed entities may begin with a text declaration, which looks like an
1101 XML declaration with just an encoding declaration:
1102 </p>
1103
1104 <pre>
1105<?xml encoding="Big5"?>
1106</pre>
1107 <p>
1108 With Expat, you may also specify an encoding at the time of creating a parser.
1109 This is useful when the encoding information may come from a source outside the
1110 document itself (like a higher level protocol.)
1111 </p>
1112
1113 <p>
1114 <a id="builtin_encodings" name="builtin_encodings"></a>There are four built-in
1115 encodings in Expat:
1116 </p>
1117
1118 <ul>
1119 <li>UTF-8
1120 </li>
1121
1122 <li>UTF-16
1123 </li>
1124
1125 <li>ISO-8859-1
1126 </li>
1127
1128 <li>US-ASCII
1129 </li>
1130 </ul>
1131
1132 <p>
1133 Anything else discovered in an encoding declaration or in the protocol encoding
1134 specified in the parser constructor, triggers a call to the
1135 <code>UnknownEncodingHandler</code>. This handler gets passed the encoding name
1136 and a pointer to an <code>XML_Encoding</code> data structure. Your handler must
1137 fill in this structure and return <code>XML_STATUS_OK</code> if it knows how to
1138 deal with the encoding. Otherwise the handler should return
1139 <code>XML_STATUS_ERROR</code>. The handler also gets passed a pointer to an
1140 optional application data structure that you may indicate when you set the
1141 handler.
1142 </p>
1143
1144 <p>
1145 Expat places restrictions on character encodings that it can support by filling
1146 in the <code>XML_Encoding</code> structure. include file:
1147 </p>
1148
1149 <ol>
1150 <li>Every ASCII character that can appear in a well-formed XML document must be
1151 represented by a single byte, and that byte must correspond to it's ASCII
1152 encoding (except for the characters $@\^'{}~)
1153 </li>
1154
1155 <li>Characters must be encoded in 4 bytes or less.
1156 </li>
1157
1158 <li>All characters encoded must have Unicode scalar values less than or equal to
1159 65535 (0xFFFF)<em>This does not apply to the built-in support for UTF-16 and
1160 UTF-8</em>
1161 </li>
1162
1163 <li>No character may be encoded by more that one distinct sequence of bytes
1164 </li>
1165 </ol>
1166
1167 <p>
1168 <code>XML_Encoding</code> contains an array of integers that correspond to the
1169 1st byte of an encoding sequence. If the value in the array for a byte is zero or
1170 positive, then the byte is a single byte encoding that encodes the Unicode scalar
1171 value contained in the array. A -1 in this array indicates a malformed byte. If
1172 the value is -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte
1173 sequence respectively. Multi-byte sequences are sent to the convert function
1174 pointed at in the <code>XML_Encoding</code> structure. This function should
1175 return the Unicode scalar value for the sequence or -1 if the sequence is
1176 malformed.
1177 </p>
1178
1179 <p>
1180 One pitfall that novice Expat users are likely to fall into is that although
1181 Expat may accept input in various encodings, the strings that it passes to the
1182 handlers are always encoded in UTF-8 or UTF-16 (depending on how Expat was
1183 compiled). Your application is responsible for any translation of these strings
1184 into other encodings.
1185 </p>
1186
1187 <h3>
1188 Handling External Entity References
1189 </h3>
1190
1191 <p>
1192 Expat does not read or parse external entities directly. Note that any external
1193 DTD is a special case of an external entity. If you've set no
1194 <code>ExternalEntityRefHandler</code>, then external entity references are
1195 silently ignored. Otherwise, it calls your handler with the information needed to
1196 read and parse the external entity.
1197 </p>
1198
1199 <p>
1200 Your handler isn't actually responsible for parsing the entity, but it is
1201 responsible for creating a subsidiary parser with <code><a href=
1202 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> that
1203 will do the job. This returns an instance of <code>XML_Parser</code> that has
1204 handlers and other data structures initialized from the parent parser. You may
1205 then use <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href=
1206 "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this parser. Since
1207 external entities my refer to other external entities, your handler should be
1208 prepared to be called recursively.
1209 </p>
1210
1211 <h3>
1212 Parsing DTDs
1213 </h3>
1214
1215 <p>
1216 In order to parse parameter entities, before starting the parse, you must call
1217 <code><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code>
1218 with one of the following arguments:
1219 </p>
1220
1221 <dl>
1222 <dt>
1223 <code>XML_PARAM_ENTITY_PARSING_NEVER</code>
1224 </dt>
1225
1226 <dd>
1227 Don't parse parameter entities or the external subset
1228 </dd>
1229
1230 <dt>
1231 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code>
1232 </dt>
1233
1234 <dd>
1235 Parse parameter entities and the external subset unless <code>standalone</code>
1236 was set to "yes" in the XML declaration.
1237 </dd>
1238
1239 <dt>
1240 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code>
1241 </dt>
1242
1243 <dd>
1244 Always parse parameter entities and the external subset
1245 </dd>
1246 </dl>
1247
1248 <p>
1249 In order to read an external DTD, you also have to set an external entity
1250 reference handler as described above.
1251 </p>
1252
1253 <h3 id="stop-resume">
1254 Temporarily Stopping Parsing
1255 </h3>
1256
1257 <p>
1258 Expat 1.95.8 introduces a new feature: its now possible to stop parsing
1259 temporarily from within a handler function, even if more data has already been
1260 passed into the parser. Applications for this include
1261 </p>
1262
1263 <ul>
1264 <li>Supporting the <a href="https://www.w3.org/TR/xinclude/">XInclude</a>
1265 specification.
1266 </li>
1267
1268 <li>Delaying further processing until additional information is available from
1269 some other source.
1270 </li>
1271
1272 <li>Adjusting processor load as task priorities shift within an application.
1273 </li>
1274
1275 <li>Stopping parsing completely (simply free or reset the parser instead of
1276 resuming in the outer parsing loop). This can be useful if an application-domain
1277 error is found in the XML being parsed or if the result of the parse is
1278 determined not to be useful after all.
1279 </li>
1280 </ul>
1281
1282 <p>
1283 To take advantage of this feature, the main parsing loop of an application needs
1284 to support this specifically. It cannot be supported with a parsing loop
1285 compatible with Expat 1.95.7 or earlier (though existing loops will continue to
1286 work without supporting the stop/resume feature).
1287 </p>
1288
1289 <p>
1290 An application that uses this feature for a single parser will have the rough
1291 structure (in pseudo-code):
1292 </p>
1293
1294 <pre class="pseudocode">
1295fd = open_input()
1296p = create_parser()
1297
1298if parse_xml(p, fd) {
1299 /* suspended */
1300
1301 int suspended = 1;
1302
1303 while (suspended) {
1304 do_something_else()
1305 if ready_to_resume() {
1306 suspended = continue_parsing(p, fd);
1307 }
1308 }
1309}
1310</pre>
1311 <p>
1312 An application that may resume any of several parsers based on input (either from
1313 the XML being parsed or some other source) will certainly have more interesting
1314 control structures.
1315 </p>
1316
1317 <p>
1318 This C function could be used for the <code>parse_xml</code> function mentioned
1319 in the pseudo-code above:
1320 </p>
1321
1322 <pre class="eg">
1323#define BUFF_SIZE 10240
1324
1325/* Parse a document from the open file descriptor 'fd' until the parse
1326 is complete (the document has been completely parsed, or there's
1327 been an error), or the parse is stopped. Return non-zero when
1328 the parse is merely suspended.
1329*/
1330int
1331parse_xml(XML_Parser p, int fd)
1332{
1333 for (;;) {
1334 int last_chunk;
1335 int bytes_read;
1336 enum XML_Status status;
1337
1338 void *buff = XML_GetBuffer(p, BUFF_SIZE);
1339 if (buff == NULL) {
1340 /* handle error... */
1341 return 0;
1342 }
1343 bytes_read = read(fd, buff, BUFF_SIZE);
1344 if (bytes_read < 0) {
1345 /* handle error... */
1346 return 0;
1347 }
1348 status = XML_ParseBuffer(p, bytes_read, bytes_read == 0);
1349 switch (status) {
1350 case XML_STATUS_ERROR:
1351 /* handle error... */
1352 return 0;
1353 case XML_STATUS_SUSPENDED:
1354 return 1;
1355 }
1356 if (bytes_read == 0)
1357 return 0;
1358 }
1359}
1360</pre>
1361 <p>
1362 The corresponding <code>continue_parsing</code> function is somewhat simpler,
1363 since it only need deal with the return code from <code><a href=
1364 "#XML_ResumeParser">XML_ResumeParser</a></code>; it can delegate the input
1365 handling to the <code>parse_xml</code> function:
1366 </p>
1367
1368 <pre class="eg">
1369/* Continue parsing a document which had been suspended. The 'p' and
1370 'fd' arguments are the same as passed to parse_xml(). Return
1371 non-zero when the parse is suspended.
1372*/
1373int
1374continue_parsing(XML_Parser p, int fd)
1375{
1376 enum XML_Status status = XML_ResumeParser(p);
1377 switch (status) {
1378 case XML_STATUS_ERROR:
1379 /* handle error... */
1380 return 0;
1381 case XML_ERROR_NOT_SUSPENDED:
1382 /* handle error... */
1383 return 0;.
1384 case XML_STATUS_SUSPENDED:
1385 return 1;
1386 }
1387 return parse_xml(p, fd);
1388}
1389</pre>
1390 <p>
1391 Now that we've seen what a mess the top-level parsing loop can become, what have
1392 we gained? Very simply, we can now use the <code><a href=
1393 "#XML_StopParser">XML_StopParser</a></code> function to stop parsing, without
1394 having to go to great lengths to avoid additional processing that we're expecting
1395 to ignore. As a bonus, we get to stop parsing <em>temporarily</em>, and come back
1396 to it when we're ready.
1397 </p>
1398
1399 <p>
1400 To stop parsing from a handler function, use the <code><a href=
1401 "#XML_StopParser">XML_StopParser</a></code> function. This function takes two
1402 arguments; the parser being stopped and a flag indicating whether the parse can
1403 be resumed in the future.
1404 </p>
1405 <!-- XXX really need more here -->
1406
1407 <hr />
1408 <!-- ================================================================ -->
1409
1410 <h2>
1411 <a id="reference" name="reference">Expat Reference</a>
1412 </h2>
1413
1414 <h3>
1415 <a id="creation" name="creation">Parser Creation</a>
1416 </h3>
1417
1418 <h4 id="XML_ParserCreate">
1419 XML_ParserCreate
1420 </h4>
1421
1422 <pre class="fcndec">
1423XML_Parser XMLCALL
1424XML_ParserCreate(const XML_Char *encoding);
1425</pre>
1426 <div class="fcndef">
1427 <p>
1428 Construct a new parser. If encoding is non-<code>NULL</code>, it specifies a
1429 character encoding to use for the document. This overrides the document
1430 encoding declaration. There are four built-in encodings:
1431 </p>
1432
1433 <ul>
1434 <li>US-ASCII
1435 </li>
1436
1437 <li>UTF-8
1438 </li>
1439
1440 <li>UTF-16
1441 </li>
1442
1443 <li>ISO-8859-1
1444 </li>
1445 </ul>
1446
1447 <p>
1448 Any other value will invoke a call to the UnknownEncodingHandler.
1449 </p>
1450 </div>
1451
1452 <h4 id="XML_ParserCreateNS">
1453 XML_ParserCreateNS
1454 </h4>
1455
1456 <pre class="fcndec">
1457XML_Parser XMLCALL
1458XML_ParserCreateNS(const XML_Char *encoding,
1459 XML_Char sep);
1460</pre>
1461 <div class="fcndef">
1462 Constructs a new parser that has namespace processing in effect. Namespace
1463 expanded element names and attribute names are returned as a concatenation of the
1464 namespace URI, <em>sep</em>, and the local part of the name. This means that you
1465 should pick a character for <em>sep</em> that can't be part of an URI. Since
1466 Expat does not check namespace URIs for conformance, the only safe choice for a
1467 namespace separator is a character that is illegal in XML. For instance,
1468 <code>'\xFF'</code> is not legal in UTF-8, and <code>'\xFFFF'</code> is not legal
1469 in UTF-16. There is a special case when <em>sep</em> is the null character
1470 <code>'\0'</code>: the namespace URI and the local part will be concatenated
1471 without any separator - this is intended to support RDF processors. It is a
1472 programming error to use the null separator with <a href=
1473 "#XML_SetReturnNSTriplet">namespace triplets</a>.
1474 </div>
1475
1476 <p>
1477 <strong>Note:</strong> Expat does not validate namespace URIs (beyond encoding)
1478 against RFC 3986 today (and is not required to do so with regard to the XML 1.0
1479 namespaces specification) but it may start doing that in future releases. Before
1480 that, an application using Expat must be ready to receive namespace URIs
1481 containing non-URI characters.
1482 </p>
1483
1484 <h4 id="XML_ParserCreate_MM">
1485 XML_ParserCreate_MM
1486 </h4>
1487
1488 <pre class="fcndec">
1489XML_Parser XMLCALL
1490XML_ParserCreate_MM(const XML_Char *encoding,
1491 const XML_Memory_Handling_Suite *ms,
1492 const XML_Char *sep);
1493</pre>
1494
1495 <pre class="signature">
1496typedef struct {
1497 void *(XMLCALL *malloc_fcn)(size_t size);
1498 void *(XMLCALL *realloc_fcn)(void *ptr, size_t size);
1499 void (XMLCALL *free_fcn)(void *ptr);
1500} XML_Memory_Handling_Suite;
1501</pre>
1502 <div class="fcndef">
1503 <p>
1504 Construct a new parser using the suite of memory handling functions specified
1505 in <code>ms</code>. If <code>ms</code> is <code>NULL</code>, then use the
1506 standard set of memory management functions. If <code>sep</code> is
1507 non-<code>NULL</code>, then namespace processing is enabled in the created
1508 parser and the character pointed at by sep is used as the separator between the
1509 namespace URI and the local part of the name.
1510 </p>
1511 </div>
1512
1513 <h4 id="XML_ExternalEntityParserCreate">
1514 XML_ExternalEntityParserCreate
1515 </h4>
1516
1517 <pre class="fcndec">
1518XML_Parser XMLCALL
1519XML_ExternalEntityParserCreate(XML_Parser p,
1520 const XML_Char *context,
1521 const XML_Char *encoding);
1522</pre>
1523 <div class="fcndef">
1524 <p>
1525 Construct a new <code>XML_Parser</code> object for parsing an external general
1526 entity. Context is the context argument passed in a call to a
1527 ExternalEntityRefHandler. Other state information such as handlers, user data,
1528 namespace processing is inherited from the parser passed as the 1st argument.
1529 So you shouldn't need to call any of the behavior changing functions on this
1530 parser (unless you want it to act differently than the parent parser).
1531 </p>
1532
1533 <p>
1534 <strong>Note:</strong> Please be sure to free subparsers created by
1535 <code><a href=
1536 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>
1537 <em>prior to</em> freeing their related parent parser, as subparsers reference
1538 and use parts of their respective parent parser, internally. Parent parsers
1539 must outlive subparsers.
1540 </p>
1541 </div>
1542
1543 <h4 id="XML_ParserFree">
1544 XML_ParserFree
1545 </h4>
1546
1547 <pre class="fcndec">
1548void XMLCALL
1549XML_ParserFree(XML_Parser p);
1550</pre>
1551 <div class="fcndef">
1552 <p>
1553 Free memory used by the parser.
1554 </p>
1555
1556 <p>
1557 <strong>Note:</strong> Your application is responsible for freeing any memory
1558 associated with <a href="#userdata">user data</a>.
1559 </p>
1560
1561 <p>
1562 <strong>Note:</strong> Please be sure to free subparsers created by
1563 <code><a href=
1564 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>
1565 <em>prior to</em> freeing their related parent parser, as subparsers reference
1566 and use parts of their respective parent parser, internally. Parent parsers
1567 must outlive subparsers.
1568 </p>
1569 </div>
1570
1571 <h4 id="XML_ParserReset">
1572 XML_ParserReset
1573 </h4>
1574
1575 <pre class="fcndec">
1576XML_Bool XMLCALL
1577XML_ParserReset(XML_Parser p,
1578 const XML_Char *encoding);
1579</pre>
1580 <div class="fcndef">
1581 Clean up the memory structures maintained by the parser so that it may be used
1582 again. After this has been called, <code>parser</code> is ready to start parsing
1583 a new document. All handlers are cleared from the parser, except for the
1584 unknownEncodingHandler. The parser's external state is re-initialized except for
1585 the values of ns and ns_triplets. This function may not be used on a parser
1586 created using <code><a href=
1587 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>; it
1588 will return <code>XML_FALSE</code> in that case. Returns <code>XML_TRUE</code> on
1589 success. Your application is responsible for dealing with any memory associated
1590 with <a href="#userdata">user data</a>.
1591 </div>
1592
1593 <h3>
1594 <a id="parsing" name="parsing">Parsing</a>
1595 </h3>
1596
1597 <p>
1598 To state the obvious: the three parsing functions <code><a href=
1599 "#XML_Parse">XML_Parse</a></code>, <code><a href=
1600 "#XML_ParseBuffer">XML_ParseBuffer</a></code> and <code><a href=
1601 "#XML_GetBuffer">XML_GetBuffer</a></code> must not be called from within a
1602 handler unless they operate on a separate parser instance, that is, one that did
1603 not call the handler. For example, it is OK to call the parsing functions from
1604 within an <code>XML_ExternalEntityRefHandler</code>, if they apply to the parser
1605 created by <code><a href=
1606 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>.
1607 </p>
1608
1609 <p>
1610 Note: The <code>len</code> argument passed to these functions should be
1611 considerably less than the maximum value for an integer, as it could create an
1612 integer overflow situation if the added lengths of a buffer and the unprocessed
1613 portion of the previous buffer exceed the maximum integer value. Input data at
1614 the end of a buffer will remain unprocessed if it is part of an XML token for
1615 which the end is not part of that buffer.
1616 </p>
1617
1618 <p>
1619 <a id="isFinal" name="isFinal"></a>The application <em>must</em> make a
1620 concluding <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href=
1621 "#XML_ParseBuffer">XML_ParseBuffer</a></code> call with <code>isFinal</code> set
1622 to <code>XML_TRUE</code>.
1623 </p>
1624
1625 <h4 id="XML_Parse">
1626 XML_Parse
1627 </h4>
1628
1629 <pre class="fcndec">
1630enum XML_Status XMLCALL
1631XML_Parse(XML_Parser p,
1632 const char *s,
1633 int len,
1634 int isFinal);
1635</pre>
1636
1637 <pre class="signature">
1638enum XML_Status {
1639 XML_STATUS_ERROR = 0,
1640 XML_STATUS_OK = 1
1641};
1642</pre>
1643 <div class="fcndef">
1644 <p>
1645 Parse some more of the document. The string <code>s</code> is a buffer
1646 containing part (or perhaps all) of the document. The number of bytes of s that
1647 are part of the document is indicated by <code>len</code>. This means that
1648 <code>s</code> doesn't have to be null-terminated. It also means that if
1649 <code>len</code> is larger than the number of bytes in the block of memory that
1650 <code>s</code> points at, then a memory fault is likely. Negative values for
1651 <code>len</code> are rejected since Expat 2.2.1. The <code>isFinal</code>
1652 parameter informs the parser that this is the last piece of the document.
1653 Frequently, the last piece is empty (i.e. <code>len</code> is zero.)
1654 </p>
1655
1656 <p>
1657 If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. Otherwise
1658 it returns <code>XML_STATUS_OK</code> value. Note that regardless of the return
1659 value, there is no guarantee that all provided input has been parsed; only
1660 after <a href="#isFinal">the concluding call</a> will all handler callbacks and
1661 parsing errors have happened.
1662 </p>
1663
1664 <p>
1665 Simplified, <code>XML_Parse</code> can be considered a convenience wrapper that
1666 is pairing calls to <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> and
1667 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> (when Expat is
1668 built with macro <code>XML_CONTEXT_BYTES</code> defined to a positive value,
1669 which is both common and default). <code>XML_Parse</code> is then functionally
1670 equivalent to calling <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code>,
1671 <code>memcpy</code>, and <code><a href=
1672 "#XML_ParseBuffer">XML_ParseBuffer</a></code>.
1673 </p>
1674
1675 <p>
1676 To avoid double copying of the input, direct use of functions <code><a href=
1677 "#XML_GetBuffer">XML_GetBuffer</a></code> and <code><a href=
1678 "#XML_ParseBuffer">XML_ParseBuffer</a></code> is advised for most production
1679 use, e.g. if you're using <code>read</code> or similar functionality to fill
1680 your buffers, fill directly into the buffer from <code><a href=
1681 "#XML_GetBuffer">XML_GetBuffer</a></code>, then parse with <code><a href=
1682 "#XML_ParseBuffer">XML_ParseBuffer</a></code>.
1683 </p>
1684 </div>
1685
1686 <h4 id="XML_ParseBuffer">
1687 XML_ParseBuffer
1688 </h4>
1689
1690 <pre class="fcndec">
1691enum XML_Status XMLCALL
1692XML_ParseBuffer(XML_Parser p,
1693 int len,
1694 int isFinal);
1695</pre>
1696 <div class="fcndef">
1697 <p>
1698 This is just like <code><a href="#XML_Parse">XML_Parse</a></code>, except in
1699 this case Expat provides the buffer. By obtaining the buffer from Expat with
1700 the <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> function, the
1701 application can avoid double copying of the input.
1702 </p>
1703
1704 <p>
1705 Negative values for <code>len</code> are rejected since Expat 2.6.3.
1706 </p>
1707 </div>
1708
1709 <h4 id="XML_GetBuffer">
1710 XML_GetBuffer
1711 </h4>
1712
1713 <pre class="fcndec">
1714void * XMLCALL
1715XML_GetBuffer(XML_Parser p,
1716 int len);
1717</pre>
1718 <div class="fcndef">
1719 Obtain a buffer of size <code>len</code> to read a piece of the document into. A
1720 <code>NULL</code> value is returned if Expat can't allocate enough memory for
1721 this buffer. A <code>NULL</code> value may also be returned if <code>len</code>
1722 is zero. This has to be called prior to every call to <code><a href=
1723 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. A typical use would look like
1724 this:
1725
1726 <pre class="eg">
1727for (;;) {
1728 int bytes_read;
1729 void *buff = XML_GetBuffer(p, BUFF_SIZE);
1730 if (buff == NULL) {
1731 /* handle error */
1732 }
1733
1734 bytes_read = read(docfd, buff, BUFF_SIZE);
1735 if (bytes_read < 0) {
1736 /* handle error */
1737 }
1738
1739 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) {
1740 /* handle parse error */
1741 }
1742
1743 if (bytes_read == 0)
1744 break;
1745}
1746</pre>
1747 </div>
1748
1749 <h4 id="XML_StopParser">
1750 XML_StopParser
1751 </h4>
1752
1753 <pre class="fcndec">
1754enum XML_Status XMLCALL
1755XML_StopParser(XML_Parser p,
1756 XML_Bool resumable);
1757</pre>
1758 <div class="fcndef">
1759 <p>
1760 Stops parsing, causing <code><a href="#XML_Parse">XML_Parse</a></code> or
1761 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> to return. Must be
1762 called from within a call-back handler, except when aborting (when
1763 <code>resumable</code> is <code>XML_FALSE</code>) an already suspended parser.
1764 Some call-backs may still follow because they would otherwise get lost,
1765 including
1766 </p>
1767
1768 <ul>
1769 <li>the end element handler for empty elements when stopped in the start
1770 element handler,
1771 </li>
1772
1773 <li>the end namespace declaration handler when stopped in the end element
1774 handler,
1775 </li>
1776
1777 <li>the character data handler when stopped in the character data handler while
1778 making multiple call-backs on a contiguous chunk of characters,
1779 </li>
1780 </ul>
1781
1782 <p>
1783 and possibly others.
1784 </p>
1785
1786 <p>
1787 This can be called from most handlers, including DTD related call-backs, except
1788 when parsing an external parameter entity and <code>resumable</code> is
1789 <code>XML_TRUE</code>. Returns <code>XML_STATUS_OK</code> when successful,
1790 <code>XML_STATUS_ERROR</code> otherwise. The possible error codes are:
1791 </p>
1792
1793 <dl>
1794 <dt>
1795 <code>XML_ERROR_NOT_STARTED</code>
1796 </dt>
1797
1798 <dd>
1799 when stopping or suspending a parser before it has started, added in Expat
1800 2.6.4.
1801 </dd>
1802
1803 <dt>
1804 <code>XML_ERROR_SUSPENDED</code>
1805 </dt>
1806
1807 <dd>
1808 when suspending an already suspended parser.
1809 </dd>
1810
1811 <dt>
1812 <code>XML_ERROR_FINISHED</code>
1813 </dt>
1814
1815 <dd>
1816 when the parser has already finished.
1817 </dd>
1818
1819 <dt>
1820 <code>XML_ERROR_SUSPEND_PE</code>
1821 </dt>
1822
1823 <dd>
1824 when suspending while parsing an external PE.
1825 </dd>
1826 </dl>
1827
1828 <p>
1829 Since the stop/resume feature requires application support in the outer parsing
1830 loop, it is an error to call this function for a parser not being handled
1831 appropriately; see <a href="#stop-resume">Temporarily Stopping Parsing</a> for
1832 more information.
1833 </p>
1834
1835 <p>
1836 When <code>resumable</code> is <code>XML_TRUE</code> then parsing is
1837 <em>suspended</em>, that is, <code><a href="#XML_Parse">XML_Parse</a></code>
1838 and <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> return
1839 <code>XML_STATUS_SUSPENDED</code>. Otherwise, parsing is <em>aborted</em>, that
1840 is, <code><a href="#XML_Parse">XML_Parse</a></code> and <code><a href=
1841 "#XML_ParseBuffer">XML_ParseBuffer</a></code> return
1842 <code>XML_STATUS_ERROR</code> with error code <code>XML_ERROR_ABORTED</code>.
1843 </p>
1844
1845 <p>
1846 <strong>Note:</strong> This will be applied to the current parser instance
1847 only, that is, if there is a parent parser then it will continue parsing when
1848 the external entity reference handler returns. It is up to the implementation
1849 of that handler to call <code><a href=
1850 "#XML_StopParser">XML_StopParser</a></code> on the parent parser (recursively),
1851 if one wants to stop parsing altogether.
1852 </p>
1853
1854 <p>
1855 When suspended, parsing can be resumed by calling <code><a href=
1856 "#XML_ResumeParser">XML_ResumeParser</a></code>.
1857 </p>
1858
1859 <p>
1860 New in Expat 1.95.8.
1861 </p>
1862 </div>
1863
1864 <h4 id="XML_ResumeParser">
1865 XML_ResumeParser
1866 </h4>
1867
1868 <pre class="fcndec">
1869enum XML_Status XMLCALL
1870XML_ResumeParser(XML_Parser p);
1871</pre>
1872 <div class="fcndef">
1873 <p>
1874 Resumes parsing after it has been suspended with <code><a href=
1875 "#XML_StopParser">XML_StopParser</a></code>. Must not be called from within a
1876 handler call-back. Returns same status codes as <code><a href=
1877 "#XML_Parse">XML_Parse</a></code> or <code><a href=
1878 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. An additional error code,
1879 <code>XML_ERROR_NOT_SUSPENDED</code>, will be returned if the parser was not
1880 currently suspended.
1881 </p>
1882
1883 <p>
1884 <strong>Note:</strong> This must be called on the most deeply nested child
1885 parser instance first, and on its parent parser only after the child parser has
1886 finished, to be applied recursively until the document entity's parser is
1887 restarted. That is, the parent parser will not resume by itself and it is up to
1888 the application to call <code><a href=
1889 "#XML_ResumeParser">XML_ResumeParser</a></code> on it at the appropriate
1890 moment.
1891 </p>
1892
1893 <p>
1894 New in Expat 1.95.8.
1895 </p>
1896 </div>
1897
1898 <h4 id="XML_GetParsingStatus">
1899 XML_GetParsingStatus
1900 </h4>
1901
1902 <pre class="fcndec">
1903void XMLCALL
1904XML_GetParsingStatus(XML_Parser p,
1905 XML_ParsingStatus *status);
1906</pre>
1907
1908 <pre class="signature">
1909enum XML_Parsing {
1910 XML_INITIALIZED,
1911 XML_PARSING,
1912 XML_FINISHED,
1913 XML_SUSPENDED
1914};
1915
1916typedef struct {
1917 enum XML_Parsing parsing;
1918 XML_Bool finalBuffer;
1919} XML_ParsingStatus;
1920</pre>
1921 <div class="fcndef">
1922 <p>
1923 Returns status of parser with respect to being initialized, parsing, finished,
1924 or suspended, and whether the final buffer is being processed. The
1925 <code>status</code> parameter <em>must not</em> be <code>NULL</code>.
1926 </p>
1927
1928 <p>
1929 New in Expat 1.95.8.
1930 </p>
1931 </div>
1932
1933 <h3>
1934 <a id="setting" name="setting">Handler Setting</a>
1935 </h3>
1936
1937 <p>
1938 Although handlers are typically set prior to parsing and left alone, an
1939 application may choose to set or change the handler for a parsing event while the
1940 parse is in progress. For instance, your application may choose to ignore all
1941 text not descended from a <code>para</code> element. One way it could do this is
1942 to set the character handler when a para start tag is seen, and unset it for the
1943 corresponding end tag.
1944 </p>
1945
1946 <p>
1947 A handler may be <em>unset</em> by providing a <code>NULL</code> pointer to the
1948 appropriate handler setter. None of the handler setting functions have a return
1949 value.
1950 </p>
1951
1952 <p>
1953 Your handlers will be receiving strings in arrays of type <code>XML_Char</code>.
1954 This type is conditionally defined in expat.h as either <code>char</code>,
1955 <code>wchar_t</code> or <code>unsigned short</code>. The former implies UTF-8
1956 encoding, the latter two imply UTF-16 encoding. Note that you'll receive them in
1957 this form independent of the original encoding of the document.
1958 </p>
1959
1960 <div class="handler">
1961 <h4 id="XML_SetStartElementHandler">
1962 XML_SetStartElementHandler
1963 </h4>
1964
1965 <pre class="setter">
1966void XMLCALL
1967XML_SetStartElementHandler(XML_Parser p,
1968 XML_StartElementHandler start);
1969</pre>
1970
1971 <pre class="signature">
1972typedef void
1973(XMLCALL *XML_StartElementHandler)(void *userData,
1974 const XML_Char *name,
1975 const XML_Char **atts);
1976</pre>
1977 <p>
1978 Set handler for start (and empty) tags. Attributes are passed to the start
1979 handler as a pointer to a vector of char pointers. Each attribute seen in a
1980 start (or empty) tag occupies 2 consecutive places in this vector: the
1981 attribute name followed by the attribute value. These pairs are terminated by a
1982 <code>NULL</code> pointer.
1983 </p>
1984
1985 <p>
1986 Note that an empty tag generates a call to both start and end handlers (in that
1987 order).
1988 </p>
1989 </div>
1990
1991 <div class="handler">
1992 <h4 id="XML_SetEndElementHandler">
1993 XML_SetEndElementHandler
1994 </h4>
1995
1996 <pre class="setter">
1997void XMLCALL
1998XML_SetEndElementHandler(XML_Parser p,
1999 XML_EndElementHandler);
2000</pre>
2001
2002 <pre class="signature">
2003typedef void
2004(XMLCALL *XML_EndElementHandler)(void *userData,
2005 const XML_Char *name);
2006</pre>
2007 <p>
2008 Set handler for end (and empty) tags. As noted above, an empty tag generates a
2009 call to both start and end handlers.
2010 </p>
2011 </div>
2012
2013 <div class="handler">
2014 <h4 id="XML_SetElementHandler">
2015 XML_SetElementHandler
2016 </h4>
2017
2018 <pre class="setter">
2019void XMLCALL
2020XML_SetElementHandler(XML_Parser p,
2021 XML_StartElementHandler start,
2022 XML_EndElementHandler end);
2023</pre>
2024 <p>
2025 Set handlers for start and end tags with one call.
2026 </p>
2027 </div>
2028
2029 <div class="handler">
2030 <h4 id="XML_SetCharacterDataHandler">
2031 XML_SetCharacterDataHandler
2032 </h4>
2033
2034 <pre class="setter">
2035void XMLCALL
2036XML_SetCharacterDataHandler(XML_Parser p,
2037 XML_CharacterDataHandler charhndl)
2038</pre>
2039
2040 <pre class="signature">
2041typedef void
2042(XMLCALL *XML_CharacterDataHandler)(void *userData,
2043 const XML_Char *s,
2044 int len);
2045</pre>
2046 <p>
2047 Set a text handler. The string your handler receives is <em>NOT
2048 null-terminated</em>. You have to use the length argument to deal with the end
2049 of the string. A single block of contiguous text free of markup may still
2050 result in a sequence of calls to this handler. In other words, if you're
2051 searching for a pattern in the text, it may be split across calls to this
2052 handler. Note: Setting this handler to <code>NULL</code> may <em>NOT
2053 immediately</em> terminate call-backs if the parser is currently processing
2054 such a single block of contiguous markup-free text, as the parser will continue
2055 calling back until the end of the block is reached.
2056 </p>
2057 </div>
2058
2059 <div class="handler">
2060 <h4 id="XML_SetProcessingInstructionHandler">
2061 XML_SetProcessingInstructionHandler
2062 </h4>
2063
2064 <pre class="setter">
2065void XMLCALL
2066XML_SetProcessingInstructionHandler(XML_Parser p,
2067 XML_ProcessingInstructionHandler proc)
2068</pre>
2069
2070 <pre class="signature">
2071typedef void
2072(XMLCALL *XML_ProcessingInstructionHandler)(void *userData,
2073 const XML_Char *target,
2074 const XML_Char *data);
2075
2076</pre>
2077 <p>
2078 Set a handler for processing instructions. The target is the first word in the
2079 processing instruction. The data is the rest of the characters in it after
2080 skipping all whitespace after the initial word.
2081 </p>
2082 </div>
2083
2084 <div class="handler">
2085 <h4 id="XML_SetCommentHandler">
2086 XML_SetCommentHandler
2087 </h4>
2088
2089 <pre class="setter">
2090void XMLCALL
2091XML_SetCommentHandler(XML_Parser p,
2092 XML_CommentHandler cmnt)
2093</pre>
2094
2095 <pre class="signature">
2096typedef void
2097(XMLCALL *XML_CommentHandler)(void *userData,
2098 const XML_Char *data);
2099</pre>
2100 <p>
2101 Set a handler for comments. The data is all text inside the comment delimiters.
2102 </p>
2103 </div>
2104
2105 <div class="handler">
2106 <h4 id="XML_SetStartCdataSectionHandler">
2107 XML_SetStartCdataSectionHandler
2108 </h4>
2109
2110 <pre class="setter">
2111void XMLCALL
2112XML_SetStartCdataSectionHandler(XML_Parser p,
2113 XML_StartCdataSectionHandler start);
2114</pre>
2115
2116 <pre class="signature">
2117typedef void
2118(XMLCALL *XML_StartCdataSectionHandler)(void *userData);
2119</pre>
2120 <p>
2121 Set a handler that gets called at the beginning of a CDATA section.
2122 </p>
2123 </div>
2124
2125 <div class="handler">
2126 <h4 id="XML_SetEndCdataSectionHandler">
2127 XML_SetEndCdataSectionHandler
2128 </h4>
2129
2130 <pre class="setter">
2131void XMLCALL
2132XML_SetEndCdataSectionHandler(XML_Parser p,
2133 XML_EndCdataSectionHandler end);
2134</pre>
2135
2136 <pre class="signature">
2137typedef void
2138(XMLCALL *XML_EndCdataSectionHandler)(void *userData);
2139</pre>
2140 <p>
2141 Set a handler that gets called at the end of a CDATA section.
2142 </p>
2143 </div>
2144
2145 <div class="handler">
2146 <h4 id="XML_SetCdataSectionHandler">
2147 XML_SetCdataSectionHandler
2148 </h4>
2149
2150 <pre class="setter">
2151void XMLCALL
2152XML_SetCdataSectionHandler(XML_Parser p,
2153 XML_StartCdataSectionHandler start,
2154 XML_EndCdataSectionHandler end)
2155</pre>
2156 <p>
2157 Sets both CDATA section handlers with one call.
2158 </p>
2159 </div>
2160
2161 <div class="handler">
2162 <h4 id="XML_SetDefaultHandler">
2163 XML_SetDefaultHandler
2164 </h4>
2165
2166 <pre class="setter">
2167void XMLCALL
2168XML_SetDefaultHandler(XML_Parser p,
2169 XML_DefaultHandler hndl)
2170</pre>
2171
2172 <pre class="signature">
2173typedef void
2174(XMLCALL *XML_DefaultHandler)(void *userData,
2175 const XML_Char *s,
2176 int len);
2177</pre>
2178 <p>
2179 Sets a handler for any characters in the document which wouldn't otherwise be
2180 handled. This includes both data for which no handlers can be set (like some
2181 kinds of DTD declarations) and data which could be reported but which currently
2182 has no handler set. The characters are passed exactly as they were present in
2183 the XML document except that they will be encoded in UTF-8 or UTF-16. Line
2184 boundaries are not normalized. Note that a byte order mark character is not
2185 passed to the default handler. There are no guarantees about how characters are
2186 divided between calls to the default handler: for example, a comment might be
2187 split between multiple calls. Setting the handler with this call has the side
2188 effect of turning off expansion of references to internally defined general
2189 entities. Instead these references are passed to the default handler.
2190 </p>
2191
2192 <p>
2193 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.
2194 </p>
2195 </div>
2196
2197 <div class="handler">
2198 <h4 id="XML_SetDefaultHandlerExpand">
2199 XML_SetDefaultHandlerExpand
2200 </h4>
2201
2202 <pre class="setter">
2203void XMLCALL
2204XML_SetDefaultHandlerExpand(XML_Parser p,
2205 XML_DefaultHandler hndl)
2206</pre>
2207
2208 <pre class="signature">
2209typedef void
2210(XMLCALL *XML_DefaultHandler)(void *userData,
2211 const XML_Char *s,
2212 int len);
2213</pre>
2214 <p>
2215 This sets a default handler, but doesn't inhibit the expansion of internal
2216 entity references. The entity reference will not be passed to the default
2217 handler.
2218 </p>
2219
2220 <p>
2221 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.
2222 </p>
2223 </div>
2224
2225 <div class="handler">
2226 <h4 id="XML_SetExternalEntityRefHandler">
2227 XML_SetExternalEntityRefHandler
2228 </h4>
2229
2230 <pre class="setter">
2231void XMLCALL
2232XML_SetExternalEntityRefHandler(XML_Parser p,
2233 XML_ExternalEntityRefHandler hndl)
2234</pre>
2235
2236 <pre class="signature">
2237typedef int
2238(XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p,
2239 const XML_Char *context,
2240 const XML_Char *base,
2241 const XML_Char *systemId,
2242 const XML_Char *publicId);
2243</pre>
2244 <p>
2245 Set an external entity reference handler. This handler is also called for
2246 processing an external DTD subset if parameter entity parsing is in effect.
2247 (See <a href=
2248 "#XML_SetParamEntityParsing"><code>XML_SetParamEntityParsing</code></a>.)
2249 </p>
2250
2251 <p>
2252 <strong>Warning:</strong> Using an external entity reference handler can lead
2253 to <a href="https://libexpat.github.io/doc/xml-security/#external-entities">XXE
2254 vulnerabilities</a>. It should only be used in applications that do not parse
2255 untrusted XML input.
2256 </p>
2257
2258 <p>
2259 The <code>context</code> parameter specifies the parsing context in the format
2260 expected by the <code>context</code> argument to <code><a href=
2261 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>.
2262 <code>code</code> is valid only until the handler returns, so if the referenced
2263 entity is to be parsed later, it must be copied. <code>context</code> is
2264 <code>NULL</code> only when the entity is a parameter entity, which is how one
2265 can differentiate between general and parameter entities.
2266 </p>
2267
2268 <p>
2269 The <code>base</code> parameter is the base to use for relative system
2270 identifiers. It is set by <code><a href="#XML_SetBase">XML_SetBase</a></code>
2271 and may be <code>NULL</code>. The <code>publicId</code> parameter is the public
2272 id given in the entity declaration and may be <code>NULL</code>.
2273 <code>systemId</code> is the system identifier specified in the entity
2274 declaration and is never <code>NULL</code>.
2275 </p>
2276
2277 <p>
2278 There are a couple of ways in which this handler differs from others. First,
2279 this handler returns a status indicator (an integer).
2280 <code>XML_STATUS_OK</code> should be returned for successful handling of the
2281 external entity reference. Returning <code>XML_STATUS_ERROR</code> indicates
2282 failure, and causes the calling parser to return an
2283 <code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error.
2284 </p>
2285
2286 <p>
2287 Second, instead of having the user data as its first argument, it receives the
2288 parser that encountered the entity reference. This, along with the context
2289 parameter, may be used as arguments to a call to <code><a href=
2290 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>.
2291 Using the returned parser, the body of the external entity can be recursively
2292 parsed.
2293 </p>
2294
2295 <p>
2296 Since this handler may be called recursively, it should not be saving
2297 information into global or static variables.
2298 </p>
2299 </div>
2300
2301 <h4 id="XML_SetExternalEntityRefHandlerArg">
2302 XML_SetExternalEntityRefHandlerArg
2303 </h4>
2304
2305 <pre class="fcndec">
2306void XMLCALL
2307XML_SetExternalEntityRefHandlerArg(XML_Parser p,
2308 void *arg)
2309</pre>
2310 <div class="fcndef">
2311 <p>
2312 Set the argument passed to the ExternalEntityRefHandler. If <code>arg</code> is
2313 not <code>NULL</code>, it is the new value passed to the handler set using
2314 <code><a href=
2315 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>;
2316 if <code>arg</code> is <code>NULL</code>, the argument passed to the handler
2317 function will be the parser object itself.
2318 </p>
2319
2320 <p>
2321 <strong>Note:</strong> The type of <code>arg</code> and the type of the first
2322 argument to the ExternalEntityRefHandler do not match. This function takes a
2323 <code>void *</code> to be passed to the handler, while the handler accepts an
2324 <code>XML_Parser</code>. This is a historical accident, but will not be
2325 corrected before Expat 2.0 (at the earliest) to avoid causing compiler warnings
2326 for code that's known to work with this API. It is the responsibility of the
2327 application code to know the actual type of the argument passed to the handler
2328 and to manage it properly.
2329 </p>
2330 </div>
2331
2332 <div class="handler">
2333 <h4 id="XML_SetSkippedEntityHandler">
2334 XML_SetSkippedEntityHandler
2335 </h4>
2336
2337 <pre class="setter">
2338void XMLCALL
2339XML_SetSkippedEntityHandler(XML_Parser p,
2340 XML_SkippedEntityHandler handler)
2341</pre>
2342
2343 <pre class="signature">
2344typedef void
2345(XMLCALL *XML_SkippedEntityHandler)(void *userData,
2346 const XML_Char *entityName,
2347 int is_parameter_entity);
2348</pre>
2349 <p>
2350 Set a skipped entity handler. This is called in two situations:
2351 </p>
2352
2353 <ol>
2354 <li>An entity reference is encountered for which no declaration has been read
2355 <em>and</em> this is not an error.
2356 </li>
2357
2358 <li>An internal entity reference is read, but not expanded, because <a href=
2359 "#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> has been
2360 called.
2361 </li>
2362 </ol>
2363
2364 <p>
2365 The <code>is_parameter_entity</code> argument will be non-zero for a parameter
2366 entity and zero for a general entity.
2367 </p>
2368
2369 <p>
2370 Note: Skipped parameter entities in declarations and skipped general entities
2371 in attribute values cannot be reported, because the event would be out of sync
2372 with the reporting of the declarations or attribute values
2373 </p>
2374 </div>
2375
2376 <div class="handler">
2377 <h4 id="XML_SetUnknownEncodingHandler">
2378 XML_SetUnknownEncodingHandler
2379 </h4>
2380
2381 <pre class="setter">
2382void XMLCALL
2383XML_SetUnknownEncodingHandler(XML_Parser p,
2384 XML_UnknownEncodingHandler enchandler,
2385 void *encodingHandlerData)
2386</pre>
2387
2388 <pre class="signature">
2389typedef int
2390(XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData,
2391 const XML_Char *name,
2392 XML_Encoding *info);
2393
2394typedef struct {
2395 int map[256];
2396 void *data;
2397 int (XMLCALL *convert)(void *data, const char *s);
2398 void (XMLCALL *release)(void *data);
2399} XML_Encoding;
2400</pre>
2401 <p>
2402 Set a handler to deal with encodings other than the <a href=
2403 "#builtin_encodings">built in set</a>. This should be done before
2404 <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href=
2405 "#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on the given
2406 parser.
2407 </p>
2408
2409 <p>
2410 If the handler knows how to deal with an encoding with the given name, it
2411 should fill in the <code>info</code> data structure and return
2412 <code>XML_STATUS_OK</code>. Otherwise it should return
2413 <code>XML_STATUS_ERROR</code>. The handler will be called at most once per
2414 parsed (external) entity. The optional application data pointer
2415 <code>encodingHandlerData</code> will be passed back to the handler.
2416 </p>
2417
2418 <p>
2419 The map array contains information for every possible leading byte in a byte
2420 sequence. If the corresponding value is >= 0, then it's a single byte
2421 sequence and the byte encodes that Unicode value. If the value is -1, then that
2422 byte is invalid as the initial byte in a sequence. If the value is -n, where n
2423 is an integer > 1, then n is the number of bytes in the sequence and the
2424 actual conversion is accomplished by a call to the function pointed at by
2425 convert. This function may return -1 if the sequence itself is invalid. The
2426 convert pointer may be <code>NULL</code> if there are only single byte codes.
2427 The data parameter passed to the convert function is the data pointer from
2428 <code>XML_Encoding</code>. The string s is <em>NOT</em> null-terminated and
2429 points at the sequence of bytes to be converted.
2430 </p>
2431
2432 <p>
2433 The function pointed at by <code>release</code> is called by the parser when it
2434 is finished with the encoding. It may be <code>NULL</code>.
2435 </p>
2436 </div>
2437
2438 <div class="handler">
2439 <h4 id="XML_SetStartNamespaceDeclHandler">
2440 XML_SetStartNamespaceDeclHandler
2441 </h4>
2442
2443 <pre class="setter">
2444void XMLCALL
2445XML_SetStartNamespaceDeclHandler(XML_Parser p,
2446 XML_StartNamespaceDeclHandler start);
2447</pre>
2448
2449 <pre class="signature">
2450typedef void
2451(XMLCALL *XML_StartNamespaceDeclHandler)(void *userData,
2452 const XML_Char *prefix,
2453 const XML_Char *uri);
2454</pre>
2455 <p>
2456 Set a handler to be called when a namespace is declared. Namespace declarations
2457 occur inside start tags. But the namespace declaration start handler is called
2458 before the start tag handler for each namespace declared in that start tag.
2459 </p>
2460 </div>
2461
2462 <div class="handler">
2463 <h4 id="XML_SetEndNamespaceDeclHandler">
2464 XML_SetEndNamespaceDeclHandler
2465 </h4>
2466
2467 <pre class="setter">
2468void XMLCALL
2469XML_SetEndNamespaceDeclHandler(XML_Parser p,
2470 XML_EndNamespaceDeclHandler end);
2471</pre>
2472
2473 <pre class="signature">
2474typedef void
2475(XMLCALL *XML_EndNamespaceDeclHandler)(void *userData,
2476 const XML_Char *prefix);
2477</pre>
2478 <p>
2479 Set a handler to be called when leaving the scope of a namespace declaration.
2480 This will be called, for each namespace declaration, after the handler for the
2481 end tag of the element in which the namespace was declared.
2482 </p>
2483 </div>
2484
2485 <div class="handler">
2486 <h4 id="XML_SetNamespaceDeclHandler">
2487 XML_SetNamespaceDeclHandler
2488 </h4>
2489
2490 <pre class="setter">
2491void XMLCALL
2492XML_SetNamespaceDeclHandler(XML_Parser p,
2493 XML_StartNamespaceDeclHandler start,
2494 XML_EndNamespaceDeclHandler end)
2495</pre>
2496 <p>
2497 Sets both namespace declaration handlers with a single call.
2498 </p>
2499 </div>
2500
2501 <div class="handler">
2502 <h4 id="XML_SetXmlDeclHandler">
2503 XML_SetXmlDeclHandler
2504 </h4>
2505
2506 <pre class="setter">
2507void XMLCALL
2508XML_SetXmlDeclHandler(XML_Parser p,
2509 XML_XmlDeclHandler xmldecl);
2510</pre>
2511
2512 <pre class="signature">
2513typedef void
2514(XMLCALL *XML_XmlDeclHandler)(void *userData,
2515 const XML_Char *version,
2516 const XML_Char *encoding,
2517 int standalone);
2518</pre>
2519 <p>
2520 Sets a handler that is called for XML declarations and also for text
2521 declarations discovered in external entities. The way to distinguish is that
2522 the <code>version</code> parameter will be <code>NULL</code> for text
2523 declarations. The <code>encoding</code> parameter may be <code>NULL</code> for
2524 an XML declaration. The <code>standalone</code> argument will contain -1, 0, or
2525 1 indicating respectively that there was no standalone parameter in the
2526 declaration, that it was given as no, or that it was given as yes.
2527 </p>
2528 </div>
2529
2530 <div class="handler">
2531 <h4 id="XML_SetStartDoctypeDeclHandler">
2532 XML_SetStartDoctypeDeclHandler
2533 </h4>
2534
2535 <pre class="setter">
2536void XMLCALL
2537XML_SetStartDoctypeDeclHandler(XML_Parser p,
2538 XML_StartDoctypeDeclHandler start);
2539</pre>
2540
2541 <pre class="signature">
2542typedef void
2543(XMLCALL *XML_StartDoctypeDeclHandler)(void *userData,
2544 const XML_Char *doctypeName,
2545 const XML_Char *sysid,
2546 const XML_Char *pubid,
2547 int has_internal_subset);
2548</pre>
2549 <p>
2550 Set a handler that is called at the start of a DOCTYPE declaration, before any
2551 external or internal subset is parsed. Both <code>sysid</code> and
2552 <code>pubid</code> may be <code>NULL</code>. The
2553 <code>has_internal_subset</code> will be non-zero if the DOCTYPE declaration
2554 has an internal subset.
2555 </p>
2556 </div>
2557
2558 <div class="handler">
2559 <h4 id="XML_SetEndDoctypeDeclHandler">
2560 XML_SetEndDoctypeDeclHandler
2561 </h4>
2562
2563 <pre class="setter">
2564void XMLCALL
2565XML_SetEndDoctypeDeclHandler(XML_Parser p,
2566 XML_EndDoctypeDeclHandler end);
2567</pre>
2568
2569 <pre class="signature">
2570typedef void
2571(XMLCALL *XML_EndDoctypeDeclHandler)(void *userData);
2572</pre>
2573 <p>
2574 Set a handler that is called at the end of a DOCTYPE declaration, after parsing
2575 any external subset.
2576 </p>
2577 </div>
2578
2579 <div class="handler">
2580 <h4 id="XML_SetDoctypeDeclHandler">
2581 XML_SetDoctypeDeclHandler
2582 </h4>
2583
2584 <pre class="setter">
2585void XMLCALL
2586XML_SetDoctypeDeclHandler(XML_Parser p,
2587 XML_StartDoctypeDeclHandler start,
2588 XML_EndDoctypeDeclHandler end);
2589</pre>
2590 <p>
2591 Set both doctype handlers with one call.
2592 </p>
2593 </div>
2594
2595 <div class="handler">
2596 <h4 id="XML_SetElementDeclHandler">
2597 XML_SetElementDeclHandler
2598 </h4>
2599
2600 <pre class="setter">
2601void XMLCALL
2602XML_SetElementDeclHandler(XML_Parser p,
2603 XML_ElementDeclHandler eldecl);
2604</pre>
2605
2606 <pre class="signature">
2607typedef void
2608(XMLCALL *XML_ElementDeclHandler)(void *userData,
2609 const XML_Char *name,
2610 XML_Content *model);
2611</pre>
2612
2613 <pre class="signature">
2614enum XML_Content_Type {
2615 XML_CTYPE_EMPTY = 1,
2616 XML_CTYPE_ANY,
2617 XML_CTYPE_MIXED,
2618 XML_CTYPE_NAME,
2619 XML_CTYPE_CHOICE,
2620 XML_CTYPE_SEQ
2621};
2622
2623enum XML_Content_Quant {
2624 XML_CQUANT_NONE,
2625 XML_CQUANT_OPT,
2626 XML_CQUANT_REP,
2627 XML_CQUANT_PLUS
2628};
2629
2630typedef struct XML_cp XML_Content;
2631
2632struct XML_cp {
2633 enum XML_Content_Type type;
2634 enum XML_Content_Quant quant;
2635 const XML_Char * name;
2636 unsigned int numchildren;
2637 XML_Content * children;
2638};
2639</pre>
2640 <p>
2641 Sets a handler for element declarations in a DTD. The handler gets called with
2642 the name of the element in the declaration and a pointer to a structure that
2643 contains the element model. It's the user code's responsibility to free model
2644 when finished with via a call to <code><a href=
2645 "#XML_FreeContentModel">XML_FreeContentModel</a></code>. There is no need to
2646 free the model from the handler, it can be kept around and freed at a later
2647 stage.
2648 </p>
2649
2650 <p>
2651 The <code>model</code> argument is the root of a tree of
2652 <code>XML_Content</code> nodes. If <code>type</code> equals
2653 <code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then
2654 <code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other fields
2655 will be zero or <code>NULL</code>. If <code>type</code> is
2656 <code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be
2657 <code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and
2658 <code>numchildren</code> will contain the number of elements that are allowed
2659 to be mixed in and <code>children</code> points to an array of
2660 <code>XML_Content</code> structures that will all have type XML_CTYPE_NAME with
2661 no quantification. Only the root node can be type <code>XML_CTYPE_EMPTY</code>,
2662 <code>XML_CTYPE_ANY</code>, or <code>XML_CTYPE_MIXED</code>.
2663 </p>
2664
2665 <p>
2666 For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field points to the
2667 name and the <code>numchildren</code> and <code>children</code> fields will be
2668 zero and <code>NULL</code>. The <code>quant</code> field will indicate any
2669 quantifiers placed on the name.
2670 </p>
2671
2672 <p>
2673 Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> indicate a
2674 choice or sequence respectively. The <code>numchildren</code> field indicates
2675 how many nodes in the choice or sequence and <code>children</code> points to
2676 the nodes.
2677 </p>
2678 </div>
2679
2680 <div class="handler">
2681 <h4 id="XML_SetAttlistDeclHandler">
2682 XML_SetAttlistDeclHandler
2683 </h4>
2684
2685 <pre class="setter">
2686void XMLCALL
2687XML_SetAttlistDeclHandler(XML_Parser p,
2688 XML_AttlistDeclHandler attdecl);
2689</pre>
2690
2691 <pre class="signature">
2692typedef void
2693(XMLCALL *XML_AttlistDeclHandler)(void *userData,
2694 const XML_Char *elname,
2695 const XML_Char *attname,
2696 const XML_Char *att_type,
2697 const XML_Char *dflt,
2698 int isrequired);
2699</pre>
2700 <p>
2701 Set a handler for attlist declarations in the DTD. This handler is called for
2702 <em>each</em> attribute. So a single attlist declaration with multiple
2703 attributes declared will generate multiple calls to this handler. The
2704 <code>elname</code> parameter returns the name of the element for which the
2705 attribute is being declared. The attribute name is in the <code>attname</code>
2706 parameter. The attribute type is in the <code>att_type</code> parameter. It is
2707 the string representing the type in the declaration with whitespace removed.
2708 </p>
2709
2710 <p>
2711 The <code>dflt</code> parameter holds the default value. It will be
2712 <code>NULL</code> in the case of "#IMPLIED" or "#REQUIRED" attributes. You can
2713 distinguish these two cases by checking the <code>isrequired</code> parameter,
2714 which will be true in the case of "#REQUIRED" attributes. Attributes which are
2715 "#FIXED" will have also have a true <code>isrequired</code>, but they will have
2716 the non-<code>NULL</code> fixed value in the <code>dflt</code> parameter.
2717 </p>
2718 </div>
2719
2720 <div class="handler">
2721 <h4 id="XML_SetEntityDeclHandler">
2722 XML_SetEntityDeclHandler
2723 </h4>
2724
2725 <pre class="setter">
2726void XMLCALL
2727XML_SetEntityDeclHandler(XML_Parser p,
2728 XML_EntityDeclHandler handler);
2729</pre>
2730
2731 <pre class="signature">
2732typedef void
2733(XMLCALL *XML_EntityDeclHandler)(void *userData,
2734 const XML_Char *entityName,
2735 int is_parameter_entity,
2736 const XML_Char *value,
2737 int value_length,
2738 const XML_Char *base,
2739 const XML_Char *systemId,
2740 const XML_Char *publicId,
2741 const XML_Char *notationName);
2742</pre>
2743 <p>
2744 Sets a handler that will be called for all entity declarations. The
2745 <code>is_parameter_entity</code> argument will be non-zero in the case of
2746 parameter entities and zero otherwise.
2747 </p>
2748
2749 <p>
2750 For internal entities (<code><!ENTITY foo "bar"></code>),
2751 <code>value</code> will be non-<code>NULL</code> and <code>systemId</code>,
2752 <code>publicId</code>, and <code>notationName</code> will all be
2753 <code>NULL</code>. The value string is <em>not</em> null-terminated; the length
2754 is provided in the <code>value_length</code> parameter. Do not use
2755 <code>value_length</code> to test for internal entities, since it is legal to
2756 have zero-length values. Instead check for whether or not <code>value</code> is
2757 <code>NULL</code>.
2758 </p>
2759
2760 <p>
2761 The <code>notationName</code> argument will have a non-<code>NULL</code> value
2762 only for unparsed entity declarations.
2763 </p>
2764 </div>
2765
2766 <div class="handler">
2767 <h4 id="XML_SetUnparsedEntityDeclHandler">
2768 XML_SetUnparsedEntityDeclHandler
2769 </h4>
2770
2771 <pre class="setter">
2772void XMLCALL
2773XML_SetUnparsedEntityDeclHandler(XML_Parser p,
2774 XML_UnparsedEntityDeclHandler h)
2775</pre>
2776
2777 <pre class="signature">
2778typedef void
2779(XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData,
2780 const XML_Char *entityName,
2781 const XML_Char *base,
2782 const XML_Char *systemId,
2783 const XML_Char *publicId,
2784 const XML_Char *notationName);
2785</pre>
2786 <p>
2787 Set a handler that receives declarations of unparsed entities. These are entity
2788 declarations that have a notation (NDATA) field:
2789 </p>
2790
2791 <div id="eg">
2792 <pre>
2793<!ENTITY logo SYSTEM "images/logo.gif" NDATA gif>
2794</pre>
2795 </div>
2796
2797 <p>
2798 This handler is obsolete and is provided for backwards compatibility. Use
2799 instead <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a>.
2800 </p>
2801 </div>
2802
2803 <div class="handler">
2804 <h4 id="XML_SetNotationDeclHandler">
2805 XML_SetNotationDeclHandler
2806 </h4>
2807
2808 <pre class="setter">
2809void XMLCALL
2810XML_SetNotationDeclHandler(XML_Parser p,
2811 XML_NotationDeclHandler h)
2812</pre>
2813
2814 <pre class="signature">
2815typedef void
2816(XMLCALL *XML_NotationDeclHandler)(void *userData,
2817 const XML_Char *notationName,
2818 const XML_Char *base,
2819 const XML_Char *systemId,
2820 const XML_Char *publicId);
2821</pre>
2822 <p>
2823 Set a handler that receives notation declarations.
2824 </p>
2825 </div>
2826
2827 <div class="handler">
2828 <h4 id="XML_SetNotStandaloneHandler">
2829 XML_SetNotStandaloneHandler
2830 </h4>
2831
2832 <pre class="setter">
2833void XMLCALL
2834XML_SetNotStandaloneHandler(XML_Parser p,
2835 XML_NotStandaloneHandler h)
2836</pre>
2837
2838 <pre class="signature">
2839typedef int
2840(XMLCALL *XML_NotStandaloneHandler)(void *userData);
2841</pre>
2842 <p>
2843 Set a handler that is called if the document is not "standalone". This happens
2844 when there is an external subset or a reference to a parameter entity, but does
2845 not have standalone set to "yes" in an XML declaration. If this handler returns
2846 <code>XML_STATUS_ERROR</code>, then the parser will throw an
2847 <code>XML_ERROR_NOT_STANDALONE</code> error.
2848 </p>
2849 </div>
2850
2851 <h3>
2852 <a id="position" name="position">Parse position and error reporting functions</a>
2853 </h3>
2854
2855 <p>
2856 These are the functions you'll want to call when the parse functions return
2857 <code>XML_STATUS_ERROR</code> (a parse error has occurred), although the position
2858 reporting functions are useful outside of errors. The position reported is the
2859 byte position (in the original document or entity encoding) of the first of the
2860 sequence of characters that generated the current event (or the error that caused
2861 the parse functions to return <code>XML_STATUS_ERROR</code>.) The exceptions are
2862 callbacks triggered by declarations in the document prologue, in which case they
2863 exact position reported is somewhere in the relevant markup, but not necessarily
2864 as meaningful as for other events.
2865 </p>
2866
2867 <p>
2868 The position reporting functions are accurate only outside of the DTD. In other
2869 words, they usually return bogus information when called from within a DTD
2870 declaration handler.
2871 </p>
2872
2873 <h4 id="XML_GetErrorCode">
2874 XML_GetErrorCode
2875 </h4>
2876
2877 <pre class="fcndec">
2878enum XML_Error XMLCALL
2879XML_GetErrorCode(XML_Parser p);
2880</pre>
2881 <div class="fcndef">
2882 Return what type of error has occurred.
2883 </div>
2884
2885 <h4 id="XML_ErrorString">
2886 XML_ErrorString
2887 </h4>
2888
2889 <pre class="fcndec">
2890const XML_LChar * XMLCALL
2891XML_ErrorString(enum XML_Error code);
2892</pre>
2893 <div class="fcndef">
2894 Return a string describing the error corresponding to code. The code should be
2895 one of the enums that can be returned from <code><a href=
2896 "#XML_GetErrorCode">XML_GetErrorCode</a></code>.
2897 </div>
2898
2899 <h4 id="XML_GetCurrentByteIndex">
2900 XML_GetCurrentByteIndex
2901 </h4>
2902
2903 <pre class="fcndec">
2904XML_Index XMLCALL
2905XML_GetCurrentByteIndex(XML_Parser p);
2906</pre>
2907 <div class="fcndef">
2908 Return the byte offset of the position. This always corresponds to the values
2909 returned by <code><a href=
2910 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and
2911 <code><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code>.
2912 </div>
2913
2914 <h4 id="XML_GetCurrentLineNumber">
2915 XML_GetCurrentLineNumber
2916 </h4>
2917
2918 <pre class="fcndec">
2919XML_Size XMLCALL
2920XML_GetCurrentLineNumber(XML_Parser p);
2921</pre>
2922 <div class="fcndef">
2923 Return the line number of the position. The first line is reported as
2924 <code>1</code>.
2925 </div>
2926
2927 <h4 id="XML_GetCurrentColumnNumber">
2928 XML_GetCurrentColumnNumber
2929 </h4>
2930
2931 <pre class="fcndec">
2932XML_Size XMLCALL
2933XML_GetCurrentColumnNumber(XML_Parser p);
2934</pre>
2935 <div class="fcndef">
2936 Return the <em>offset</em>, from the beginning of the current line, of the
2937 position. The first column is reported as <code>0</code>.
2938 </div>
2939
2940 <h4 id="XML_GetCurrentByteCount">
2941 XML_GetCurrentByteCount
2942 </h4>
2943
2944 <pre class="fcndec">
2945int XMLCALL
2946XML_GetCurrentByteCount(XML_Parser p);
2947</pre>
2948 <div class="fcndef">
2949 Return the number of bytes in the current event. Returns <code>0</code> if the
2950 event is inside a reference to an internal entity and for the end-tag event for
2951 empty element tags (the later can be used to distinguish empty-element tags from
2952 empty elements using separate start and end tags).
2953 </div>
2954
2955 <h4 id="XML_GetInputContext">
2956 XML_GetInputContext
2957 </h4>
2958
2959 <pre class="fcndec">
2960const char * XMLCALL
2961XML_GetInputContext(XML_Parser p,
2962 int *offset,
2963 int *size);
2964</pre>
2965 <div class="fcndef">
2966 <p>
2967 Returns the parser's input buffer, sets the integer pointed at by
2968 <code>offset</code> to the offset within this buffer of the current parse
2969 position, and set the integer pointed at by <code>size</code> to the size of
2970 the returned buffer.
2971 </p>
2972
2973 <p>
2974 This should only be called from within a handler during an active parse and the
2975 returned buffer should only be referred to from within the handler that made
2976 the call. This input buffer contains the untranslated bytes of the input.
2977 </p>
2978
2979 <p>
2980 Only a limited amount of context is kept, so if the event triggering a call
2981 spans over a very large amount of input, the actual parse position may be
2982 before the beginning of the buffer.
2983 </p>
2984
2985 <p>
2986 If <code>XML_CONTEXT_BYTES</code> is zero, this will always return
2987 <code>NULL</code>.
2988 </p>
2989 </div>
2990
2991 <h3>
2992 <a id="attack-protection" name="attack-protection">Attack Protection</a><a id=
2993 "billion-laughs" name="billion-laughs"></a>
2994 </h3>
2995
2996 <h4 id="XML_SetBillionLaughsAttackProtectionMaximumAmplification">
2997 XML_SetBillionLaughsAttackProtectionMaximumAmplification
2998 </h4>
2999
3000 <pre class="fcndec">
3001/* Added in Expat 2.4.0. */
3002XML_Bool XMLCALL
3003XML_SetBillionLaughsAttackProtectionMaximumAmplification(XML_Parser p,
3004 float maximumAmplificationFactor);
3005</pre>
3006 <div class="fcndef">
3007 <p>
3008 Sets the maximum tolerated amplification factor for protection against <a href=
3009 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs
3010 attacks</a> (default: <code>100.0</code>) of parser <code>p</code> to
3011 <code>maximumAmplificationFactor</code>, and returns <code>XML_TRUE</code> upon
3012 success and <code>XML_FALSE</code> upon error.
3013 </p>
3014
3015 <p>
3016 Once the <a href=
3017 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">threshold for
3018 activation</a> is reached, the amplification factor is calculated as ..
3019 </p>
3020
3021 <pre>amplification := (direct + indirect) / direct</pre>
3022 <p>
3023 .. while parsing, whereas <code>direct</code> is the number of bytes read from
3024 the primary document in parsing and <code>indirect</code> is the number of
3025 bytes added by expanding entities and reading of external DTD files, combined.
3026 </p>
3027
3028 <p>
3029 For a call to
3030 <code>XML_SetBillionLaughsAttackProtectionMaximumAmplification</code> to
3031 succeed:
3032 </p>
3033
3034 <ul>
3035 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without
3036 any parent parsers) and
3037 </li>
3038
3039 <li>
3040 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and
3041 greater than or equal to <code>1.0</code>.
3042 </li>
3043 </ul>
3044
3045 <p>
3046 <strong>Note:</strong> If you ever need to increase this value for non-attack
3047 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a
3048 bug report</a>.
3049 </p>
3050
3051 <p>
3052 <strong>Note:</strong> Peak amplifications of factor 15,000 for the entire
3053 payload and of factor 30,000 in the middle of parsing have been observed with
3054 small benign files in practice. So if you do reduce the maximum allowed
3055 amplification, please make sure that the activation threshold is still big
3056 enough to not end up with undesired false positives (i.e. benign files being
3057 rejected).
3058 </p>
3059 </div>
3060
3061 <h4 id="XML_SetBillionLaughsAttackProtectionActivationThreshold">
3062 XML_SetBillionLaughsAttackProtectionActivationThreshold
3063 </h4>
3064
3065 <pre class="fcndec">
3066/* Added in Expat 2.4.0. */
3067XML_Bool XMLCALL
3068XML_SetBillionLaughsAttackProtectionActivationThreshold(XML_Parser p,
3069 unsigned long long activationThresholdBytes);
3070</pre>
3071 <div class="fcndef">
3072 <p>
3073 Sets number of output bytes (including amplification from entity expansion and
3074 reading DTD files) needed to activate protection against <a href=
3075 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs
3076 attacks</a> (default: <code>8 MiB</code>) of parser <code>p</code> to
3077 <code>activationThresholdBytes</code>, and returns <code>XML_TRUE</code> upon
3078 success and <code>XML_FALSE</code> upon error.
3079 </p>
3080
3081 <p>
3082 For a call to
3083 <code>XML_SetBillionLaughsAttackProtectionActivationThreshold</code> to
3084 succeed:
3085 </p>
3086
3087 <ul>
3088 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without
3089 any parent parsers).
3090 </li>
3091 </ul>
3092
3093 <p>
3094 <strong>Note:</strong> If you ever need to increase this value for non-attack
3095 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a
3096 bug report</a>.
3097 </p>
3098
3099 <p>
3100 <strong>Note:</strong> Activation thresholds below 4 MiB are known to break
3101 support for <a href=
3102 "https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</a>
3103 1.3 payload and are hence not recommended.
3104 </p>
3105 </div>
3106
3107 <h4 id="XML_SetAllocTrackerMaximumAmplification">
3108 XML_SetAllocTrackerMaximumAmplification
3109 </h4>
3110
3111 <pre class="fcndec">
3112/* Added in Expat 2.7.2. */
3113XML_Bool
3114XML_SetAllocTrackerMaximumAmplification(XML_Parser p,
3115 float maximumAmplificationFactor);
3116</pre>
3117 <div class="fcndef">
3118 <p>
3119 Sets the maximum tolerated amplification factor between direct input and bytes
3120 of dynamic memory allocated (default: <code>100.0</code>) of parser
3121 <code>p</code> to <code>maximumAmplificationFactor</code>, and returns
3122 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error.
3123 </p>
3124
3125 <p>
3126 <strong>Note:</strong> There are three types of allocations that intentionally
3127 bypass tracking and limiting:
3128 </p>
3129
3130 <ul>
3131 <li>application calls to functions <code><a href=
3132 "#XML_MemMalloc">XML_MemMalloc</a></code> and <code><a href="#XML_MemRealloc">
3133 XML_MemRealloc</a></code> — <em>healthy</em> use of these two functions
3134 continues to be a responsibility of the application using Expat —,
3135 </li>
3136
3137 <li>the main character buffer used by functions <code><a href="#XML_GetBuffer">
3138 XML_GetBuffer</a></code> and <code><a href=
3139 "#XML_ParseBuffer">XML_ParseBuffer</a></code> (and thus also by plain
3140 <code><a href="#XML_Parse">XML_Parse</a></code>), and
3141 </li>
3142
3143 <li>the <a href="#XML_SetElementDeclHandler">content model memory</a> (that is
3144 passed to the <a href="#XML_SetElementDeclHandler">element declaration
3145 handler</a> and freed by a call to <code><a href=
3146 "#XML_FreeContentModel">XML_FreeContentModel</a></code>).
3147 </li>
3148 </ul>
3149
3150 <p>
3151 Once the <a href="#XML_SetAllocTrackerActivationThreshold">threshold for
3152 activation</a> is reached, the amplification factor is calculated as ..
3153 </p>
3154
3155 <pre>amplification := allocated / direct</pre>
3156 <p>
3157 .. while parsing, whereas <code>direct</code> is the number of bytes read from
3158 the primary document in parsing and <code>allocated</code> is the number of
3159 bytes of dynamic memory allocated in the parser hierarchy.
3160 </p>
3161
3162 <p>
3163 For a call to <code>XML_SetAllocTrackerMaximumAmplification</code> to succeed:
3164 </p>
3165
3166 <ul>
3167 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without
3168 any parent parsers) and
3169 </li>
3170
3171 <li>
3172 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and
3173 greater than or equal to <code>1.0</code>.
3174 </li>
3175 </ul>
3176
3177 <p>
3178 <strong>Note:</strong> If you ever need to increase this value for non-attack
3179 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a
3180 bug report</a>.
3181 </p>
3182
3183 <p>
3184 <strong>Note:</strong> Amplifications factors greater than <code>100.0</code>
3185 can been observed near the start of parsing even with benign files in practice.
3186 So if you do reduce the maximum allowed amplification, please make sure that
3187 the activation threshold is still big enough to not end up with undesired false
3188 positives (i.e. benign files being rejected).
3189 </p>
3190 </div>
3191
3192 <h4 id="XML_SetAllocTrackerActivationThreshold">
3193 XML_SetAllocTrackerActivationThreshold
3194 </h4>
3195
3196 <pre class="fcndec">
3197/* Added in Expat 2.7.2. */
3198XML_Bool
3199XML_SetAllocTrackerActivationThreshold(XML_Parser p,
3200 unsigned long long activationThresholdBytes);
3201</pre>
3202 <div class="fcndef">
3203 <p>
3204 Sets number of allocated bytes of dynamic memory needed to activate protection
3205 against disproportionate use of RAM (default: <code>64 MiB</code>) of parser
3206 <code>p</code> to <code>activationThresholdBytes</code>, and returns
3207 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error.
3208 </p>
3209
3210 <p>
3211 <strong>Note:</strong> For types of allocations that intentionally bypass
3212 tracking and limiting, please see <code><a href=
3213 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a></code>
3214 above.
3215 </p>
3216
3217 <p>
3218 For a call to <code>XML_SetAllocTrackerActivationThreshold</code> to succeed:
3219 </p>
3220
3221 <ul>
3222 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without
3223 any parent parsers).
3224 </li>
3225 </ul>
3226
3227 <p>
3228 <strong>Note:</strong> If you ever need to increase this value for non-attack
3229 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a
3230 bug report</a>.
3231 </p>
3232 </div>
3233
3234 <h4 id="XML_SetReparseDeferralEnabled">
3235 XML_SetReparseDeferralEnabled
3236 </h4>
3237
3238 <pre class="fcndec">
3239/* Added in Expat 2.6.0. */
3240XML_Bool XMLCALL
3241XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled);
3242</pre>
3243 <div class="fcndef">
3244 <p>
3245 Large tokens may require many parse calls before enough data is available for
3246 Expat to parse it in full. If Expat retried parsing the token on every parse
3247 call, parsing could take quadratic time. To avoid this, Expat only retries once
3248 a significant amount of new data is available. This function allows disabling
3249 this behavior.
3250 </p>
3251
3252 <p>
3253 The <code>enabled</code> argument should be <code>XML_TRUE</code> or
3254 <code>XML_FALSE</code>.
3255 </p>
3256
3257 <p>
3258 Returns <code>XML_TRUE</code> on success, and <code>XML_FALSE</code> on error.
3259 </p>
3260 </div>
3261
3262 <h3>
3263 <a id="miscellaneous" name="miscellaneous">Miscellaneous functions</a>
3264 </h3>
3265
3266 <p>
3267 The functions in this section either obtain state information from the parser or
3268 can be used to dynamically set parser options.
3269 </p>
3270
3271 <h4 id="XML_SetUserData">
3272 XML_SetUserData
3273 </h4>
3274
3275 <pre class="fcndec">
3276void XMLCALL
3277XML_SetUserData(XML_Parser p,
3278 void *userData);
3279</pre>
3280 <div class="fcndef">
3281 This sets the user data pointer that gets passed to handlers. It overwrites any
3282 previous value for this pointer. Note that the application is responsible for
3283 freeing the memory associated with <code>userData</code> when it is finished with
3284 the parser. So if you call this when there's already a pointer there, and you
3285 haven't freed the memory associated with it, then you've probably just leaked
3286 memory.
3287 </div>
3288
3289 <h4 id="XML_GetUserData">
3290 XML_GetUserData
3291 </h4>
3292
3293 <pre class="fcndec">
3294void * XMLCALL
3295XML_GetUserData(XML_Parser p);
3296</pre>
3297 <div class="fcndef">
3298 This returns the user data pointer that gets passed to handlers. It is actually
3299 implemented as a macro.
3300 </div>
3301
3302 <h4 id="XML_UseParserAsHandlerArg">
3303 XML_UseParserAsHandlerArg
3304 </h4>
3305
3306 <pre class="fcndec">
3307void XMLCALL
3308XML_UseParserAsHandlerArg(XML_Parser p);
3309</pre>
3310 <div class="fcndef">
3311 After this is called, handlers receive the parser in their <code>userData</code>
3312 arguments. The user data can still be obtained using the <code><a href=
3313 "#XML_GetUserData">XML_GetUserData</a></code> function.
3314 </div>
3315
3316 <h4 id="XML_SetBase">
3317 XML_SetBase
3318 </h4>
3319
3320 <pre class="fcndec">
3321enum XML_Status XMLCALL
3322XML_SetBase(XML_Parser p,
3323 const XML_Char *base);
3324</pre>
3325 <div class="fcndef">
3326 Set the base to be used for resolving relative URIs in system identifiers. The
3327 return value is <code>XML_STATUS_ERROR</code> if there's no memory to store base,
3328 otherwise it's <code>XML_STATUS_OK</code>.
3329 </div>
3330
3331 <h4 id="XML_GetBase">
3332 XML_GetBase
3333 </h4>
3334
3335 <pre class="fcndec">
3336const XML_Char * XMLCALL
3337XML_GetBase(XML_Parser p);
3338</pre>
3339 <div class="fcndef">
3340 Return the base for resolving relative URIs.
3341 </div>
3342
3343 <h4 id="XML_GetSpecifiedAttributeCount">
3344 XML_GetSpecifiedAttributeCount
3345 </h4>
3346
3347 <pre class="fcndec">
3348int XMLCALL
3349XML_GetSpecifiedAttributeCount(XML_Parser p);
3350</pre>
3351 <div class="fcndef">
3352 When attributes are reported to the start handler in the atts vector, attributes
3353 that were explicitly set in the element occur before any attributes that receive
3354 their value from default information in an ATTLIST declaration. This function
3355 returns the number of attributes that were explicitly set times two, thus giving
3356 the offset in the <code>atts</code> array passed to the start tag handler of the
3357 first attribute set due to defaults. It supplies information for the last call to
3358 a start handler. If called inside a start handler, then that means the current
3359 call.
3360 </div>
3361
3362 <h4 id="XML_GetIdAttributeIndex">
3363 XML_GetIdAttributeIndex
3364 </h4>
3365
3366 <pre class="fcndec">
3367int XMLCALL
3368XML_GetIdAttributeIndex(XML_Parser p);
3369</pre>
3370 <div class="fcndef">
3371 Returns the index of the ID attribute passed in the atts array in the last call
3372 to <code><a href="#XML_StartElementHandler">XML_StartElementHandler</a></code>,
3373 or -1 if there is no ID attribute. If called inside a start handler, then that
3374 means the current call.
3375 </div>
3376
3377 <h4 id="XML_GetAttributeInfo">
3378 XML_GetAttributeInfo
3379 </h4>
3380
3381 <pre class="fcndec">
3382const XML_AttrInfo * XMLCALL
3383XML_GetAttributeInfo(XML_Parser parser);
3384</pre>
3385
3386 <pre class="signature">
3387typedef struct {
3388 XML_Index nameStart; /* Offset to beginning of the attribute name. */
3389 XML_Index nameEnd; /* Offset after the attribute name's last byte. */
3390 XML_Index valueStart; /* Offset to beginning of the attribute value. */
3391 XML_Index valueEnd; /* Offset after the attribute value's last byte. */
3392} XML_AttrInfo;
3393</pre>
3394 <div class="fcndef">
3395 Returns an array of <code>XML_AttrInfo</code> structures for the attribute/value
3396 pairs passed in the last call to the <code>XML_StartElementHandler</code> that
3397 were specified in the start-tag rather than defaulted. Each attribute/value pair
3398 counts as 1; thus the number of entries in the array is
3399 <code>XML_GetSpecifiedAttributeCount(parser) / 2</code>.
3400 </div>
3401
3402 <h4 id="XML_SetEncoding">
3403 XML_SetEncoding
3404 </h4>
3405
3406 <pre class="fcndec">
3407enum XML_Status XMLCALL
3408XML_SetEncoding(XML_Parser p,
3409 const XML_Char *encoding);
3410</pre>
3411 <div class="fcndef">
3412 Set the encoding to be used by the parser. It is equivalent to passing a
3413 non-<code>NULL</code> encoding argument to the parser creation functions. It must
3414 not be called after <code><a href="#XML_Parse">XML_Parse</a></code> or
3415 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on
3416 the given parser. Returns <code>XML_STATUS_OK</code> on success or
3417 <code>XML_STATUS_ERROR</code> on error.
3418 </div>
3419
3420 <h4 id="XML_SetParamEntityParsing">
3421 XML_SetParamEntityParsing
3422 </h4>
3423
3424 <pre class="fcndec">
3425int XMLCALL
3426XML_SetParamEntityParsing(XML_Parser p,
3427 enum XML_ParamEntityParsing code);
3428</pre>
3429 <div class="fcndef">
3430 This enables parsing of parameter entities, including the external parameter
3431 entity that is the external DTD subset, according to <code>code</code>. The
3432 choices for <code>code</code> are:
3433 <ul>
3434 <li>
3435 <code>XML_PARAM_ENTITY_PARSING_NEVER</code>
3436 </li>
3437
3438 <li>
3439 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code>
3440 </li>
3441
3442 <li>
3443 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code>
3444 </li>
3445 </ul>
3446 <b>Note:</b> If <code>XML_SetParamEntityParsing</code> is called after
3447 <code>XML_Parse</code> or <code>XML_ParseBuffer</code>, then it has no effect and
3448 will always return 0.
3449 </div>
3450
3451 <h4 id="XML_SetHashSalt">
3452 XML_SetHashSalt
3453 </h4>
3454
3455 <pre class="fcndec">
3456int XMLCALL
3457XML_SetHashSalt(XML_Parser p,
3458 unsigned long hash_salt);
3459</pre>
3460 <div class="fcndef">
3461 Sets the hash salt to use for internal hash calculations. Helps in preventing DoS
3462 attacks based on predicting hash function behavior. In order to have an effect
3463 this must be called before parsing has started. Returns 1 if successful, 0 when
3464 called after <code>XML_Parse</code> or <code>XML_ParseBuffer</code>.
3465 <p>
3466 <b>Note:</b> This call is optional, as the parser will auto-generate a new
3467 random salt value if no value has been set at the start of parsing.
3468 </p>
3469
3470 <p>
3471 <b>Note:</b> One should not call <code>XML_SetHashSalt</code> with a hash salt
3472 value of 0, as this value is used as sentinel value to indicate that
3473 <code>XML_SetHashSalt</code> has <b>not</b> been called. Consequently such a
3474 call will have no effect, even if it returns 1.
3475 </p>
3476 </div>
3477
3478 <h4 id="XML_UseForeignDTD">
3479 XML_UseForeignDTD
3480 </h4>
3481
3482 <pre class="fcndec">
3483enum XML_Error XMLCALL
3484XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD);
3485</pre>
3486 <div class="fcndef">
3487 <p>
3488 This function allows an application to provide an external subset for the
3489 document type declaration for documents which do not specify an external subset
3490 of their own. For documents which specify an external subset in their DOCTYPE
3491 declaration, the application-provided subset will be ignored. If the document
3492 does not contain a DOCTYPE declaration at all and <code>useDTD</code> is true,
3493 the application-provided subset will be parsed, but the
3494 <code>startDoctypeDeclHandler</code> and <code>endDoctypeDeclHandler</code>
3495 functions, if set, will not be called. The setting of parameter entity parsing,
3496 controlled using <code><a href=
3497 "#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code>, will be
3498 honored.
3499 </p>
3500
3501 <p>
3502 The application-provided external subset is read by calling the external entity
3503 reference handler set via <code><a href=
3504 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>
3505 with both <code>publicId</code> and <code>systemId</code> set to
3506 <code>NULL</code>.
3507 </p>
3508
3509 <p>
3510 If this function is called after parsing has begun, it returns
3511 <code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores
3512 <code>useDTD</code>. If called when Expat has been compiled without DTD
3513 support, it returns <code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise,
3514 it returns <code>XML_ERROR_NONE</code>.
3515 </p>
3516
3517 <p>
3518 <b>Note:</b> For the purpose of checking WFC: Entity Declared, passing
3519 <code>useDTD == XML_TRUE</code> will make the parser behave as if the document
3520 had a DTD with an external subset. This holds true even if the external entity
3521 reference handler returns without action.
3522 </p>
3523 </div>
3524
3525 <h4 id="XML_SetReturnNSTriplet">
3526 XML_SetReturnNSTriplet
3527 </h4>
3528
3529 <pre class="fcndec">
3530void XMLCALL
3531XML_SetReturnNSTriplet(XML_Parser parser,
3532 int do_nst);
3533</pre>
3534 <div class="fcndef">
3535 <p>
3536 This function only has an effect when using a parser created with
3537 <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, i.e. when
3538 namespace processing is in effect. The <code>do_nst</code> sets whether or not
3539 prefixes are returned with names qualified with a namespace prefix. If this
3540 function is called with <code>do_nst</code> non-zero, then afterwards namespace
3541 qualified names (that is qualified with a prefix as opposed to belonging to a
3542 default namespace) are returned as a triplet with the three parts separated by
3543 the namespace separator specified when the parser was created. The order of
3544 returned parts is URI, local name, and prefix.
3545 </p>
3546
3547 <p>
3548 If <code>do_nst</code> is zero, then namespaces are reported in the default
3549 manner, URI then local_name separated by the namespace separator.
3550 </p>
3551 </div>
3552
3553 <h4 id="XML_DefaultCurrent">
3554 XML_DefaultCurrent
3555 </h4>
3556
3557 <pre class="fcndec">
3558void XMLCALL
3559XML_DefaultCurrent(XML_Parser parser);
3560</pre>
3561 <div class="fcndef">
3562 This can be called within a handler for a start element, end element, processing
3563 instruction or character data. It causes the corresponding markup to be passed to
3564 the default handler set by <code><a href=
3565 "#XML_SetDefaultHandler">XML_SetDefaultHandler</a></code> or <code><a href=
3566 "#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></code>. It does
3567 nothing if there is not a default handler.
3568 </div>
3569
3570 <h4 id="XML_ExpatVersion">
3571 XML_ExpatVersion
3572 </h4>
3573
3574 <pre class="fcndec">
3575XML_LChar * XMLCALL
3576XML_ExpatVersion();
3577</pre>
3578 <div class="fcndef">
3579 Return the library version as a string (e.g. <code>"expat_1.95.1"</code>).
3580 </div>
3581
3582 <h4 id="XML_ExpatVersionInfo">
3583 XML_ExpatVersionInfo
3584 </h4>
3585
3586 <pre class="fcndec">
3587struct XML_Expat_Version XMLCALL
3588XML_ExpatVersionInfo();
3589</pre>
3590
3591 <pre class="signature">
3592typedef struct {
3593 int major;
3594 int minor;
3595 int micro;
3596} XML_Expat_Version;
3597</pre>
3598 <div class="fcndef">
3599 Return the library version information as a structure. Some macros are also
3600 defined that support compile-time tests of the library version:
3601 <ul>
3602 <li>
3603 <code>XML_MAJOR_VERSION</code>
3604 </li>
3605
3606 <li>
3607 <code>XML_MINOR_VERSION</code>
3608 </li>
3609
3610 <li>
3611 <code>XML_MICRO_VERSION</code>
3612 </li>
3613 </ul>
3614 Testing these constants is currently the best way to determine if particular
3615 parts of the Expat API are available.
3616 </div>
3617
3618 <h4 id="XML_GetFeatureList">
3619 XML_GetFeatureList
3620 </h4>
3621
3622 <pre class="fcndec">
3623const XML_Feature * XMLCALL
3624XML_GetFeatureList();
3625</pre>
3626
3627 <pre class="signature">
3628enum XML_FeatureEnum {
3629 XML_FEATURE_END = 0,
3630 XML_FEATURE_UNICODE,
3631 XML_FEATURE_UNICODE_WCHAR_T,
3632 XML_FEATURE_DTD,
3633 XML_FEATURE_CONTEXT_BYTES,
3634 XML_FEATURE_MIN_SIZE,
3635 XML_FEATURE_SIZEOF_XML_CHAR,
3636 XML_FEATURE_SIZEOF_XML_LCHAR,
3637 XML_FEATURE_NS,
3638 XML_FEATURE_LARGE_SIZE
3639};
3640
3641typedef struct {
3642 enum XML_FeatureEnum feature;
3643 XML_LChar *name;
3644 long int value;
3645} XML_Feature;
3646</pre>
3647 <div class="fcndef">
3648 <p>
3649 Returns a list of "feature" records, providing details on how Expat was
3650 configured at compile time. Most applications should not need to worry about
3651 this, but this information is otherwise not available from Expat. This function
3652 allows code that does need to check these features to do so at runtime.
3653 </p>
3654
3655 <p>
3656 The return value is an array of <code>XML_Feature</code>, terminated by a
3657 record with a <code>feature</code> of <code>XML_FEATURE_END</code> and
3658 <code>name</code> of <code>NULL</code>, identifying the feature-test macros
3659 Expat was compiled with. Since an application that requires this kind of
3660 information needs to determine the type of character the <code>name</code>
3661 points to, records for the <code>XML_FEATURE_SIZEOF_XML_CHAR</code> and
3662 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the beginning of
3663 the list, followed by <code>XML_FEATURE_UNICODE</code> and
3664 <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at all.
3665 </p>
3666
3667 <p>
3668 Some features have an associated value. If there isn't an associated value, the
3669 <code>value</code> field is set to 0. At this time, the following features have
3670 been defined to have values:
3671 </p>
3672
3673 <dl>
3674 <dt>
3675 <code>XML_FEATURE_SIZEOF_XML_CHAR</code>
3676 </dt>
3677
3678 <dd>
3679 The number of bytes occupied by one <code>XML_Char</code> character.
3680 </dd>
3681
3682 <dt>
3683 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code>
3684 </dt>
3685
3686 <dd>
3687 The number of bytes occupied by one <code>XML_LChar</code> character.
3688 </dd>
3689
3690 <dt>
3691 <code>XML_FEATURE_CONTEXT_BYTES</code>
3692 </dt>
3693
3694 <dd>
3695 The maximum number of characters of context which can be reported by
3696 <code><a href="#XML_GetInputContext">XML_GetInputContext</a></code>.
3697 </dd>
3698 </dl>
3699 </div>
3700
3701 <h4 id="XML_FreeContentModel">
3702 XML_FreeContentModel
3703 </h4>
3704
3705 <pre class="fcndec">
3706void XMLCALL
3707XML_FreeContentModel(XML_Parser parser, XML_Content *model);
3708</pre>
3709 <div class="fcndef">
3710 Function to deallocate the <code>model</code> argument passed to the
3711 <code>XML_ElementDeclHandler</code> callback set using <code><a href=
3712 "#XML_SetElementDeclHandler">XML_ElementDeclHandler</a></code>. This function
3713 should not be used for any other purpose.
3714 </div>
3715
3716 <p>
3717 The following functions allow external code to share the memory allocator an
3718 <code>XML_Parser</code> has been configured to use. This is especially useful for
3719 third-party libraries that interact with a parser object created by application
3720 code, or heavily layered applications. This can be essential when using
3721 dynamically loaded libraries which use different C standard libraries (this can
3722 happen on Windows, at least).
3723 </p>
3724
3725 <h4 id="XML_MemMalloc">
3726 XML_MemMalloc
3727 </h4>
3728
3729 <pre class="fcndec">
3730void * XMLCALL
3731XML_MemMalloc(XML_Parser parser, size_t size);
3732</pre>
3733 <div class="fcndef">
3734 Allocate <code>size</code> bytes of memory using the allocator the
3735 <code>parser</code> object has been configured to use. Returns a pointer to the
3736 memory or <code>NULL</code> on failure. Memory allocated in this way must be
3737 freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>.
3738 </div>
3739
3740 <h4 id="XML_MemRealloc">
3741 XML_MemRealloc
3742 </h4>
3743
3744 <pre class="fcndec">
3745void * XMLCALL
3746XML_MemRealloc(XML_Parser parser, void *ptr, size_t size);
3747</pre>
3748 <div class="fcndef">
3749 Allocate <code>size</code> bytes of memory using the allocator the
3750 <code>parser</code> object has been configured to use. <code>ptr</code> must
3751 point to a block of memory allocated by <code><a href=
3752 "#XML_MemMalloc">XML_MemMalloc</a></code> or <code>XML_MemRealloc</code>, or be
3753 <code>NULL</code>. This function tries to expand the block pointed to by
3754 <code>ptr</code> if possible. Returns a pointer to the memory or
3755 <code>NULL</code> on failure. On success, the original block has either been
3756 expanded or freed. On failure, the original block has not been freed; the caller
3757 is responsible for freeing the original block. Memory allocated in this way must
3758 be freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>.
3759 </div>
3760
3761 <h4 id="XML_MemFree">
3762 XML_MemFree
3763 </h4>
3764
3765 <pre class="fcndec">
3766void XMLCALL
3767XML_MemFree(XML_Parser parser, void *ptr);
3768</pre>
3769 <div class="fcndef">
3770 Free a block of memory pointed to by <code>ptr</code>. The block must have been
3771 allocated by <code><a href="#XML_MemMalloc">XML_MemMalloc</a></code> or
3772 <code>XML_MemRealloc</code>, or be <code>NULL</code>.
3773 </div>
3774
3775 <hr />
3776
3777 <div class="footer">
3778 Found a bug in the documentation? <a href=
3779 "https://github.com/libexpat/libexpat/issues">Please file a bug report.</a>
3780 </div>
3781 </div>
3782 </body>
3783</html>