jcs's openbsd hax
openbsd
at jcs 3783 lines 138 kB view raw
1<?xml version="1.0" encoding="utf-8"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 5 <head> 6 <!-- 7 __ __ _ 8 ___\ \/ /_ __ __ _| |_ 9 / _ \\ /| '_ \ / _` | __| 10 | __// \| |_) | (_| | |_ 11 \___/_/\_\ .__/ \__,_|\__| 12 |_| XML parser 13 14 Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net> 15 Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net> 16 Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net> 17 Copyright (c) 2017-2026 Sebastian Pipping <sebastian@pipping.org> 18 Copyright (c) 2017 Jakub Wilk <jwilk@jwilk.net> 19 Copyright (c) 2021 Tomas Korbar <tkorbar@redhat.com> 20 Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr> 21 Copyright (c) 2022 Thijs Schreijer <thijs@thijsschreijer.nl> 22 Copyright (c) 2023-2025 Hanno Böck <hanno@gentoo.org> 23 Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com> 24 Licensed under the MIT license: 25 26 Permission is hereby granted, free of charge, to any person obtaining 27 a copy of this software and associated documentation files (the 28 "Software"), to deal in the Software without restriction, including 29 without limitation the rights to use, copy, modify, merge, publish, 30 distribute, sublicense, and/or sell copies of the Software, and to permit 31 persons to whom the Software is furnished to do so, subject to the 32 following conditions: 33 34 The above copyright notice and this permission notice shall be included 35 in all copies or substantial portions of the Software. 36 37 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 38 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 39 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 40 NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 41 DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 42 OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 43 USE OR OTHER DEALINGS IN THE SOFTWARE. 44--> 45 46 <title> 47 Expat XML Parser 48 </title> 49 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" /> 50 <link href="ok.min.css" rel="stylesheet" /> 51 <link href="style.css" rel="stylesheet" /> 52 </head> 53 <body> 54 <div> 55 <h1> 56 The Expat XML Parser <small>Release 2.7.4</small> 57 </h1> 58 </div> 59 60 <div class="content"> 61 <p> 62 Expat is a library, written in C, for parsing XML documents. It's the underlying 63 XML parser for the open source Mozilla project, Perl's <code>XML::Parser</code>, 64 Python's <code>xml.parsers.expat</code>, and other open-source XML parsers. 65 </p> 66 67 <p> 68 This library is the creation of James Clark, who's also given us groff (an nroff 69 look-alike), Jade (an implementation of ISO's DSSSL stylesheet language for 70 SGML), XP (a Java XML parser package), XT (a Java XSL engine). James was also the 71 technical lead on the XML Working Group at W3C that produced the XML 72 specification. 73 </p> 74 75 <p> 76 This is free software, licensed under the <a href="../COPYING">MIT/X Consortium 77 license</a>. You may download it from <a href="https://libexpat.github.io/">the 78 Expat home page</a>. 79 </p> 80 81 <p> 82 The bulk of this document was originally commissioned as an article by <a href= 83 "https://www.xml.com/">XML.com</a>. They graciously allowed Clark Cooper to 84 retain copyright and to distribute it with Expat. This version has been 85 substantially extended to include documentation on features which have been added 86 since the original article was published, and additional information on using the 87 original interface. 88 </p> 89 90 <hr /> 91 92 <h2> 93 Table of Contents 94 </h2> 95 96 <ul> 97 <li> 98 <a href="#overview">Overview</a> 99 </li> 100 101 <li> 102 <a href="#building">Building and Installing</a> 103 </li> 104 105 <li> 106 <a href="#using">Using Expat</a> 107 </li> 108 109 <li> 110 <a href="#reference">Reference</a> 111 <ul> 112 <li> 113 <a href="#creation">Parser Creation Functions</a> 114 <ul> 115 <li> 116 <a href="#XML_ParserCreate">XML_ParserCreate</a> 117 </li> 118 119 <li> 120 <a href="#XML_ParserCreateNS">XML_ParserCreateNS</a> 121 </li> 122 123 <li> 124 <a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a> 125 </li> 126 127 <li> 128 <a href= 129 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a> 130 </li> 131 132 <li> 133 <a href="#XML_ParserFree">XML_ParserFree</a> 134 </li> 135 136 <li> 137 <a href="#XML_ParserReset">XML_ParserReset</a> 138 </li> 139 </ul> 140 </li> 141 142 <li> 143 <a href="#parsing">Parsing Functions</a> 144 <ul> 145 <li> 146 <a href="#XML_Parse">XML_Parse</a> 147 </li> 148 149 <li> 150 <a href="#XML_ParseBuffer">XML_ParseBuffer</a> 151 </li> 152 153 <li> 154 <a href="#XML_GetBuffer">XML_GetBuffer</a> 155 </li> 156 157 <li> 158 <a href="#XML_StopParser">XML_StopParser</a> 159 </li> 160 161 <li> 162 <a href="#XML_ResumeParser">XML_ResumeParser</a> 163 </li> 164 165 <li> 166 <a href="#XML_GetParsingStatus">XML_GetParsingStatus</a> 167 </li> 168 </ul> 169 </li> 170 171 <li> 172 <a href="#setting">Handler Setting Functions</a> 173 <ul> 174 <li> 175 <a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a> 176 </li> 177 178 <li> 179 <a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a> 180 </li> 181 182 <li> 183 <a href="#XML_SetElementHandler">XML_SetElementHandler</a> 184 </li> 185 186 <li> 187 <a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a> 188 </li> 189 190 <li> 191 <a href= 192 "#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a> 193 </li> 194 195 <li> 196 <a href="#XML_SetCommentHandler">XML_SetCommentHandler</a> 197 </li> 198 199 <li> 200 <a href= 201 "#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a> 202 </li> 203 204 <li> 205 <a href= 206 "#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a> 207 </li> 208 209 <li> 210 <a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a> 211 </li> 212 213 <li> 214 <a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a> 215 </li> 216 217 <li> 218 <a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a> 219 </li> 220 221 <li> 222 <a href= 223 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a> 224 </li> 225 226 <li> 227 <a href= 228 "#XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</a> 229 </li> 230 231 <li> 232 <a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a> 233 </li> 234 235 <li> 236 <a href= 237 "#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a> 238 </li> 239 240 <li> 241 <a href= 242 "#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a> 243 </li> 244 245 <li> 246 <a href= 247 "#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a> 248 </li> 249 250 <li> 251 <a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a> 252 </li> 253 254 <li> 255 <a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a> 256 </li> 257 258 <li> 259 <a href= 260 "#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a> 261 </li> 262 263 <li> 264 <a href= 265 "#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a> 266 </li> 267 268 <li> 269 <a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a> 270 </li> 271 272 <li> 273 <a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a> 274 </li> 275 276 <li> 277 <a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a> 278 </li> 279 280 <li> 281 <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a> 282 </li> 283 284 <li> 285 <a href= 286 "#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a> 287 </li> 288 289 <li> 290 <a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a> 291 </li> 292 293 <li> 294 <a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a> 295 </li> 296 </ul> 297 </li> 298 299 <li> 300 <a href="#position">Parse Position and Error Reporting Functions</a> 301 <ul> 302 <li> 303 <a href="#XML_GetErrorCode">XML_GetErrorCode</a> 304 </li> 305 306 <li> 307 <a href="#XML_ErrorString">XML_ErrorString</a> 308 </li> 309 310 <li> 311 <a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a> 312 </li> 313 314 <li> 315 <a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a> 316 </li> 317 318 <li> 319 <a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a> 320 </li> 321 322 <li> 323 <a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a> 324 </li> 325 326 <li> 327 <a href="#XML_GetInputContext">XML_GetInputContext</a> 328 </li> 329 </ul> 330 </li> 331 332 <li> 333 <a href="#attack-protection">Attack Protection</a> 334 <ul> 335 <li> 336 <a href= 337 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a> 338 </li> 339 340 <li> 341 <a href= 342 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a> 343 </li> 344 345 <li> 346 <a href= 347 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a> 348 </li> 349 350 <li> 351 <a href= 352 "#XML_SetAllocTrackerActivationThreshold">XML_SetAllocTrackerActivationThreshold</a> 353 </li> 354 355 <li> 356 <a href= 357 "#XML_SetReparseDeferralEnabled">XML_SetReparseDeferralEnabled</a> 358 </li> 359 </ul> 360 </li> 361 362 <li> 363 <a href="#miscellaneous">Miscellaneous Functions</a> 364 <ul> 365 <li> 366 <a href="#XML_SetUserData">XML_SetUserData</a> 367 </li> 368 369 <li> 370 <a href="#XML_GetUserData">XML_GetUserData</a> 371 </li> 372 373 <li> 374 <a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a> 375 </li> 376 377 <li> 378 <a href="#XML_SetBase">XML_SetBase</a> 379 </li> 380 381 <li> 382 <a href="#XML_GetBase">XML_GetBase</a> 383 </li> 384 385 <li> 386 <a href= 387 "#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a> 388 </li> 389 390 <li> 391 <a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a> 392 </li> 393 394 <li> 395 <a href="#XML_GetAttributeInfo">XML_GetAttributeInfo</a> 396 </li> 397 398 <li> 399 <a href="#XML_SetEncoding">XML_SetEncoding</a> 400 </li> 401 402 <li> 403 <a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a> 404 </li> 405 406 <li> 407 <a href="#XML_SetHashSalt">XML_SetHashSalt</a> 408 </li> 409 410 <li> 411 <a href="#XML_UseForeignDTD">XML_UseForeignDTD</a> 412 </li> 413 414 <li> 415 <a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a> 416 </li> 417 418 <li> 419 <a href="#XML_DefaultCurrent">XML_DefaultCurrent</a> 420 </li> 421 422 <li> 423 <a href="#XML_ExpatVersion">XML_ExpatVersion</a> 424 </li> 425 426 <li> 427 <a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a> 428 </li> 429 430 <li> 431 <a href="#XML_GetFeatureList">XML_GetFeatureList</a> 432 </li> 433 434 <li> 435 <a href="#XML_FreeContentModel">XML_FreeContentModel</a> 436 </li> 437 438 <li> 439 <a href="#XML_MemMalloc">XML_MemMalloc</a> 440 </li> 441 442 <li> 443 <a href="#XML_MemRealloc">XML_MemRealloc</a> 444 </li> 445 446 <li> 447 <a href="#XML_MemFree">XML_MemFree</a> 448 </li> 449 </ul> 450 </li> 451 </ul> 452 </li> 453 </ul> 454 455 <hr /> 456 457 <h2> 458 <a id="overview" name="overview">Overview</a> 459 </h2> 460 461 <p> 462 Expat is a stream-oriented parser. You register callback (or handler) functions 463 with the parser and then start feeding it the document. As the parser recognizes 464 parts of the document, it will call the appropriate handler for that part (if 465 you've registered one.) The document is fed to the parser in pieces, so you can 466 start parsing before you have all the document. This also allows you to parse 467 really huge documents that won't fit into memory. 468 </p> 469 470 <p> 471 Expat can be intimidating due to the many kinds of handlers and options you can 472 set. But you only need to learn four functions in order to do 90% of what you'll 473 want to do with it: 474 </p> 475 476 <dl> 477 <dt> 478 <code><a href="#XML_ParserCreate">XML_ParserCreate</a></code> 479 </dt> 480 481 <dd> 482 Create a new parser object. 483 </dd> 484 485 <dt> 486 <code><a href="#XML_SetElementHandler">XML_SetElementHandler</a></code> 487 </dt> 488 489 <dd> 490 Set handlers for start and end tags. 491 </dd> 492 493 <dt> 494 <code><a href= 495 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code> 496 </dt> 497 498 <dd> 499 Set handler for text. 500 </dd> 501 502 <dt> 503 <code><a href="#XML_Parse">XML_Parse</a></code> 504 </dt> 505 506 <dd> 507 Pass a buffer full of document to the parser 508 </dd> 509 </dl> 510 511 <p> 512 These functions and others are described in the <a href= 513 "#reference">reference</a> part of this document. The reference section also 514 describes in detail the parameters passed to the different types of handlers. 515 </p> 516 517 <p> 518 Let's look at a very simple example program that only uses 3 of the above 519 functions (it doesn't need to set a character handler.) The program <a href= 520 "../examples/outline.c">outline.c</a> prints an element outline, indenting child 521 elements to distinguish them from the parent element that contains them. The 522 start handler does all the work. It prints two indenting spaces for every level 523 of ancestor elements, then it prints the element and attribute information. 524 Finally it increments the global <code>Depth</code> variable. 525 </p> 526 527 <pre class="eg"> 528int Depth; 529 530void XMLCALL 531start(void *data, const char *el, const char **attr) { 532 int i; 533 534 for (i = 0; i &lt; Depth; i++) 535 printf(" "); 536 537 printf("%s", el); 538 539 for (i = 0; attr[i]; i += 2) { 540 printf(" %s='%s'", attr[i], attr[i + 1]); 541 } 542 543 printf("\n"); 544 Depth++; 545} /* End of start handler */ 546</pre> 547 <p> 548 The end tag simply does the bookkeeping work of decrementing <code>Depth</code>. 549 </p> 550 551 <pre class="eg"> 552void XMLCALL 553end(void *data, const char *el) { 554 Depth--; 555} /* End of end handler */ 556</pre> 557 <p> 558 Note the <code>XMLCALL</code> annotation used for the callbacks. This is used to 559 ensure that the Expat and the callbacks are using the same calling convention in 560 case the compiler options used for Expat itself and the client code are 561 different. Expat tries not to care what the default calling convention is, though 562 it may require that it be compiled with a default convention of "cdecl" on some 563 platforms. For code which uses Expat, however, the calling convention is 564 specified by the <code>XMLCALL</code> annotation on most platforms; callbacks 565 should be defined using this annotation. 566 </p> 567 568 <p> 569 The <code>XMLCALL</code> annotation was added in Expat 1.95.7, but existing 570 working Expat applications don't need to add it (since they are already using the 571 "cdecl" calling convention, or they wouldn't be working). The annotation is only 572 needed if the default calling convention may be something other than "cdecl". To 573 use the annotation safely with older versions of Expat, you can conditionally 574 define it <em>after</em> including Expat's header file: 575 </p> 576 577 <pre class="eg"> 578#include &lt;expat.h&gt; 579 580#ifndef XMLCALL 581#if defined(_MSC_VER) &amp;&amp; !defined(__BEOS__) &amp;&amp; !defined(__CYGWIN__) 582#define XMLCALL __cdecl 583#elif defined(__GNUC__) 584#define XMLCALL __attribute__((cdecl)) 585#else 586#define XMLCALL 587#endif 588#endif 589</pre> 590 <p> 591 After creating the parser, the main program just has the job of shoveling the 592 document to the parser so that it can do its work. 593 </p> 594 595 <hr /> 596 597 <h2> 598 <a id="building" name="building">Building and Installing Expat</a> 599 </h2> 600 601 <p> 602 The Expat distribution comes as a compressed (with GNU gzip) tar file. You may 603 download the latest version from <a href= 604 "https://sourceforge.net/projects/expat/">Source Forge</a>. After unpacking this, 605 cd into the directory. Then follow either the Win32 directions or Unix directions 606 below. 607 </p> 608 609 <h3> 610 Building under Win32 611 </h3> 612 613 <p> 614 If you're using the GNU compiler under cygwin, follow the Unix directions in the 615 next section. Otherwise if you have Microsoft's Developer Studio installed, you 616 can use CMake to generate a <code>.sln</code> file, e.g. <code>cmake -G"Visual 617 Studio 17 2022" -DCMAKE_BUILD_TYPE=RelWithDebInfo .</code> , and build Expat 618 using <code>msbuild /m expat.sln</code> after. 619 </p> 620 621 <p> 622 Alternatively, you may download the Win32 binary package that contains the 623 "expat.h" include file and a pre-built DLL. 624 </p> 625 626 <h3> 627 Building under Unix (or GNU) 628 </h3> 629 630 <p> 631 First you'll need to run the configure shell script in order to configure the 632 Makefiles and headers for your system. 633 </p> 634 635 <p> 636 If you're happy with all the defaults that configure picks for you, and you have 637 permission on your system to install into /usr/local, you can install Expat with 638 this sequence of commands: 639 </p> 640 641 <pre class="eg"> 642./configure 643make 644make install 645</pre> 646 <p> 647 There are some options that you can provide to this script, but the only one 648 we'll mention here is the <code>--prefix</code> option. You can find out all the 649 options available by running configure with just the <code>--help</code> option. 650 </p> 651 652 <p> 653 By default, the configure script sets things up so that the library gets 654 installed in <code>/usr/local/lib</code> and the associated header file in 655 <code>/usr/local/include</code>. But if you were to give the option, 656 <code>--prefix=/home/me/mystuff</code>, then the library and header would get 657 installed in <code>/home/me/mystuff/lib</code> and 658 <code>/home/me/mystuff/include</code> respectively. 659 </p> 660 661 <h3> 662 Configuring Expat Using the Pre-Processor 663 </h3> 664 665 <p> 666 Expat's feature set can be configured using a small number of pre-processor 667 definitions. The symbols are: 668 </p> 669 670 <dl class="cpp-symbols"> 671 <dt> 672 <a id="XML_GE" name="XML_GE">XML_GE</a> 673 </dt> 674 675 <dd> 676 Added in Expat 2.6.0. Include support for <a href= 677 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-physical-struct">general 678 entities</a> (syntax <code>&amp;e1;</code> to reference and syntax 679 <code>&lt;!ENTITY e1 'value1'&gt;</code> (an internal general entity) or 680 <code>&lt;!ENTITY e2 SYSTEM 'file2'&gt;</code> (an external general entity) to 681 declare). With <code>XML_GE</code> enabled, general entities will be replaced 682 by their declared replacement text; for this to work for <em>external</em> 683 general entities, in addition an <code><a href= 684 "#XML_SetExternalEntityRefHandler">XML_ExternalEntityRefHandler</a></code> must 685 be set using <code><a href= 686 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>. 687 Also, enabling <code>XML_GE</code> makes the functions <code><a href= 688 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code> 689 and <code><a href= 690 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code> 691 available.<br /> 692 With <code>XML_GE</code> disabled, Expat has a smaller memory footprint and can 693 be faster, but will not load external general entities and will replace all 694 general entities (except the <a href= 695 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">predefined 696 five</a>: <code>amp</code>, <code>apos</code>, <code>gt</code>, 697 <code>lt</code>, <code>quot</code>) with a self-reference: for example, 698 referencing an entity <code>e1</code> via <code>&amp;e1;</code> will be 699 replaced by text <code>&amp;e1;</code>. 700 </dd> 701 702 <dt> 703 <a id="XML_DTD" name="XML_DTD">XML_DTD</a> 704 </dt> 705 706 <dd> 707 Include support for using and reporting DTD-based content. If this is defined, 708 default attribute values from an external DTD subset are reported and attribute 709 value normalization occurs based on the type of attributes defined in the 710 external subset. Without this, Expat has a smaller memory footprint and can be 711 faster, but will not load external parameter entities or process conditional 712 sections. If defined, makes the functions <code><a href= 713 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code> 714 and <code><a href= 715 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code> 716 available. 717 </dd> 718 719 <dt> 720 <a id="XML_NS" name="XML_NS">XML_NS</a> 721 </dt> 722 723 <dd> 724 When defined, support for the <cite><a href= 725 "https://www.w3.org/TR/REC-xml-names/">Namespaces in XML</a></cite> 726 specification is included. 727 </dd> 728 729 <dt> 730 <a id="XML_UNICODE" name="XML_UNICODE">XML_UNICODE</a> 731 </dt> 732 733 <dd> 734 When defined, character data reported to the application is encoded in UTF-16 735 using wide characters of the type <code>XML_Char</code>. This is implied if 736 <code>XML_UNICODE_WCHAR_T</code> is defined. 737 </dd> 738 739 <dt> 740 <a id="XML_UNICODE_WCHAR_T" name="XML_UNICODE_WCHAR_T">XML_UNICODE_WCHAR_T</a> 741 </dt> 742 743 <dd> 744 If defined, causes the <code>XML_Char</code> character type to be defined using 745 the <code>wchar_t</code> type; otherwise, <code>unsigned short</code> is used. 746 Defining this implies <code>XML_UNICODE</code>. 747 </dd> 748 749 <dt> 750 <a id="XML_LARGE_SIZE" name="XML_LARGE_SIZE">XML_LARGE_SIZE</a> 751 </dt> 752 753 <dd> 754 If defined, causes the <code>XML_Size</code> and <code>XML_Index</code> integer 755 types to be at least 64 bits in size. This is intended to support processing of 756 very large input streams, where the return values of <code><a href= 757 "#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></code>, <code><a href= 758 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and 759 <code><a href= 760 "#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code> could 761 overflow. It may not be supported by all compilers, and is turned off by 762 default. 763 </dd> 764 765 <dt> 766 <a id="XML_CONTEXT_BYTES" name="XML_CONTEXT_BYTES">XML_CONTEXT_BYTES</a> 767 </dt> 768 769 <dd> 770 The number of input bytes of markup context which the parser will ensure are 771 available for reporting via <code><a href= 772 "#XML_GetInputContext">XML_GetInputContext</a></code>. This is normally set to 773 1024, and must be set to a positive integer to enable. If this is set to zero, 774 the input context will not be available and <code><a href= 775 "#XML_GetInputContext">XML_GetInputContext</a></code> will always report 776 <code>NULL</code>. Without this, Expat has a smaller memory footprint and can 777 be faster. 778 </dd> 779 780 <dt> 781 <a id="XML_STATIC" name="XML_STATIC">XML_STATIC</a> 782 </dt> 783 784 <dd> 785 On Windows, this should be set if Expat is going to be linked statically with 786 the code that calls it; this is required to get all the right MSVC magic 787 annotations correct. This is ignored on other platforms. 788 </dd> 789 790 <dt> 791 <a id="XML_ATTR_INFO" name="XML_ATTR_INFO">XML_ATTR_INFO</a> 792 </dt> 793 794 <dd> 795 If defined, makes the additional function <code><a href= 796 "#XML_GetAttributeInfo">XML_GetAttributeInfo</a></code> available for reporting 797 attribute byte offsets. 798 </dd> 799 </dl> 800 801 <hr /> 802 803 <h2> 804 <a id="using" name="using">Using Expat</a> 805 </h2> 806 807 <h3> 808 Compiling and Linking Against Expat 809 </h3> 810 811 <p> 812 Unless you installed Expat in a location not expected by your compiler and 813 linker, all you have to do to use Expat in your programs is to include the Expat 814 header (<code>#include &lt;expat.h&gt;</code>) in your files that make calls to 815 it and to tell the linker that it needs to link against the Expat library. On 816 Unix systems, this would usually be done with the <code>-lexpat</code> argument. 817 Otherwise, you'll need to tell the compiler where to look for the Expat header 818 and the linker where to find the Expat library. You may also need to take steps 819 to tell the operating system where to find this library at run time. 820 </p> 821 822 <p> 823 On a Unix-based system, here's what a Makefile might look like when Expat is 824 installed in a standard location: 825 </p> 826 827 <pre class="eg"> 828CC=cc 829LDFLAGS= 830LIBS= -lexpat 831xmlapp: xmlapp.o 832 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 833</pre> 834 <p> 835 If you installed Expat in, say, <code>/home/me/mystuff</code>, then the Makefile 836 would look like this: 837 </p> 838 839 <pre class="eg"> 840CC=cc 841CFLAGS= -I/home/me/mystuff/include 842LDFLAGS= 843LIBS= -L/home/me/mystuff/lib -lexpat 844xmlapp: xmlapp.o 845 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 846</pre> 847 <p> 848 You'd also have to set the environment variable <code>LD_LIBRARY_PATH</code> to 849 <code>/home/me/mystuff/lib</code> (or to 850 <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if LD_LIBRARY_PATH already 851 has some directories in it) in order to run your application. 852 </p> 853 854 <h3> 855 Expat Basics 856 </h3> 857 858 <p> 859 As we saw in the example in the overview, the first step in parsing an XML 860 document with Expat is to create a parser object. There are <a href= 861 "#creation">three functions</a> in the Expat API for creating a parser object. 862 However, only two of these (<code><a href= 863 "#XML_ParserCreate">XML_ParserCreate</a></code> and <code><a href= 864 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>) can be used for constructing 865 a parser for a top-level document. The object returned by these functions is an 866 opaque pointer (i.e. "expat.h" declares it as void *) to data with further 867 internal structure. In order to free the memory associated with this object you 868 must call <code><a href="#XML_ParserFree">XML_ParserFree</a></code>. Note that if 869 you have provided any <a href="#userdata">user data</a> that gets stored in the 870 parser, then your application is responsible for freeing it prior to calling 871 <code>XML_ParserFree</code>. 872 </p> 873 874 <p> 875 The objects returned by the parser creation functions are good for parsing only 876 one XML document or external parsed entity. If your application needs to parse 877 many XML documents, then it needs to create a parser object for each one. The 878 best way to deal with this is to create a higher level object that contains all 879 the default initialization you want for your parser objects. 880 </p> 881 882 <p> 883 Walking through a document hierarchy with a stream oriented parser will require a 884 good stack mechanism in order to keep track of current context. For instance, to 885 answer the simple question, "What element does this text belong to?" requires a 886 stack, since the parser may have descended into other elements that are children 887 of the current one and has encountered this text on the way out. 888 </p> 889 890 <p> 891 The things you're likely to want to keep on a stack are the currently opened 892 element and it's attributes. You push this information onto the stack in the 893 start handler and you pop it off in the end handler. 894 </p> 895 896 <p> 897 For some tasks, it is sufficient to just keep information on what the depth of 898 the stack is (or would be if you had one.) The outline program shown above 899 presents one example. Another such task would be skipping over a complete 900 element. When you see the start tag for the element you want to skip, you set a 901 skip flag and record the depth at which the element started. When the end tag 902 handler encounters the same depth, the skipped element has ended and the flag may 903 be cleared. If you follow the convention that the root element starts at 1, then 904 you can use the same variable for skip flag and skip depth. 905 </p> 906 907 <pre class="eg"> 908void 909init_info(Parseinfo *info) { 910 info-&gt;skip = 0; 911 info-&gt;depth = 1; 912 /* Other initializations here */ 913} /* End of init_info */ 914 915void XMLCALL 916rawstart(void *data, const char *el, const char **attr) { 917 Parseinfo *inf = (Parseinfo *) data; 918 919 if (! inf-&gt;skip) { 920 if (should_skip(inf, el, attr)) { 921 inf-&gt;skip = inf-&gt;depth; 922 } 923 else 924 start(inf, el, attr); /* This does rest of start handling */ 925 } 926 927 inf-&gt;depth++; 928} /* End of rawstart */ 929 930void XMLCALL 931rawend(void *data, const char *el) { 932 Parseinfo *inf = (Parseinfo *) data; 933 934 inf-&gt;depth--; 935 936 if (! inf-&gt;skip) 937 end(inf, el); /* This does rest of end handling */ 938 939 if (inf-&gt;skip == inf-&gt;depth) 940 inf-&gt;skip = 0; 941} /* End rawend */ 942</pre> 943 <p> 944 Notice in the above example the difference in how depth is manipulated in the 945 start and end handlers. The end tag handler should be the mirror image of the 946 start tag handler. This is necessary to properly model containment. Since, in the 947 start tag handler, we incremented depth <em>after</em> the main body of start tag 948 code, then in the end handler, we need to manipulate it <em>before</em> the main 949 body. If we'd decided to increment it first thing in the start handler, then we'd 950 have had to decrement it last thing in the end handler. 951 </p> 952 953 <h3 id="userdata"> 954 Communicating between handlers 955 </h3> 956 957 <p> 958 In order to be able to pass information between different handlers without using 959 globals, you'll need to define a data structure to hold the shared variables. You 960 can then tell Expat (with the <code><a href= 961 "#XML_SetUserData">XML_SetUserData</a></code> function) to pass a pointer to this 962 structure to the handlers. This is the first argument received by most handlers. 963 In the <a href="#reference">reference section</a>, an argument to a callback 964 function is named <code>userData</code> and have type <code>void *</code> if the 965 user data is passed; it will have the type <code>XML_Parser</code> if the parser 966 itself is passed. When the parser is passed, the user data may be retrieved using 967 <code><a href="#XML_GetUserData">XML_GetUserData</a></code>. 968 </p> 969 970 <p> 971 One common case where multiple calls to a single handler may need to communicate 972 using an application data structure is the case when content passed to the 973 character data handler (set by <code><a href= 974 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code>) needs to 975 be accumulated. A common first-time mistake with any of the event-oriented 976 interfaces to an XML parser is to expect all the text contained in an element to 977 be reported by a single call to the character data handler. Expat, like many 978 other XML parsers, reports such data as a sequence of calls; there's no way to 979 know when the end of the sequence is reached until a different callback is made. 980 A buffer referenced by the user data structure proves both an effective and 981 convenient place to accumulate character data. 982 </p> 983 <!-- XXX example needed here --> 984 985 <h3> 986 XML Version 987 </h3> 988 989 <p> 990 Expat is an XML 1.0 parser, and as such never complains based on the value of the 991 <code>version</code> pseudo-attribute in the XML declaration, if present. 992 </p> 993 994 <p> 995 If an application needs to check the version number (to support alternate 996 processing), it should use the <code><a href= 997 "#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></code> function to set a 998 handler that uses the information in the XML declaration to determine what to do. 999 This example shows how to check that only a version number of <code>"1.0"</code> 1000 is accepted: 1001 </p> 1002 1003 <pre class="eg"> 1004static int wrong_version; 1005static XML_Parser parser; 1006 1007static void XMLCALL 1008xmldecl_handler(void *userData, 1009 const XML_Char *version, 1010 const XML_Char *encoding, 1011 int standalone) 1012{ 1013 static const XML_Char Version_1_0[] = {'1', '.', '0', 0}; 1014 1015 int i; 1016 1017 for (i = 0; i &lt; (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) { 1018 if (version[i] != Version_1_0[i]) { 1019 wrong_version = 1; 1020 /* also clear all other handlers: */ 1021 XML_SetCharacterDataHandler(parser, NULL); 1022 ... 1023 return; 1024 } 1025 } 1026 ... 1027} 1028</pre> 1029 <h3> 1030 Namespace Processing 1031 </h3> 1032 1033 <p> 1034 When the parser is created using the <code><a href= 1035 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, function, Expat performs 1036 namespace processing. Under namespace processing, Expat consumes 1037 <code>xmlns</code> and <code>xmlns:...</code> attributes, which declare 1038 namespaces for the scope of the element in which they occur. This means that your 1039 start handler will not see these attributes. Your application can still be 1040 informed of these declarations by setting namespace declaration handlers with 1041 <a href= 1042 "#XML_SetNamespaceDeclHandler"><code>XML_SetNamespaceDeclHandler</code></a>. 1043 </p> 1044 1045 <p> 1046 Element type and attribute names that belong to a given namespace are passed to 1047 the appropriate handler in expanded form. By default this expanded form is a 1048 concatenation of the namespace URI, the separator character (which is the 2nd 1049 argument to <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>), 1050 and the local name (i.e. the part after the colon). Names with undeclared 1051 prefixes are not well-formed when namespace processing is enabled, and will 1052 trigger an error. Unprefixed attribute names are never expanded, and unprefixed 1053 element names are only expanded when they are in the scope of a default 1054 namespace. 1055 </p> 1056 1057 <p> 1058 However if <code><a href= 1059 "#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></code> has been called with 1060 a non-zero <code>do_nst</code> parameter, then the expanded form for names with 1061 an explicit prefix is a concatenation of: URI, separator, local name, separator, 1062 prefix. 1063 </p> 1064 1065 <p> 1066 You can set handlers for the start of a namespace declaration and for the end of 1067 a scope of a declaration with the <code><a href= 1068 "#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></code> function. 1069 The StartNamespaceDeclHandler is called prior to the start tag handler and the 1070 EndNamespaceDeclHandler is called after the corresponding end tag that ends the 1071 namespace's scope. The namespace start handler gets passed the prefix and URI for 1072 the namespace. For a default namespace declaration (xmlns='...'), the prefix will 1073 be <code>NULL</code>. The URI will be <code>NULL</code> for the case where the 1074 default namespace is being unset. The namespace end handler just gets the prefix 1075 for the closing scope. 1076 </p> 1077 1078 <p> 1079 These handlers are called for each declaration. So if, for instance, a start tag 1080 had three namespace declarations, then the StartNamespaceDeclHandler would be 1081 called three times before the start tag handler is called, once for each 1082 declaration. 1083 </p> 1084 1085 <h3> 1086 Character Encodings 1087 </h3> 1088 1089 <p> 1090 While XML is based on Unicode, and every XML processor is required to recognized 1091 UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), other encodings may be 1092 declared in XML documents or entities. For the main document, an XML declaration 1093 may contain an encoding declaration: 1094 </p> 1095 1096 <pre> 1097&lt;?xml version="1.0" encoding="ISO-8859-2"?&gt; 1098</pre> 1099 <p> 1100 External parsed entities may begin with a text declaration, which looks like an 1101 XML declaration with just an encoding declaration: 1102 </p> 1103 1104 <pre> 1105&lt;?xml encoding="Big5"?&gt; 1106</pre> 1107 <p> 1108 With Expat, you may also specify an encoding at the time of creating a parser. 1109 This is useful when the encoding information may come from a source outside the 1110 document itself (like a higher level protocol.) 1111 </p> 1112 1113 <p> 1114 <a id="builtin_encodings" name="builtin_encodings"></a>There are four built-in 1115 encodings in Expat: 1116 </p> 1117 1118 <ul> 1119 <li>UTF-8 1120 </li> 1121 1122 <li>UTF-16 1123 </li> 1124 1125 <li>ISO-8859-1 1126 </li> 1127 1128 <li>US-ASCII 1129 </li> 1130 </ul> 1131 1132 <p> 1133 Anything else discovered in an encoding declaration or in the protocol encoding 1134 specified in the parser constructor, triggers a call to the 1135 <code>UnknownEncodingHandler</code>. This handler gets passed the encoding name 1136 and a pointer to an <code>XML_Encoding</code> data structure. Your handler must 1137 fill in this structure and return <code>XML_STATUS_OK</code> if it knows how to 1138 deal with the encoding. Otherwise the handler should return 1139 <code>XML_STATUS_ERROR</code>. The handler also gets passed a pointer to an 1140 optional application data structure that you may indicate when you set the 1141 handler. 1142 </p> 1143 1144 <p> 1145 Expat places restrictions on character encodings that it can support by filling 1146 in the <code>XML_Encoding</code> structure. include file: 1147 </p> 1148 1149 <ol> 1150 <li>Every ASCII character that can appear in a well-formed XML document must be 1151 represented by a single byte, and that byte must correspond to it's ASCII 1152 encoding (except for the characters $@\^'{}~) 1153 </li> 1154 1155 <li>Characters must be encoded in 4 bytes or less. 1156 </li> 1157 1158 <li>All characters encoded must have Unicode scalar values less than or equal to 1159 65535 (0xFFFF)<em>This does not apply to the built-in support for UTF-16 and 1160 UTF-8</em> 1161 </li> 1162 1163 <li>No character may be encoded by more that one distinct sequence of bytes 1164 </li> 1165 </ol> 1166 1167 <p> 1168 <code>XML_Encoding</code> contains an array of integers that correspond to the 1169 1st byte of an encoding sequence. If the value in the array for a byte is zero or 1170 positive, then the byte is a single byte encoding that encodes the Unicode scalar 1171 value contained in the array. A -1 in this array indicates a malformed byte. If 1172 the value is -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte 1173 sequence respectively. Multi-byte sequences are sent to the convert function 1174 pointed at in the <code>XML_Encoding</code> structure. This function should 1175 return the Unicode scalar value for the sequence or -1 if the sequence is 1176 malformed. 1177 </p> 1178 1179 <p> 1180 One pitfall that novice Expat users are likely to fall into is that although 1181 Expat may accept input in various encodings, the strings that it passes to the 1182 handlers are always encoded in UTF-8 or UTF-16 (depending on how Expat was 1183 compiled). Your application is responsible for any translation of these strings 1184 into other encodings. 1185 </p> 1186 1187 <h3> 1188 Handling External Entity References 1189 </h3> 1190 1191 <p> 1192 Expat does not read or parse external entities directly. Note that any external 1193 DTD is a special case of an external entity. If you've set no 1194 <code>ExternalEntityRefHandler</code>, then external entity references are 1195 silently ignored. Otherwise, it calls your handler with the information needed to 1196 read and parse the external entity. 1197 </p> 1198 1199 <p> 1200 Your handler isn't actually responsible for parsing the entity, but it is 1201 responsible for creating a subsidiary parser with <code><a href= 1202 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> that 1203 will do the job. This returns an instance of <code>XML_Parser</code> that has 1204 handlers and other data structures initialized from the parent parser. You may 1205 then use <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 1206 "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this parser. Since 1207 external entities my refer to other external entities, your handler should be 1208 prepared to be called recursively. 1209 </p> 1210 1211 <h3> 1212 Parsing DTDs 1213 </h3> 1214 1215 <p> 1216 In order to parse parameter entities, before starting the parse, you must call 1217 <code><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code> 1218 with one of the following arguments: 1219 </p> 1220 1221 <dl> 1222 <dt> 1223 <code>XML_PARAM_ENTITY_PARSING_NEVER</code> 1224 </dt> 1225 1226 <dd> 1227 Don't parse parameter entities or the external subset 1228 </dd> 1229 1230 <dt> 1231 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code> 1232 </dt> 1233 1234 <dd> 1235 Parse parameter entities and the external subset unless <code>standalone</code> 1236 was set to "yes" in the XML declaration. 1237 </dd> 1238 1239 <dt> 1240 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code> 1241 </dt> 1242 1243 <dd> 1244 Always parse parameter entities and the external subset 1245 </dd> 1246 </dl> 1247 1248 <p> 1249 In order to read an external DTD, you also have to set an external entity 1250 reference handler as described above. 1251 </p> 1252 1253 <h3 id="stop-resume"> 1254 Temporarily Stopping Parsing 1255 </h3> 1256 1257 <p> 1258 Expat 1.95.8 introduces a new feature: its now possible to stop parsing 1259 temporarily from within a handler function, even if more data has already been 1260 passed into the parser. Applications for this include 1261 </p> 1262 1263 <ul> 1264 <li>Supporting the <a href="https://www.w3.org/TR/xinclude/">XInclude</a> 1265 specification. 1266 </li> 1267 1268 <li>Delaying further processing until additional information is available from 1269 some other source. 1270 </li> 1271 1272 <li>Adjusting processor load as task priorities shift within an application. 1273 </li> 1274 1275 <li>Stopping parsing completely (simply free or reset the parser instead of 1276 resuming in the outer parsing loop). This can be useful if an application-domain 1277 error is found in the XML being parsed or if the result of the parse is 1278 determined not to be useful after all. 1279 </li> 1280 </ul> 1281 1282 <p> 1283 To take advantage of this feature, the main parsing loop of an application needs 1284 to support this specifically. It cannot be supported with a parsing loop 1285 compatible with Expat 1.95.7 or earlier (though existing loops will continue to 1286 work without supporting the stop/resume feature). 1287 </p> 1288 1289 <p> 1290 An application that uses this feature for a single parser will have the rough 1291 structure (in pseudo-code): 1292 </p> 1293 1294 <pre class="pseudocode"> 1295fd = open_input() 1296p = create_parser() 1297 1298if parse_xml(p, fd) { 1299 /* suspended */ 1300 1301 int suspended = 1; 1302 1303 while (suspended) { 1304 do_something_else() 1305 if ready_to_resume() { 1306 suspended = continue_parsing(p, fd); 1307 } 1308 } 1309} 1310</pre> 1311 <p> 1312 An application that may resume any of several parsers based on input (either from 1313 the XML being parsed or some other source) will certainly have more interesting 1314 control structures. 1315 </p> 1316 1317 <p> 1318 This C function could be used for the <code>parse_xml</code> function mentioned 1319 in the pseudo-code above: 1320 </p> 1321 1322 <pre class="eg"> 1323#define BUFF_SIZE 10240 1324 1325/* Parse a document from the open file descriptor 'fd' until the parse 1326 is complete (the document has been completely parsed, or there's 1327 been an error), or the parse is stopped. Return non-zero when 1328 the parse is merely suspended. 1329*/ 1330int 1331parse_xml(XML_Parser p, int fd) 1332{ 1333 for (;;) { 1334 int last_chunk; 1335 int bytes_read; 1336 enum XML_Status status; 1337 1338 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1339 if (buff == NULL) { 1340 /* handle error... */ 1341 return 0; 1342 } 1343 bytes_read = read(fd, buff, BUFF_SIZE); 1344 if (bytes_read &lt; 0) { 1345 /* handle error... */ 1346 return 0; 1347 } 1348 status = XML_ParseBuffer(p, bytes_read, bytes_read == 0); 1349 switch (status) { 1350 case XML_STATUS_ERROR: 1351 /* handle error... */ 1352 return 0; 1353 case XML_STATUS_SUSPENDED: 1354 return 1; 1355 } 1356 if (bytes_read == 0) 1357 return 0; 1358 } 1359} 1360</pre> 1361 <p> 1362 The corresponding <code>continue_parsing</code> function is somewhat simpler, 1363 since it only need deal with the return code from <code><a href= 1364 "#XML_ResumeParser">XML_ResumeParser</a></code>; it can delegate the input 1365 handling to the <code>parse_xml</code> function: 1366 </p> 1367 1368 <pre class="eg"> 1369/* Continue parsing a document which had been suspended. The 'p' and 1370 'fd' arguments are the same as passed to parse_xml(). Return 1371 non-zero when the parse is suspended. 1372*/ 1373int 1374continue_parsing(XML_Parser p, int fd) 1375{ 1376 enum XML_Status status = XML_ResumeParser(p); 1377 switch (status) { 1378 case XML_STATUS_ERROR: 1379 /* handle error... */ 1380 return 0; 1381 case XML_ERROR_NOT_SUSPENDED: 1382 /* handle error... */ 1383 return 0;. 1384 case XML_STATUS_SUSPENDED: 1385 return 1; 1386 } 1387 return parse_xml(p, fd); 1388} 1389</pre> 1390 <p> 1391 Now that we've seen what a mess the top-level parsing loop can become, what have 1392 we gained? Very simply, we can now use the <code><a href= 1393 "#XML_StopParser">XML_StopParser</a></code> function to stop parsing, without 1394 having to go to great lengths to avoid additional processing that we're expecting 1395 to ignore. As a bonus, we get to stop parsing <em>temporarily</em>, and come back 1396 to it when we're ready. 1397 </p> 1398 1399 <p> 1400 To stop parsing from a handler function, use the <code><a href= 1401 "#XML_StopParser">XML_StopParser</a></code> function. This function takes two 1402 arguments; the parser being stopped and a flag indicating whether the parse can 1403 be resumed in the future. 1404 </p> 1405 <!-- XXX really need more here --> 1406 1407 <hr /> 1408 <!-- ================================================================ --> 1409 1410 <h2> 1411 <a id="reference" name="reference">Expat Reference</a> 1412 </h2> 1413 1414 <h3> 1415 <a id="creation" name="creation">Parser Creation</a> 1416 </h3> 1417 1418 <h4 id="XML_ParserCreate"> 1419 XML_ParserCreate 1420 </h4> 1421 1422 <pre class="fcndec"> 1423XML_Parser XMLCALL 1424XML_ParserCreate(const XML_Char *encoding); 1425</pre> 1426 <div class="fcndef"> 1427 <p> 1428 Construct a new parser. If encoding is non-<code>NULL</code>, it specifies a 1429 character encoding to use for the document. This overrides the document 1430 encoding declaration. There are four built-in encodings: 1431 </p> 1432 1433 <ul> 1434 <li>US-ASCII 1435 </li> 1436 1437 <li>UTF-8 1438 </li> 1439 1440 <li>UTF-16 1441 </li> 1442 1443 <li>ISO-8859-1 1444 </li> 1445 </ul> 1446 1447 <p> 1448 Any other value will invoke a call to the UnknownEncodingHandler. 1449 </p> 1450 </div> 1451 1452 <h4 id="XML_ParserCreateNS"> 1453 XML_ParserCreateNS 1454 </h4> 1455 1456 <pre class="fcndec"> 1457XML_Parser XMLCALL 1458XML_ParserCreateNS(const XML_Char *encoding, 1459 XML_Char sep); 1460</pre> 1461 <div class="fcndef"> 1462 Constructs a new parser that has namespace processing in effect. Namespace 1463 expanded element names and attribute names are returned as a concatenation of the 1464 namespace URI, <em>sep</em>, and the local part of the name. This means that you 1465 should pick a character for <em>sep</em> that can't be part of an URI. Since 1466 Expat does not check namespace URIs for conformance, the only safe choice for a 1467 namespace separator is a character that is illegal in XML. For instance, 1468 <code>'\xFF'</code> is not legal in UTF-8, and <code>'\xFFFF'</code> is not legal 1469 in UTF-16. There is a special case when <em>sep</em> is the null character 1470 <code>'\0'</code>: the namespace URI and the local part will be concatenated 1471 without any separator - this is intended to support RDF processors. It is a 1472 programming error to use the null separator with <a href= 1473 "#XML_SetReturnNSTriplet">namespace triplets</a>. 1474 </div> 1475 1476 <p> 1477 <strong>Note:</strong> Expat does not validate namespace URIs (beyond encoding) 1478 against RFC 3986 today (and is not required to do so with regard to the XML 1.0 1479 namespaces specification) but it may start doing that in future releases. Before 1480 that, an application using Expat must be ready to receive namespace URIs 1481 containing non-URI characters. 1482 </p> 1483 1484 <h4 id="XML_ParserCreate_MM"> 1485 XML_ParserCreate_MM 1486 </h4> 1487 1488 <pre class="fcndec"> 1489XML_Parser XMLCALL 1490XML_ParserCreate_MM(const XML_Char *encoding, 1491 const XML_Memory_Handling_Suite *ms, 1492 const XML_Char *sep); 1493</pre> 1494 1495 <pre class="signature"> 1496typedef struct { 1497 void *(XMLCALL *malloc_fcn)(size_t size); 1498 void *(XMLCALL *realloc_fcn)(void *ptr, size_t size); 1499 void (XMLCALL *free_fcn)(void *ptr); 1500} XML_Memory_Handling_Suite; 1501</pre> 1502 <div class="fcndef"> 1503 <p> 1504 Construct a new parser using the suite of memory handling functions specified 1505 in <code>ms</code>. If <code>ms</code> is <code>NULL</code>, then use the 1506 standard set of memory management functions. If <code>sep</code> is 1507 non-<code>NULL</code>, then namespace processing is enabled in the created 1508 parser and the character pointed at by sep is used as the separator between the 1509 namespace URI and the local part of the name. 1510 </p> 1511 </div> 1512 1513 <h4 id="XML_ExternalEntityParserCreate"> 1514 XML_ExternalEntityParserCreate 1515 </h4> 1516 1517 <pre class="fcndec"> 1518XML_Parser XMLCALL 1519XML_ExternalEntityParserCreate(XML_Parser p, 1520 const XML_Char *context, 1521 const XML_Char *encoding); 1522</pre> 1523 <div class="fcndef"> 1524 <p> 1525 Construct a new <code>XML_Parser</code> object for parsing an external general 1526 entity. Context is the context argument passed in a call to a 1527 ExternalEntityRefHandler. Other state information such as handlers, user data, 1528 namespace processing is inherited from the parser passed as the 1st argument. 1529 So you shouldn't need to call any of the behavior changing functions on this 1530 parser (unless you want it to act differently than the parent parser). 1531 </p> 1532 1533 <p> 1534 <strong>Note:</strong> Please be sure to free subparsers created by 1535 <code><a href= 1536 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> 1537 <em>prior to</em> freeing their related parent parser, as subparsers reference 1538 and use parts of their respective parent parser, internally. Parent parsers 1539 must outlive subparsers. 1540 </p> 1541 </div> 1542 1543 <h4 id="XML_ParserFree"> 1544 XML_ParserFree 1545 </h4> 1546 1547 <pre class="fcndec"> 1548void XMLCALL 1549XML_ParserFree(XML_Parser p); 1550</pre> 1551 <div class="fcndef"> 1552 <p> 1553 Free memory used by the parser. 1554 </p> 1555 1556 <p> 1557 <strong>Note:</strong> Your application is responsible for freeing any memory 1558 associated with <a href="#userdata">user data</a>. 1559 </p> 1560 1561 <p> 1562 <strong>Note:</strong> Please be sure to free subparsers created by 1563 <code><a href= 1564 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> 1565 <em>prior to</em> freeing their related parent parser, as subparsers reference 1566 and use parts of their respective parent parser, internally. Parent parsers 1567 must outlive subparsers. 1568 </p> 1569 </div> 1570 1571 <h4 id="XML_ParserReset"> 1572 XML_ParserReset 1573 </h4> 1574 1575 <pre class="fcndec"> 1576XML_Bool XMLCALL 1577XML_ParserReset(XML_Parser p, 1578 const XML_Char *encoding); 1579</pre> 1580 <div class="fcndef"> 1581 Clean up the memory structures maintained by the parser so that it may be used 1582 again. After this has been called, <code>parser</code> is ready to start parsing 1583 a new document. All handlers are cleared from the parser, except for the 1584 unknownEncodingHandler. The parser's external state is re-initialized except for 1585 the values of ns and ns_triplets. This function may not be used on a parser 1586 created using <code><a href= 1587 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>; it 1588 will return <code>XML_FALSE</code> in that case. Returns <code>XML_TRUE</code> on 1589 success. Your application is responsible for dealing with any memory associated 1590 with <a href="#userdata">user data</a>. 1591 </div> 1592 1593 <h3> 1594 <a id="parsing" name="parsing">Parsing</a> 1595 </h3> 1596 1597 <p> 1598 To state the obvious: the three parsing functions <code><a href= 1599 "#XML_Parse">XML_Parse</a></code>, <code><a href= 1600 "#XML_ParseBuffer">XML_ParseBuffer</a></code> and <code><a href= 1601 "#XML_GetBuffer">XML_GetBuffer</a></code> must not be called from within a 1602 handler unless they operate on a separate parser instance, that is, one that did 1603 not call the handler. For example, it is OK to call the parsing functions from 1604 within an <code>XML_ExternalEntityRefHandler</code>, if they apply to the parser 1605 created by <code><a href= 1606 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 1607 </p> 1608 1609 <p> 1610 Note: The <code>len</code> argument passed to these functions should be 1611 considerably less than the maximum value for an integer, as it could create an 1612 integer overflow situation if the added lengths of a buffer and the unprocessed 1613 portion of the previous buffer exceed the maximum integer value. Input data at 1614 the end of a buffer will remain unprocessed if it is part of an XML token for 1615 which the end is not part of that buffer. 1616 </p> 1617 1618 <p> 1619 <a id="isFinal" name="isFinal"></a>The application <em>must</em> make a 1620 concluding <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 1621 "#XML_ParseBuffer">XML_ParseBuffer</a></code> call with <code>isFinal</code> set 1622 to <code>XML_TRUE</code>. 1623 </p> 1624 1625 <h4 id="XML_Parse"> 1626 XML_Parse 1627 </h4> 1628 1629 <pre class="fcndec"> 1630enum XML_Status XMLCALL 1631XML_Parse(XML_Parser p, 1632 const char *s, 1633 int len, 1634 int isFinal); 1635</pre> 1636 1637 <pre class="signature"> 1638enum XML_Status { 1639 XML_STATUS_ERROR = 0, 1640 XML_STATUS_OK = 1 1641}; 1642</pre> 1643 <div class="fcndef"> 1644 <p> 1645 Parse some more of the document. The string <code>s</code> is a buffer 1646 containing part (or perhaps all) of the document. The number of bytes of s that 1647 are part of the document is indicated by <code>len</code>. This means that 1648 <code>s</code> doesn't have to be null-terminated. It also means that if 1649 <code>len</code> is larger than the number of bytes in the block of memory that 1650 <code>s</code> points at, then a memory fault is likely. Negative values for 1651 <code>len</code> are rejected since Expat 2.2.1. The <code>isFinal</code> 1652 parameter informs the parser that this is the last piece of the document. 1653 Frequently, the last piece is empty (i.e. <code>len</code> is zero.) 1654 </p> 1655 1656 <p> 1657 If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. Otherwise 1658 it returns <code>XML_STATUS_OK</code> value. Note that regardless of the return 1659 value, there is no guarantee that all provided input has been parsed; only 1660 after <a href="#isFinal">the concluding call</a> will all handler callbacks and 1661 parsing errors have happened. 1662 </p> 1663 1664 <p> 1665 Simplified, <code>XML_Parse</code> can be considered a convenience wrapper that 1666 is pairing calls to <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> and 1667 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> (when Expat is 1668 built with macro <code>XML_CONTEXT_BYTES</code> defined to a positive value, 1669 which is both common and default). <code>XML_Parse</code> is then functionally 1670 equivalent to calling <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code>, 1671 <code>memcpy</code>, and <code><a href= 1672 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. 1673 </p> 1674 1675 <p> 1676 To avoid double copying of the input, direct use of functions <code><a href= 1677 "#XML_GetBuffer">XML_GetBuffer</a></code> and <code><a href= 1678 "#XML_ParseBuffer">XML_ParseBuffer</a></code> is advised for most production 1679 use, e.g. if you're using <code>read</code> or similar functionality to fill 1680 your buffers, fill directly into the buffer from <code><a href= 1681 "#XML_GetBuffer">XML_GetBuffer</a></code>, then parse with <code><a href= 1682 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. 1683 </p> 1684 </div> 1685 1686 <h4 id="XML_ParseBuffer"> 1687 XML_ParseBuffer 1688 </h4> 1689 1690 <pre class="fcndec"> 1691enum XML_Status XMLCALL 1692XML_ParseBuffer(XML_Parser p, 1693 int len, 1694 int isFinal); 1695</pre> 1696 <div class="fcndef"> 1697 <p> 1698 This is just like <code><a href="#XML_Parse">XML_Parse</a></code>, except in 1699 this case Expat provides the buffer. By obtaining the buffer from Expat with 1700 the <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> function, the 1701 application can avoid double copying of the input. 1702 </p> 1703 1704 <p> 1705 Negative values for <code>len</code> are rejected since Expat 2.6.3. 1706 </p> 1707 </div> 1708 1709 <h4 id="XML_GetBuffer"> 1710 XML_GetBuffer 1711 </h4> 1712 1713 <pre class="fcndec"> 1714void * XMLCALL 1715XML_GetBuffer(XML_Parser p, 1716 int len); 1717</pre> 1718 <div class="fcndef"> 1719 Obtain a buffer of size <code>len</code> to read a piece of the document into. A 1720 <code>NULL</code> value is returned if Expat can't allocate enough memory for 1721 this buffer. A <code>NULL</code> value may also be returned if <code>len</code> 1722 is zero. This has to be called prior to every call to <code><a href= 1723 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. A typical use would look like 1724 this: 1725 1726 <pre class="eg"> 1727for (;;) { 1728 int bytes_read; 1729 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1730 if (buff == NULL) { 1731 /* handle error */ 1732 } 1733 1734 bytes_read = read(docfd, buff, BUFF_SIZE); 1735 if (bytes_read &lt; 0) { 1736 /* handle error */ 1737 } 1738 1739 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) { 1740 /* handle parse error */ 1741 } 1742 1743 if (bytes_read == 0) 1744 break; 1745} 1746</pre> 1747 </div> 1748 1749 <h4 id="XML_StopParser"> 1750 XML_StopParser 1751 </h4> 1752 1753 <pre class="fcndec"> 1754enum XML_Status XMLCALL 1755XML_StopParser(XML_Parser p, 1756 XML_Bool resumable); 1757</pre> 1758 <div class="fcndef"> 1759 <p> 1760 Stops parsing, causing <code><a href="#XML_Parse">XML_Parse</a></code> or 1761 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> to return. Must be 1762 called from within a call-back handler, except when aborting (when 1763 <code>resumable</code> is <code>XML_FALSE</code>) an already suspended parser. 1764 Some call-backs may still follow because they would otherwise get lost, 1765 including 1766 </p> 1767 1768 <ul> 1769 <li>the end element handler for empty elements when stopped in the start 1770 element handler, 1771 </li> 1772 1773 <li>the end namespace declaration handler when stopped in the end element 1774 handler, 1775 </li> 1776 1777 <li>the character data handler when stopped in the character data handler while 1778 making multiple call-backs on a contiguous chunk of characters, 1779 </li> 1780 </ul> 1781 1782 <p> 1783 and possibly others. 1784 </p> 1785 1786 <p> 1787 This can be called from most handlers, including DTD related call-backs, except 1788 when parsing an external parameter entity and <code>resumable</code> is 1789 <code>XML_TRUE</code>. Returns <code>XML_STATUS_OK</code> when successful, 1790 <code>XML_STATUS_ERROR</code> otherwise. The possible error codes are: 1791 </p> 1792 1793 <dl> 1794 <dt> 1795 <code>XML_ERROR_NOT_STARTED</code> 1796 </dt> 1797 1798 <dd> 1799 when stopping or suspending a parser before it has started, added in Expat 1800 2.6.4. 1801 </dd> 1802 1803 <dt> 1804 <code>XML_ERROR_SUSPENDED</code> 1805 </dt> 1806 1807 <dd> 1808 when suspending an already suspended parser. 1809 </dd> 1810 1811 <dt> 1812 <code>XML_ERROR_FINISHED</code> 1813 </dt> 1814 1815 <dd> 1816 when the parser has already finished. 1817 </dd> 1818 1819 <dt> 1820 <code>XML_ERROR_SUSPEND_PE</code> 1821 </dt> 1822 1823 <dd> 1824 when suspending while parsing an external PE. 1825 </dd> 1826 </dl> 1827 1828 <p> 1829 Since the stop/resume feature requires application support in the outer parsing 1830 loop, it is an error to call this function for a parser not being handled 1831 appropriately; see <a href="#stop-resume">Temporarily Stopping Parsing</a> for 1832 more information. 1833 </p> 1834 1835 <p> 1836 When <code>resumable</code> is <code>XML_TRUE</code> then parsing is 1837 <em>suspended</em>, that is, <code><a href="#XML_Parse">XML_Parse</a></code> 1838 and <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> return 1839 <code>XML_STATUS_SUSPENDED</code>. Otherwise, parsing is <em>aborted</em>, that 1840 is, <code><a href="#XML_Parse">XML_Parse</a></code> and <code><a href= 1841 "#XML_ParseBuffer">XML_ParseBuffer</a></code> return 1842 <code>XML_STATUS_ERROR</code> with error code <code>XML_ERROR_ABORTED</code>. 1843 </p> 1844 1845 <p> 1846 <strong>Note:</strong> This will be applied to the current parser instance 1847 only, that is, if there is a parent parser then it will continue parsing when 1848 the external entity reference handler returns. It is up to the implementation 1849 of that handler to call <code><a href= 1850 "#XML_StopParser">XML_StopParser</a></code> on the parent parser (recursively), 1851 if one wants to stop parsing altogether. 1852 </p> 1853 1854 <p> 1855 When suspended, parsing can be resumed by calling <code><a href= 1856 "#XML_ResumeParser">XML_ResumeParser</a></code>. 1857 </p> 1858 1859 <p> 1860 New in Expat 1.95.8. 1861 </p> 1862 </div> 1863 1864 <h4 id="XML_ResumeParser"> 1865 XML_ResumeParser 1866 </h4> 1867 1868 <pre class="fcndec"> 1869enum XML_Status XMLCALL 1870XML_ResumeParser(XML_Parser p); 1871</pre> 1872 <div class="fcndef"> 1873 <p> 1874 Resumes parsing after it has been suspended with <code><a href= 1875 "#XML_StopParser">XML_StopParser</a></code>. Must not be called from within a 1876 handler call-back. Returns same status codes as <code><a href= 1877 "#XML_Parse">XML_Parse</a></code> or <code><a href= 1878 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. An additional error code, 1879 <code>XML_ERROR_NOT_SUSPENDED</code>, will be returned if the parser was not 1880 currently suspended. 1881 </p> 1882 1883 <p> 1884 <strong>Note:</strong> This must be called on the most deeply nested child 1885 parser instance first, and on its parent parser only after the child parser has 1886 finished, to be applied recursively until the document entity's parser is 1887 restarted. That is, the parent parser will not resume by itself and it is up to 1888 the application to call <code><a href= 1889 "#XML_ResumeParser">XML_ResumeParser</a></code> on it at the appropriate 1890 moment. 1891 </p> 1892 1893 <p> 1894 New in Expat 1.95.8. 1895 </p> 1896 </div> 1897 1898 <h4 id="XML_GetParsingStatus"> 1899 XML_GetParsingStatus 1900 </h4> 1901 1902 <pre class="fcndec"> 1903void XMLCALL 1904XML_GetParsingStatus(XML_Parser p, 1905 XML_ParsingStatus *status); 1906</pre> 1907 1908 <pre class="signature"> 1909enum XML_Parsing { 1910 XML_INITIALIZED, 1911 XML_PARSING, 1912 XML_FINISHED, 1913 XML_SUSPENDED 1914}; 1915 1916typedef struct { 1917 enum XML_Parsing parsing; 1918 XML_Bool finalBuffer; 1919} XML_ParsingStatus; 1920</pre> 1921 <div class="fcndef"> 1922 <p> 1923 Returns status of parser with respect to being initialized, parsing, finished, 1924 or suspended, and whether the final buffer is being processed. The 1925 <code>status</code> parameter <em>must not</em> be <code>NULL</code>. 1926 </p> 1927 1928 <p> 1929 New in Expat 1.95.8. 1930 </p> 1931 </div> 1932 1933 <h3> 1934 <a id="setting" name="setting">Handler Setting</a> 1935 </h3> 1936 1937 <p> 1938 Although handlers are typically set prior to parsing and left alone, an 1939 application may choose to set or change the handler for a parsing event while the 1940 parse is in progress. For instance, your application may choose to ignore all 1941 text not descended from a <code>para</code> element. One way it could do this is 1942 to set the character handler when a para start tag is seen, and unset it for the 1943 corresponding end tag. 1944 </p> 1945 1946 <p> 1947 A handler may be <em>unset</em> by providing a <code>NULL</code> pointer to the 1948 appropriate handler setter. None of the handler setting functions have a return 1949 value. 1950 </p> 1951 1952 <p> 1953 Your handlers will be receiving strings in arrays of type <code>XML_Char</code>. 1954 This type is conditionally defined in expat.h as either <code>char</code>, 1955 <code>wchar_t</code> or <code>unsigned short</code>. The former implies UTF-8 1956 encoding, the latter two imply UTF-16 encoding. Note that you'll receive them in 1957 this form independent of the original encoding of the document. 1958 </p> 1959 1960 <div class="handler"> 1961 <h4 id="XML_SetStartElementHandler"> 1962 XML_SetStartElementHandler 1963 </h4> 1964 1965 <pre class="setter"> 1966void XMLCALL 1967XML_SetStartElementHandler(XML_Parser p, 1968 XML_StartElementHandler start); 1969</pre> 1970 1971 <pre class="signature"> 1972typedef void 1973(XMLCALL *XML_StartElementHandler)(void *userData, 1974 const XML_Char *name, 1975 const XML_Char **atts); 1976</pre> 1977 <p> 1978 Set handler for start (and empty) tags. Attributes are passed to the start 1979 handler as a pointer to a vector of char pointers. Each attribute seen in a 1980 start (or empty) tag occupies 2 consecutive places in this vector: the 1981 attribute name followed by the attribute value. These pairs are terminated by a 1982 <code>NULL</code> pointer. 1983 </p> 1984 1985 <p> 1986 Note that an empty tag generates a call to both start and end handlers (in that 1987 order). 1988 </p> 1989 </div> 1990 1991 <div class="handler"> 1992 <h4 id="XML_SetEndElementHandler"> 1993 XML_SetEndElementHandler 1994 </h4> 1995 1996 <pre class="setter"> 1997void XMLCALL 1998XML_SetEndElementHandler(XML_Parser p, 1999 XML_EndElementHandler); 2000</pre> 2001 2002 <pre class="signature"> 2003typedef void 2004(XMLCALL *XML_EndElementHandler)(void *userData, 2005 const XML_Char *name); 2006</pre> 2007 <p> 2008 Set handler for end (and empty) tags. As noted above, an empty tag generates a 2009 call to both start and end handlers. 2010 </p> 2011 </div> 2012 2013 <div class="handler"> 2014 <h4 id="XML_SetElementHandler"> 2015 XML_SetElementHandler 2016 </h4> 2017 2018 <pre class="setter"> 2019void XMLCALL 2020XML_SetElementHandler(XML_Parser p, 2021 XML_StartElementHandler start, 2022 XML_EndElementHandler end); 2023</pre> 2024 <p> 2025 Set handlers for start and end tags with one call. 2026 </p> 2027 </div> 2028 2029 <div class="handler"> 2030 <h4 id="XML_SetCharacterDataHandler"> 2031 XML_SetCharacterDataHandler 2032 </h4> 2033 2034 <pre class="setter"> 2035void XMLCALL 2036XML_SetCharacterDataHandler(XML_Parser p, 2037 XML_CharacterDataHandler charhndl) 2038</pre> 2039 2040 <pre class="signature"> 2041typedef void 2042(XMLCALL *XML_CharacterDataHandler)(void *userData, 2043 const XML_Char *s, 2044 int len); 2045</pre> 2046 <p> 2047 Set a text handler. The string your handler receives is <em>NOT 2048 null-terminated</em>. You have to use the length argument to deal with the end 2049 of the string. A single block of contiguous text free of markup may still 2050 result in a sequence of calls to this handler. In other words, if you're 2051 searching for a pattern in the text, it may be split across calls to this 2052 handler. Note: Setting this handler to <code>NULL</code> may <em>NOT 2053 immediately</em> terminate call-backs if the parser is currently processing 2054 such a single block of contiguous markup-free text, as the parser will continue 2055 calling back until the end of the block is reached. 2056 </p> 2057 </div> 2058 2059 <div class="handler"> 2060 <h4 id="XML_SetProcessingInstructionHandler"> 2061 XML_SetProcessingInstructionHandler 2062 </h4> 2063 2064 <pre class="setter"> 2065void XMLCALL 2066XML_SetProcessingInstructionHandler(XML_Parser p, 2067 XML_ProcessingInstructionHandler proc) 2068</pre> 2069 2070 <pre class="signature"> 2071typedef void 2072(XMLCALL *XML_ProcessingInstructionHandler)(void *userData, 2073 const XML_Char *target, 2074 const XML_Char *data); 2075 2076</pre> 2077 <p> 2078 Set a handler for processing instructions. The target is the first word in the 2079 processing instruction. The data is the rest of the characters in it after 2080 skipping all whitespace after the initial word. 2081 </p> 2082 </div> 2083 2084 <div class="handler"> 2085 <h4 id="XML_SetCommentHandler"> 2086 XML_SetCommentHandler 2087 </h4> 2088 2089 <pre class="setter"> 2090void XMLCALL 2091XML_SetCommentHandler(XML_Parser p, 2092 XML_CommentHandler cmnt) 2093</pre> 2094 2095 <pre class="signature"> 2096typedef void 2097(XMLCALL *XML_CommentHandler)(void *userData, 2098 const XML_Char *data); 2099</pre> 2100 <p> 2101 Set a handler for comments. The data is all text inside the comment delimiters. 2102 </p> 2103 </div> 2104 2105 <div class="handler"> 2106 <h4 id="XML_SetStartCdataSectionHandler"> 2107 XML_SetStartCdataSectionHandler 2108 </h4> 2109 2110 <pre class="setter"> 2111void XMLCALL 2112XML_SetStartCdataSectionHandler(XML_Parser p, 2113 XML_StartCdataSectionHandler start); 2114</pre> 2115 2116 <pre class="signature"> 2117typedef void 2118(XMLCALL *XML_StartCdataSectionHandler)(void *userData); 2119</pre> 2120 <p> 2121 Set a handler that gets called at the beginning of a CDATA section. 2122 </p> 2123 </div> 2124 2125 <div class="handler"> 2126 <h4 id="XML_SetEndCdataSectionHandler"> 2127 XML_SetEndCdataSectionHandler 2128 </h4> 2129 2130 <pre class="setter"> 2131void XMLCALL 2132XML_SetEndCdataSectionHandler(XML_Parser p, 2133 XML_EndCdataSectionHandler end); 2134</pre> 2135 2136 <pre class="signature"> 2137typedef void 2138(XMLCALL *XML_EndCdataSectionHandler)(void *userData); 2139</pre> 2140 <p> 2141 Set a handler that gets called at the end of a CDATA section. 2142 </p> 2143 </div> 2144 2145 <div class="handler"> 2146 <h4 id="XML_SetCdataSectionHandler"> 2147 XML_SetCdataSectionHandler 2148 </h4> 2149 2150 <pre class="setter"> 2151void XMLCALL 2152XML_SetCdataSectionHandler(XML_Parser p, 2153 XML_StartCdataSectionHandler start, 2154 XML_EndCdataSectionHandler end) 2155</pre> 2156 <p> 2157 Sets both CDATA section handlers with one call. 2158 </p> 2159 </div> 2160 2161 <div class="handler"> 2162 <h4 id="XML_SetDefaultHandler"> 2163 XML_SetDefaultHandler 2164 </h4> 2165 2166 <pre class="setter"> 2167void XMLCALL 2168XML_SetDefaultHandler(XML_Parser p, 2169 XML_DefaultHandler hndl) 2170</pre> 2171 2172 <pre class="signature"> 2173typedef void 2174(XMLCALL *XML_DefaultHandler)(void *userData, 2175 const XML_Char *s, 2176 int len); 2177</pre> 2178 <p> 2179 Sets a handler for any characters in the document which wouldn't otherwise be 2180 handled. This includes both data for which no handlers can be set (like some 2181 kinds of DTD declarations) and data which could be reported but which currently 2182 has no handler set. The characters are passed exactly as they were present in 2183 the XML document except that they will be encoded in UTF-8 or UTF-16. Line 2184 boundaries are not normalized. Note that a byte order mark character is not 2185 passed to the default handler. There are no guarantees about how characters are 2186 divided between calls to the default handler: for example, a comment might be 2187 split between multiple calls. Setting the handler with this call has the side 2188 effect of turning off expansion of references to internally defined general 2189 entities. Instead these references are passed to the default handler. 2190 </p> 2191 2192 <p> 2193 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>. 2194 </p> 2195 </div> 2196 2197 <div class="handler"> 2198 <h4 id="XML_SetDefaultHandlerExpand"> 2199 XML_SetDefaultHandlerExpand 2200 </h4> 2201 2202 <pre class="setter"> 2203void XMLCALL 2204XML_SetDefaultHandlerExpand(XML_Parser p, 2205 XML_DefaultHandler hndl) 2206</pre> 2207 2208 <pre class="signature"> 2209typedef void 2210(XMLCALL *XML_DefaultHandler)(void *userData, 2211 const XML_Char *s, 2212 int len); 2213</pre> 2214 <p> 2215 This sets a default handler, but doesn't inhibit the expansion of internal 2216 entity references. The entity reference will not be passed to the default 2217 handler. 2218 </p> 2219 2220 <p> 2221 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>. 2222 </p> 2223 </div> 2224 2225 <div class="handler"> 2226 <h4 id="XML_SetExternalEntityRefHandler"> 2227 XML_SetExternalEntityRefHandler 2228 </h4> 2229 2230 <pre class="setter"> 2231void XMLCALL 2232XML_SetExternalEntityRefHandler(XML_Parser p, 2233 XML_ExternalEntityRefHandler hndl) 2234</pre> 2235 2236 <pre class="signature"> 2237typedef int 2238(XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p, 2239 const XML_Char *context, 2240 const XML_Char *base, 2241 const XML_Char *systemId, 2242 const XML_Char *publicId); 2243</pre> 2244 <p> 2245 Set an external entity reference handler. This handler is also called for 2246 processing an external DTD subset if parameter entity parsing is in effect. 2247 (See <a href= 2248 "#XML_SetParamEntityParsing"><code>XML_SetParamEntityParsing</code></a>.) 2249 </p> 2250 2251 <p> 2252 <strong>Warning:</strong> Using an external entity reference handler can lead 2253 to <a href="https://libexpat.github.io/doc/xml-security/#external-entities">XXE 2254 vulnerabilities</a>. It should only be used in applications that do not parse 2255 untrusted XML input. 2256 </p> 2257 2258 <p> 2259 The <code>context</code> parameter specifies the parsing context in the format 2260 expected by the <code>context</code> argument to <code><a href= 2261 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 2262 <code>code</code> is valid only until the handler returns, so if the referenced 2263 entity is to be parsed later, it must be copied. <code>context</code> is 2264 <code>NULL</code> only when the entity is a parameter entity, which is how one 2265 can differentiate between general and parameter entities. 2266 </p> 2267 2268 <p> 2269 The <code>base</code> parameter is the base to use for relative system 2270 identifiers. It is set by <code><a href="#XML_SetBase">XML_SetBase</a></code> 2271 and may be <code>NULL</code>. The <code>publicId</code> parameter is the public 2272 id given in the entity declaration and may be <code>NULL</code>. 2273 <code>systemId</code> is the system identifier specified in the entity 2274 declaration and is never <code>NULL</code>. 2275 </p> 2276 2277 <p> 2278 There are a couple of ways in which this handler differs from others. First, 2279 this handler returns a status indicator (an integer). 2280 <code>XML_STATUS_OK</code> should be returned for successful handling of the 2281 external entity reference. Returning <code>XML_STATUS_ERROR</code> indicates 2282 failure, and causes the calling parser to return an 2283 <code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error. 2284 </p> 2285 2286 <p> 2287 Second, instead of having the user data as its first argument, it receives the 2288 parser that encountered the entity reference. This, along with the context 2289 parameter, may be used as arguments to a call to <code><a href= 2290 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 2291 Using the returned parser, the body of the external entity can be recursively 2292 parsed. 2293 </p> 2294 2295 <p> 2296 Since this handler may be called recursively, it should not be saving 2297 information into global or static variables. 2298 </p> 2299 </div> 2300 2301 <h4 id="XML_SetExternalEntityRefHandlerArg"> 2302 XML_SetExternalEntityRefHandlerArg 2303 </h4> 2304 2305 <pre class="fcndec"> 2306void XMLCALL 2307XML_SetExternalEntityRefHandlerArg(XML_Parser p, 2308 void *arg) 2309</pre> 2310 <div class="fcndef"> 2311 <p> 2312 Set the argument passed to the ExternalEntityRefHandler. If <code>arg</code> is 2313 not <code>NULL</code>, it is the new value passed to the handler set using 2314 <code><a href= 2315 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>; 2316 if <code>arg</code> is <code>NULL</code>, the argument passed to the handler 2317 function will be the parser object itself. 2318 </p> 2319 2320 <p> 2321 <strong>Note:</strong> The type of <code>arg</code> and the type of the first 2322 argument to the ExternalEntityRefHandler do not match. This function takes a 2323 <code>void *</code> to be passed to the handler, while the handler accepts an 2324 <code>XML_Parser</code>. This is a historical accident, but will not be 2325 corrected before Expat 2.0 (at the earliest) to avoid causing compiler warnings 2326 for code that's known to work with this API. It is the responsibility of the 2327 application code to know the actual type of the argument passed to the handler 2328 and to manage it properly. 2329 </p> 2330 </div> 2331 2332 <div class="handler"> 2333 <h4 id="XML_SetSkippedEntityHandler"> 2334 XML_SetSkippedEntityHandler 2335 </h4> 2336 2337 <pre class="setter"> 2338void XMLCALL 2339XML_SetSkippedEntityHandler(XML_Parser p, 2340 XML_SkippedEntityHandler handler) 2341</pre> 2342 2343 <pre class="signature"> 2344typedef void 2345(XMLCALL *XML_SkippedEntityHandler)(void *userData, 2346 const XML_Char *entityName, 2347 int is_parameter_entity); 2348</pre> 2349 <p> 2350 Set a skipped entity handler. This is called in two situations: 2351 </p> 2352 2353 <ol> 2354 <li>An entity reference is encountered for which no declaration has been read 2355 <em>and</em> this is not an error. 2356 </li> 2357 2358 <li>An internal entity reference is read, but not expanded, because <a href= 2359 "#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> has been 2360 called. 2361 </li> 2362 </ol> 2363 2364 <p> 2365 The <code>is_parameter_entity</code> argument will be non-zero for a parameter 2366 entity and zero for a general entity. 2367 </p> 2368 2369 <p> 2370 Note: Skipped parameter entities in declarations and skipped general entities 2371 in attribute values cannot be reported, because the event would be out of sync 2372 with the reporting of the declarations or attribute values 2373 </p> 2374 </div> 2375 2376 <div class="handler"> 2377 <h4 id="XML_SetUnknownEncodingHandler"> 2378 XML_SetUnknownEncodingHandler 2379 </h4> 2380 2381 <pre class="setter"> 2382void XMLCALL 2383XML_SetUnknownEncodingHandler(XML_Parser p, 2384 XML_UnknownEncodingHandler enchandler, 2385 void *encodingHandlerData) 2386</pre> 2387 2388 <pre class="signature"> 2389typedef int 2390(XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData, 2391 const XML_Char *name, 2392 XML_Encoding *info); 2393 2394typedef struct { 2395 int map[256]; 2396 void *data; 2397 int (XMLCALL *convert)(void *data, const char *s); 2398 void (XMLCALL *release)(void *data); 2399} XML_Encoding; 2400</pre> 2401 <p> 2402 Set a handler to deal with encodings other than the <a href= 2403 "#builtin_encodings">built in set</a>. This should be done before 2404 <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 2405 "#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on the given 2406 parser. 2407 </p> 2408 2409 <p> 2410 If the handler knows how to deal with an encoding with the given name, it 2411 should fill in the <code>info</code> data structure and return 2412 <code>XML_STATUS_OK</code>. Otherwise it should return 2413 <code>XML_STATUS_ERROR</code>. The handler will be called at most once per 2414 parsed (external) entity. The optional application data pointer 2415 <code>encodingHandlerData</code> will be passed back to the handler. 2416 </p> 2417 2418 <p> 2419 The map array contains information for every possible leading byte in a byte 2420 sequence. If the corresponding value is &gt;= 0, then it's a single byte 2421 sequence and the byte encodes that Unicode value. If the value is -1, then that 2422 byte is invalid as the initial byte in a sequence. If the value is -n, where n 2423 is an integer &gt; 1, then n is the number of bytes in the sequence and the 2424 actual conversion is accomplished by a call to the function pointed at by 2425 convert. This function may return -1 if the sequence itself is invalid. The 2426 convert pointer may be <code>NULL</code> if there are only single byte codes. 2427 The data parameter passed to the convert function is the data pointer from 2428 <code>XML_Encoding</code>. The string s is <em>NOT</em> null-terminated and 2429 points at the sequence of bytes to be converted. 2430 </p> 2431 2432 <p> 2433 The function pointed at by <code>release</code> is called by the parser when it 2434 is finished with the encoding. It may be <code>NULL</code>. 2435 </p> 2436 </div> 2437 2438 <div class="handler"> 2439 <h4 id="XML_SetStartNamespaceDeclHandler"> 2440 XML_SetStartNamespaceDeclHandler 2441 </h4> 2442 2443 <pre class="setter"> 2444void XMLCALL 2445XML_SetStartNamespaceDeclHandler(XML_Parser p, 2446 XML_StartNamespaceDeclHandler start); 2447</pre> 2448 2449 <pre class="signature"> 2450typedef void 2451(XMLCALL *XML_StartNamespaceDeclHandler)(void *userData, 2452 const XML_Char *prefix, 2453 const XML_Char *uri); 2454</pre> 2455 <p> 2456 Set a handler to be called when a namespace is declared. Namespace declarations 2457 occur inside start tags. But the namespace declaration start handler is called 2458 before the start tag handler for each namespace declared in that start tag. 2459 </p> 2460 </div> 2461 2462 <div class="handler"> 2463 <h4 id="XML_SetEndNamespaceDeclHandler"> 2464 XML_SetEndNamespaceDeclHandler 2465 </h4> 2466 2467 <pre class="setter"> 2468void XMLCALL 2469XML_SetEndNamespaceDeclHandler(XML_Parser p, 2470 XML_EndNamespaceDeclHandler end); 2471</pre> 2472 2473 <pre class="signature"> 2474typedef void 2475(XMLCALL *XML_EndNamespaceDeclHandler)(void *userData, 2476 const XML_Char *prefix); 2477</pre> 2478 <p> 2479 Set a handler to be called when leaving the scope of a namespace declaration. 2480 This will be called, for each namespace declaration, after the handler for the 2481 end tag of the element in which the namespace was declared. 2482 </p> 2483 </div> 2484 2485 <div class="handler"> 2486 <h4 id="XML_SetNamespaceDeclHandler"> 2487 XML_SetNamespaceDeclHandler 2488 </h4> 2489 2490 <pre class="setter"> 2491void XMLCALL 2492XML_SetNamespaceDeclHandler(XML_Parser p, 2493 XML_StartNamespaceDeclHandler start, 2494 XML_EndNamespaceDeclHandler end) 2495</pre> 2496 <p> 2497 Sets both namespace declaration handlers with a single call. 2498 </p> 2499 </div> 2500 2501 <div class="handler"> 2502 <h4 id="XML_SetXmlDeclHandler"> 2503 XML_SetXmlDeclHandler 2504 </h4> 2505 2506 <pre class="setter"> 2507void XMLCALL 2508XML_SetXmlDeclHandler(XML_Parser p, 2509 XML_XmlDeclHandler xmldecl); 2510</pre> 2511 2512 <pre class="signature"> 2513typedef void 2514(XMLCALL *XML_XmlDeclHandler)(void *userData, 2515 const XML_Char *version, 2516 const XML_Char *encoding, 2517 int standalone); 2518</pre> 2519 <p> 2520 Sets a handler that is called for XML declarations and also for text 2521 declarations discovered in external entities. The way to distinguish is that 2522 the <code>version</code> parameter will be <code>NULL</code> for text 2523 declarations. The <code>encoding</code> parameter may be <code>NULL</code> for 2524 an XML declaration. The <code>standalone</code> argument will contain -1, 0, or 2525 1 indicating respectively that there was no standalone parameter in the 2526 declaration, that it was given as no, or that it was given as yes. 2527 </p> 2528 </div> 2529 2530 <div class="handler"> 2531 <h4 id="XML_SetStartDoctypeDeclHandler"> 2532 XML_SetStartDoctypeDeclHandler 2533 </h4> 2534 2535 <pre class="setter"> 2536void XMLCALL 2537XML_SetStartDoctypeDeclHandler(XML_Parser p, 2538 XML_StartDoctypeDeclHandler start); 2539</pre> 2540 2541 <pre class="signature"> 2542typedef void 2543(XMLCALL *XML_StartDoctypeDeclHandler)(void *userData, 2544 const XML_Char *doctypeName, 2545 const XML_Char *sysid, 2546 const XML_Char *pubid, 2547 int has_internal_subset); 2548</pre> 2549 <p> 2550 Set a handler that is called at the start of a DOCTYPE declaration, before any 2551 external or internal subset is parsed. Both <code>sysid</code> and 2552 <code>pubid</code> may be <code>NULL</code>. The 2553 <code>has_internal_subset</code> will be non-zero if the DOCTYPE declaration 2554 has an internal subset. 2555 </p> 2556 </div> 2557 2558 <div class="handler"> 2559 <h4 id="XML_SetEndDoctypeDeclHandler"> 2560 XML_SetEndDoctypeDeclHandler 2561 </h4> 2562 2563 <pre class="setter"> 2564void XMLCALL 2565XML_SetEndDoctypeDeclHandler(XML_Parser p, 2566 XML_EndDoctypeDeclHandler end); 2567</pre> 2568 2569 <pre class="signature"> 2570typedef void 2571(XMLCALL *XML_EndDoctypeDeclHandler)(void *userData); 2572</pre> 2573 <p> 2574 Set a handler that is called at the end of a DOCTYPE declaration, after parsing 2575 any external subset. 2576 </p> 2577 </div> 2578 2579 <div class="handler"> 2580 <h4 id="XML_SetDoctypeDeclHandler"> 2581 XML_SetDoctypeDeclHandler 2582 </h4> 2583 2584 <pre class="setter"> 2585void XMLCALL 2586XML_SetDoctypeDeclHandler(XML_Parser p, 2587 XML_StartDoctypeDeclHandler start, 2588 XML_EndDoctypeDeclHandler end); 2589</pre> 2590 <p> 2591 Set both doctype handlers with one call. 2592 </p> 2593 </div> 2594 2595 <div class="handler"> 2596 <h4 id="XML_SetElementDeclHandler"> 2597 XML_SetElementDeclHandler 2598 </h4> 2599 2600 <pre class="setter"> 2601void XMLCALL 2602XML_SetElementDeclHandler(XML_Parser p, 2603 XML_ElementDeclHandler eldecl); 2604</pre> 2605 2606 <pre class="signature"> 2607typedef void 2608(XMLCALL *XML_ElementDeclHandler)(void *userData, 2609 const XML_Char *name, 2610 XML_Content *model); 2611</pre> 2612 2613 <pre class="signature"> 2614enum XML_Content_Type { 2615 XML_CTYPE_EMPTY = 1, 2616 XML_CTYPE_ANY, 2617 XML_CTYPE_MIXED, 2618 XML_CTYPE_NAME, 2619 XML_CTYPE_CHOICE, 2620 XML_CTYPE_SEQ 2621}; 2622 2623enum XML_Content_Quant { 2624 XML_CQUANT_NONE, 2625 XML_CQUANT_OPT, 2626 XML_CQUANT_REP, 2627 XML_CQUANT_PLUS 2628}; 2629 2630typedef struct XML_cp XML_Content; 2631 2632struct XML_cp { 2633 enum XML_Content_Type type; 2634 enum XML_Content_Quant quant; 2635 const XML_Char * name; 2636 unsigned int numchildren; 2637 XML_Content * children; 2638}; 2639</pre> 2640 <p> 2641 Sets a handler for element declarations in a DTD. The handler gets called with 2642 the name of the element in the declaration and a pointer to a structure that 2643 contains the element model. It's the user code's responsibility to free model 2644 when finished with via a call to <code><a href= 2645 "#XML_FreeContentModel">XML_FreeContentModel</a></code>. There is no need to 2646 free the model from the handler, it can be kept around and freed at a later 2647 stage. 2648 </p> 2649 2650 <p> 2651 The <code>model</code> argument is the root of a tree of 2652 <code>XML_Content</code> nodes. If <code>type</code> equals 2653 <code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then 2654 <code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other fields 2655 will be zero or <code>NULL</code>. If <code>type</code> is 2656 <code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be 2657 <code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and 2658 <code>numchildren</code> will contain the number of elements that are allowed 2659 to be mixed in and <code>children</code> points to an array of 2660 <code>XML_Content</code> structures that will all have type XML_CTYPE_NAME with 2661 no quantification. Only the root node can be type <code>XML_CTYPE_EMPTY</code>, 2662 <code>XML_CTYPE_ANY</code>, or <code>XML_CTYPE_MIXED</code>. 2663 </p> 2664 2665 <p> 2666 For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field points to the 2667 name and the <code>numchildren</code> and <code>children</code> fields will be 2668 zero and <code>NULL</code>. The <code>quant</code> field will indicate any 2669 quantifiers placed on the name. 2670 </p> 2671 2672 <p> 2673 Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> indicate a 2674 choice or sequence respectively. The <code>numchildren</code> field indicates 2675 how many nodes in the choice or sequence and <code>children</code> points to 2676 the nodes. 2677 </p> 2678 </div> 2679 2680 <div class="handler"> 2681 <h4 id="XML_SetAttlistDeclHandler"> 2682 XML_SetAttlistDeclHandler 2683 </h4> 2684 2685 <pre class="setter"> 2686void XMLCALL 2687XML_SetAttlistDeclHandler(XML_Parser p, 2688 XML_AttlistDeclHandler attdecl); 2689</pre> 2690 2691 <pre class="signature"> 2692typedef void 2693(XMLCALL *XML_AttlistDeclHandler)(void *userData, 2694 const XML_Char *elname, 2695 const XML_Char *attname, 2696 const XML_Char *att_type, 2697 const XML_Char *dflt, 2698 int isrequired); 2699</pre> 2700 <p> 2701 Set a handler for attlist declarations in the DTD. This handler is called for 2702 <em>each</em> attribute. So a single attlist declaration with multiple 2703 attributes declared will generate multiple calls to this handler. The 2704 <code>elname</code> parameter returns the name of the element for which the 2705 attribute is being declared. The attribute name is in the <code>attname</code> 2706 parameter. The attribute type is in the <code>att_type</code> parameter. It is 2707 the string representing the type in the declaration with whitespace removed. 2708 </p> 2709 2710 <p> 2711 The <code>dflt</code> parameter holds the default value. It will be 2712 <code>NULL</code> in the case of "#IMPLIED" or "#REQUIRED" attributes. You can 2713 distinguish these two cases by checking the <code>isrequired</code> parameter, 2714 which will be true in the case of "#REQUIRED" attributes. Attributes which are 2715 "#FIXED" will have also have a true <code>isrequired</code>, but they will have 2716 the non-<code>NULL</code> fixed value in the <code>dflt</code> parameter. 2717 </p> 2718 </div> 2719 2720 <div class="handler"> 2721 <h4 id="XML_SetEntityDeclHandler"> 2722 XML_SetEntityDeclHandler 2723 </h4> 2724 2725 <pre class="setter"> 2726void XMLCALL 2727XML_SetEntityDeclHandler(XML_Parser p, 2728 XML_EntityDeclHandler handler); 2729</pre> 2730 2731 <pre class="signature"> 2732typedef void 2733(XMLCALL *XML_EntityDeclHandler)(void *userData, 2734 const XML_Char *entityName, 2735 int is_parameter_entity, 2736 const XML_Char *value, 2737 int value_length, 2738 const XML_Char *base, 2739 const XML_Char *systemId, 2740 const XML_Char *publicId, 2741 const XML_Char *notationName); 2742</pre> 2743 <p> 2744 Sets a handler that will be called for all entity declarations. The 2745 <code>is_parameter_entity</code> argument will be non-zero in the case of 2746 parameter entities and zero otherwise. 2747 </p> 2748 2749 <p> 2750 For internal entities (<code>&lt;!ENTITY foo "bar"&gt;</code>), 2751 <code>value</code> will be non-<code>NULL</code> and <code>systemId</code>, 2752 <code>publicId</code>, and <code>notationName</code> will all be 2753 <code>NULL</code>. The value string is <em>not</em> null-terminated; the length 2754 is provided in the <code>value_length</code> parameter. Do not use 2755 <code>value_length</code> to test for internal entities, since it is legal to 2756 have zero-length values. Instead check for whether or not <code>value</code> is 2757 <code>NULL</code>. 2758 </p> 2759 2760 <p> 2761 The <code>notationName</code> argument will have a non-<code>NULL</code> value 2762 only for unparsed entity declarations. 2763 </p> 2764 </div> 2765 2766 <div class="handler"> 2767 <h4 id="XML_SetUnparsedEntityDeclHandler"> 2768 XML_SetUnparsedEntityDeclHandler 2769 </h4> 2770 2771 <pre class="setter"> 2772void XMLCALL 2773XML_SetUnparsedEntityDeclHandler(XML_Parser p, 2774 XML_UnparsedEntityDeclHandler h) 2775</pre> 2776 2777 <pre class="signature"> 2778typedef void 2779(XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData, 2780 const XML_Char *entityName, 2781 const XML_Char *base, 2782 const XML_Char *systemId, 2783 const XML_Char *publicId, 2784 const XML_Char *notationName); 2785</pre> 2786 <p> 2787 Set a handler that receives declarations of unparsed entities. These are entity 2788 declarations that have a notation (NDATA) field: 2789 </p> 2790 2791 <div id="eg"> 2792 <pre> 2793&lt;!ENTITY logo SYSTEM "images/logo.gif" NDATA gif&gt; 2794</pre> 2795 </div> 2796 2797 <p> 2798 This handler is obsolete and is provided for backwards compatibility. Use 2799 instead <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a>. 2800 </p> 2801 </div> 2802 2803 <div class="handler"> 2804 <h4 id="XML_SetNotationDeclHandler"> 2805 XML_SetNotationDeclHandler 2806 </h4> 2807 2808 <pre class="setter"> 2809void XMLCALL 2810XML_SetNotationDeclHandler(XML_Parser p, 2811 XML_NotationDeclHandler h) 2812</pre> 2813 2814 <pre class="signature"> 2815typedef void 2816(XMLCALL *XML_NotationDeclHandler)(void *userData, 2817 const XML_Char *notationName, 2818 const XML_Char *base, 2819 const XML_Char *systemId, 2820 const XML_Char *publicId); 2821</pre> 2822 <p> 2823 Set a handler that receives notation declarations. 2824 </p> 2825 </div> 2826 2827 <div class="handler"> 2828 <h4 id="XML_SetNotStandaloneHandler"> 2829 XML_SetNotStandaloneHandler 2830 </h4> 2831 2832 <pre class="setter"> 2833void XMLCALL 2834XML_SetNotStandaloneHandler(XML_Parser p, 2835 XML_NotStandaloneHandler h) 2836</pre> 2837 2838 <pre class="signature"> 2839typedef int 2840(XMLCALL *XML_NotStandaloneHandler)(void *userData); 2841</pre> 2842 <p> 2843 Set a handler that is called if the document is not "standalone". This happens 2844 when there is an external subset or a reference to a parameter entity, but does 2845 not have standalone set to "yes" in an XML declaration. If this handler returns 2846 <code>XML_STATUS_ERROR</code>, then the parser will throw an 2847 <code>XML_ERROR_NOT_STANDALONE</code> error. 2848 </p> 2849 </div> 2850 2851 <h3> 2852 <a id="position" name="position">Parse position and error reporting functions</a> 2853 </h3> 2854 2855 <p> 2856 These are the functions you'll want to call when the parse functions return 2857 <code>XML_STATUS_ERROR</code> (a parse error has occurred), although the position 2858 reporting functions are useful outside of errors. The position reported is the 2859 byte position (in the original document or entity encoding) of the first of the 2860 sequence of characters that generated the current event (or the error that caused 2861 the parse functions to return <code>XML_STATUS_ERROR</code>.) The exceptions are 2862 callbacks triggered by declarations in the document prologue, in which case they 2863 exact position reported is somewhere in the relevant markup, but not necessarily 2864 as meaningful as for other events. 2865 </p> 2866 2867 <p> 2868 The position reporting functions are accurate only outside of the DTD. In other 2869 words, they usually return bogus information when called from within a DTD 2870 declaration handler. 2871 </p> 2872 2873 <h4 id="XML_GetErrorCode"> 2874 XML_GetErrorCode 2875 </h4> 2876 2877 <pre class="fcndec"> 2878enum XML_Error XMLCALL 2879XML_GetErrorCode(XML_Parser p); 2880</pre> 2881 <div class="fcndef"> 2882 Return what type of error has occurred. 2883 </div> 2884 2885 <h4 id="XML_ErrorString"> 2886 XML_ErrorString 2887 </h4> 2888 2889 <pre class="fcndec"> 2890const XML_LChar * XMLCALL 2891XML_ErrorString(enum XML_Error code); 2892</pre> 2893 <div class="fcndef"> 2894 Return a string describing the error corresponding to code. The code should be 2895 one of the enums that can be returned from <code><a href= 2896 "#XML_GetErrorCode">XML_GetErrorCode</a></code>. 2897 </div> 2898 2899 <h4 id="XML_GetCurrentByteIndex"> 2900 XML_GetCurrentByteIndex 2901 </h4> 2902 2903 <pre class="fcndec"> 2904XML_Index XMLCALL 2905XML_GetCurrentByteIndex(XML_Parser p); 2906</pre> 2907 <div class="fcndef"> 2908 Return the byte offset of the position. This always corresponds to the values 2909 returned by <code><a href= 2910 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and 2911 <code><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code>. 2912 </div> 2913 2914 <h4 id="XML_GetCurrentLineNumber"> 2915 XML_GetCurrentLineNumber 2916 </h4> 2917 2918 <pre class="fcndec"> 2919XML_Size XMLCALL 2920XML_GetCurrentLineNumber(XML_Parser p); 2921</pre> 2922 <div class="fcndef"> 2923 Return the line number of the position. The first line is reported as 2924 <code>1</code>. 2925 </div> 2926 2927 <h4 id="XML_GetCurrentColumnNumber"> 2928 XML_GetCurrentColumnNumber 2929 </h4> 2930 2931 <pre class="fcndec"> 2932XML_Size XMLCALL 2933XML_GetCurrentColumnNumber(XML_Parser p); 2934</pre> 2935 <div class="fcndef"> 2936 Return the <em>offset</em>, from the beginning of the current line, of the 2937 position. The first column is reported as <code>0</code>. 2938 </div> 2939 2940 <h4 id="XML_GetCurrentByteCount"> 2941 XML_GetCurrentByteCount 2942 </h4> 2943 2944 <pre class="fcndec"> 2945int XMLCALL 2946XML_GetCurrentByteCount(XML_Parser p); 2947</pre> 2948 <div class="fcndef"> 2949 Return the number of bytes in the current event. Returns <code>0</code> if the 2950 event is inside a reference to an internal entity and for the end-tag event for 2951 empty element tags (the later can be used to distinguish empty-element tags from 2952 empty elements using separate start and end tags). 2953 </div> 2954 2955 <h4 id="XML_GetInputContext"> 2956 XML_GetInputContext 2957 </h4> 2958 2959 <pre class="fcndec"> 2960const char * XMLCALL 2961XML_GetInputContext(XML_Parser p, 2962 int *offset, 2963 int *size); 2964</pre> 2965 <div class="fcndef"> 2966 <p> 2967 Returns the parser's input buffer, sets the integer pointed at by 2968 <code>offset</code> to the offset within this buffer of the current parse 2969 position, and set the integer pointed at by <code>size</code> to the size of 2970 the returned buffer. 2971 </p> 2972 2973 <p> 2974 This should only be called from within a handler during an active parse and the 2975 returned buffer should only be referred to from within the handler that made 2976 the call. This input buffer contains the untranslated bytes of the input. 2977 </p> 2978 2979 <p> 2980 Only a limited amount of context is kept, so if the event triggering a call 2981 spans over a very large amount of input, the actual parse position may be 2982 before the beginning of the buffer. 2983 </p> 2984 2985 <p> 2986 If <code>XML_CONTEXT_BYTES</code> is zero, this will always return 2987 <code>NULL</code>. 2988 </p> 2989 </div> 2990 2991 <h3> 2992 <a id="attack-protection" name="attack-protection">Attack Protection</a><a id= 2993 "billion-laughs" name="billion-laughs"></a> 2994 </h3> 2995 2996 <h4 id="XML_SetBillionLaughsAttackProtectionMaximumAmplification"> 2997 XML_SetBillionLaughsAttackProtectionMaximumAmplification 2998 </h4> 2999 3000 <pre class="fcndec"> 3001/* Added in Expat 2.4.0. */ 3002XML_Bool XMLCALL 3003XML_SetBillionLaughsAttackProtectionMaximumAmplification(XML_Parser p, 3004 float maximumAmplificationFactor); 3005</pre> 3006 <div class="fcndef"> 3007 <p> 3008 Sets the maximum tolerated amplification factor for protection against <a href= 3009 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs 3010 attacks</a> (default: <code>100.0</code>) of parser <code>p</code> to 3011 <code>maximumAmplificationFactor</code>, and returns <code>XML_TRUE</code> upon 3012 success and <code>XML_FALSE</code> upon error. 3013 </p> 3014 3015 <p> 3016 Once the <a href= 3017 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">threshold for 3018 activation</a> is reached, the amplification factor is calculated as .. 3019 </p> 3020 3021 <pre>amplification := (direct + indirect) / direct</pre> 3022 <p> 3023 .. while parsing, whereas <code>direct</code> is the number of bytes read from 3024 the primary document in parsing and <code>indirect</code> is the number of 3025 bytes added by expanding entities and reading of external DTD files, combined. 3026 </p> 3027 3028 <p> 3029 For a call to 3030 <code>XML_SetBillionLaughsAttackProtectionMaximumAmplification</code> to 3031 succeed: 3032 </p> 3033 3034 <ul> 3035 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3036 any parent parsers) and 3037 </li> 3038 3039 <li> 3040 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and 3041 greater than or equal to <code>1.0</code>. 3042 </li> 3043 </ul> 3044 3045 <p> 3046 <strong>Note:</strong> If you ever need to increase this value for non-attack 3047 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3048 bug report</a>. 3049 </p> 3050 3051 <p> 3052 <strong>Note:</strong> Peak amplifications of factor 15,000 for the entire 3053 payload and of factor 30,000 in the middle of parsing have been observed with 3054 small benign files in practice. So if you do reduce the maximum allowed 3055 amplification, please make sure that the activation threshold is still big 3056 enough to not end up with undesired false positives (i.e. benign files being 3057 rejected). 3058 </p> 3059 </div> 3060 3061 <h4 id="XML_SetBillionLaughsAttackProtectionActivationThreshold"> 3062 XML_SetBillionLaughsAttackProtectionActivationThreshold 3063 </h4> 3064 3065 <pre class="fcndec"> 3066/* Added in Expat 2.4.0. */ 3067XML_Bool XMLCALL 3068XML_SetBillionLaughsAttackProtectionActivationThreshold(XML_Parser p, 3069 unsigned long long activationThresholdBytes); 3070</pre> 3071 <div class="fcndef"> 3072 <p> 3073 Sets number of output bytes (including amplification from entity expansion and 3074 reading DTD files) needed to activate protection against <a href= 3075 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs 3076 attacks</a> (default: <code>8 MiB</code>) of parser <code>p</code> to 3077 <code>activationThresholdBytes</code>, and returns <code>XML_TRUE</code> upon 3078 success and <code>XML_FALSE</code> upon error. 3079 </p> 3080 3081 <p> 3082 For a call to 3083 <code>XML_SetBillionLaughsAttackProtectionActivationThreshold</code> to 3084 succeed: 3085 </p> 3086 3087 <ul> 3088 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3089 any parent parsers). 3090 </li> 3091 </ul> 3092 3093 <p> 3094 <strong>Note:</strong> If you ever need to increase this value for non-attack 3095 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3096 bug report</a>. 3097 </p> 3098 3099 <p> 3100 <strong>Note:</strong> Activation thresholds below 4 MiB are known to break 3101 support for <a href= 3102 "https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</a> 3103 1.3 payload and are hence not recommended. 3104 </p> 3105 </div> 3106 3107 <h4 id="XML_SetAllocTrackerMaximumAmplification"> 3108 XML_SetAllocTrackerMaximumAmplification 3109 </h4> 3110 3111 <pre class="fcndec"> 3112/* Added in Expat 2.7.2. */ 3113XML_Bool 3114XML_SetAllocTrackerMaximumAmplification(XML_Parser p, 3115 float maximumAmplificationFactor); 3116</pre> 3117 <div class="fcndef"> 3118 <p> 3119 Sets the maximum tolerated amplification factor between direct input and bytes 3120 of dynamic memory allocated (default: <code>100.0</code>) of parser 3121 <code>p</code> to <code>maximumAmplificationFactor</code>, and returns 3122 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 3123 </p> 3124 3125 <p> 3126 <strong>Note:</strong> There are three types of allocations that intentionally 3127 bypass tracking and limiting: 3128 </p> 3129 3130 <ul> 3131 <li>application calls to functions <code><a href= 3132 "#XML_MemMalloc">XML_MemMalloc</a></code> and <code><a href="#XML_MemRealloc"> 3133 XML_MemRealloc</a></code><em>healthy</em> use of these two functions 3134 continues to be a responsibility of the application using Expat —, 3135 </li> 3136 3137 <li>the main character buffer used by functions <code><a href="#XML_GetBuffer"> 3138 XML_GetBuffer</a></code> and <code><a href= 3139 "#XML_ParseBuffer">XML_ParseBuffer</a></code> (and thus also by plain 3140 <code><a href="#XML_Parse">XML_Parse</a></code>), and 3141 </li> 3142 3143 <li>the <a href="#XML_SetElementDeclHandler">content model memory</a> (that is 3144 passed to the <a href="#XML_SetElementDeclHandler">element declaration 3145 handler</a> and freed by a call to <code><a href= 3146 "#XML_FreeContentModel">XML_FreeContentModel</a></code>). 3147 </li> 3148 </ul> 3149 3150 <p> 3151 Once the <a href="#XML_SetAllocTrackerActivationThreshold">threshold for 3152 activation</a> is reached, the amplification factor is calculated as .. 3153 </p> 3154 3155 <pre>amplification := allocated / direct</pre> 3156 <p> 3157 .. while parsing, whereas <code>direct</code> is the number of bytes read from 3158 the primary document in parsing and <code>allocated</code> is the number of 3159 bytes of dynamic memory allocated in the parser hierarchy. 3160 </p> 3161 3162 <p> 3163 For a call to <code>XML_SetAllocTrackerMaximumAmplification</code> to succeed: 3164 </p> 3165 3166 <ul> 3167 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3168 any parent parsers) and 3169 </li> 3170 3171 <li> 3172 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and 3173 greater than or equal to <code>1.0</code>. 3174 </li> 3175 </ul> 3176 3177 <p> 3178 <strong>Note:</strong> If you ever need to increase this value for non-attack 3179 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3180 bug report</a>. 3181 </p> 3182 3183 <p> 3184 <strong>Note:</strong> Amplifications factors greater than <code>100.0</code> 3185 can been observed near the start of parsing even with benign files in practice. 3186 So if you do reduce the maximum allowed amplification, please make sure that 3187 the activation threshold is still big enough to not end up with undesired false 3188 positives (i.e. benign files being rejected). 3189 </p> 3190 </div> 3191 3192 <h4 id="XML_SetAllocTrackerActivationThreshold"> 3193 XML_SetAllocTrackerActivationThreshold 3194 </h4> 3195 3196 <pre class="fcndec"> 3197/* Added in Expat 2.7.2. */ 3198XML_Bool 3199XML_SetAllocTrackerActivationThreshold(XML_Parser p, 3200 unsigned long long activationThresholdBytes); 3201</pre> 3202 <div class="fcndef"> 3203 <p> 3204 Sets number of allocated bytes of dynamic memory needed to activate protection 3205 against disproportionate use of RAM (default: <code>64 MiB</code>) of parser 3206 <code>p</code> to <code>activationThresholdBytes</code>, and returns 3207 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 3208 </p> 3209 3210 <p> 3211 <strong>Note:</strong> For types of allocations that intentionally bypass 3212 tracking and limiting, please see <code><a href= 3213 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a></code> 3214 above. 3215 </p> 3216 3217 <p> 3218 For a call to <code>XML_SetAllocTrackerActivationThreshold</code> to succeed: 3219 </p> 3220 3221 <ul> 3222 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3223 any parent parsers). 3224 </li> 3225 </ul> 3226 3227 <p> 3228 <strong>Note:</strong> If you ever need to increase this value for non-attack 3229 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3230 bug report</a>. 3231 </p> 3232 </div> 3233 3234 <h4 id="XML_SetReparseDeferralEnabled"> 3235 XML_SetReparseDeferralEnabled 3236 </h4> 3237 3238 <pre class="fcndec"> 3239/* Added in Expat 2.6.0. */ 3240XML_Bool XMLCALL 3241XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled); 3242</pre> 3243 <div class="fcndef"> 3244 <p> 3245 Large tokens may require many parse calls before enough data is available for 3246 Expat to parse it in full. If Expat retried parsing the token on every parse 3247 call, parsing could take quadratic time. To avoid this, Expat only retries once 3248 a significant amount of new data is available. This function allows disabling 3249 this behavior. 3250 </p> 3251 3252 <p> 3253 The <code>enabled</code> argument should be <code>XML_TRUE</code> or 3254 <code>XML_FALSE</code>. 3255 </p> 3256 3257 <p> 3258 Returns <code>XML_TRUE</code> on success, and <code>XML_FALSE</code> on error. 3259 </p> 3260 </div> 3261 3262 <h3> 3263 <a id="miscellaneous" name="miscellaneous">Miscellaneous functions</a> 3264 </h3> 3265 3266 <p> 3267 The functions in this section either obtain state information from the parser or 3268 can be used to dynamically set parser options. 3269 </p> 3270 3271 <h4 id="XML_SetUserData"> 3272 XML_SetUserData 3273 </h4> 3274 3275 <pre class="fcndec"> 3276void XMLCALL 3277XML_SetUserData(XML_Parser p, 3278 void *userData); 3279</pre> 3280 <div class="fcndef"> 3281 This sets the user data pointer that gets passed to handlers. It overwrites any 3282 previous value for this pointer. Note that the application is responsible for 3283 freeing the memory associated with <code>userData</code> when it is finished with 3284 the parser. So if you call this when there's already a pointer there, and you 3285 haven't freed the memory associated with it, then you've probably just leaked 3286 memory. 3287 </div> 3288 3289 <h4 id="XML_GetUserData"> 3290 XML_GetUserData 3291 </h4> 3292 3293 <pre class="fcndec"> 3294void * XMLCALL 3295XML_GetUserData(XML_Parser p); 3296</pre> 3297 <div class="fcndef"> 3298 This returns the user data pointer that gets passed to handlers. It is actually 3299 implemented as a macro. 3300 </div> 3301 3302 <h4 id="XML_UseParserAsHandlerArg"> 3303 XML_UseParserAsHandlerArg 3304 </h4> 3305 3306 <pre class="fcndec"> 3307void XMLCALL 3308XML_UseParserAsHandlerArg(XML_Parser p); 3309</pre> 3310 <div class="fcndef"> 3311 After this is called, handlers receive the parser in their <code>userData</code> 3312 arguments. The user data can still be obtained using the <code><a href= 3313 "#XML_GetUserData">XML_GetUserData</a></code> function. 3314 </div> 3315 3316 <h4 id="XML_SetBase"> 3317 XML_SetBase 3318 </h4> 3319 3320 <pre class="fcndec"> 3321enum XML_Status XMLCALL 3322XML_SetBase(XML_Parser p, 3323 const XML_Char *base); 3324</pre> 3325 <div class="fcndef"> 3326 Set the base to be used for resolving relative URIs in system identifiers. The 3327 return value is <code>XML_STATUS_ERROR</code> if there's no memory to store base, 3328 otherwise it's <code>XML_STATUS_OK</code>. 3329 </div> 3330 3331 <h4 id="XML_GetBase"> 3332 XML_GetBase 3333 </h4> 3334 3335 <pre class="fcndec"> 3336const XML_Char * XMLCALL 3337XML_GetBase(XML_Parser p); 3338</pre> 3339 <div class="fcndef"> 3340 Return the base for resolving relative URIs. 3341 </div> 3342 3343 <h4 id="XML_GetSpecifiedAttributeCount"> 3344 XML_GetSpecifiedAttributeCount 3345 </h4> 3346 3347 <pre class="fcndec"> 3348int XMLCALL 3349XML_GetSpecifiedAttributeCount(XML_Parser p); 3350</pre> 3351 <div class="fcndef"> 3352 When attributes are reported to the start handler in the atts vector, attributes 3353 that were explicitly set in the element occur before any attributes that receive 3354 their value from default information in an ATTLIST declaration. This function 3355 returns the number of attributes that were explicitly set times two, thus giving 3356 the offset in the <code>atts</code> array passed to the start tag handler of the 3357 first attribute set due to defaults. It supplies information for the last call to 3358 a start handler. If called inside a start handler, then that means the current 3359 call. 3360 </div> 3361 3362 <h4 id="XML_GetIdAttributeIndex"> 3363 XML_GetIdAttributeIndex 3364 </h4> 3365 3366 <pre class="fcndec"> 3367int XMLCALL 3368XML_GetIdAttributeIndex(XML_Parser p); 3369</pre> 3370 <div class="fcndef"> 3371 Returns the index of the ID attribute passed in the atts array in the last call 3372 to <code><a href="#XML_StartElementHandler">XML_StartElementHandler</a></code>, 3373 or -1 if there is no ID attribute. If called inside a start handler, then that 3374 means the current call. 3375 </div> 3376 3377 <h4 id="XML_GetAttributeInfo"> 3378 XML_GetAttributeInfo 3379 </h4> 3380 3381 <pre class="fcndec"> 3382const XML_AttrInfo * XMLCALL 3383XML_GetAttributeInfo(XML_Parser parser); 3384</pre> 3385 3386 <pre class="signature"> 3387typedef struct { 3388 XML_Index nameStart; /* Offset to beginning of the attribute name. */ 3389 XML_Index nameEnd; /* Offset after the attribute name's last byte. */ 3390 XML_Index valueStart; /* Offset to beginning of the attribute value. */ 3391 XML_Index valueEnd; /* Offset after the attribute value's last byte. */ 3392} XML_AttrInfo; 3393</pre> 3394 <div class="fcndef"> 3395 Returns an array of <code>XML_AttrInfo</code> structures for the attribute/value 3396 pairs passed in the last call to the <code>XML_StartElementHandler</code> that 3397 were specified in the start-tag rather than defaulted. Each attribute/value pair 3398 counts as 1; thus the number of entries in the array is 3399 <code>XML_GetSpecifiedAttributeCount(parser) / 2</code>. 3400 </div> 3401 3402 <h4 id="XML_SetEncoding"> 3403 XML_SetEncoding 3404 </h4> 3405 3406 <pre class="fcndec"> 3407enum XML_Status XMLCALL 3408XML_SetEncoding(XML_Parser p, 3409 const XML_Char *encoding); 3410</pre> 3411 <div class="fcndef"> 3412 Set the encoding to be used by the parser. It is equivalent to passing a 3413 non-<code>NULL</code> encoding argument to the parser creation functions. It must 3414 not be called after <code><a href="#XML_Parse">XML_Parse</a></code> or 3415 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on 3416 the given parser. Returns <code>XML_STATUS_OK</code> on success or 3417 <code>XML_STATUS_ERROR</code> on error. 3418 </div> 3419 3420 <h4 id="XML_SetParamEntityParsing"> 3421 XML_SetParamEntityParsing 3422 </h4> 3423 3424 <pre class="fcndec"> 3425int XMLCALL 3426XML_SetParamEntityParsing(XML_Parser p, 3427 enum XML_ParamEntityParsing code); 3428</pre> 3429 <div class="fcndef"> 3430 This enables parsing of parameter entities, including the external parameter 3431 entity that is the external DTD subset, according to <code>code</code>. The 3432 choices for <code>code</code> are: 3433 <ul> 3434 <li> 3435 <code>XML_PARAM_ENTITY_PARSING_NEVER</code> 3436 </li> 3437 3438 <li> 3439 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code> 3440 </li> 3441 3442 <li> 3443 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code> 3444 </li> 3445 </ul> 3446 <b>Note:</b> If <code>XML_SetParamEntityParsing</code> is called after 3447 <code>XML_Parse</code> or <code>XML_ParseBuffer</code>, then it has no effect and 3448 will always return 0. 3449 </div> 3450 3451 <h4 id="XML_SetHashSalt"> 3452 XML_SetHashSalt 3453 </h4> 3454 3455 <pre class="fcndec"> 3456int XMLCALL 3457XML_SetHashSalt(XML_Parser p, 3458 unsigned long hash_salt); 3459</pre> 3460 <div class="fcndef"> 3461 Sets the hash salt to use for internal hash calculations. Helps in preventing DoS 3462 attacks based on predicting hash function behavior. In order to have an effect 3463 this must be called before parsing has started. Returns 1 if successful, 0 when 3464 called after <code>XML_Parse</code> or <code>XML_ParseBuffer</code>. 3465 <p> 3466 <b>Note:</b> This call is optional, as the parser will auto-generate a new 3467 random salt value if no value has been set at the start of parsing. 3468 </p> 3469 3470 <p> 3471 <b>Note:</b> One should not call <code>XML_SetHashSalt</code> with a hash salt 3472 value of 0, as this value is used as sentinel value to indicate that 3473 <code>XML_SetHashSalt</code> has <b>not</b> been called. Consequently such a 3474 call will have no effect, even if it returns 1. 3475 </p> 3476 </div> 3477 3478 <h4 id="XML_UseForeignDTD"> 3479 XML_UseForeignDTD 3480 </h4> 3481 3482 <pre class="fcndec"> 3483enum XML_Error XMLCALL 3484XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD); 3485</pre> 3486 <div class="fcndef"> 3487 <p> 3488 This function allows an application to provide an external subset for the 3489 document type declaration for documents which do not specify an external subset 3490 of their own. For documents which specify an external subset in their DOCTYPE 3491 declaration, the application-provided subset will be ignored. If the document 3492 does not contain a DOCTYPE declaration at all and <code>useDTD</code> is true, 3493 the application-provided subset will be parsed, but the 3494 <code>startDoctypeDeclHandler</code> and <code>endDoctypeDeclHandler</code> 3495 functions, if set, will not be called. The setting of parameter entity parsing, 3496 controlled using <code><a href= 3497 "#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code>, will be 3498 honored. 3499 </p> 3500 3501 <p> 3502 The application-provided external subset is read by calling the external entity 3503 reference handler set via <code><a href= 3504 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code> 3505 with both <code>publicId</code> and <code>systemId</code> set to 3506 <code>NULL</code>. 3507 </p> 3508 3509 <p> 3510 If this function is called after parsing has begun, it returns 3511 <code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores 3512 <code>useDTD</code>. If called when Expat has been compiled without DTD 3513 support, it returns <code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise, 3514 it returns <code>XML_ERROR_NONE</code>. 3515 </p> 3516 3517 <p> 3518 <b>Note:</b> For the purpose of checking WFC: Entity Declared, passing 3519 <code>useDTD == XML_TRUE</code> will make the parser behave as if the document 3520 had a DTD with an external subset. This holds true even if the external entity 3521 reference handler returns without action. 3522 </p> 3523 </div> 3524 3525 <h4 id="XML_SetReturnNSTriplet"> 3526 XML_SetReturnNSTriplet 3527 </h4> 3528 3529 <pre class="fcndec"> 3530void XMLCALL 3531XML_SetReturnNSTriplet(XML_Parser parser, 3532 int do_nst); 3533</pre> 3534 <div class="fcndef"> 3535 <p> 3536 This function only has an effect when using a parser created with 3537 <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, i.e. when 3538 namespace processing is in effect. The <code>do_nst</code> sets whether or not 3539 prefixes are returned with names qualified with a namespace prefix. If this 3540 function is called with <code>do_nst</code> non-zero, then afterwards namespace 3541 qualified names (that is qualified with a prefix as opposed to belonging to a 3542 default namespace) are returned as a triplet with the three parts separated by 3543 the namespace separator specified when the parser was created. The order of 3544 returned parts is URI, local name, and prefix. 3545 </p> 3546 3547 <p> 3548 If <code>do_nst</code> is zero, then namespaces are reported in the default 3549 manner, URI then local_name separated by the namespace separator. 3550 </p> 3551 </div> 3552 3553 <h4 id="XML_DefaultCurrent"> 3554 XML_DefaultCurrent 3555 </h4> 3556 3557 <pre class="fcndec"> 3558void XMLCALL 3559XML_DefaultCurrent(XML_Parser parser); 3560</pre> 3561 <div class="fcndef"> 3562 This can be called within a handler for a start element, end element, processing 3563 instruction or character data. It causes the corresponding markup to be passed to 3564 the default handler set by <code><a href= 3565 "#XML_SetDefaultHandler">XML_SetDefaultHandler</a></code> or <code><a href= 3566 "#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></code>. It does 3567 nothing if there is not a default handler. 3568 </div> 3569 3570 <h4 id="XML_ExpatVersion"> 3571 XML_ExpatVersion 3572 </h4> 3573 3574 <pre class="fcndec"> 3575XML_LChar * XMLCALL 3576XML_ExpatVersion(); 3577</pre> 3578 <div class="fcndef"> 3579 Return the library version as a string (e.g. <code>"expat_1.95.1"</code>). 3580 </div> 3581 3582 <h4 id="XML_ExpatVersionInfo"> 3583 XML_ExpatVersionInfo 3584 </h4> 3585 3586 <pre class="fcndec"> 3587struct XML_Expat_Version XMLCALL 3588XML_ExpatVersionInfo(); 3589</pre> 3590 3591 <pre class="signature"> 3592typedef struct { 3593 int major; 3594 int minor; 3595 int micro; 3596} XML_Expat_Version; 3597</pre> 3598 <div class="fcndef"> 3599 Return the library version information as a structure. Some macros are also 3600 defined that support compile-time tests of the library version: 3601 <ul> 3602 <li> 3603 <code>XML_MAJOR_VERSION</code> 3604 </li> 3605 3606 <li> 3607 <code>XML_MINOR_VERSION</code> 3608 </li> 3609 3610 <li> 3611 <code>XML_MICRO_VERSION</code> 3612 </li> 3613 </ul> 3614 Testing these constants is currently the best way to determine if particular 3615 parts of the Expat API are available. 3616 </div> 3617 3618 <h4 id="XML_GetFeatureList"> 3619 XML_GetFeatureList 3620 </h4> 3621 3622 <pre class="fcndec"> 3623const XML_Feature * XMLCALL 3624XML_GetFeatureList(); 3625</pre> 3626 3627 <pre class="signature"> 3628enum XML_FeatureEnum { 3629 XML_FEATURE_END = 0, 3630 XML_FEATURE_UNICODE, 3631 XML_FEATURE_UNICODE_WCHAR_T, 3632 XML_FEATURE_DTD, 3633 XML_FEATURE_CONTEXT_BYTES, 3634 XML_FEATURE_MIN_SIZE, 3635 XML_FEATURE_SIZEOF_XML_CHAR, 3636 XML_FEATURE_SIZEOF_XML_LCHAR, 3637 XML_FEATURE_NS, 3638 XML_FEATURE_LARGE_SIZE 3639}; 3640 3641typedef struct { 3642 enum XML_FeatureEnum feature; 3643 XML_LChar *name; 3644 long int value; 3645} XML_Feature; 3646</pre> 3647 <div class="fcndef"> 3648 <p> 3649 Returns a list of "feature" records, providing details on how Expat was 3650 configured at compile time. Most applications should not need to worry about 3651 this, but this information is otherwise not available from Expat. This function 3652 allows code that does need to check these features to do so at runtime. 3653 </p> 3654 3655 <p> 3656 The return value is an array of <code>XML_Feature</code>, terminated by a 3657 record with a <code>feature</code> of <code>XML_FEATURE_END</code> and 3658 <code>name</code> of <code>NULL</code>, identifying the feature-test macros 3659 Expat was compiled with. Since an application that requires this kind of 3660 information needs to determine the type of character the <code>name</code> 3661 points to, records for the <code>XML_FEATURE_SIZEOF_XML_CHAR</code> and 3662 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the beginning of 3663 the list, followed by <code>XML_FEATURE_UNICODE</code> and 3664 <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at all. 3665 </p> 3666 3667 <p> 3668 Some features have an associated value. If there isn't an associated value, the 3669 <code>value</code> field is set to 0. At this time, the following features have 3670 been defined to have values: 3671 </p> 3672 3673 <dl> 3674 <dt> 3675 <code>XML_FEATURE_SIZEOF_XML_CHAR</code> 3676 </dt> 3677 3678 <dd> 3679 The number of bytes occupied by one <code>XML_Char</code> character. 3680 </dd> 3681 3682 <dt> 3683 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> 3684 </dt> 3685 3686 <dd> 3687 The number of bytes occupied by one <code>XML_LChar</code> character. 3688 </dd> 3689 3690 <dt> 3691 <code>XML_FEATURE_CONTEXT_BYTES</code> 3692 </dt> 3693 3694 <dd> 3695 The maximum number of characters of context which can be reported by 3696 <code><a href="#XML_GetInputContext">XML_GetInputContext</a></code>. 3697 </dd> 3698 </dl> 3699 </div> 3700 3701 <h4 id="XML_FreeContentModel"> 3702 XML_FreeContentModel 3703 </h4> 3704 3705 <pre class="fcndec"> 3706void XMLCALL 3707XML_FreeContentModel(XML_Parser parser, XML_Content *model); 3708</pre> 3709 <div class="fcndef"> 3710 Function to deallocate the <code>model</code> argument passed to the 3711 <code>XML_ElementDeclHandler</code> callback set using <code><a href= 3712 "#XML_SetElementDeclHandler">XML_ElementDeclHandler</a></code>. This function 3713 should not be used for any other purpose. 3714 </div> 3715 3716 <p> 3717 The following functions allow external code to share the memory allocator an 3718 <code>XML_Parser</code> has been configured to use. This is especially useful for 3719 third-party libraries that interact with a parser object created by application 3720 code, or heavily layered applications. This can be essential when using 3721 dynamically loaded libraries which use different C standard libraries (this can 3722 happen on Windows, at least). 3723 </p> 3724 3725 <h4 id="XML_MemMalloc"> 3726 XML_MemMalloc 3727 </h4> 3728 3729 <pre class="fcndec"> 3730void * XMLCALL 3731XML_MemMalloc(XML_Parser parser, size_t size); 3732</pre> 3733 <div class="fcndef"> 3734 Allocate <code>size</code> bytes of memory using the allocator the 3735 <code>parser</code> object has been configured to use. Returns a pointer to the 3736 memory or <code>NULL</code> on failure. Memory allocated in this way must be 3737 freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>. 3738 </div> 3739 3740 <h4 id="XML_MemRealloc"> 3741 XML_MemRealloc 3742 </h4> 3743 3744 <pre class="fcndec"> 3745void * XMLCALL 3746XML_MemRealloc(XML_Parser parser, void *ptr, size_t size); 3747</pre> 3748 <div class="fcndef"> 3749 Allocate <code>size</code> bytes of memory using the allocator the 3750 <code>parser</code> object has been configured to use. <code>ptr</code> must 3751 point to a block of memory allocated by <code><a href= 3752 "#XML_MemMalloc">XML_MemMalloc</a></code> or <code>XML_MemRealloc</code>, or be 3753 <code>NULL</code>. This function tries to expand the block pointed to by 3754 <code>ptr</code> if possible. Returns a pointer to the memory or 3755 <code>NULL</code> on failure. On success, the original block has either been 3756 expanded or freed. On failure, the original block has not been freed; the caller 3757 is responsible for freeing the original block. Memory allocated in this way must 3758 be freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>. 3759 </div> 3760 3761 <h4 id="XML_MemFree"> 3762 XML_MemFree 3763 </h4> 3764 3765 <pre class="fcndec"> 3766void XMLCALL 3767XML_MemFree(XML_Parser parser, void *ptr); 3768</pre> 3769 <div class="fcndef"> 3770 Free a block of memory pointed to by <code>ptr</code>. The block must have been 3771 allocated by <code><a href="#XML_MemMalloc">XML_MemMalloc</a></code> or 3772 <code>XML_MemRealloc</code>, or be <code>NULL</code>. 3773 </div> 3774 3775 <hr /> 3776 3777 <div class="footer"> 3778 Found a bug in the documentation? <a href= 3779 "https://github.com/libexpat/libexpat/issues">Please file a bug report.</a> 3780 </div> 3781 </div> 3782 </body> 3783</html>