PHP XPath Generator

In this post I’ll explain a clever way to automatically generate a list of XPath expressions that can be used to select any element and attribute in an XML document. We’ll use PHP’s SimpleXML extension to easily parse and traverse the XML document and to build a simple XPath generator.

I’ll start with a simple example XML document and the resulting output. Say that we are trying to import an XML feed with books:

<?xml version="1.0" encoding="UTF-8"?>
	<book type="paperback">
		<title lang="english">Snowcrash</title>
		<price currency="euro">3,99</price>
		<price currency="dollar">5,49</price>

Running this through our XPath generator function results in the following list of XPath expressions:


Generally you will only need the selectors for one particular element, in this case the book element. We’ll use an XPath expression to select this context element to limit the scope of our XPath generator.

An example of where this type of functionality might come in handy is when unknown XML data needs to be mapped into predefined fields such as columns of a database table or content types in a CMS. For example the Drupal Feeds module allows you to import XML data into existing Drupal content types. The source XML fields are mapped to the Drupal content type fields by means of user-defined XPath expressions. Currently the user has to specify these expressions by hand for each field. Wouldn’t it be nice to have a dropdown box available next to each field that contains a list of XPath expressions for all XML elements and attributes (especially when the user is not a progammer/XML expert)?

Here’s the full code, the comments should make it self-explanatory. If you are not familiar with the concept of recursion take a look at this article. In the code below the XML is supplied using the form POST method. You can try out a slightly modified version here.

$simpleXML = new SimpleXMLElement($_POST['XML']);          //create a SimpleXMLElement object from the XML document
$contextElements = $simpleXML->xpath($_POST['context']);   //select the context nodes
$contextElement = $contextElements[0];                     //grab the first context node
$xpathList = XMLElementToXPath($contextElement->getName(),$contextElement); 

foreach($xpathList as $xpath)
	echo $xpath."\n";

//recursive function that takes a SimpleXMLElement object and returns a list of XPath expressions
function XMLElementToXPath($currentXPath, $myXMLElement)
  $xpathList = array();
  $attributes = $myXMLElement->attributes();              //grab attributes from current root element

  foreach($attributes as $att_key=>$att_value)            //process element attributes (if any)
    $xpathList[] = ($currentXPath!=''?$currentXPath.'/':'').'@'. $att_key;  //create XPath for attribute
    $xpathList[] = ($currentXPath!=''?$currentXPath:'').'[@'. $att_key.'=\''.$att_value.'\']'; //create XPath expression for element with certain attribute value

  foreach($myXMLElement->children() as $childElement)     //process children (if any)
    $xpathList[]= ($currentXPath!=''?$currentXPath.'/':'').(string) $childElement->getName();  //create XPath expression for element
    if($childElement instanceof SimpleXMLElement)                              //if child is an XML node itself then go into recursion
      $xpathList = array_merge($xpathList,XMLElementToXPath(($currentXPath!=''?$currentXPath.'/':'').(string)$childElement->getName(),$childElement));
  return $xpathList;

As mentioned you can see a slightly modified version of the script in action here and you can download the source here. The generated list of XPath expressions isn’t complete, for example it doesn’t produce XPath expressions with combinations of multiple attributes. However this should be sufficient in most cases and can be easily extended when extra types of XPath expressions are needed. Feel free to drop any questions/comments below!

Leave a Reply


Next ArticleJQuery UI Themes