Site icon R-bloggers

Omegahat Statistical Computing » R 2010-03-16 19:28:40

[This article was first published on Omegahat Statistical Computing » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we can fix up the errors, add namespace declarations to the document and then re-parse the resulting document. Here is the code. It will make it into the XML package.

fixXMLNamespaces =
  #
  #  call as
  #    dd = fixXMLNamespaces("~/v75_step6.wsp", .namespaces = MissingNS)
  #  or
  #   dd = fixXMLNamespaces("~/v75_step6.wsp", gating = "http://www.crap.org", 'data-type' = "http://www.morecrap.org")
  #
function(doc = "~/v75_step6.wsp", ..., .namespaces = list(...)) 
{
    # collect the error messages
  e = xmlErrorCumulator(, FALSE)
  doc = xmlParse(doc, error = e)

  if(length(e) == 0)
     return(doc)

     # find the ones that refer to prefixes that are not defined
  ns = grep("^Namespace prefix .* not defined", unique(environment(e)$messages), val = TRUE)
  ns = unique(gsub("Namespace prefix ([^ ]+) .*", "\\1", ns))

    # now set those name spaces on the root of the document
  if(is(.namespaces, "list"))
    .namespaces = structure(as.character(unlist(.namespaces)), names = names(.namespaces))

  uris = .namespaces[ns]
  if(length(uris)) {
     mapply(function(id, uri)
              newXMLNamespace(xmlRoot(doc), uri, id),
            names(uris), uris)
     xmlParse(saveXML(doc), asText = TRUE)
  } else
     doc
}

(I’ve made some minor changes thanks to Hin-Tak’s suggestions, but haven’t tested them.)


To leave a comment for the author, please follow the link and comment on their blog: Omegahat Statistical Computing » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.