[This article was first published on Omegahat Statistical Computing » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Hin-Tak Leung mailed me about a problem with certain malformed XML documents from FlowJo. There are namespace prefixes (prfx:nodeName) with no corresponding namespace declarations (xmlns:prefix=”uri”). How do we fix these? Well, the XML parser can read this but raises errors. We can do nice things to catch these errors and then post-process them. Then we can fix up the errors, add namespace declarations to the document and then re-parse the resulting document. Here is the code. It will make it into the XML package.
fixXMLNamespaces = # # call as # dd = fixXMLNamespaces("~/v75_step6.wsp", .namespaces = MissingNS) # or # dd = fixXMLNamespaces("~/v75_step6.wsp", gating = "http://www.crap.org", 'data-type' = "http://www.morecrap.org") # function(doc = "~/v75_step6.wsp", ..., .namespaces = list(...)) { # collect the error messages e = xmlErrorCumulator(, FALSE) doc = xmlParse(doc, error = e) if(length(e) == 0) return(doc) # find the ones that refer to prefixes that are not defined ns = grep("^Namespace prefix .* not defined", unique(environment(e)$messages), val = TRUE) ns = unique(gsub("Namespace prefix ([^ ]+) .*", "\\1", ns)) # now set those name spaces on the root of the document if(is(.namespaces, "list")) .namespaces = structure(as.character(unlist(.namespaces)), names = names(.namespaces)) uris = .namespaces[ns] if(length(uris)) { mapply(function(id, uri) newXMLNamespace(xmlRoot(doc), uri, id), names(uris), uris) xmlParse(saveXML(doc), asText = TRUE) } else doc }
(I’ve made some minor changes thanks to Hin-Tak’s suggestions, but haven’t tested them.)
To leave a comment for the author, please follow the link and comment on their blog: Omegahat Statistical Computing » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.