regex - Generic solution for removing xml declararation using perl -
hi want remove declaration in xml file , problem declaration embed root element.
xml looks follows
case1:
<?xml version="1.0" encoding="utf-8"?> <document> document root <child>----</child> </document>`
case 2:
<?xml version="1.0" encoding="utf-8"?> <document> document root <child>----</child> </document>`
function should work case when root node in next line.
my function works case 2..
sub getxmldata { ($xml) = @_; @data = (); open(file,"<$xml"); while(<file>) { chomp; if(/\<\?xml\sversion/) {next;} push(@data, $_); } close(file); return join("\n",@data);
}
*** please note encoding not constant always.
ok, problem here - you're trying parse xml line based, , doesn't work. should avoid doing it, because makes brittle code, 1 day break - you've noted - valid changes source xml. both documents semantically identical, fact code handles 1 , not other example of why doing xml way bad idea.
more importantly though - why trying remove xml declaration xml? trying accomplish?
generically reformatting xml can done this:
#!/usr/bin/perl use strict; use warnings; use xml::twig; $twig = xml::twig->new( pretty_print => 'indented', ); $twig->parsefile('your_xml_file'); $twig->print;
this parse xml , reformat in one of valid ways xml may formatted. urge not discard xml declaration, , instead carry on xml::twig
process it. (open new question you're trying accomplish, , i'll happily give solution doesn't trip different valid formats of xml).
when comes merging xml documents, xml::twig
can - , still check , validate xml goes.
so might (extending above):
foreach $file ( @file_list ) { $child = xml::twig -> new (); $child -> parsefile ( $xml_file ); $child_doc = $child -> root -> cut; $child_doc -> paste ( $twig -> root ); } $twig -> print;
exactly you'd need do, depends little on desired output structure - you'd need 'wrap' in root element anyway. open new question sample input , desired output, , i'll happily take crack @ it.
as example - if feed above sample input twice, get:
<?xml version="1.0" encoding="utf-8"?> <document><document> document root <child>----</child></document> document root <child>----</child></document>
which know isn't want, illustrates parser based way of xml restructuring.
Comments
Post a Comment