amazon ec2 - How to composite large images stored on S3 in ImageMagick from EC2 instances? -
i have ongoing list of image processing tasks do, using imagemagick composite large individual graphic files (20mb each). these images stored on s3 (approximately 2.5gb in total).
i thinking use multiple ec2 instances process tasks, composite images , upload output file s3.
the problem setup imagemagick needs file library local (on machine). images on s3, means each instance need download copy of images s3, slowing down whole process.
what's best way share image library nodes?
consider following points:
you can processing of imagemagick files in memory "saving" input image in special format
mpr:
(magick pixel register). details see answer: "imagemagick multiple operations in single invocation"imagemagick can access remote images via
http://
.you can put lot of imagemagick's operations 1 single command line can produce multiple output files, , can segment command line sub- or side-processes using parentheses syntax:
... \( im side process \) ...
sub-/side-processes.
how can streamline overall process depends lot want do. however,
- the
mpr:
/mpc:
technique can useful , avoid or minimize need use multiple ec2 instances; - you cannot around step somehow ship input pixels instance of imagemagick should process them (so "downloading copy" have occur);
- you can minimize number of downloads storing input under series of
mpr:xy1
,mpr:xy2
etc. labels in memory , access these multiple times fast long , constructed imagemagick command line number of compositions want.
example
to give example. consider having 10 tiffs, , want create 3 different pdf files these tiffs, each pdf containing different set of pages made 10 tiffs. run 3 commands:
convert 1.tif 3.tif 4.tif 8.tif 9.tif 10.tif -compress jpeg -quality 70 1out1.pdf convert 2.tif 3.tif 4.tif 7.tif 8.tif 9.tif -compress jpeg -quality 70 1out2.pdf convert 3.tif 4.tif 5.tif 7.tif 8.tif 10.tif -compress jpeg -quality 70 1out3.pdf
these 3 commands have load 6 tiff files each (some tiffs, 3.tif
being used in 3 commands). 18 i/o events.
now consider command alternative, run faster (i believe):
convert \ 1.tif +write mpr:t1 +delete \ 2.tif +write mpr:t2 +delete \ 3.tif +write mpr:t3 +delete \ 4.tif +write mpr:t4 +delete \ 5.tif +write mpr:t5 +delete \ 6.tif +write mpr:t6 +delete \ 7.tif +write mpr:t7 +delete \ 8.tif +write mpr:t8 +delete \ 9.tif +write mpr:t9 +delete \ 10.tif +write mpr:t10 +delete \ \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10 \ -compress jpeg -quality 70 +write 2out1.pdf \) \ \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9 \ -compress jpeg -quality 70 +write 2out2.pdf \) \ \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10 \ -compress jpeg -quality 70 +write 2out3.pdf \) \ null:
this command loads each of 10 tiffs once (10 i/o events in total). writes each tiff mpr:
file appropriate label , deletes initial tiff image sequence.
after initial preparation imagemagick run 3 different, parenthese-d side-processing pipelines in sequence loading required output pages mpr:
images, , create pdf each of them.
above example limited in order demonstrate measurable advantage using mpr:
. because same results can achieved command:
convert \ 1.tif \ 2.tif \ 3.tif \ 4.tif \ 5.tif \ 6.tif \ 7.tif \ 8.tif \ 9.tif \ 10.tif \ \( -clone 0,2-3,7-9 -compress jpeg -quality 70 +write 3out1.pdf \) \ \( -clone 1-3,6-8 -compress jpeg -quality 70 +write 3out2.pdf \) \ \( -clone 2-4,6-7,9 -compress jpeg -quality 70 +write 3out3.pdf \) \ null:
however, there 1 more hook performance win may acquired: -compress jpeg -quality 70
applied 3 times 6 (cloned, original) images each.
there may cpu cycles saved if apply operation tiffs before written mpr registers. way apply operation 10 tiffs. later not need apply more when write out pdfs:
convert \ -respect-parentheses \ 1.tif -compress jpeg -quality 70 +write mpr:t1 +delete \ 2.tif -compress jpeg -quality 70 +write mpr:t2 +delete \ 3.tif -compress jpeg -quality 70 +write mpr:t3 +delete \ 4.tif -compress jpeg -quality 70 +write mpr:t4 +delete \ 5.tif -compress jpeg -quality 70 +write mpr:t5 +delete \ 6.tif -compress jpeg -quality 70 +write mpr:t6 +delete \ 7.tif -compress jpeg -quality 70 +write mpr:t7 +delete \ 8.tif -compress jpeg -quality 70 +write mpr:t8 +delete \ 9.tif -compress jpeg -quality 70 +write mpr:t9 +delete \ 10.tif -compress jpeg -quality 70 +write mpr:t10 +delete \ \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10 4out1.pdf \) \ \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9 4out2.pdf \) \ \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10 4out3.pdf \) \ null:
update
mark setchell's comment spot on. had overlooked before mentioned it. faster (and less type) run command this:
convert \ -respect-parentheses \ -compress jpeg -quality 70 \ 1.tif +write mpr:t1 +delete \ 2.tif +write mpr:t2 +delete \ 3.tif +write mpr:t3 +delete \ 4.tif +write mpr:t4 +delete \ 5.tif +write mpr:t5 +delete \ 6.tif +write mpr:t6 +delete \ 7.tif +write mpr:t7 +delete \ 8.tif +write mpr:t8 +delete \ 9.tif +write mpr:t9 +delete \ 10.tif +write mpr:t10 +delete \ \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10 5out1.pdf \) \ \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9 5out2.pdf \) \ \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10 5out3.pdf \) \ null:
you'll have run own benchmarks, own images, in own environment, though, if want decide whichever of proposed commands should prefer.
Comments
Post a Comment