python - Overwriting previously extracted files instead of creating new ones -


there few libraries used extract archive files through python, such gzip, zipfile library, rarfile, tarfile, patool etc. found 1 of libraries (patool) useful due cross-format feature in sense can extract type of archive including popular ones such zip, gzip, tar , rar.

to extract archive file patool easy this:

patoolib.extract_archive( "archive.zip",outdir="folder1") 

where "archive.zip" path of archive file , "folder1" path of directory extracted file stored.

the extracting works fine. problem if run same code again exact same archive file, identical extracted file stored in same folder different name (filename @ first run, filename1 @ second, filename11 @ third , on.

instead of this, need code overwrite extracted file if file under same name exists in directory.

this extract_archive function looks minimal - have these 2 parameters, verbosity parameter, , program parameter specifies program want extract archives with.

edits: nizam mohamed's answer documented extract_archive function overwriting output. found out partially true - function overwrites zip files, not gz files after. gz files, function still generates new files.

edits padraic cunningham's answer suggested using master source . so, downloaded code , replaced old patool library scripts scripts in link. here result:

os.listdir() out[11]: ['a.gz']  patoolib.extract_archive("a.gz",verbosity=1,outdir=".") patool: extracting a.gz ... patool: ... a.gz extracted `.'. out[12]: '.'  patoolib.extract_archive("a.gz",verbosity=1,outdir=".") patool: extracting a.gz ... patool: ... a.gz extracted `.'. out[13]: '.'  patoolib.extract_archive("a.gz",verbosity=1,outdir=".") patool: extracting a.gz ... patool: ... a.gz extracted `.'. out[14]: '.'  os.listdir() out[15]: ['a', 'a.gz', 'a1', 'a2'] 

so, again, extract_archive function creating new files everytime executed. file archived under a.gz has different name a actually.

as you've stated, patoolib intended generic archive tool.

various archive types can created, extracted, tested, listed, compared, searched , repacked patool. advantage of patool simplicity in handling archive files without having remember myriad of programs , options.

generic extract behaviour vs specific extract behaviour

the problem here extract_archive not expose ability modify underlying default behaviour of archive tool extensively.

for .zip extension, patoolib use unzip. can have desired behaviour of extracting archive passing -o option command line interface. i.e. unzip -o ... however, specific command line option unzip, , changes each archive utility.

for example tar offers overwrite option, no shortened command line equivalent zip. i.e. tar --overwrite tar -o not have intended effect.

to fix issue make feature request author, or use alternative library. unfortunately, mantra of patoolib require extending extract utility functions implement underlying extractors own overwrite command options.

example changes patoolib

in patoolib.programs.unzip

def extract_zip (archive, compression, cmd, verbosity, outdir, overwrite=false):     """extract zip archive."""     cmdlist = [cmd]     if verbosity > 1:         cmdlist.append('-v')     if overwrite:         cmdlist.append('-o')     cmdlist.extend(['--', archive, '-d', outdir])     return cmdlist 

in patoolib.programs.tar

def extract_tar (archive, compression, cmd, verbosity, outdir, overwrite=false):     """extract tar archive."""     cmdlist = [cmd, '--extract']     if overwrite:         cmdlist.append('--overwrite')     add_tar_opts(cmdlist, compression, verbosity)     cmdlist.extend(["--file", archive, '--directory', outdir])     return cmdlist 

it's not trivial change update every program, each program different!

monkey patching overwrite behavior

so you've decided not improve patoolib source code... can overwrite behaviour of extract_archive existing directory, remove it, call original extract_archive.

you include code in modules, if many modules require it, perhaps stick __init__.py

import os import patoolib shutil import rmtree   def overwrite_then_extract_archive(archive, verbosity=0, outdir=none, program=none):     if outdir:         if os.path.exists(outdir):             shutil.rmtree(outdir)     patoolib.extract_archive(archive, verbosity, outdir, program)  patoolib.extract_archive = overwrite_then_extract_archive 

now when call extract_archive() have functionality of overwrite_then_extract_archive().


Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

node.js - How to abort query on demand using Neo4j drivers -