How to read Python package metadata without installation? -
i have python program sort of wrapper around pip
use assist development of python packages. problem face how read metadata such name , version of package (generally '.tar.gz' , '.whl' archives) without installation. can distutils
or other tool this?
just few notes... code written python 3, working sorts of python packages such sdist, bdist_wheel both py2 , py3. i'm concerned local packages have path to, not theoretical packages available on pypi.
what i'm doing works fine, seems pretty messy , i'm wondering if there better tool can abstract this. right reading metadata text file within archive , manually parsing out fields need. if fails, stripping name , version out of package's file name (really terrible) . there better way this? here 2 functions using parse package name , version.
update
simeon, thank suggestion use metadata.json file contained within wheel archives. i'm not familiar of files contained within archives had hoped there nice way parse of them. metadata.json meets criteria wheels. i'm going leave question open little longer see if there other suggestions before accepting.
anyways, in case encounters issue in future, i've attached updated code. can illustrated cleaner class, have now. isn't super ruggedized edge cases, buyer beware , that.
import tarfile, zipfile def getmetapath(afo): """ return path metadata file within tarfile or zipfile object. tarfile: pkg-info zipfile: metadata.json """ if isinstance(afo, tarfile.tarfile): pkgname = afo.fileobj.name path in afo.getnames(): if path.endswith('/pkg-info'): return path elif isinstance(afo, zipfile.zipfile): pkgname = afo.filename path in afo.namelist(): if path.endswith('.dist-info/metadata.json'): return path try: raise attributeerror("unable identify metadata file '{0}'".format(pkgname)) except nameerror: raise attributeerror("unable identify archive's metadata file") def getmetafield(pkgpath, field): """ return value of field package metadata file. whenever possible, version fields returned version object. i.e. getmetafield('/path/to/archive-0.3.tar.gz', 'name') ==> 'archive' """ wrapper = str if field.casefold() == 'version': try: # attempt use version object (able perform comparisons) distutils.version import looseversion wrapper except importerror: pass # package tar archive if pkgpath.endswith('.tar.gz'): tarfile.open(pkgpath) tfo: tfo.extractfile(getmetapath(tfo)) mfo: metalines = mfo.read().decode().splitlines() line in metalines: if line.startswith(field.capitalize() + ': '): return wrapper(line.split(': ')[-1]) # package wheel (zip) archive elif pkgpath.endswith('.whl'): import json zipfile.zipfile(pkgpath) zfo: metadata = json.loads(zfo.read(getmetapath(zfo)).decode()) try: return wrapper(metadata[field.lower()]) except keyerror: pass raise exception("unable extract field '{0}' package '{1}'". \ format(field, pkgpath))
the situation not great , that's why wheel files created. if had support wheel files clean code approach remain bit messy long have support *.tar.gz
source packages.
the file format of wheels specified in pep 427 can both parse filename information , read contents of <package>-<version>.dist-info
directory inside. in particular metadata.json
, metadata
useful. in fact, reading metadata.json
sufficient , lead clean code access information without installing.
i refactor code work metadata.json
, implement best-effort approach pkg-info
of source packages. long-term plan convert tar.gz
source packages wheels , remove outdated code pkg-info
parsing.
Comments
Post a Comment