util_hash

Utilities for hashing string and files.

disseminate.builders.deciders.utils_hash.hash_file(filepath, chunk_size, hashfunc)

Create a unique hash for the file contents of the given filepath.

Parameters
filepathpathlib.Path

The filepath of the file whose contents will be hashed.

chunk_sizeOptional[int]

When reading the contents of files (from pathlib.Path items), read the files in the given number of chunk bytes.

hashfuncOptional[func]

The type of hash to use.

Returns
hashbytes

The hash bytes.

disseminate.builders.deciders.utils_hash.hash_items(*items, chunk_size=4096, hashfunc=<built-in function openssl_md5>, sort=True)

Create a unique text string hash from the given item objects.

Parameters
*itemsTuple[obj, str, bytes, pathlib.Path]

Items to use in calculating the hash.

chunk_sizeOptional[int]

When reading the contents of files (from pathlib.Path items), read the files in the given number of chunk bytes.

hashfuncOptional[func]

The type of hash to use.

sortOptional[bool]

If True, sort the items before calculating the hash. Enabling this option ensures that the items order does not change the hash.

Returns
hashdigeststr

The hash digest string.

disseminate.builders.deciders.utils_hash.hash_pdf(filepath, chunk_size, hashfunc, startswith=(b'/CreationDate', b'/ModDate', b'/ID'))

Create a unique hash for the file contents of the given pdf filepath.

PDF files are parsed differently because they contain metadata on the date created, which may not reflect a change in the actual content of the file.

Parameters
filepathpathlib.Path

The filepath of the file whose contents will be hashed.

chunk_sizeOptional[int]

When reading the contents of files (from pathlib.Path items), read the files in the given number of chunk bytes.

hashfuncOptional[func]

The type of hash to use.

startswithOptional[Tuple[str]]

Lines that start with the given bytes will be ignored in the hash.

Returns
hashbytes

The hash bytes.

disseminate.builders.deciders.utils_hash.line_splitter(file, newline, chunk_size, tail=None)

Given a file object, read its contents in chunks into lines without breaking newlines.