util_hash¶
Utilities for hashing string and files.
- disseminate.builders.deciders.utils_hash.hash_file(filepath, chunk_size, hashfunc)¶
Create a unique hash for the file contents of the given filepath.
- Parameters
- filepath
pathlib.Path
The filepath of the file whose contents will be hashed.
- chunk_sizeOptional[int]
When reading the contents of files (from
pathlib.Path
items), read the files in the given number of chunk bytes.- hashfuncOptional[func]
The type of hash to use.
- filepath
- Returns
- hashbytes
The hash bytes.
- disseminate.builders.deciders.utils_hash.hash_items(*items, chunk_size=4096, hashfunc=<built-in function openssl_md5>, sort=True)¶
Create a unique text string hash from the given item objects.
- Parameters
- *itemsTuple[obj, str, bytes,
pathlib.Path
] Items to use in calculating the hash.
- chunk_sizeOptional[int]
When reading the contents of files (from
pathlib.Path
items), read the files in the given number of chunk bytes.- hashfuncOptional[func]
The type of hash to use.
- sortOptional[bool]
If True, sort the items before calculating the hash. Enabling this option ensures that the items order does not change the hash.
- *itemsTuple[obj, str, bytes,
- Returns
- hashdigeststr
The hash digest string.
- disseminate.builders.deciders.utils_hash.hash_pdf(filepath, chunk_size, hashfunc, startswith=(b'/CreationDate', b'/ModDate', b'/ID'))¶
Create a unique hash for the file contents of the given pdf filepath.
PDF files are parsed differently because they contain metadata on the date created, which may not reflect a change in the actual content of the file.
- Parameters
- filepath
pathlib.Path
The filepath of the file whose contents will be hashed.
- chunk_sizeOptional[int]
When reading the contents of files (from
pathlib.Path
items), read the files in the given number of chunk bytes.- hashfuncOptional[func]
The type of hash to use.
- startswithOptional[Tuple[str]]
Lines that start with the given bytes will be ignored in the hash.
- filepath
- Returns
- hashbytes
The hash bytes.
- disseminate.builders.deciders.utils_hash.line_splitter(file, newline, chunk_size, tail=None)¶
Given a file object, read its contents in chunks into lines without breaking newlines.