Download utilities
copy_configs_tocwd
copy_configs_tocwd ()
Copy configs directory from package to current directory.
find_filepaths
find_filepaths (root_folder:str)
Recursively finds all files.
Type | Details | |
---|---|---|
root_folder | str | directory |
Returns | Tuple | sorted filepaths and length of filepaths |
get_new_name
get_new_name (dir_list:list)
Return dict with old name and new name of files in multiple directories.
{‘data/1_extracted/dataset1/Basalt/14.jpg’: ‘data/2_processed/Basalt/dataset1_01_Basalt_14.jpg’}
Type | Details | |
---|---|---|
dir_list | list | list of dir paths |
Returns | dict | {old_name: new_name} |
move_to_processed
move_to_processed ()
Combine files with same subclass and moves them to the subclass under data/2_processed.
Uses get_new_name
to create new names of files and then rename them and copy to data/2_processed.
move_bad_files
move_bad_files (txt_file, dest, text)
Move files in txt_file to dest.
Type | Details | |
---|---|---|
txt_file | file | text file with path of bad images |
dest | type | target destination |
text |
timer_func
timer_func (func)
Show the execution time of the function object passed.
Type | Details | |
---|---|---|
func | function | function |
timer_func..wrap_func
timer_func.<locals>.wrap_func (*args, **kwargs)
remove_unsupported_images
remove_unsupported_images (root_folder:str)
Remove unsupported images.
Type | Details | |
---|---|---|
root_folder | str | Root Folder. |
clean_images
clean_images (cfg)
Remove bad, misclassified, duplicate, corrupted and unsupported images.
Type | Details | |
---|---|---|
cfg | cfg (omegaconf.DictConfig) | Hydra Configuration |
move_and_rename
move_and_rename (class_dir:str)
Move files from class_dir to tmp, renames them there based on count, and moves back to 2_processed class_dir: A class dir of supporting classes (Marble, Coal, …), which contains image files.
Type | Details | |
---|---|---|
class_dir | str | Class directory that contains folders of classes containing images |
move_files
move_files (src_dir:str, dest_dir:str='data/2_processed/tmp')
Move files to tmp directory in 2_processed.
src_dir: directory of rock subclass with files [Basalt, Marble, Coal, …]
Type | Default | Details | |
---|---|---|---|
src_dir | str | Source Directory path | |
dest_dir | str | data/2_processed/tmp | Destination Directory path, by default “data/2_processed/tmp” |
rename_files
rename_files (source_dir:str='data/2_processed/tmp')
Rename files in classes and moves to 2_processed.
Type | Default | Details | |
---|---|---|---|
source_dir | str | data/2_processed/tmp | Directory, by default “data/2_processed/tmp” |
get_tfds_from_dir
get_tfds_from_dir (cfg)
Convert directory of images to tfds dataset.
Type | Details | |
---|---|---|
cfg | cfg (omegaconf.DictConfig): | Hydra Configuration |
Returns | Tuple | Tuple containing 3 Tensorflow Datasets |
prepare
prepare (ds, cfg, shuffle=False, augment=False)
Prepare dataset using augment, preprocess, cache, shuffle and prefetch.
Type | Default | Details | |
---|---|---|---|
ds | Dataset | Tensorflow Dataset | |
cfg | cfg (omegaconf.DictConfig): | Hydra Configuration | |
shuffle | bool | False | shuffle parameter, by default False |
augment | bool | False | augment parameter, by default False |
Returns | Dataset | Tensorflow Dataset preprocessed, shuffled, augmented and batched |
get_preprocess
get_preprocess (cfg)
Return preprocess function for particular model.
Type | Details | |
---|---|---|
cfg | cfg (omegaconf.DictConfig) | Hydra Configuration |
Returns | type | description |
get_value_counts
get_value_counts (dataset_path:str, column:str='file_type')
Get value counts of passed column.
Type | Default | Details | |
---|---|---|---|
dataset_path | str | directory with subclasses | |
column | str | file_type | column name |
Returns | None |
get_df
get_df (root:str='data/2_processed')
Return df with classes, image paths and file names.
Type | Default | Details | |
---|---|---|---|
root | str | data/2_processed | directory to scan for image files, by default “data/2_processed” |
Returns | DataFrame | with columns file_name, class and file_path |
sampling
sampling (cfg)
Oversamples/Undersample/No Sampling data into train, val, test.
Type | Details | |
---|---|---|
cfg | cfg (omegaconf.DictConfig) | Hydra Configuration |