Whats this rock!
  • Getting Started
  • Tutorial
  • Resources
    • Telegram Rock Classifier Chatbot
    • Keras-CV
    • Tensorflow (tutorial)
    • nbdev (docs)
  • Help
    • Report an Issue

Download utilities

  • Download
    • Download dataset
    • Download utilities
  • Preprocess
    • Preprocess Data
  • Exploratory Data Analysis
    • Exploratory Analysis
  • Config Management
    • Hydra
  • Training models
    • Training
    • Training utilities
    • Training models
    • Callbacks
  • MLOps
    • Experiment Tracking
    • HyperParameter Tuning
    • Model Management
  • Telegram Bot
    • Telegram bot deployment

On this page

  • copy_configs_tocwd
  • find_filepaths
  • get_new_name
  • move_to_processed
  • move_bad_files
  • timer_func
  • timer_func..wrap_func
  • remove_unsupported_images
  • clean_images
  • move_and_rename
  • move_files
  • rename_files
  • get_tfds_from_dir
  • prepare
  • get_preprocess
  • get_value_counts
  • get_df
  • sampling

Report an issue

Download utilities

Utils for downloading and preprocessing data

source

copy_configs_tocwd

 copy_configs_tocwd ()

Copy configs directory from package to current directory.


source

find_filepaths

 find_filepaths (root_folder:str)

Recursively finds all files.

Type Details
root_folder str directory
Returns Tuple sorted filepaths and length of filepaths

source

get_new_name

 get_new_name (dir_list:list)

Return dict with old name and new name of files in multiple directories.

{‘data/1_extracted/dataset1/Basalt/14.jpg’: ‘data/2_processed/Basalt/dataset1_01_Basalt_14.jpg’}

Type Details
dir_list list list of dir paths
Returns dict {old_name: new_name}

source

move_to_processed

 move_to_processed ()

Combine files with same subclass and moves them to the subclass under data/2_processed.

Uses get_new_name to create new names of files and then rename them and copy to data/2_processed.


source

move_bad_files

 move_bad_files (txt_file, dest, text)

Move files in txt_file to dest.

Type Details
txt_file file text file with path of bad images
dest type target destination
text

source

timer_func

 timer_func (func)

Show the execution time of the function object passed.

Type Details
func function function

timer_func..wrap_func

 timer_func.<locals>.wrap_func (*args, **kwargs)

source

remove_unsupported_images

 remove_unsupported_images (root_folder:str)

Remove unsupported images.

Type Details
root_folder str Root Folder.

source

clean_images

 clean_images (cfg)

Remove bad, misclassified, duplicate, corrupted and unsupported images.

Type Details
cfg cfg (omegaconf.DictConfig) Hydra Configuration

source

move_and_rename

 move_and_rename (class_dir:str)

Move files from class_dir to tmp, renames them there based on count, and moves back to 2_processed class_dir: A class dir of supporting classes (Marble, Coal, …), which contains image files.

Type Details
class_dir str Class directory that contains folders of classes containing images

source

move_files

 move_files (src_dir:str, dest_dir:str='data/2_processed/tmp')

Move files to tmp directory in 2_processed.

src_dir: directory of rock subclass with files [Basalt, Marble, Coal, …]

Type Default Details
src_dir str Source Directory path
dest_dir str data/2_processed/tmp Destination Directory path, by default “data/2_processed/tmp”

source

rename_files

 rename_files (source_dir:str='data/2_processed/tmp')

Rename files in classes and moves to 2_processed.

Type Default Details
source_dir str data/2_processed/tmp Directory, by default “data/2_processed/tmp”

source

get_tfds_from_dir

 get_tfds_from_dir (cfg)

Convert directory of images to tfds dataset.

Type Details
cfg cfg (omegaconf.DictConfig): Hydra Configuration
Returns Tuple Tuple containing 3 Tensorflow Datasets

source

prepare

 prepare (ds, cfg, shuffle=False, augment=False)

Prepare dataset using augment, preprocess, cache, shuffle and prefetch.

Type Default Details
ds Dataset Tensorflow Dataset
cfg cfg (omegaconf.DictConfig): Hydra Configuration
shuffle bool False shuffle parameter, by default False
augment bool False augment parameter, by default False
Returns Dataset Tensorflow Dataset preprocessed, shuffled, augmented and batched

source

get_preprocess

 get_preprocess (cfg)

Return preprocess function for particular model.

Type Details
cfg cfg (omegaconf.DictConfig) Hydra Configuration
Returns type description

source

get_value_counts

 get_value_counts (dataset_path:str, column:str='file_type')

Get value counts of passed column.

Type Default Details
dataset_path str directory with subclasses
column str file_type column name
Returns None

source

get_df

 get_df (root:str='data/2_processed')

Return df with classes, image paths and file names.

Type Default Details
root str data/2_processed directory to scan for image files, by default “data/2_processed”
Returns DataFrame with columns file_name, class and file_path

source

sampling

 sampling (cfg)

Oversamples/Undersample/No Sampling data into train, val, test.

Type Details
cfg cfg (omegaconf.DictConfig) Hydra Configuration