Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util package

Submodules

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download module

class Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download.googleimagesdownload

Bases: object

Class for google image downloads

build_search_url(search_term, params, url, similar_images, specific_site, safe_search)

Building main search of URL

Parameters:
  • search_term – Search terms included

  • params – Additional parameters

  • url – URL to search

  • similar_images – Check for similar images

  • specific_site – specifies a specific site to serach

  • safe_search – boolan to determine if safe search on

Returns:

build_url_parameters(arguments)

Building URL parameters

Parameters:

arguments – Constructs URL parameters for search

Returns:

create_directories(main_directory, dir_name, thumbnail, thumbnail_only)

Function to make directories

Parameters:
  • main_directory – Main directory where files will be stored

  • dir_name – sub directory name

  • thumbnail – thumbnail of image

  • thumbnail_only – Selects if only a thumbnail is stored

Returns:

download(arguments)

Bulk download of files based on arguments

Parameters:

arguments – Dictionary of arguments to consider

Returns:

download_executor(arguments)

Function that downloads files based on arguments

Parameters:

arguments – Dictionary of arguments to consider

Returns:

download_extended_page(url, chromedriver)

Downloads the page for more than 100 images

Parameters:
  • url – Webpage to download

  • chromedriver – version of chrome drives

Returns:

webpage information as JSON

download_image(image_url, image_format, main_directory, dir_name, count, print_urls, socket_timeout, prefix, print_size, no_numbering, no_download, save_source, img_src, silent_mode, thumbnail_only, format, ignore_urls)

Function to download images

Parameters:
  • image_url – URL where the images are located

  • image_format – Format that the image is saved as

  • main_directory – main directory where folders are located

  • dir_name – subdirectory where images are saved

  • count – number of images to save

  • print_urls – Boolian to print URLS

  • socket_timeout – time before process timeouts

  • prefix – prefix to add to the files

  • print_size – prints the file size

  • no_numbering – selects if you number the files

  • no_download – selects if you download the files

  • save_source – selects if you save the webpage source

  • img_src – save the direction to the image source

  • silent_mode – sets the download operation to silent mode

  • thumbnail_only – saves the thumbnail only

  • format – Sets the format that files will be saved

  • ignore_urls – Sets what URLS to ignore based on keywords

Returns:

Information about the download

download_image_thumbnail(image_url, main_directory, dir_name, return_image_name, print_urls, socket_timeout, print_size, no_download, save_source, img_src, ignore_urls)

Function to download image thumbnails

Parameters:
  • image_url – URL for file

  • main_directory – Main directory where files will be stored

  • dir_name – sub directory where files will be stored

  • return_image_name – name of image that will be saved

  • print_urls – Selects if you print the URLs when scraping

  • socket_timeout – Time before a process timesout

  • print_size – Boolian to print the size of the files

  • no_download – Function to check the process without downloading files

  • save_source – Save the source file

  • img_src – Source of the image

  • ignore_urls – Specific urls to ignore

Returns:

download_page(url)

Downloads the entire content of a webpage

Parameters:

url – URL where the webpage is located

Returns:

Webpage information

file_size(file_path)

Measures the files size

Parameters:

file_path – Path where data is stored

Returns:

Size of the files

format_object(object)

Formats the object in a readable format

Parameters:

object – Raw object from web

Returns:

Dictionary containing formatted object

get_all_tabs(page)

Finding ‘Next Image’ from the given raw page

Parameters:

page – URL for the page

Returns:

the tabs that are looked at

get_next_tab(s)

Finding ‘Next Image’ from the given raw page

Parameters:

s – image id

Returns:

information about the image

keywords_from_file(file_name)

Keywords from file

Parameters:

file_name – Name of the file to search

Returns:

types of files to include in the search

repair(brokenjson)

function that helps repair bad JSON files

Parameters:

brokenjson – JSON file

Returns:

Fixed JSON file

replace_with_byte(match)

Correcting the escape characters for python 2

Parameters:

match – charcters to fix

Returns:

fixed character

similar_images(similar_images)

Function that deals with similar images

Parameters:

similar_images – urls

Returns:

list without similar images

single_image(image_url)

Function to download a single image

Parameters:

image_url – URL for image

Returns:

Encoded image

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download.main()

Main program

Returns:

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download.user_input()

Parser for the user inputs

Returns:

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_collation module

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_collation.image_collection(path, pattern='*.jpg')

Tool to search folders for image files to project.

Parameters:
  • path – sets the path where to search for images

  • pattern – sets the pattern to search for. Can use wildcards

Returns:

class Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_collation.image_dataset(images, transform=None, viz=Compose(     Resize(size=(224, 224), interpolation=bilinear)     ToTensor() ))

Bases: Dataset

Builds a Pytorch Dataset

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_scraping module

Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_scraping.download_images_from_google(names, path, num=25, verbose=True)

Tool to download files from google image search based on search criteria

Parameters:
  • names – list of strings to search

  • path – path where files will be saved

  • num – number of images to download in each catagory

  • verbose – True makes the function print intermediate actions

Returns:

Module contents