Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util package
Submodules
Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download module
- class Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download.googleimagesdownload
Bases:
object
Class for google image downloads
- build_search_url(search_term, params, url, similar_images, specific_site, safe_search)
Building main search of URL
- Parameters:
search_term – Search terms included
params – Additional parameters
url – URL to search
similar_images – Check for similar images
specific_site – specifies a specific site to serach
safe_search – boolan to determine if safe search on
- Returns:
- build_url_parameters(arguments)
Building URL parameters
- Parameters:
arguments – Constructs URL parameters for search
- Returns:
- create_directories(main_directory, dir_name, thumbnail, thumbnail_only)
Function to make directories
- Parameters:
main_directory – Main directory where files will be stored
dir_name – sub directory name
thumbnail – thumbnail of image
thumbnail_only – Selects if only a thumbnail is stored
- Returns:
- download(arguments)
Bulk download of files based on arguments
- Parameters:
arguments – Dictionary of arguments to consider
- Returns:
- download_executor(arguments)
Function that downloads files based on arguments
- Parameters:
arguments – Dictionary of arguments to consider
- Returns:
- download_extended_page(url, chromedriver)
Downloads the page for more than 100 images
- Parameters:
url – Webpage to download
chromedriver – version of chrome drives
- Returns:
webpage information as JSON
- download_image(image_url, image_format, main_directory, dir_name, count, print_urls, socket_timeout, prefix, print_size, no_numbering, no_download, save_source, img_src, silent_mode, thumbnail_only, format, ignore_urls)
Function to download images
- Parameters:
image_url – URL where the images are located
image_format – Format that the image is saved as
main_directory – main directory where folders are located
dir_name – subdirectory where images are saved
count – number of images to save
print_urls – Boolian to print URLS
socket_timeout – time before process timeouts
prefix – prefix to add to the files
print_size – prints the file size
no_numbering – selects if you number the files
no_download – selects if you download the files
save_source – selects if you save the webpage source
img_src – save the direction to the image source
silent_mode – sets the download operation to silent mode
thumbnail_only – saves the thumbnail only
format – Sets the format that files will be saved
ignore_urls – Sets what URLS to ignore based on keywords
- Returns:
Information about the download
- download_image_thumbnail(image_url, main_directory, dir_name, return_image_name, print_urls, socket_timeout, print_size, no_download, save_source, img_src, ignore_urls)
Function to download image thumbnails
- Parameters:
image_url – URL for file
main_directory – Main directory where files will be stored
dir_name – sub directory where files will be stored
return_image_name – name of image that will be saved
print_urls – Selects if you print the URLs when scraping
socket_timeout – Time before a process timesout
print_size – Boolian to print the size of the files
no_download – Function to check the process without downloading files
save_source – Save the source file
img_src – Source of the image
ignore_urls – Specific urls to ignore
- Returns:
- download_page(url)
Downloads the entire content of a webpage
- Parameters:
url – URL where the webpage is located
- Returns:
Webpage information
- file_size(file_path)
Measures the files size
- Parameters:
file_path – Path where data is stored
- Returns:
Size of the files
- format_object(object)
Formats the object in a readable format
- Parameters:
object – Raw object from web
- Returns:
Dictionary containing formatted object
- get_all_tabs(page)
Finding ‘Next Image’ from the given raw page
- Parameters:
page – URL for the page
- Returns:
the tabs that are looked at
- get_next_tab(s)
Finding ‘Next Image’ from the given raw page
- Parameters:
s – image id
- Returns:
information about the image
- keywords_from_file(file_name)
Keywords from file
- Parameters:
file_name – Name of the file to search
- Returns:
types of files to include in the search
- repair(brokenjson)
function that helps repair bad JSON files
- Parameters:
brokenjson – JSON file
- Returns:
Fixed JSON file
- replace_with_byte(match)
Correcting the escape characters for python 2
- Parameters:
match – charcters to fix
- Returns:
fixed character
- similar_images(similar_images)
Function that deals with similar images
- Parameters:
similar_images – urls
- Returns:
list without similar images
- single_image(image_url)
Function to download a single image
- Parameters:
image_url – URL for image
- Returns:
Encoded image
- Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download.main()
Main program
- Returns:
- Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.google_images_download.user_input()
Parser for the user inputs
- Returns:
Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_collation module
- Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_collation.image_collection(path, pattern='*.jpg')
Tool to search folders for image files to project.
- Parameters:
path – sets the path where to search for images
pattern – sets the pattern to search for. Can use wildcards
- Returns:
- class Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_collation.image_dataset(images, transform=None, viz=Compose( Resize(size=(224, 224), interpolation=bilinear) ToTensor() ))
Bases:
Dataset
Builds a Pytorch Dataset
Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_scraping module
- Recursive_Symmetry_Aware_Materials_Microstructure_Explorer.util.image_scraping.download_images_from_google(names, path, num=25, verbose=True)
Tool to download files from google image search based on search criteria
- Parameters:
names – list of strings to search
path – path where files will be saved
num – number of images to download in each catagory
verbose – True makes the function print intermediate actions
- Returns: