这些图片尺寸并不相同,因为血涂片和细胞图像是基于人、测试方法、图片方向不同而不同的。让我们总结我们的训练数据集的统计信息来决定最佳的图像尺寸(牢记,我们根本不会碰测试数据集)。
import cv2 from concurrent import futures import threading -
def get_img_shape_parallel(idx, img, total_imgs): if idx % 5000 == 0 or idx == (total_imgs - 1): print('{}: working on img num: {}'.format(threading.current_thread().name, idx)) return cv2.imread(img).shape ex = futures.ThreadPoolExecutor(max_workers=None) data_inp = [(idx, img, len(train_files)) for idx, img in enumerate(train_files)] print('Starting Img shape computation:') train_img_dims_map = ex.map(get_img_shape_parallel, [record[0] for record in data_inp], [record[1] for record in data_inp], [record[2] for record in data_inp]) train_img_dims = list(train_img_dims_map) print('Min Dimensions:', np.min(train_img_dims, axis=0)) print('Avg Dimensions:', np.mean(train_img_dims, axis=0)) print('Median Dimensions:', np.median(train_img_dims, axis=0)) print('Max Dimensions:', np.max(train_img_dims, axis=0)) -
-
# Output Starting Img shape computation: ThreadPoolExecutor-0_0: working on img num: 0 ThreadPoolExecutor-0_17: working on img num: 5000 ThreadPoolExecutor-0_15: working on img num: 10000 ThreadPoolExecutor-0_1: working on img num: 15000 ThreadPoolExecutor-0_7: working on img num: 17360 Min Dimensions: [46 46 3] Avg Dimensions: [132.77311215 132.45757733 3.] Median Dimensions: [130. 130. 3.] Max Dimensions: [385 394 3]
我们应用并行处理来加速图像读取,并且基于汇总统计结果,我们将每幅图片的尺寸重新调整到 125x125 像素。让我们载入我们所有的图像并重新调整它们为这些固定尺寸。
IMG_DIMS = (125, 125) -
def get_img_data_parallel(idx, img, total_imgs): if idx % 5000 == 0 or idx == (total_imgs - 1): print('{}: working on img num: {}'.format(threading.current_thread().name, idx)) img = cv2.imread(img) img = cv2.resize(img, dsize=IMG_DIMS, interpolation=cv2.INTER_CUBIC) img = np.array(img, dtype=np.float32) return img -
ex = futures.ThreadPoolExecutor(max_workers=None) train_data_inp = [(idx, img, len(train_files)) for idx, img in enumerate(train_files)] val_data_inp = [(idx, img, len(val_files)) for idx, img in enumerate(val_files)] test_data_inp = [(idx, img, len(test_files)) for idx, img in enumerate(test_files)] -
print('Loading Train Images:') train_data_map = ex.map(get_img_data_parallel, [record[0] for record in train_data_inp], [record[1] for record in train_data_inp], [record[2] for record in train_data_inp]) train_data = np.array(list(train_data_map)) -
print('\nLoading Validation Images:') val_data_map = ex.map(get_img_data_parallel, [record[0] for record in val_data_inp], [record[1] for record in val_data_inp], [record[2] for record in val_data_inp]) val_data = np.array(list(val_data_map)) -
print('\nLoading Test Images:') test_data_map = ex.map(get_img_data_parallel, [record[0] for record in test_data_inp], [record[1] for record in test_data_inp], [record[2] for record in test_data_inp]) test_data = np.array(list(test_data_map)) -
train_data.shape, val_data.shape, test_data.shape -
-
# Output Loading Train Images: ThreadPoolExecutor-1_0: working on img num: 0 ThreadPoolExecutor-1_12: working on img num: 5000 ThreadPoolExecutor-1_6: working on img num: 10000 ThreadPoolExecutor-1_10: working on img num: 15000 ThreadPoolExecutor-1_3: working on img num: 17360 -
Loading Validation Images: ThreadPoolExecutor-1_13: working on img num: 0 ThreadPoolExecutor-1_18: working on img num: 1928 -
Loading Test Images: ThreadPoolExecutor-1_5: working on img num: 0 ThreadPoolExecutor-1_19: working on img num: 5000 ThreadPoolExecutor-1_8: working on img num: 8267 ((17361, 125, 125, 3), (1929, 125, 125, 3), (8268, 125, 125, 3))
(编辑:ASP站长网)
|