I am attempting to train StyleGAN3 on Google Cloud using the following setup:
2vCPU
13GB of RAM
Nvidia T4
PyTorch 1.13
CUDA 11.3
Python 3.10
My dataset consists of approximately 14,000 JPG images of dresses sketchs, which I converted to PNG format using Pillow Python script.
import os
from PIL import Image
def convert_images(input_folder, output_folder):
os.makedirs(output_folder, exist_ok=True)
for filename in os.listdir(input_folder):
if filename.endswith(".jpg"):
# Open the image
image_path = os.path.join(input_folder, filename)
image = Image.open(image_path)
# Convert to PNG format
new_filename = os.path.splitext(filename)[0] + ".png"
output_path = os.path.join(output_folder, new_filename)
image.save(output_path, "PNG")
print(f"Converted {filename} to {new_filename}")
print("Conversion completed.")
input_path = "/home/..../img"
output_path = "/home/..../converted_img"
convert_images(input_path, output_path)
I built the dataset using the following command:
python dataset_tool.py --source /home/..../converted_img --dest /home/..../dataset.zip --resolution=256x256
The training process starts with the following command:
python train.py --data=/home/..../dataset.zip --outdir=/home/..../training-runs --cfg=stylegan3-t --gpus=1 --batch=32 --gamma=2 --batch-gpu=16 --snap=10 --mirror=1 --workers=2
Initially, everything seems to work fine, but the training process stops earlier than expected, and the evaluation phase begins. Unfortunately, I have been stuck on the evaluation phase for about an hour probably due to a warning message.
I used --workers=1 but metrics still seems freeze so I have to disable them.
Additionally, I would like to share the training result I obtained.
The model is cleary collapsed.
Could you please provide guidance on how to best present the training result for further analysis and troubleshooting?
Thank you for your assistance.