Introduction
Typically, when I prepare to compose my next Substack post, the procedure involves formulating an idea or concept, evaluating whether the idea provides the variety I aim to offer my subscribers, and rejecting it if not. I then test a practical example of the technical concept, discarding it if it does not prove effective. Finally, I determine whether the idea is engaging and set it aside if it is not. It is uncommon for me to meet and exceed all these considerations, but this idea did.
In a previous post, I shared some code about video/image negatives with alpha. Initially, I wanted to explore why a “ray tracing” concept wouldn’t work for up-scaling an image. Although I understood theoretically why it wouldn’t work, I wanted a practical example to illustrate the limitations. Subsequently, I focused on algorithmic challenges related to converting brightness into alpha and using that data for image re-composition. Unfortunately, this did not allow me to delve into upscaling techniques.
For this post, I decided to focus on AI up-scaling, as my research for the previous post indicated that this is the standard approach today. I consulted ChatGPT to identify an AI model that balances ease of installation and configuration with effective results. It suggested Real-ESRGAN (pronounced real-ee-sir-guhn) as a strong candidate. I then created some sample code and tested it.
I was amazed at how robust the results were, even with questionable quality source material. This approach seems like an economical way to re-monetize decades of excellent content including television series, music videos, and films, by reintroducing it to a generation that may struggle to appreciate poor video quality or black bars on the sides of the image. However, I realize that the preference for watching long-form content on a large screen is in competition with smaller form factors today. Read on to see some results or put this to work for yourself using Python.
Choosing Source Material
I decided to try a clip from one of Star Trek: Deep Space Nine’s later seasons for a test for this. The Science Fiction genres tend to have the most vocal communities and outcry to the various studios that own the respective properties about remastering and re-releasing some of their older material in the library. The studios rarely move forward on this, due to their return on investment.
For some shows, this may be easy to do. Consider M*A*S*H, which has been a syndication staple for years. I own a DVD box set of this series, and my favorite feature is the ability to turn off the laugh track. The producers and directors of the show were never fans of CBS’s mandate to include a laugh track, so being able to watch episodes without canned laughter significantly enhances the viewing experience.
The M*A*S*H episode "Our Finest Hour" comes with a note explaining issues with the master copy due to its heavy circulation in syndication or the way it was produced. (This episode did not look great on the DVD copy). In 2018, M*A*S*H was quietly remastered to HD (reframed for 16:9) from the original film and made available on Hulu for streaming. “Our Finest Hour” was restored to its original quality during this remaster. In my opinion, this improved video quality for the entire series has greatly increased the rewatchability of this series for me, even though I’ve probably seen every episode several times already. This is very subjective, as many loyalists argue that the reframing for 16:9 takes away from the intent of the original directors.
The Twilight Zone (1959 – 1964) also received a remaster treatment in 2010. These episodes were not reframed from their original 4:3 to 16:9, but the film artifacts such as dirt, degradation, distortion, and film jitter, that some may recall from the syndicated version, were removed.
M*A*S*H and The Twilight Zone may have been a bit easier to resurrect, as the special effects were generally of a practical nature.
Transitioning to science fiction genres, each property has its distinct concerns and considerations. For example, fans of the original Star Wars (now known as A New Hope) often highlight the difficulty of obtaining a high-quality version of the film as it appeared in theaters in 1977. This means a version without the addition of "Episode IV" in the opening prologue and without the various CGI special-effect insertions. Similarly, the original late 1960s Star Trek series faced criticism from younger audiences for its dated and sometimes unconvincing special effects.
In 2007, Paramount saw a revenue opportunity by remastering the original Star Trek series and issuing Blu-Ray box sets featuring both the original film and new CGI effects. The decent sales of this remaster encouraged Paramount to initiate the remastering of Star Trek: The Next Generation. However, despite being a more recent series, the film masters were in disarray. Additionally, replacing the effects shots with new CGI created a mismatched aesthetic, which negated any benefit they added to the original series. After significant effort to remaster the filmed scenes and effects, The Next Generation was eventually re-released with all seasons on Blu-Ray box sets starting in 2012.
Many fans of Star Trek have wondered why similar remastering efforts have not been applied to Deep Space Nine or Voyager, both of which were produced near the end of the “standard definition” era. Deep Space Nine (DS9) presents a unique challenge, as the early seasons used models for special effects, while the later seasons relied on CGI. The studio argues that the commercial success of DS9 and Voyager was not as significant as The Next Generation. Additionally, investing in remastering these series would be a financial gamble, as the metric would be new subscribers to Paramount+ to watch the remastered series, rather than sales of physical or digital media to cover the costs (as was the case in the past).
Due to these challenges, dedicated fans are constantly searching for affordable options to use AI for up-scaling the best-quality masters available. There are numerous fan-uploaded videos on YouTube showcasing various efforts. Simply search for "Star Trek Remastered" (or any science-fiction property, to be honest) on YouTube to see examples. Comparing my results to the countless others that have made attempts seems like a sensible thing to do.
To date, the studios have been reluctant to employ AI to enhance the show for today's large screens. Historically, the tools available were not of sufficient quality to justify this effort. Additionally, the studios acknowledge that the post-production masters might not be good enough to initiate such a project. AI up-scaling still also involves significant cost and effort. Lastly, there may be some creative restraint, as the original creators of the show would want any such effort to be done correctly, and not release a poorly done upscale that winds up being the target of fan ire.
My Results With Real-ESRGAN
I started with a 720p clip of the finale of Star Trek: Deep Space Nine (S7 Episode 27, “What You Leave Behind”). The python script wasn’t exactly built for speed, though there are some potentials for speeding the process up. Each frame took about ten seconds for upscale on my mid-range workstation with my original test. The clip I had was 45 seconds, and the full render took about three hours and forty-five minutes. (The YouTube clip provided below shortens this duration, as I want to satisfy any copyright considerations).
I realized later that I had insufficient “tiling” (which breaks the frames into smaller portions for more efficient processing) in my original code. I made the change to cut the frames into 256 tiles and this shortened the time down to three seconds per frame utilizing the GPU. This means my 45 seconds of video should have taken about 1 hour and 10 minutes. With this change, Real-ESRGAN displays slightly annoying iterative “Tiling” updates which dirties up your terminal output turn the process. This can be commented out in Real-ESRGAN\realesrgan\utils.py at line 163.
print(f'\tTile {tile_idx}/{tiles_x * tiles_y}')
I was able to see the individual frames as the image was rendering.
You’ll notice that you can see more sharpness in Colm Meany’s hands (this may be difficult to see in the small form factor of some Substack views). Additionally, it’s interesting to note that it’s likely impossible to see the text that was originally on the LCARS display, and the Real-ESRGAN doesn’t spend a lot of time or effort to figure this out, and rightfully so.
Meanwhile, on exterior effects shots, you can see the ship’s name receiving higher detail and the general blurriness replaced with a more sharp image. Additionally, the upscaler handled the explosions well. (Again, this may be difficult to notice on smaller screens on Substack).
Finally, if you can do so – I recommend watching this clip on a larger screen so you can see the impressive results as they would compare side-by-side.
The Code
The code is available in the “video-upscaler-resrgan” directory in this Github repository. I’ve included code, not currently used – to shell out to ffmpeg which may be able to handle reassembly of the up-scaled frames a bit better than what cv2 can do.
The code can be found in the "video-upscaler-resrgan" directory in this GitHub repository. I have included code, which is not currently in use, to shell out to ffmpeg that may handle the reassembly of the up-scaled frames better than cv2.
I had several problems with setting this up which I’ll detail here to hopefully save you some hassle.
your-project-folder/
├── video_upscale.py ← Your main script (runs from here)
├── input.mp4 ← Your input video (example command line)
├── Real-ESRGAN/ ← Cloned Git repo
│ ├── realesrgan/ ← Real-ESRGAN Python package
│ ├── basicsr/ ←
│ ├── weights/ ←
│ │ └── RealESRGAN_x4plus.pth ← Your downloaded model file
│ ├── setup.py ← Required for `python setup.py develop`
│ └── ... ← Rest of Real-ESRGAN repo files
├── scratch/ ← Example workdir (command line)
│ ├── origframes/ ← Extracted frames go here
│ └── upscaledframes/ ← AI-upscaled frames go here
└── output.mp4 ← Final upscaled video (example command line)
While this wasn’t a problem itself, it may be best to spell this out – so you know where to clone Real-ESRGAN into relative to video_upscale.py
requirements.txt with CUDA
The requirements file for installation of packages if you wish to offload to a GPU.
# Core AI/ML framework (CUDA 12.1 builds)
torch==2.5.1+cu121 --index-url https://download.pytorch.org/whl/cu121
torchvision==0.18.1+cu121 --index-url https://download.pytorch.org/whl/cu121
torchaudio==2.5.1+cu121 --index-url https://download.pytorch.org/whl/cu121
# Real-ESRGAN dependencies
tqdm
opencv-python
numpy
realesrgan @ git+https://github.com/xinntao/Real-ESRGAN.git
CUDA can be checked with the following command:
nvidia-smi
…provided you have Nvidia drivers installed. I’m running CUDA 12.8 on may machine.
requirements.txt CPU only
If you can’t run on a GPU enabled device, try this requirements.txt instead. Be warned, it will run significantly slower. I have not yet tested in CPU only mode.
# PyTorch for CPU (no CUDA)
torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cpu
torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cpu
# Rest of your stack
tqdm
opencv-python
numpy
realesrgan @ git+https://github.com/xinntao/Real-ESRGAN.git
The Real-ESRGAN library will fall back to CPU if the hardware is not equipped with a compatible GPU, but I believe Python itself will need to be able to find the non-GPU version of pytorch.
PyTorch State Dictionary
Download RealESRGAN_x4plus.pth at this link. Place in the appropriate directory as shown above. (This is the already established “training” for the AI model).
basicsr Reference of Older Location of rgb_to_grayscale
You may see the error:
ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'
Your python interpreter will tell you where the offending degradation.py code is located. (torchvision has moved the location of the rgb_to_grayscale function in recent versions) Find this file and edit this line near the top from
from torchvision.transforms.functional_tensor import rgb_to_grayscale
to
from torchvision.transforms.functional import rgb_to_grayscale
and save the file.
video_upscale.py code
# video_upscale.py
import os
import cv2
import glob
import shutil
import argparse
import subprocess
import numpy as np
import time
from tqdm import tqdm
from realesrgan import RealESRGANer
from basicsr.archs.rrdbnet_arch import RRDBNet
import torchvision.transforms.functional as TF
from torchvision.transforms.functional import to_tensor
import torch
# class PatchedRealESRGANer(RealESRGANer):
# def enhance(self, img, outscale=None, alpha_upsampler='realesrgan'):
# if not isinstance(img, np.ndarray):
# raise TypeError("Input must be a NumPy array")
# if img.dtype != np.float32:
# img = img.astype(np.float32)
# if np.max(img) > 1.0:
# img /= 255.0 # normalize
# return super().enhance(img, outscale, alpha_upsampler)
def extract_frames(input_video, orig_dir):
"""
Extract frames from a video file and save them as images.
:param input_video:
:param orig_dir:
:return:
"""
os.makedirs(orig_dir, exist_ok=True)
cap = cv2.VideoCapture(input_video)
fps = cap.get(cv2.CAP_PROP_FPS)
total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
for i in tqdm(range(total), desc="Extracting frames"):
ret, frame = cap.read()
if not ret:
break
cv2.imwrite(os.path.join(orig_dir, f"frame_{i:05d}.png"), frame)
cap.release()
return fps, total
def upscale_frames(orig_dir, upscale_dir, model_path):
"""
Upscale images in a directory using Real-ESRGAN.
:param orig_dir:
:param upscale_dir:
:param model_path:
:return:
"""
os.makedirs(upscale_dir, exist_ok=True)
model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64,
num_block=23, num_grow_ch=32, scale=4)
# use_cuda = torch.cuda.is_available()
upsampler = RealESRGANer(
scale=4,
model_path=model_path,
model=model,
tile=256, # force tiling
tile_pad=10,
pre_pad=0,
half=True if torch.cuda.is_available() else False,
device='cuda' if torch.cuda.is_available() else 'cpu'
)
# print(f"[DEBUG] Using device: {torch.cuda.current_device()}")
# print(f"[DEBUG] Device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
# print(f"[DEBUG] GPU memory allocated: {torch.cuda.memory_allocated()} bytes")
# print(f"[DEBUG] GPU memory reserved: {torch.cuda.memory_reserved()} bytes")
frame_files = sorted(f for f in os.listdir(orig_dir) if f.endswith(".png"))
for i, fname in enumerate(tqdm(frame_files, desc="Upscaling frames")):
img_path = os.path.join(orig_dir, fname)
img = cv2.imread(img_path, cv2.IMREAD_COLOR)
# Normalize to float32 in [0, 1] — but keep it NumPy
if img.dtype != np.float32:
img = img.astype(np.float32) / 255.0
start = time.time()
output, _ = upsampler.enhance(img, outscale=1.5)
end = time.time()
# Debug output
# print(f"[TIMING] Inference time: {end - start:.2f} sec")
# print(f"[DEBUG] Frame: {fname}")
# print(f"[DEBUG] Image dtype: {img.dtype}, shape: {img.shape}")
# print(f"[DEBUG] CUDA is available: {torch.cuda.is_available()}")
# print(f"[DEBUG] Current device: {torch.cuda.current_device()}")
# print(f"[DEBUG] Device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
# print(f"[DEBUG] Memory allocated: {torch.cuda.memory_allocated()} bytes")
# print(f"[DEBUG] Memory reserved: {torch.cuda.memory_reserved()} bytes")
# print(f"[DEBUG] Model is on device: {next(model.parameters()).device}")
cv2.imwrite(os.path.join(upscale_dir, fname), output)
def frames_to_video(input_dir, prefix, output_path, fps=30):
"""
Convert a series of images to a video file using OpenCV.
:param input_dir:
:param prefix:
:param output_path:
:param fps:
:return:
"""
pattern = os.path.join(input_dir, f"{prefix}_*.png")
images = sorted(glob.glob(pattern))
if not images:
print(f"Error: No images found with prefix {prefix}")
return False
first_frame = cv2.imread(images[0])
height, width, _ = first_frame.shape
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
for img_path in images:
frame = cv2.imread(img_path)
if frame is None:
print(f"Warning: Skipping unreadable frame {img_path}")
continue
out.write(frame)
out.release()
return True
def reassemble_video_alt(upscale_dir, output_file, fps):
"""
Old ffmpeg method — kept for fallback or alternate use.
"""
if not os.path.exists(upscale_dir):
print(f"[ERROR] Upscale directory '{upscale_dir}' not found.")
return
cmd = [
"ffmpeg", "-y",
"-framerate", str(fps),
"-i", os.path.join(upscale_dir, "frame_%05d.png"),
"-c:v", "libx264",
"-pix_fmt", "yuv420p",
output_file
]
subprocess.run(cmd, check=True)
def cleanup_dirs(*dirs):
"""
Clean up temporary directories.
:param dirs:
:return:
"""
for d in dirs:
if os.path.exists(d):
shutil.rmtree(d)
def main():
"""
Main function to handle command line arguments and orchestrate the video upscaling process.
:return:
"""
parser = argparse.ArgumentParser(description="Video Upscaler with Real-ESRGAN")
parser.add_argument("--inputfile", required=True, help="Input source video file")
parser.add_argument("--outputfile", required=True, help="Output 1080p video file")
parser.add_argument("--workdir", required=True, help="Temporary work directory")
parser.add_argument("--cleanup", action="store_true", help="Delete work directories after processing")
args = parser.parse_args()
orig_dir = os.path.join(args.workdir, "origframes")
upscale_dir = os.path.join(args.workdir, "upscaledframes")
model_path = "Real-ESRGAN/weights/RealESRGAN_x4plus.pth"
print(">>> Step 1: Extracting frames...")
fps, total = extract_frames(args.inputfile, orig_dir)
print(">>> Step 2: Upscaling frames...")
upscale_frames(orig_dir, upscale_dir, model_path)
print(">>> Step 3: Reassembling video (OpenCV)...")
success = frames_to_video(upscale_dir, prefix="frame", output_path=args.outputfile, fps=fps)
if not success:
print("[FAIL] Video assembly failed.")
return
if args.cleanup:
print(">>> Cleaning up temporary directories...")
cleanup_dirs(orig_dir, upscale_dir)
print(f"\nDONE. Upscaled video saved as: {args.outputfile}")
if __name__ == "__main__":
main()
The code is hard coded to handle 480 to 720, or 720 to 1080. You can adjust this line with values from the table below.
output, _ = upsampler.enhance(img, outscale=1.5)
If we accepted this value as an argument, we’d have to add code to assess the current input material and make sure the user was passing a valid command line argument. This could easily be done, but was skipped over for this test code.
Executing the Script
python video_upscale.py --inputfile inputvideo1.mp4 --outputfile outputupscale.mp4 --workdir workdir
There’s an optional --cleanup flag if you’d like to have the still image files removed at the end of operation.
python video_upscale.py --inputfile inputvideo1.mp4 --outputfile outputupscale.mp4 --workdir workdir –cleanup
I recommend starting with a small, one-second clip of your source material to make sure all is configured well and that your hardware can offer feasible results. Note, this is not an effort for the impatient.
Conclusion
For me, the feasibility of AI upscaling as an interim solution is more appealing now than ever. Between the falling cost of compute power and the availability of models like Real-ESRGAN, we have crossed into a territory where individual creators, and maybe eventually studios, can start giving old footage a second life without breaking the bank. Additionally, while I did not discuss in great detail here, it would be nice if music videos (or promos) from the 1970s and 1980s could be restored in this manner.
And it’s not just sci-fi that stands to benefit. There’s a deep well of classic television that’s only ever been lightly polished for reruns or streaming. So many of these shows were shot on film and then bottlenecked into standard definition formats. With AI upscaling, we have a way to undo some of that compression—maybe not perfectly, but impressively enough to matter.
As streaming services race to churn out forgettable reality shows and half-baked scripted content, I think they’re overlooking one of their greatest assets: their own archives. I’d gladly keep paying for a service that regularly updates its back catalog with thoughtful, high-quality remasters. It beats trying to get excited about another dumb reality series where a bunch of people are running around an island looking for a briefcase or some nonsense like that.