Back to Projects
AI & InfrastructureOngoing

On-Prem AI Compute Platform

GPU Inference & Product Processing

Role

Architect & Lead Developer

Status

In active use powering AI workloads for VidCutAI and Nulite products

Stack

Nvidia RTX 4080Nvidia RTX 3070 TiNvidia RTX 3060DockerRedisTranscription ModelsDiarization ModelsLocal LLMs

The Challenge

AI products like VidCutAI and Transcription Pro require consistent, high-throughput GPU compute for transcription, diarization, and language models without the cost or latency of cloud-only solutions.

The Solution

I built dedicated GPU-backed machines integrated into the homelab network, separating production and testing workloads. These systems expose internal APIs for AI processing and support both local and hybrid (API-assisted) workflows.

Key Features

  • Dedicated GPU Inference Nodes
  • Production and Testing Environments
  • Self-hosted Transcription, Diarization, and LLM Workloads
  • Internal APIs for AI Processing
  • Tight Integration with Desktop and Web Products

Description

A dedicated on-prem AI compute layer built on top of the homelab infrastructure, designed to run GPU-intensive inference and batch processing workloads for AI-driven products.