Capacity-Controlled Multi-View Stylization of 3D Gaussian Splatting

Abstract

While 3D Gaussian Splatting (3DGS) provides an efficient and explicit representation for novel view synthesis, enforcing stylistic coherence across viewpoints remains challenging. Existing 3D stylization methods typically apply 2D feature-matching losses independently per rendered view, which leads to unstable style allocation, many-to-one feature reuse, and limited cross-view consistency. We propose a capacity-controlled framework for multi-view stylization of 3DGS, grounded in optimal transport. Specifically, we reformulate local style matching as a semi-balanced optimal transport problem. By introducing explicit column-capacity constraints with tunable strength, our formulation mitigates many-to-one matching and enables controllable allocation of style features. This transport-based objective provides a principled mechanism for balancing feature coverage and stylistic diversity while maintaining stable correspondences across viewpoints. To further enhance cross-view coherence, we incorporate a novel cross-view matching guidance to constrain correspondences between scene content and style patterns. In addition, we introduce several geometric regularizations to enhance the vanilla 3DGS, thereby enabling optimized Gaussian primitives to represent finer-grained textures during stylization. Extensive experiments demonstrate that our approach significantly improves multi-view stylistic consistency and produces stable, expressive 3D stylizations while preserving the core semantic structure of the scene.

Method

We introduce an error-driven densification strategy: we first optimize 2D Gaussians in image space to reduce residual errors (left). Then we leverage depth map information to back-project 2D Gaussians into world space, and apply clustering, WPCA, and scaling calibration to minimize projection errors. The new primitives are initialized with a view-dependent opacity lobe oriented toward the corresponding camera (middle). We then jointly optimize the augmented Gaussians with the original scene to recover challenging view-dependent color and improve photometric consistency (right). Our post‑enhancement is plug‑and‑play with existing 3DGS frameworks and achieves higher quality with fewer SH parameters, especially for complex reflections.

Capacity-Controlled Feature Transport

Inspired by classical Phong shading, we model view-dependent opacity with a cosine weighted function whose shape is controlled by two parameters: β, which governs the lobe’s sharpness, and T, which determines its angular extent. Along with the central orientation of the lobe, each new kernel introduces 5 learnable parameters to a standard Gaussian primitive.

Cross-view Matching Guidance

Inspired by classical Phong shading, we model view-dependent opacity with a cosine weighted function whose shape is controlled by two parameters: β, which governs the lobe’s sharpness, and T, which determines its angular extent. Along with the central orientation of the lobe, each new kernel introduces 5 learnable parameters to a standard Gaussian primitive.

Results

Our Gaussian supplementation strategy significantly enhances rendering quality across existing GS scenes. It is worth noting that, within the MCMC framework, although second-order SH (i.e., sh=2) reduce the parameter count by 21 per primitive compared to third-order SH (sh=3), they still achieve comparable rendering quality. Both approaches outperform the state-of-the-art implicit method [Barron et al. 2023], demonstrating that our method facilitates the adaptation of Gaussian splatting to low-end hardware platforms. Compared with [DBS Liu et al. 2025], the state-of-the-art explicit approach, our method outperforms it on real-world datasets. Compared with specular-aware methods, VoD-3DGS and Spec-Gaussian, our method achieves better results on almost all datasets.