Journal of Indian Acad. Math.
ISSN: 0970-5120
Vol. 48, No. 1 (2026) pp. 1–9.
MATHEMATICAL PROPERTIES OF ACTIVATION
FUNCTIONS IN ARTIFICIAL INTELLIGENCE
DEVELOPMENTS
Analysis and Implications for Deep Neural Architectures
Massimiliano Ferrara1, and Celeste Ciccia2
Abstract. Activation functions govern the expressive power and training dynamics of
deep neural networks through their analytical properties. This paper provides a rigorous
mathematical analysis of six fundamental activation functions – Linear, Sigmoid, Hyper-
bolic Tangent, ReLU, Parametric ReLU, and Exponential Linear Unit – examining how
regularity, gradient structure, and spectral properties influence representational capac-
ity, gradient flow stability, and convergence behavior in deep architectures. We establish
formal results on the representational collapse of linear activations, derive sharp gradient
decay bounds for saturating functions, prove gradient preservation theorems for piecewise-
linear activations, and characterize the convergence advantages of smooth non-saturating
units. Our analysis yields a unified mathematical framework connecting activation func-
tion properties to network trainability, with direct implications for the design of deep
learning architectures in sequential decision-making, continuous control, and safety-critical
applications.
Keywords: Activation functions, deep neural networks, gradient flow, vanishing gradi-
ents, convergence analysis, ReLU, ELU, representational capacity.
2010 AMS Subject Classification: 68T07, 65K10, 90C26, 41A25, 60H35.
1. Introduction
Deep neural networks derive their approximation power from the composition of pa-
rameterized affine maps with nonlinear activation functions. While the universal ap-
proximation theorem [1] establishes existence results for shallow networks, the practical
trainability and generalization of deep architectures depend critically on the analytical
properties of the chosen activation. Despite extensive empirical work surveying activation
function performance in supervised learning [2, 3], a unified mathematical treatment con-
necting regularity, gradient structure, and convergence guarantees in deep architectures
remains incomplete.
This paper addresses the gap by providing rigorous analysis of six canonical activation
functions that represent the major paradigms in neural network design: the Linear func-
tion, the Sigmoid, the Hyperbolic Tangent (TanH), the Rectified Linear Unit (ReLU) [4],
the Parametric ReLU (PReLU) [5], and the Exponential Linear Unit (ELU) [6]. We fo-
cus on four mathematical dimensions: (i) representational capacity through composition,
(ii) gradient magnitude propagation across depth, (iii) regularity and Lipschitz properties,
1