Universal Approximation by Neural Networks

This page illustrates the ability of a neural network to approximate any function with an error as small as desired. We show in this simulation the case of the approximation of a polynomial in one variable, but this basic case can be generalized very well to higher dimensions.
Each time the "improve" button is pressed, the neural network measures the approximation error it made at the last step and uses a simple gradient descent algorithm to improve.
In the simulation below, enter a polynomial function to fit and click on "Improve" to perform one backpropagation round of the neural network. For higher numbers of hidden layers, the model may have trouble to converge

Given a continuous function on a compact set and a desired level of precision, one can construct a neural network with a finite number of neurons that will approximate the function with the desired level of precision. More formally, we have the next Theorem.

Theorem: (Stone-Weierstrass)

Let $X$ be a compact space, $C(X)$ be the algebra of continuous functions from $X$ to $\mathbb{R}$. A subalgebra of $C(X)$ is dense in $C(X)$ if and only if it separates points of $X$ and for all $x\in X$, it contains a function $f$ such that $f(x)\neq 0$.

By choosing the subalgebra of polynomials on $X\subseteq\mathbb{R}^n$ compact one obtains the following Corollary:

Corollary:

Let $m,n$ be positive integers, $K\subseteq\mathbb{R}^n$ be a compact set and $C(K,\mathbb{R}^m)$ be the set of continuous functions from $K$ to $\mathbb{R}^m$. Then, for any $f\in C(K,\mathbb{R}^m)$ and $\epsilon>0$, there exists a neural network $\hat{f}$ with $n$ inputs, $m$ outputs such that $\sup _{x\in K} \|f(x)-\hat{f}(x)\|<\epsilon$.

Note that the previous results do not guarantee convergence of the optimization process, they only prove that an approximation exists.