The integral calculus for Brownian motion

The integral calculus for Brownian motion, Ito calculus, starts with the idea that even though single paths of Brownian motion are too rough to form a Riemann limit, a suitable in-probability limit can still exist and be meaningfully referred to as the integral. For instance, suppose you wish to integrate $f(t) = t$ against Brownian motion $W_t$ on $[0, 1]$, i.e

$$ \int_0^1 t \d B_t. $$

The way (at least the Ito one) to define this integral is to consider the prelimit

$$ S_n \df \sum_{j = 0}^{n - 1} f(j/n) \Rnd{B_{(j + 1)/n} - B_{j/n}}. $$

The core choice here is that the integrand is evaluated at the left endpoint of the interval $[j/n, (j + 1)/n]$ when considering the Brownian increment $B_{(j + 1)/n} - B_{j/n}$. This choice is very important when the integrand is random (for instance when the integrand is adapted, this Ito choice makes the integral a martingale).

It may be shown, using the properties of Brownian motion, that for reasonable $f$, $\Var(S_n - S_m) \to 0$ as $n, m \to \infty$, and thus $S_n$ has an $L^2$ limit, i.e., there is a random variable $S$ such that $S_n \lto S$. This $S$ is referred to as the Ito integral.

The theory of this integral then extends to defining the process

$$ I_t \df \int_0^t f(t) \d W_t, \quad t > 0, $$

jointly for all $t$, for functions $f$, and later, for suitable random processes.

Going in the opposite direction

We will now ask the opposite question. Given some integral as a random variable, can we find the integrand? This is a "differentiation-type" question, and is one of the core goals of the Malliavin calculus. The basic idea is to probe how a random variable changes under perturbations of the noise, but setting this up requires a significant amount of machinery.

To set the stage, we define noise underlying our objects. An isonormal Gaussian space comes with three components, a (nice) Hilbert space $H$, a probability space $\cP = (\Om, \cF, \P)$ hosting our random variables, and a map $W$ from $H$ to random variables in our space, i.e., $W(h)$ is in $L^2(\cP)$ for all $h \in H$.

Further, we require two conditions on these random variables. Firstly, $W(\cdot)$ has to be Gaussian, jointly in the indexing vector (one would say that $W(\cdot)$ is a Gaussian process). Secondly, and most importantly, we also impose the isonormal property,

$$ \E[W(h)W(g)] = \Inn{h, g}. $$

where the inner product on the right is the one on $H$. Observe that this essentially specifies the law of the Gaussian process $W(\cdot)$. For instance, this forces $W$ to be linear. To see why, note that we may apply the isonormal property to obtain

$$ \begin{align} \E\Rnd{W(h + g) - W(h) - W(g)}^2 &= \Inn{h + g, h + g} - \Inn{h + g, h} - \Inn{h + g, g} \\ &\phantom{==} - \Inn{h, h + g} + \Inn{h, h} + \Inn{h, g} \\ &\phantom{==} - \Inn{g, h + g} + \Inn{h, g} + \Inn{g, g} \\ &= 0. \end{align} $$

Thus, $W(h + g) = W(h) + W(g)$, a.s.

Perhaps the most important example of this setup is given by Brownian motion. Let $H$ be the space of $L^2$ functions on $[0, 1]$, and let $W(h) = \int_0^1 h \d B_t$. Then, standard results from Ito calculus verify the isonormal property in this case. Observe that given the map $W$, it is easy to recover $B$ by setting $B_t = W(\1_{[0, t]})$.

At this point, it is also useful to assert that $\cP$ does not have any more information than what is provided by these random variables $W(\cdot)$, i.e., let $\cF$ be the $\sig$-algebra generated by $W(\cdot)$. (For convenience, we also close this $\sig$-algebra, with respect to $\P$, but this is a minor point and safely ignored at this stage.)

The Malliavin $D$-operator

To define a suitable notion of derivative, $DX$, for random variables $X$, we start by considering cases where this representation is easier to obtain. Then, we extend the operator, using functional analytic tools, to cases where it is less obvious.

Let us start by considering a very simple space, $H = \R^2$. An isonormal Gaussian space here can be defined by sampling two independent standard Gaussians $\xi_1, \xi_2$, and letting

$$ W(h = (h_1, h_2)) = h_1 \xi_1 + h_2 \xi_2, \quad \forall h \in H = \R^2. $$

This is clearly isonormal, since

$$ \E[(h_1 \xi_1 + h_2 \xi_2)(g_1 \xi_1 + g_2 \xi_2)] = h_1g_1 + h_2g_2 = \Inn{h, g}. $$

Now consider some random variable on this space, say like $X = \xi_1^2 \xi_2$. Such a random variable may be represented at $X = f(\xi_1, \xi_2)$ where $f : \R^2 \to \R$. In our case, $f(x_1, x_2) = x_1^2 x_2$. The Malliavin derivative $DX$ is then simply

$$ DX = e_1 \dd_1 f(\xi_1, \xi_2) + e_2 \dd_2 f(\xi_1, \xi_2) = 2 e_1 \xi_1 \xi_2 + e_2 \xi_1^2. $$

where $e_1 = (1, 0), e_2 = (0, 1)$ and $\dd_i f$ is the partial derivative of $f$ w.r.t the $i$th coordinate. Observe that this is a $\R^2$-valued random variable.

Why such a definition? Observe that if $\xi = (\xi_1, \xi_2)$ is instead replaced by $\xi' = (\xi'_1, \xi'_2)$ infinitesimally close to $\xi$, (which can happen due to small changes in the sampled noise), the change in $X$ is simply,

$$ \Delta X \approx (\xi'_1 - \xi_1) \dd_1 f(\xi) + (\xi'_2 - \xi_2) \dd_2 f(\xi) = \Inn{DX, \xi' - \xi}. $$

Thus, the Malliavin derivative encodes the infinitesimal change in $X$ resulting from a corresponding infinitesimal change in the noise $\xi$. The reason it is a $\R^2$-valued (and in general will be an $H$-valued) random variable, is because it needs to encode all possible directions in which the noise can change.

Let us now describe a formula concerning the $D$-operator which will be useful in the sequel. Observe that for any vector $h \in \R^2$,

$$ \E[\Inn{DX, h}] = \E\Box{\sum_i h_i \dd_i f(\xi)} = \E\Box{\sum_i h_i \xi_i f(\xi)} = \E[X W(h)] $$

using the (extremely powerful but simple) fact that for a Gaussian $Z$, $\E[f'(Z)] = \E[Z f(Z)]$. This is the adjoint formula for $D$.

For a general $H$, and a general $X$, there are many things that can go wrong with what we did above. Firstly, $X$ is perhaps not such a simple, smooth, function of the noise. Secondly, in general, a representation as simple as the one for $\R^2$ is not obvious.

To solve the first problem, let us initially restrict ourselves to only random variables $X$ that are indeed smooth functions of some finitely many $W(h)$, i.e., let $\cS$ be the set of all random variables of the form

$$ X = f(W(h_1), \ldots, W(h_n)) $$

where $f : \R^n \to \R$ is smooth. If we define

$$ DX \df \sum_i h_i \cdot \dd_i f(W(h_1), \ldots, W(h_n))\,, $$

a $H$-valued random variable, it is easy to check that the adjoint formula continues to hold. We take this as our definition of $D$ on $\cS$.

Although the definitions above work for $\cS$, in order to ensure that expectations exist, let us restrict $\cS$ to be random variables produced from functions $f$ which are not only smooth, but also have at most polynomial growth at infinity, for all derivatives.

Before we proceed, it is first necessary to convince ourselves that $D$ does not depend on the particular representation of $X$ as $f(W(h_1), \ldots, W(h_n))$, since a given $X$ potentially has many such representations.

To see this, let us first orthonormalize the set $\{h_1, \ldots, h_n\}$ to obtain $\{e_1, \ldots, e_m\}$. $X$ then continues to be a smooth function of $W(e_1), \ldots, W(e_m)$, simply by setting $X = F(W(e_1), \ldots, W(e_m))$, where $F = f \circ T$, and $T$ is a linear transformation (observe that this works because $W$ is forced to be linear). A simple check using the chain rule shows that $DX$ using the $h$-representation, and $DX$ using this orthonormal representation, are the same.

In general however, we will have two arbitrary sets $\{h_1, \ldots, h_n\}$, and $\{g_1, \ldots, g_\ell\}$, both of which are not orthonormal. But, we may use the fact we just proved to reduce to that case. Let us jointly orthonormalize the entire collection $\{h_1, \ldots, h_n\} \cup \{g_1, \ldots, g_\ell\}$ to obtain $\{e_1, \ldots, e_m\}$, and say

$$ X = F(W(e_1), \ldots, W(e_m)) = G(W(e_1), \ldots, W(e_m)) $$

are the two representations, in this new basis (note that certain variables may not be used by $F$ or $G$, i.e., $F(x, y) = x$ and such, but this does not mess up the computations we did earlier since the corresponding $\dd_i$ is just 0).

But by orthonormality, $W(e_1), \ldots, W(e_m)$ are all i.i.d. standard Gaussians (since joint Gaussians are independent when uncorrelated). Perhaps now it is intuitively clear that $F = G$ by smoothness, since the vector $(W(e_1), \ldots, W(e_m))$ takes values in every part of the space $\R^m$ with positive probability. This finishes the proof, i.e., $DX$ is well-defined, at least on $\cS$.

Some properties of $D$

Before we move forward, let us observe some basic facts about the $D$-operator. The most important fact is the adjoint formula we mentioned earlier, i.e.,

$$ \E[\Inn{DX, h}] = \E[X W(h)], \quad h \in H, $$

which continues to hold under our new definition of $D$ on $\cS$.

The second fact is that if $X, Y \in \cS$,

$$ D(XY) = X \cdot DY + Y \cdot DX, $$

which may be checked easily by assuming that $X, Y$ are both functions of some finite collection $W(h_1), \ldots, W(h_n)$ (which we can by representation independence of $D$), and then applying the product rule for usual partial derivatives.

This product rule may be combined with the adjoint formula to obtain

$$ \E[XY W(h)] = \E[\Inn{D(XY), h}] = \E[X \Inn{DY, h}] + \E[Y \Inn{DX, h}] $$

which yields

$$ \E[X \Inn{DY, h}] = \E[XY W(h)] - \E[Y\Inn{DX, h}] $$

which is the integration-by-parts formula for $D$.

Extending the operator

Now it is time for us to extend the $D$-operator to wilder random variables $X$. Since this will involve some nontrivial functional-analytic machinery, we should be precise about our objects. Firstly, $\cS$ is a linear space, and it is a fact that $\cS$ is dense in $L^p(\cP)$ for any $p \geq 1$ (this requires a proof, but we omit it).

Secondly, for any $p \geq 1$, $D$ (on $\cS$) takes values in $L^p(\cP; H)$, which denotes $L^p$ random variables taking values in $H$ (the norm is $\E[\Norm{\cdot}_H^p]^{1/p}$). It is to ensure this property that we imposed polynomial growth on the derivatives as well.

Thus as it stands, $D$ is a linear operator from $\cS \to L^p(\cP; H)$, which we will now try to extend to a larger class of random variables $X$. Since we will only consider $X \in L^p(\cP; H)$ (of which $\cS$ is a subclass, for any $p \geq 1$), the (new) domain of definition of $D$, $\dom_D$, will be called $\bD^{1, p}$ (the $1$ indicates that we are only taking the first derivative of $X$). The strategy is now a generic tool employed repeatedly in functional analysis:

Show that $D$ is closeable, that is if $X_n \to 0$ in $L^p(\cP)$ and that $DX_n \to Y$ in $L^p(\cP; H)$, then $Y = 0$.
Let $\bD^{1, p}$ be the domain of definition of the closure of $D$, which exists since $D$ is closeable.

The reason this works is simple. Intuitively, we wish to extend $D$ as follows: whenever $X_n \to X$ and $DX_n \to Y$ (in the appropriate spaces), we wish to set $DX = Y$ (even if $X \notin \cS$). But when is this extension well defined? Well if we have two sequences $X_n \to X$ and $X'_n \to X$ such that $DX_n \to Y$ but $DX'_n \to Y'$ for $Y' \neq Y$, i.e., the operator limit is sequence-dependent, then we are in trouble, since there is no canonical way to define $DX$. But as long as $\cS$ is a linear space and $D$ is linear (which are both true in our case), this says that $X_n - X'_n \to 0$, but $D(X_n - X'_n) \to Y - Y' \neq 0$. This is exactly the scenario that being closeable avoids, allowing a well-defined extension.

To show that $D$ is closeable, let us now take $X_n \in \cS$ such that $X_n \to 0$ in $L^p(\cP)$ and $DX_n \to Y$ in $L^p(\cP; H)$. Is $Y = 0$ forced? Fix $h \in H$, and let us show that $\Inn{Y, h} = 0$ a.s. For this, it suffices to show that if $V$ is a compactly supported smooth random variable (referred to as $V \in \cS_0$), i.e., $V = f(W(h_1), \ldots, W(h_n))$ for some compactly supported smooth $f$, then $\E[V \Inn{Y, h}] = 0$. Since $DX_n \to Y$ in $L^p(\cP; H)$, we have $\Inn{DX_n, h} \to \Inn{Y, h}$ in $L^p(\cP)$. Since $V$ is bounded, $V\Inn{DX_n, h} \to V\Inn{Y, h}$ in $L^p(\cP)$, and thus it suffices to show that $\E[V\Inn{DX_n, h}] \to 0$.

By the integration-by-parts formula, we have

$$ \E[V\Inn{DX_n, h}] = \E[X_n \Inn{DV, h}] - \E[X_n V W(h)]. $$

Since $V \in \cS_0$, one may check that $\Inn{DV, h} \in \cS_0$ as well, and thus $\E[X_n \Inn{DV, h}] \to 0$ as $X_n \to 0$. Similar arguments show that $\E[X_n V W(h)] \to 0$ as well, finishing the proof.

As described earlier, the domain of $D$ can then be extended to $\bD^{1, p}$, which is the closure of $\cS$ under the graph norm

$$ \Norm{X}_{1, p} \df \Rnd{\E|X|^p + \E\Norm{DX}_H^p}^{1/p}, $$

and thus is a Banach space by standard functional-analytic arguments. Further, $\domd$ (i.e., when $p = 2$) is a Hilbert space.

The adjoint formula for $D$ on $\cS$ continues to hold for $D$ on $\bD^{1, p}$. If $X \in \bD^{1, p}$, there is a sequence $X_n \to X$ under $\Norm{\cdot}_{1, p}$, implying that $X_n \to X$ in $L^p(\cP)$ and $DX_n \to DX$ in $L^p(\cP; H)$. Taking limits on both sides of the adjoint equation but for $X_n$, i.e.,

$$ \E[\Inn{DX_n, h}] = \E[X_n W(h)] $$

we obtain the equation for $X$.

The reason we do not extend the product rule is because if $X \in L^2$, $X^2$ is not in $L^2$ (in general). Thus products do not respect the $L^p$ structure. However, as long as we disallow this kind of behavior, we can prove a chain rule for $D$.

The chain rule

Given a smooth function $\phi$ with bounded partial derivatives, we wish to understand $D \phi(X)$. The reason for this condition on $\phi$ is so that $\phi(X) \in \cS$ if $X \in \cS$, and to disallow the $X^2$ type behavior we discussed at the end of the previous section.

If $X = f(W(h_1), \ldots, W(h_n))$ for $f$ smooth (and some choices of $h_1, \ldots, h_n$), then

$$ \begin{align} D\phi(X) &= \sum_{i = 1}^n h_i \cdot \dd_i (\phi \circ f) (W(h_1), \ldots, W(h_n)) \\ &= \sum_{i = 1}^n h_i \cdot \phi'(X) \cdot \dd_i f(W(h_1), \ldots, W(h_n)) \\ &= \phi'(X) D X. \end{align} $$

To extend this to $\bD^{1, p}$, observe that if $X_n \to X$ and $DX_n \to DX$ (in the appropriate spaces), then $\phi(X_n) \to \phi(X)$ (since derivatives are bounded) and $D\phi(X_n) = \phi'(X_n) DX_n \to \phi'(X) DX$ (since $\phi'$ is bounded). Thus $D\phi(X) = \phi'(X) DX$.

The chain rule continues to hold for multivariate $\phi$. For instance if $Y = \phi(X_1, X_2)$, then

$$ DY = DX_1 \cdot \dd_1\phi(X_1, X_2) + DX_2 \cdot \dd_2 \phi(X_1, X_2). $$

The proof is very similar.

Examples

We now have the tools to consider some non-trivial examples. Let us restrict to the Brownian case, i.e., where $W(h) = \int_0^1 h_t \d B_t$.

Since $f(x) = x$ is smooth, if $\g : [0, 1] \to \R$ is $L^2$,

$$ X \df \int_{0}^1 \g \d B_t = W(\g) $$

is a smooth variable, and so $DX = \g$. In particular, $DB_t = \1_{[0, t]}$. As a check, the adjoint formula here says that

$$ \E\Box{B_t \int_0^1 h_s \d B_s} = \E[\Inn{\1_{[0, t]}, h}] = \int_0^t h_s ds $$

which can be verified from, say, the isometry of the Ito integral.

Now let us consider $X = \max(B_1, 0)$ which is not smooth. However, we can show that $X \in \bD^{1, 2}$. Let $\phi_\eps$ be a compactly supported mollifier of "bandwidth" $\eps$, and observe that $f \star \phi_\eps$ has bounded derivatives where $f(x) = \max(x, 0)$. Thus, $X_\eps \df f \star \phi_\eps(B_1) \in \cS$. The chain rule implies

$$ DX_\eps = t \mapsto (f' \star \phi_\eps)(B_1) $$

where $f'(x) = \1_{x > 0}$ is a weak derivative. As $\eps \to 0$, $X_\eps \to X$ in $L^2(\cP)$ and $DX_\eps \to (t \mapsto f'(B_1))$ in $L^2(\cP; H)$. The former is clear as long as the mollifier is well behaved, and the latter follows from the fact that for random variables in $L^2(\cP; H)$ which are (random) multiples of the constant function, the norm is simply that of the multiplier.

Thus, $DX = t \mapsto \1_{B_1 > 0}$.

$D$ as a directional derivative

It turns out that the Malliavin derivative $D$ has a very nice interpretation as a directional derivative, extending the intuition implicit in our simple $H = \R^2$ example earlier. Consider a smooth random variable $X = f(W(u))$ for some $u \in H$. Then, $DX = f'(W(u)) u$, and thus

$$ \Inn{DX, h} = f'(W(u))\Inn{h, u}. $$

This can also be written as

$$ \Inn{DX, h} = \fr{f(W(u) + \eps \Inn{h, u}) - f(W(u))}{\eps}. $$

If one formally thinks of $W(u) = \Inn{u, W}$ for some formal "noise vector" $W$, then $W(u) + \eps \Inn{h, u} = \Inn{u, W + \eps h}$, i.e., the noise received a boost in direction $h$. Note that this formal noise, although not rigorous, is not particularly misleading. For our $H = \R^2$ example, this representation was literally true with $W = (\xi_1, \xi_2)$. For Brownian motion, this holds with $W$ being the white-noise functional.

In the case of Brownian motion and $u = \1_{[0, s]}$ so that $X = f(W(u)) = f(B_s)$, the above reduces to

$$ \Inn{DX, h} = \fr{f\Rnd{B_s + \eps \int_0^s h(t) \d t} - f(B_s)}{\eps}. $$

This too, makes sense since the white-noise pushed in direction $h$ pushes $B_s$ by $\int_0^s h_t \d t$ since $B$ is the integral of the white-noise.

The divergence operator

A key tool that appears repeatedly in Malliavin calculus is the adjoint of the $D$-operator, also called the divergence operator $\del$. Since $D$ is unbounded, $\del$ is the adjoint of an unbounded operator, and thus some care is required for its definition. As we will see later, $\del$ will be an extension of the usual Ito integral.

Let us briefly recall the general setup for adjoints of unbounded operators. Suppose $H_1, H_2$ are two Hilbert spaces, and $A : H_1 \to H_2$ is an unbounded operator, with dense domain $\dom_A \subseteq H_1$. The adjoint $A^*$ is defined for all $u \in H_2$ (the collection of which form the domain $\dom_{A^*}$ of $A^*$) such that

$$ \ell_u(x) \df \Inn{Ax, u}, \quad x \in \dom_A $$

is a bounded linear functional on $\dom_A$. By Hahn-Banach, for such $u$, $\ell_u(x)$ extends uniquely (by density) to a bounded linear functional on all of $H_1$, and thus by duality, there is a unique $z \in H_1$ such that $\Inn{x, z} = \ell_u(x)$ for all $x \in D_A$. The adjoint at $u$ is then simply defined to be $z$, i.e., $A^* u = z$. That is, the adjoint $A^* : \dom_{A^*} \subseteq H_2 \to H_1$ satisfies the following duality equation:

$$ \Inn{Ax, u} = \Inn{x, A^* u}, \quad \forall x \in \dom_A, u \in \dom_{A^*}. $$

Returning to our setup, let us consider the case $p = 2$, so that $D$ is defined on $\bD^{1, 2} \subseteq H_1 = L^2(\cP)$, a Hilbert space. The image space of $D$ is $H_2 = L^2(\cP; H)$ (note that $H$ is unrelated to $H_1$ and $H_2$), which is again a Hilbert space. The adjoint is then an (unbounded) operator $\del : L^2(\cP; H) \to L^2(\cP)$ satisfying

$$ \E[X \del(\Phi)] = \E[\Inn{DX, \Phi}], \quad \forall \Phi \in \bD_{1, 2}, X \in \bD^{1, 2}. $$

where $\bD_{1, 2} \subseteq L^2(\cP; H)$ is the domain $\dom_\del$ of $\del$.

If $\Phi$ is deterministic, i.e., $\Phi = h$ a.s., for some $h \in H$, then $\del(\Phi) = W(h)$, simply via the usual adjoint formula for the $D$-operator. Thus, $\del$ extends (at least) the Ito integral for deterministic integrands.

Just as smooth variables were the simple case for $D$, elementary functions $\Phi = \sum_j F_j h_j$ for $F_j \in \bD^{1, 2}$ and $h_j \in H$ are the simple case for $\del$. First we need to check that such $\Phi$ are indeed in the domain of $\del$, for which it suffices to check that for each $X \in \bD^{1, 2}$,

$$ \Abs{\E[\Inn{DX, \Phi}]} \leq C \Norm{X} $$

for some $C$ potentially depending on $\Phi$. A direct application of the integration-by-parts formula yields

$$ \E[\Inn{DX, \Phi}] = \E\Box{\sum_j F_j \Inn{DX, h_j}} = \E\Box{X\Rnd{\sum_j F_j W(h_j) - \Inn{DF_j, h_j}}}, $$

establishing the claim, and also providing us a candidate for $\del(\Phi)$, i.e.,

$$ \del(\Phi) = \del\Rnd{\sum_j F_j h_j} = \sum_j F_j W(h_j) - \Inn{DF_j, h_j}. $$

Armed with this formula, we now turn to exploring the relationship of $\del$ with the Ito integral.

The relationship with the Ito integral

Restricting to the Brownian case, we will first try to understand how the progression of time affects the $D$-operator. Since time is an important factor in the Ito integral (via requiring adaptedness for the integrand and such), this will be crucial to our understanding of the relationship between these notions.

Define the natural filtration $\cF_t = \sig\{B_s : s \leq t\}$, and let $X \in \cS$ be some smooth random variable which is also in $\cF_t$. Then, it is perhaps not hard to see that $X = f(W(h_1), W(h_2), \ldots, W(h_n))$ for some vectors $h_1, \ldots, h_n \in H = L^2([0, 1])$, satisfying $\supp h_i \sse [0, t]$ (i.e., they are zero outside $[0, t]$). Therefore, $\Inn{DX, h} = 0$ a.s., for any $h$ with $\supp h \cap [0, t] = \empty$.

We will use the fact above to explicitly express the action of $\del$ on adapted integrands. Note that an element $\Phi \in L^2(\cP; H = L^2([0, 1]))$, it can also be equivalently thought of as a process $\Phi_t(\om) = \Phi(\om)(t)$ (we will not deal with the measure-theoretic considerations here). Such a process is called $L^2$ adapted if further $\Phi_t$ is $\cF_t$ measurable for each $t$.

Suppose $\Phi$ is of the elementary adapted form with smooth coefficients, i.e.,

$$ \Phi = \sum_j F_j \1_{(t_j, t_{j + 1}]}, \quad F_j \in L^2(\cF_{t_j}) \cap \cS, $$

for some sequence of points $t_1 < t_2 < \ldots < t_n$. Then, using our formula for $\del(\Phi)$, we get

$$ \del(\Phi) = \sum_j F_j W(\1_{(t_j, t_{j + 1}]}) - \Inn{DF_j, \1_{(t_j, t_{j + 1}]}} $$

But, the second term is always zero since $F_j \in \cF_{t_j}$ and $\1_{(t_j, t_{j + 1}]}$ is supported outside $[0, t_j]$. Thus,

$$ \del(\Phi) = \sum_j F_j W(\1_{(t_j, t_{j + 1}]}) = \sum_j F_j \cdot \Rnd{B_{t_{j + 1}} - B_{t_j}}. $$

which coincides with the Ito integral for $\Phi$!

Further, since smooth random variables in $L^2(\cF_t)$ are dense in $L^2(\cF_t)$ (we will not prove this), elementary adapted integrals of the form above are dense in $L^2$ adapted processes. Taking limits in the equation

$$ \Abs{\E\Inn{DX, \Phi}} \leq C\Norm{X} $$

along $\Phi$ for fixed $X \in \domd$, we see that $L^2$ adapted processes are in $\bD_{1,2} = \dom_\del$. Moreover, for such integrands, the $\del$ agrees with the Ito integral.

The Clark-Ocone formula

We will finish our tour with one of the central results in Malliavin calculus, which provides a constructive solution to a problem from the theory of Ito integration. The setup is as follows. Say $X \in L^2(\cP)$ (we are still in the Brownian case). Can we always create an $L^2$ adapted process $\Phi$ such that

$$ X = \int_0^1 \Phi_t \d B_t $$

(where the integral is in the Ito sense)?

Well, Ito integrals are martingales with zero expectation, so we certainly need $\E X = 0$. Turns out, this is enough, and is the content of the martingale representation theorem. The proof from the Ito theory is, however, non-constructive. It proceeds via proving that any variable $X$ which is orthogonal to all such adapted integrals must be 0 a.s. But, clearly, a more constructive answer is desirable.

Starting with the representation above, and using that $\del$ and Ito integration coincide for $\Phi$, we have $X = \del(\Phi)$. For any other $L^2$ adapted process $\Psi$, we have

$$ \E[X\del(\Psi)] = \E\Box{X \int_0^1 \Psi_t \d B_t} = \E\Box{\int_0^1 \Phi_t \d B_t \cdot \int_0^1 \Psi_t \d B_t} = \int_0^t \E[\Psi_t \Phi_t] \d t $$

applying the Ito isometry in the last step. But the adjoint formula (and Fubini) also yields

$$ \E[X\del(\Psi)] = \E\Inn{DX, \Psi} = \int_0^1 \E[(DX)_t \Psi_t] dt $$

Since these are equal, the only way for this to hold for each $\Psi$ is to have $\E[(DX)_t \Psi_t] = \E[\Phi_t \Psi_t]$ for almost every $t$, i.e., $\Phi_t = \E[(DX)_t | \cF_t]$. It is customary to write $(DX)_t$ as $D_t X$, so we recover the usual form of the celebrated Clark-Ocone formula,

$$ X = \int_0^1 \E[D_t X | \cF_t] \d B_t. $$

whenever $\E X = 0$. This however requires us to restrict $X \in \domd$.

As an example, let us consider $X = B_1^2 - 1$ (the $-1$ is there to make $\E X = 0$). Then $DX = t \mapsto 2B_1$ since $B_1^2$ is smooth. Therefore $D_t X \equiv 2B_1$, and thus

$$ B_1^2 - 1 = \int_0^1 \E[2B_1 | \cF_t] \d B_t = \int_0^1 2B_t \d B_t. $$

which is also what you would get from usual Ito integration. See Nualart, pg. 26 for a more nontrivial example.