# Wozencraft ensemble

This is the approved revision of this page, as well as being the most recent.

Wozencraft ensemble is a set of linear codes in which most of codes satisfy the Gilbert-Varshamov bound. It is named after John Wozencraft, who proved its existence. The ensemble is described by , who attributes it to Wozencraft. used the Wozencraft ensemble as the inner codes in his construction of strongly explicit asymptotically good code.

## Existence theorem

Theorem: Let $\varepsilon > 0.$ For a large enough $k$, there exists an ensemble of inner codes $C_{in}^1,\cdots,C_{in}^N$ of rate $\tfrac{1}{2}$, where $N = q^k - 1$, such that for at least $(1 - \varepsilon)N$ values of $i, C_{in}^i$ has relative distance $\geqslant H_q^{ - 1} \left(\tfrac{1}{2} - \varepsilon \right )$.

Here relative distance is the ratio of minimum distance to block length. And $H_q$ is the q-ary entropy function defined as follows: $H_q(x) = x\log_q(q-1)-x\log_qx-(1-x)\log_q(1-x).$

In fact, to show the existence of this set of linear codes, we will specify this ensemble explicitly as follows: for $\alpha \in \mathbb{F}_{q^k } - \{ 0\}$, define the inner code $\begin{cases} C_{in}^\alpha :\mathbb{F}_q^k \to \mathbb{F}_q^{2k} \\ C_{in}^\alpha (x) = (x,\alpha x) \end{cases}$

Here we can notice that $x \in \mathbb{F}_q^k$ and $\alpha \in \mathbb{F}_{q^k}$. We can do the multiplication $\alpha x$ since $\mathbb{F}_q^k$ is isomorphic to $\mathbb{F}_{q^k}$.

This ensemble is due to Wozencraft and is called the Wozencraft ensemble.

For all $x, y \in \mathbb{F}_q^k$, we have the following facts:

1. $C_{in}^\alpha (x) + C_{in}^\alpha (y) = (x,\alpha x)+(y,\alpha y) = (x + y,\alpha (x + y)) = C_{in}^\alpha (x + y)$
2. For any $a \in \mathbb{F}_q, aC_{in}^\alpha (x) = a(x,\alpha x) = (ax,\alpha (ax)) = C_{in}^\alpha (ax)$

So $C_{in}^\alpha$ is a linear code for every $\alpha \in \mathbb{F}_{q^k } - \{ 0\}$.

Now we know that Wozencraft ensemble contains linear codes with rate $\tfrac{1}{2}$. In the following proof, we will show that there are at least $(1 - \varepsilon)N$ those linear codes having the relative distance $\geqslant H_q^{ - 1} \left (\tfrac{1}{2} - \varepsilon \right )$, i.e. they meet the Gilbert-Varshamov bound.

## Proof

To prove that there are at least $(1-\varepsilon)N$ number of linear codes in the Wozencraft ensemble having relative distance $\geqslant H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right)$, we will prove that there are at most $\varepsilon N$ number of linear codes having relative distance $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right)$ i.e., having distance $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k.$

Notice that in a linear code, the distance is equal to the minimum weight of all codewords of that code. This fact is the property of linear code. So if one non-zero codeword has weight $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k$, then that code has distance $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k.$

Let $P$ be the set of linear codes having distance $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k.$ Then there are $|P|$ linear codes having some codeword that has weight $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k.$

Lemma. Two linear codes $C_{in}^{\alpha_1}$ and $C_{in}^{\alpha_2}$ with $\alpha_1, \alpha_2 \in \mathbb{F}_{q^k}$ distinct and non-zero, do not share any non-zero codeword.
Proof. Suppose there exist distinct non-zero elements $\alpha_1, \alpha_2 \in \mathbb{F}_{q^k}$ such that the linear codes $C_{in}^{\alpha_1}$ and $C_{in}^{\alpha_2}$ contain the same non-zero codeword $y.$ Now since $y \in C_{in}^{\alpha_1}, y = (y_1,\alpha_1 y_1)$ for some $y_1 \in \mathbb{F}_q^k$ and similarly $y = (y_2,\alpha_2 y_2)$ for some $y_2 \in \mathbb{F}_q^k.$ Moreover since $y$ is non-zero we have $y_1, y_2 \ne 0.$ Therefore $(y_1,\alpha_1 y_1) = (y_2,\alpha_2 y_2)$, then $y_1 = y_2 \ne 0$ and $\alpha_1 y_1 = \alpha_2 y_2.$ This implies $\alpha_1 = \alpha_2$, which is a contradiction.

Any linear code having distance $ has some codeword of weight $< H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k.$ Now the Lemma implies that we have at least $|P|$ different $y$ such that $wt(y) < H_q^{-1}\left(\tfrac{1}{2}-\varepsilon \right) \cdot 2k$ (one such codeword $y$ for each linear code). Here $wt(y)$ denotes the weight of codeword $y$, which is the number of non-zero positions of $y$.

Denote $S = \left \{ y \ : \ wt(y) < H_q^{ - 1} \left (\tfrac{1}{2} - \varepsilon \right ) \cdot 2k \right \}$

Then: \begin{align} |P| &\leqslant |S| \\ &\leqslant \text{Vol}_q \left (H_q^{-1} \left (\tfrac{1}{2}-\varepsilon \right ) \cdot 2k,2k \right ) && \text{Vol}_q(r,n) \text{ is the volume of Hamming ball of radius } r \text{ in } [q]^n \\ &\leqslant q^{H_q \left (H_q^{-1}\left (\frac{1}{2}-\varepsilon \right ) \right ) \cdot 2k} && \text{Vol}_q(pn,n) \leqslant q^{H_q(p )n} \\ &= q^{ \left (\frac{1}{2}-\varepsilon \right ) \cdot 2k} \\ &= q^{k(1-2\varepsilon)} \\ &< \varepsilon(q^k-1) && \text{ for } k \text{ large enough } \\ &= \varepsilon N \end{align}

So $|P| < \varepsilon N$, therefore the set of linear codes having the relative distance $\geqslant H_q^{-1} \left (\tfrac{1}{2}-\varepsilon \right ) \cdot 2k$ has at least $N - \varepsilon N = (1-\varepsilon)N$ elements.