From 177b4acde7bc89fe2d0faba6902d483d99c504e4 Mon Sep 17 00:00:00 2001
From: Jeremy Teitelbaum <jeremy.teitelbaum@uconn.edu>
Date: Thu, 19 Apr 2018 13:44:39 -0400
Subject: [PATCH] Exchangeability and DeFinettis theorem

---
 ...hangeability and DeFinetti's Theorem.ipynb | 135 +++++++++++++++++-
 1 file changed, 132 insertions(+), 3 deletions(-)
 rename BDA 5.9.1-2-4.ipynb => BDA 5.9.1-2-4-5 -Exchangeability and DeFinetti's Theorem.ipynb (84%)

diff --git a/BDA 5.9.1-2-4.ipynb b/BDA 5.9.1-2-4-5 -Exchangeability and DeFinetti's Theorem.ipynb
similarity index 84%
rename from BDA 5.9.1-2-4.ipynb
rename to BDA 5.9.1-2-4-5 -Exchangeability and DeFinetti's Theorem.ipynb
index 67737be..484588f 100644
--- a/BDA 5.9.1-2-4.ipynb	
+++ b/BDA 5.9.1-2-4-5 -Exchangeability and DeFinetti's Theorem.ipynb	
@@ -109,9 +109,138 @@
     "So for example $P(x,y)=(1/2)P(x,N(1,1))P(y,N(-1,1))+(1/2)P(x,N(-1,1))P(y,N(1,1))$ and\n",
     "$P(x,x)=P(x,N(1,1))P(x,N(-1,1))$ so it's exchangeable.\n",
     "\n",
-    "However, in a mixture distribution there's no way to make sense of the requirement that the parameters are clustered into two groups.  If we had a mixture then \n",
-    "$$P(x,y)=\\int P((x,y)|\\theta)p(\\theta) d\\theta=\\int P(x|\\theta)P(y|\\theta)p(\\theta)d\\theta$$"
+    "After a small amount of cheating (by looking at some solutions by Gelman) the suggestion is to look at the covariance of $y_1$ and $y_2$.  Informally, they should have negative covariance because if $y_1$ is large, it suggests that it came from $N(1,1)$; but then $y_2$ comes from $N(-1,1)$ so it should be small.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from scipy.stats import norm\n",
+    "y1=norm(loc=1,scale=1)\n",
+    "y2=norm(loc=-1,scale=1)\n",
+    "y1_sample=y_1.rvs(500)\n",
+    "y2_sample=y_2.rvs(500)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A sample of our situation is $(y_1,y_2)$ or $(y_2,y_1)$ with equal probability.  So the mean of each variable is zero. The covariance is $-1=E(y_1y_2)=E(y_1)E(y_2)$. The next problem (5.9.5) shows that mixtures of iid variables have positive covariances."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Covariance= -1.0023058663082174\n"
+     ]
+    }
+   ],
+   "source": [
+    "cov=sum(y1_sample*y2_sample)/500\n",
+    "print(\"Covariance=\",cov)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the general case, the joint probablity distribution can be written\n",
+    "$$\n",
+    "P(y_1,\\ldots,y_{2J})=(\\binom{2J}{J})^{-1}\\sum_{{S\\subset [2J]}\\atop{|S|=J}} P_{S}(y_1,\\ldots,y_{2J})\n",
+    "$$\n",
+    "where $$P_{S}(y_1,\\ldots,y_{2J})=\\prod_{i\\in S} P(y_i,N(1,1))\\prod_{j\\not\\in S} P(y_j,N(-1,1)).$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To understand the covariance of, say $y_1$ and $y_2$, we need to know how often they are chosen from the same distribution and how often they are chosen from different ones.  That raises the combinatorial question of how many of the partitions of $[2J]$ have $y_1$ and $y_2$ together, and how many of them separate $y_1$ and $y_2$.  To have them together, we first pick $y_1$ and $y_2$, and then choose $J-2$ additional elements from the remaining $2J-2$.  So there are $\\binom{2J-2}{J-2}$ subsets of size $J$ that contain both $y_1$ and $y_2$.  To split them, we pick $J-1$ elements from the $2J-2$ elements other than $y_1$ and $y_2$ and combine those with $y_1$ (for example) so there are $\\binom{2J-2}{J-1}$ sets that split them. \n",
+    "\n",
+    "When computing the covariance, the cases where $y_1$ and $y_2$ are together contribute $+1$, and the cases where they are split contribute $-1$.  This gives the following:\n",
+    "$$\n",
+    "\\mathrm{cov}(y_1,y_2)=\\frac{2(\\binom{2J-2}{J-2}-\\binom{2J-2}{J-1})}{\\binom{2J}{J}}\n",
+    "$$\n",
+    "The two in the numerator comes from the fact that the number of partitions is $1/2$ of $\\binom{2J}{J}$.  \n",
+    "\n",
+    "Some trial computations gives the explicit formula that the covariance is $-\\frac{1}{(2J-1)}$ and this goes to zero as $J\\to\\infty$.\n",
+    "\n",
+    "The next problem (5.9.5) shows that in a mixture, the correlations are non-negative, so this shows we don't have a mixture of iid variables."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1 -1.0\n",
+      "2 -3.0\n",
+      "3 -5.0\n",
+      "4 -7.0\n",
+      "5 -9.0\n",
+      "6 -11.0\n",
+      "7 -13.0\n",
+      "8 -15.0\n",
+      "9 -17.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "# an illustration of the last point from the discussion above\n",
+    "from scipy.special import binom\n",
+    "for i in range(1,10):\n",
+    "    print(i,(binom(2*i,i)/2/(binom(2*i-2,i-2)-binom(2*i-2,i-1))))\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the last point, I think the way to think of it is that $y_1$ is a combination of $\\binom{2J-1}{J-1}$ copies of $N(1,1)$ -- corresponding to the partitions in which $y_1$ is in the first half -- and $\\binom{2J-1}{J}$ -- corresponding to the partitions in which $y_1$ is in the second half.  (Note that since $2J-1$ is odd, these numbers are actually equal). In the limit the correlation between different $y_i$'s drops to zero and so they become independent, and there's no contradiction to deFinetti's theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Problem 5.9.5\n",
+    "\n",
+    "Suppose that the distribution of $\\theta=(\\theta_1,\\ldots,\\theta_{J})$ can be written as a mixture of independent and identically distributed components:\n",
+    "$$\n",
+    "p(\\theta)=\\int \\prod_{j=1}^{J} p(\\theta_{j}|\\phi)p(\\phi)d\\phi.\n",
+    "$$\n",
+    "Prove that the covariances $\\mathrm{cov}(\\theta_{i},\\theta_{j})$ are all non-negative.\n",
+    "\n",
+    "Here we apply the formula:\n",
+    "$$\n",
+    "\\mathrm{cov}(y_1,y_2)=E_{\\phi}(cov(y_1,y_2|\\phi))+\\mathrm{cov}_{\\phi}(E(y_1|\\phi),E(y_2|\\phi))\n",
+    "$$\n",
+    "The first term is zero (since $y_1$ and $y_2$ are independent, conditional on $\\phi$), and the second term is positive since $E(y_1|\\phi)=E(y_2|\\phi)=\\mu(\\phi)$ since the $y_1$ are identically distributed given $\\phi$;\n",
+    "thus this term is $\\mathrm{var}(\\mu(\\phi))\\ge 0$.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
@@ -130,7 +259,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.4"
+   "version": "3.6.5"
   }
  },
  "nbformat": 4,