Exchangeability and DeFinettis theorem

jet08013 · Apr 19, 2018 · 177b4ac · 177b4ac
1 parent 784a51b
commit 177b4ac
Showing 1 changed file with 132 additions and 3 deletions.
diff --git a/BDA 5.9.1-2-4.ipynb → ...angeability and DeFinetti's Theorem.ipynb b/BDA 5.9.1-2-4.ipynb → ...angeability and DeFinetti's Theorem.ipynb
@@ -109,9 +109,138 @@
     "So for example $P(x,y)=(1/2)P(x,N(1,1))P(y,N(-1,1))+(1/2)P(x,N(-1,1))P(y,N(1,1))$ and\n",
     "$P(x,x)=P(x,N(1,1))P(x,N(-1,1))$ so it's exchangeable.\n",
     "\n",
-    "However, in a mixture distribution there's no way to make sense of the requirement that the parameters are clustered into two groups.  If we had a mixture then \n",
-    "$$P(x,y)=\\int P((x,y)|\\theta)p(\\theta) d\\theta=\\int P(x|\\theta)P(y|\\theta)p(\\theta)d\\theta$$"
+    "After a small amount of cheating (by looking at some solutions by Gelman) the suggestion is to look at the covariance of $y_1$ and $y_2$.  Informally, they should have negative covariance because if $y_1$ is large, it suggests that it came from $N(1,1)$; but then $y_2$ comes from $N(-1,1)$ so it should be small.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from scipy.stats import norm\n",
+    "y1=norm(loc=1,scale=1)\n",
+    "y2=norm(loc=-1,scale=1)\n",
+    "y1_sample=y_1.rvs(500)\n",
+    "y2_sample=y_2.rvs(500)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A sample of our situation is $(y_1,y_2)$ or $(y_2,y_1)$ with equal probability.  So the mean of each variable is zero. The covariance is $-1=E(y_1y_2)=E(y_1)E(y_2)$. The next problem (5.9.5) shows that mixtures of iid variables have positive covariances."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Covariance= -1.0023058663082174\n"
+     ]
+    }
+   ],
+   "source": [
+    "cov=sum(y1_sample*y2_sample)/500\n",
+    "print(\"Covariance=\",cov)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the general case, the joint probablity distribution can be written\n",
+    "$$\n",
+    "P(y_1,\\ldots,y_{2J})=(\\binom{2J}{J})^{-1}\\sum_{{S\\subset [2J]}\\atop{|S|=J}} P_{S}(y_1,\\ldots,y_{2J})\n",
+    "$$\n",
+    "where $$P_{S}(y_1,\\ldots,y_{2J})=\\prod_{i\\in S} P(y_i,N(1,1))\\prod_{j\\not\\in S} P(y_j,N(-1,1)).$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To understand the covariance of, say $y_1$ and $y_2$, we need to know how often they are chosen from the same distribution and how often they are chosen from different ones.  That raises the combinatorial question of how many of the partitions of $[2J]$ have $y_1$ and $y_2$ together, and how many of them separate $y_1$ and $y_2$.  To have them together, we first pick $y_1$ and $y_2$, and then choose $J-2$ additional elements from the remaining $2J-2$.  So there are $\\binom{2J-2}{J-2}$ subsets of size $J$ that contain both $y_1$ and $y_2$.  To split them, we pick $J-1$ elements from the $2J-2$ elements other than $y_1$ and $y_2$ and combine those with $y_1$ (for example) so there are $\\binom{2J-2}{J-1}$ sets that split them. \n",
+    "\n",
+    "When computing the covariance, the cases where $y_1$ and $y_2$ are together contribute $+1$, and the cases where they are split contribute $-1$.  This gives the following:\n",
+    "$$\n",
+    "\\mathrm{cov}(y_1,y_2)=\\frac{2(\\binom{2J-2}{J-2}-\\binom{2J-2}{J-1})}{\\binom{2J}{J}}\n",
+    "$$\n",
+    "The two in the numerator comes from the fact that the number of partitions is $1/2$ of $\\binom{2J}{J}$.  \n",
+    "\n",
+    "Some trial computations gives the explicit formula that the covariance is $-\\frac{1}{(2J-1)}$ and this goes to zero as $J\\to\\infty$.\n",
+    "\n",
+    "The next problem (5.9.5) shows that in a mixture, the correlations are non-negative, so this shows we don't have a mixture of iid variables."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1 -1.0\n",
+      "2 -3.0\n",
+      "3 -5.0\n",
+      "4 -7.0\n",
+      "5 -9.0\n",
+      "6 -11.0\n",
+      "7 -13.0\n",
+      "8 -15.0\n",
+      "9 -17.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "# an illustration of the last point from the discussion above\n",
+    "from scipy.special import binom\n",
+    "for i in range(1,10):\n",
+    "    print(i,(binom(2*i,i)/2/(binom(2*i-2,i-2)-binom(2*i-2,i-1))))\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the last point, I think the way to think of it is that $y_1$ is a combination of $\\binom{2J-1}{J-1}$ copies of $N(1,1)$ -- corresponding to the partitions in which $y_1$ is in the first half -- and $\\binom{2J-1}{J}$ -- corresponding to the partitions in which $y_1$ is in the second half.  (Note that since $2J-1$ is odd, these numbers are actually equal). In the limit the correlation between different $y_i$'s drops to zero and so they become independent, and there's no contradiction to deFinetti's theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Problem 5.9.5\n",
+    "\n",
+    "Suppose that the distribution of $\\theta=(\\theta_1,\\ldots,\\theta_{J})$ can be written as a mixture of independent and identically distributed components:\n",
+    "$$\n",
+    "p(\\theta)=\\int \\prod_{j=1}^{J} p(\\theta_{j}|\\phi)p(\\phi)d\\phi.\n",
+    "$$\n",
+    "Prove that the covariances $\\mathrm{cov}(\\theta_{i},\\theta_{j})$ are all non-negative.\n",
+    "\n",
+    "Here we apply the formula:\n",
+    "$$\n",
+    "\\mathrm{cov}(y_1,y_2)=E_{\\phi}(cov(y_1,y_2|\\phi))+\\mathrm{cov}_{\\phi}(E(y_1|\\phi),E(y_2|\\phi))\n",
+    "$$\n",
+    "The first term is zero (since $y_1$ and $y_2$ are independent, conditional on $\\phi$), and the second term is positive since $E(y_1|\\phi)=E(y_2|\\phi)=\\mu(\\phi)$ since the $y_1$ are identically distributed given $\\phi$;\n",
+    "thus this term is $\\mathrm{var}(\\mu(\\phi))\\ge 0$.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
@@ -130,7 +259,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.4"
+   "version": "3.6.5"
   }
  },
  "nbformat": 4,