Skip to content
Permalink
44728c6167
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
439 lines (439 sloc) 24.3 KB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2.10.13\n",
"\n",
"Discrete data: Table 2.2 gives the number of fatal accidents and deaths on scheduled airline flights per year over a ten-year period. \n",
"We use these data as a numerical example for fitting discrete data models. \n",
"\n",
"1. Assume that the numbers of fatal accidents in each year are independent with a Poisson(theta) distribution. Set a prior distribution for theta and determine the posterior distribution based on the data from 1976 through 1985. Under this model, give a 95% predictive interval for the number of fatal accidents in 1986. You can use the normal approximation to the gamma and Poisson or compute using simulation.\n",
"2. Assume that the numbers of fatal accidents in each year follow independent Poisson distributions with a constant rate and an exposure in each year proportional to the number of passenger miles flown. Set a prior distribution for theta and determine the posterior distribution based on the data for 1976–1985. (Estimate the number of passenger miles flown in each year by dividing the appropriate columns of Table 2.2 and ignoring round-off errors.) Give a 95% predictive interval for the number of fatal iaccidents in 1986 under the assumption that 8 × 10 11 passenger miles are flown that year.\n",
"3. Repeat (1) above, replacing ‘fatal accidents’ with ‘passenger deaths.’\n",
"4. Repeat (2) above, replacing ‘fatal accidents’ with ‘passenger deaths.’\n",
"5. In which of the cases above does the Poisson model seem more or less reasonable? Why? Discuss based on general principles,without specific reference to the numbers in Table 2.2. Incidentally, in 1986, there were 22 fatal accidents, 546 passenger deaths, and a death rate of 0.06 per 100 million miles flown. We return to this example in Exercises 3.12, 6.2, 6.3, and 8.14.\n",
"\n",
"|Year |Fatal accidents |Passenger deaths |Death rate\n",
"|---|---|---|---| \n",
"|1976 | 24 | 734 | 0.19 \n",
"|1977 |25 |516 |0.12 \n",
"|1978 |31 |754 |0.15 \n",
"|1979 |31 |877 |0.16 \n",
"|1980 |22 |814 |0.14 \n",
"|1981 |21 |362 |0.06 \n",
"|1982 |26 |764 |0.13 \n",
"|1983 |20 |809 |0.13 \n",
"|1984 |16 |223 |0.03 \n",
"|1985 |22 |1066 |0.15 \n",
"\n",
"+ Table 2.2 Worldwide airline fatalities, 1976–1985.\n",
"+ Death rate is passenger deaths per 100 million passenger miles.\n",
"+ Source: Statistical Abstract of the United States.\n",
"\n",
"Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B.. Bayesian Data Analysis, Third Edition (Chapman & Hall/CRC Texts in Statistical Science) (Page 60). CRC Press. Kindle Edition. \n"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Deaths</th>\n",
" <th>Fatal</th>\n",
" <th>Rate</th>\n",
" <th>year</th>\n",
" <th>Miles</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>734</td>\n",
" <td>24</td>\n",
" <td>0.19</td>\n",
" <td>1976</td>\n",
" <td>3863.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>516</td>\n",
" <td>25</td>\n",
" <td>0.12</td>\n",
" <td>1977</td>\n",
" <td>4300.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>754</td>\n",
" <td>31</td>\n",
" <td>0.15</td>\n",
" <td>1978</td>\n",
" <td>5027.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>877</td>\n",
" <td>31</td>\n",
" <td>0.16</td>\n",
" <td>1979</td>\n",
" <td>5481.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>814</td>\n",
" <td>22</td>\n",
" <td>0.14</td>\n",
" <td>1980</td>\n",
" <td>5814.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>362</td>\n",
" <td>21</td>\n",
" <td>0.06</td>\n",
" <td>1981</td>\n",
" <td>6033.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>764</td>\n",
" <td>26</td>\n",
" <td>0.13</td>\n",
" <td>1982</td>\n",
" <td>5877.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>809</td>\n",
" <td>20</td>\n",
" <td>0.13</td>\n",
" <td>1983</td>\n",
" <td>6223.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>223</td>\n",
" <td>16</td>\n",
" <td>0.03</td>\n",
" <td>1984</td>\n",
" <td>7433.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1066</td>\n",
" <td>22</td>\n",
" <td>0.15</td>\n",
" <td>1985</td>\n",
" <td>7107.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Deaths Fatal Rate year Miles\n",
"0 734 24 0.19 1976 3863.0\n",
"1 516 25 0.12 1977 4300.0\n",
"2 754 31 0.15 1978 5027.0\n",
"3 877 31 0.16 1979 5481.0\n",
"4 814 22 0.14 1980 5814.0\n",
"5 362 21 0.06 1981 6033.0\n",
"6 764 26 0.13 1982 5877.0\n",
"7 809 20 0.13 1983 6223.0\n",
"8 223 16 0.03 1984 7433.0\n",
"9 1066 22 0.15 1985 7107.0"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from scipy.stats import poisson\n",
"from sklearn.linear_model import LinearRegression\n",
"import pystan\n",
"airline_df=pd.DataFrame(dict({'year':[x for x in range(1976,1986)],'Fatal':[24,25,31,31,22,21,26,20,16,22],'Deaths':[734,516,754,877,814,362,764,809,223,1066],'Rate':[.19,.12,.15,.16,.14,.06,.13,.13,.03,.15]}))\n",
"airline_df.set_index('year')\n",
"airline_df['Miles']=np.round(airline_df['Deaths']/airline_df['Rate'],0)\n",
"airline_df"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_62649727c9442f9bc7cbfce75826d859 NOW.\n"
]
}
],
"source": [
"stan_code='''\n",
"data {\n",
" int deaths[10];\n",
"}\n",
"parameters {\n",
" real<lower=0> theta ; \n",
"}\n",
"model {\n",
"\n",
" // no prior here, what should we use?\n",
" deaths~poisson(theta);\n",
"}\n",
"\n",
"'''\n",
"sm_simple=pystan.StanModel(model_code=stan_code)\n"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"deaths=sm_simple.sampling(data=dict({'deaths':airline_df['Deaths']}))"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Inference for Stan model: anon_model_6ace3ad0d872dac6795ff2d2317d4760.\n",
"4 chains, each with iter=2000; warmup=1000; thin=1; \n",
"post-warmup draws per chain=1000, total post-warmup draws=4000.\n",
"\n",
" mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat\n",
"theta 691.94 0.21 8.16 676.54 686.33 691.83 697.4 708.53 1553 1.0\n",
"lp__ 3.8e4 0.02 0.69 3.8e4 3.8e4 3.8e4 3.8e4 3.8e4 1551 1.0\n",
"\n",
"Samples were drawn using NUTS at Sun Apr 15 18:46:11 2018.\n",
"For each parameter, n_eff is a crude measure of effective sample size,\n",
"and Rhat is the potential scale reduction factor on split chains (at \n",
"convergence, Rhat=1).\n"
]
}
],
"source": [
"print(answer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the stan output reported above, the 95% interval for the poisson rate is (676.5,708.5)."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_5173738091a79a17032e9cb1d2d99cd3 NOW.\n"
]
}
],
"source": [
"stan_code='''\n",
"data {\n",
" int deaths[10];\n",
" vector[10] miles;\n",
"}\n",
"parameters {\n",
" real<lower=0> theta ; \n",
"}\n",
"model {\n",
"\n",
" // no prior here, what should we use?\n",
" deaths~poisson(miles*theta);\n",
"}\n",
"\n",
"'''\n",
"sm_weights=pystan.StanModel(model_code=stan_code)\n"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jet08013/anaconda3/lib/python3.6/site-packages/pystan/misc.py:399: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
" elif np.issubdtype(np.asarray(v).dtype, float):\n"
]
}
],
"source": [
"deaths=sm_weights.sampling(data=dict({'deaths':airline_df['Deaths'],'miles':airline_df['Miles']}))"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Inference for Stan model: anon_model_5173738091a79a17032e9cb1d2d99cd3.\n",
"4 chains, each with iter=2000; warmup=1000; thin=1; \n",
"post-warmup draws per chain=1000, total post-warmup draws=4000.\n",
"\n",
" mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat\n",
"theta 0.12 4.4e-5 1.5e-3 0.12 0.12 0.12 0.12 0.12 1144 1.01\n",
"lp__ 3.8e4 0.02 0.71 3.8e4 3.8e4 3.8e4 3.8e4 3.8e4 1640 1.0\n",
"\n",
"Samples were drawn using NUTS at Mon Apr 16 06:52:19 2018.\n",
"For each parameter, n_eff is a crude measure of effective sample size,\n",
"and Rhat is the potential scale reduction factor on split chains (at \n",
"convergence, Rhat=1).\n"
]
}
],
"source": [
"\n",
"print(deaths)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig,ax=plt.subplots(1)\n",
"ax.scatter(airline_df['Miles'],airline_df['Deaths'] )\n",
"ax.plot(np.linspace(4000,8000,10),(.12)*np.linspace(4000,8000,10))\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([804.53466689])"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr=LinearRegression(fit_intercept=True)\n",
"lr.fit(airline_df['Miles'].values.reshape(-1,1),airline_df['Deaths'].values.reshape(-1,1))\n",
"deaths_pred=lr.predict(np.linspace(4000,8000,10).reshape(-1,1))\n",
"ax.plot(np.linspace(4000,8000,10),deaths_pred)\n",
"plt.show()\n",
"lr.get_params()\n",
"lr.coef_\n"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'LinearRegression' object has no attribute 'slope'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-84-0f6b1a1e9036>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mslope\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m: 'LinearRegression' object has no attribute 'slope'"
]
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}