diff --git a/lecture_17/lecture_16.ipynb b/lecture_17/lecture_16.ipynb
deleted file mode 100644
index 3159c42..0000000
--- a/lecture_17/lecture_16.ipynb
+++ /dev/null
@@ -1,2710 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 65,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "setdefaults"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 66,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "%plot --format svg"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "data:image/s3,"s3://crabby-images/42821/42821b1dacf533e20c6b45995204ea98bbdf5dc1" alt="question 1"\n",
- "\n",
- "data:image/s3,"s3://crabby-images/cae42/cae4297304cbb793ba6eff5c56ebf4d8f1226bda" alt="question 2"\n",
- "\n",
- "### Project Ideas so far\n",
- "\n",
- "- Nothing yet...probably something heat transfer related\n",
- "\n",
- "- Modeling Propulsion or Propagation of Sound Waves\n",
- "\n",
- "- Low Thrust Orbital Transfer\n",
- "\n",
- "- Tracking motion of a satellite entering orbit until impact\n",
- "\n",
- "- What ever you think is best.\n",
- "\n",
- "- You had heat transfer project as an option; that sounded cool\n",
- "\n",
- "- Heat transfer through a pipe\n",
- "\n",
- "- I would prefer to do something with beam/plate mechanics or vibrations than a heat transfer or thermo problem\n",
- "\n",
- "- Modeling Couette flow with a pressure gradient using a discretized form of the Navier-Stokes equation and comparing it to the analytical solution\n",
- "\n",
- "- Software to instruct a robotic arm to orient itself based on input from a gyroscope"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Questions from you:\n",
- "\n",
- "How was your spring break\n",
- "\n",
- "Are grades for hw 3 and 4 going to be posted?\n",
- "\n",
- "For class occasionally switching to doc cam and working through problems by hand might help get ideas across.\n",
- "\n",
- "Are you assigning those who do not have groups to groups?\n",
- "\n",
- "How do you graph a standard distribution in Matlab?\n",
- "\n",
- "What is the longest code you have written?\n",
- "\n",
- "how to approach probability for hw 5?\n",
- "??\n",
- "\n",
- "Will you be assigning groups to people who do not currently have one? \n",
- "\n",
- "what are some basic guidelines we should use to brainstorm project ideas?\n",
- "\n",
- "Are you a fan of bananas?\n",
- "\n",
- "Going through code isn't the most helpful, because you can easily lose interest. But I am not sure what else you can do.\n",
- "\n",
- "Has lecture 15 been posted yet? Looking in the repository I can't seem to find it."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Linear Least Squares Regression\n",
- "\n",
- "When approximating a set of data as a polynomial, there will always be error introduced (except in 2 cases). \n",
- "\n",
- "For a straight line, the actual data points, $y_{i}$ as a function of the independent variable, $x_{i}$, is:\n",
- "\n",
- "$y_{i}=a_{0}+a_{1}x_{i}+e_{i}$\n",
- "\n",
- "where $a_{0}$ and $a_{1}$ are the intercept and slope of the line and $e_{i}$ is the error between the approximate function and the recorded data point. \n",
- "\n",
- "We make the following assumptions in this analysis:\n",
- "\n",
- "1. Each x has a fixed value; it is not random and is known without error.\n",
- "\n",
- "2. The y values are independent random variables and all have the same variance.\n",
- "\n",
- "3. The y values for a given x must be normally distributed.\n",
- "\n",
- "The total error is \n",
- "\n",
- "$\\sum_{i=1}^{n}e_{i}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})$\n",
- "\n",
- "we don't care about the sign though. One approach has been demonstrated to provide a unique solution is minimizing the sum of squares error or\n",
- "\n",
- "$S_{r}=\\sum_{i=1}^{n}e_{i}^{2}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n",
- "\n",
- "Where, $S_{r}$ is the sum of squares error (SSE). \n",
- "\n",
- "$\\frac{\\partial S_{r}}{\\partial a_{0}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})$\n",
- "\n",
- "$\\frac{\\partial S_{r}}{\\partial a_{1}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})x_{i}$\n",
- "\n",
- "The minimum $S_{r}$ occurrs when the partial derivatives are 0. \n",
- "\n",
- "$\\sum y_{i}= \\sum a_{0}+\\sum a_{1}x_{i}$\n",
- "\n",
- "$\\sum x_{i}y_{i}= \\sum a_{0}x_{i}+\\sum a_{1}x_{i}^{2}$\n",
- "\n",
- "$\\left[\\begin{array}{c}\n",
- "\\sum y_{i}\\\\\n",
- "\\sum x_{i}y_{i}\\end{array}\\right]=\n",
- "\\left[\\begin{array}{cc}\n",
- "n & \\sum x_{i}\\\\\n",
- "\\sum x_{i} & \\sum x_{i}^{2}\\end{array}\\right]\n",
- "\\left[\\begin{array}{c}\n",
- "a_{0}\\\\\n",
- "a_{1}\\end{array}\\right]$\n",
- "\n",
- "\n",
- "$b=Ax$\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Example \n",
- "\n",
- "Find drag coefficient with best-fit line to experimental data\n",
- "\n",
- "|i | v (m/s) | F (N) |\n",
- "|---|---|---|\n",
- "|1 | 10 | 25 |\n",
- "|2 | 20 | 70 |\n",
- "|3 | 30 | 380|\n",
- "|4 | 40 | 550|\n",
- "|5 | 50 | 610|\n",
- "|6 | 60 | 1220|\n",
- "|7 | 70 | 830 |\n",
- "|8 |80 | 1450|"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 68,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "a =\n",
- "\n",
- " -234.286\n",
- " 19.470\n",
- "\n"
- ]
- },
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "drag_data=[...\n",
- "1 , 10 , 25 \n",
- "2 , 20 , 70 \n",
- "3 , 30 , 380\n",
- "4 , 40 , 550\n",
- "5 , 50 , 610\n",
- "6 , 60 , 1220\n",
- "7 , 70 , 830 \n",
- "8 ,80 , 1450];\n",
- "x=drag_data(:,2);\n",
- "y=drag_data(:,3);\n",
- "\n",
- "b=[sum(y);sum(x.*y)];\n",
- "A=[length(x),sum(x);\n",
- " sum(x), sum(x.^2)];\n",
- " \n",
- "a=A\\b\n",
- "\n",
- "plot(x,y,'o',x,a(1)+a(2)*x)\n",
- "legend('data','best-fit','Location','NorthWest')\n",
- "xlabel('Force (N)')\n",
- "ylabel('velocity (m/s)')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 72,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "plot(x,y-a(1)-40*x)\n",
- "legend('data','best-fit','Location','NorthWest')\n",
- "ylabel('residuals (N)')\n",
- "xlabel('velocity (m/s)')\n",
- "title('Model does not capture measurements')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "How do we know its a \"good\" fit? \n",
- "\n",
- "Can compare the sum of squares error to the total sum of squares of the dependent variable (here F). \n",
- "\n",
- "$S_{r}=\\sum(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n",
- "\n",
- "$S_{t}=\\sum(y_{i}-\\bar{y})^{2}$\n",
- "\n",
- "Then, we can calculate the *coefficient of determination*, $r^{2}$ or *correlation coefficient*, r. \n",
- "\n",
- "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}$\n",
- "\n",
- "This represents the relative improvement of assuming that y is a function of x (if the x-values are not random and the y-values are random)\n",
- "\n",
- "For further information regarding statistical work on regression, look at \n",
- "[NIST Statistics Handbook](http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 73,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "St = 1.8083e+06\n",
- "St = 1.8083e+06\n"
- ]
- }
- ],
- "source": [
- "Sr=sum((y-a(1)-a(2)*x).^2);\n",
- "St=std(y)^2*(length(y)-1)\n",
- "St=sum((y-mean(y)).^2)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 74,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "r2 = 0.88049\n",
- "r = 0.93834\n"
- ]
- }
- ],
- "source": [
- "r2=(St-Sr)/St\n",
- "r=sqrt(r2)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Limiting cases \n",
- "\n",
- "#### $r^{2}=0$ $S_{r}=S{t}$\n",
- "\n",
- "#### $r^{2}=1$ $S_{r}=0$"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 76,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "gen_examples\n",
- "plot(x1(1:50:end),y1(1:50:end),'s',x2,y2,'o')\n",
- "legend('Case 1','Case 2','Location','NorthWest')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 77,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "a1 =\n",
- "\n",
- " 0.0482496\n",
- " 0.0017062\n",
- "\n",
- "a2 =\n",
- "\n",
- " 0\n",
- " 1\n",
- "\n",
- "Sr1 = 0.83505\n",
- "St1 = 0.83529\n",
- "coefficient of determination in Case 1 is 0.000291\n",
- "Sr2 = 0\n",
- "St2 = 1.0185\n",
- "coefficient of determination in Case 2 is 1.000000\n"
- ]
- }
- ],
- "source": [
- "b1=[sum(y1);sum(x1.*y1)];\n",
- "A1=[length(x1),sum(x1);\n",
- " sum(x1), sum(x1.^2)];\n",
- " \n",
- "a1=A1\\b1\n",
- "\n",
- "b2=[sum(y2);sum(x2.*y2)];\n",
- "A2=[length(x2),sum(x2);\n",
- " sum(x2), sum(x2.^2)];\n",
- " \n",
- "a2=A2\\b2\n",
- "\n",
- "Sr1=sum((y1-a1(1)-a1(2)*x1).^2)\n",
- "St1=sum((y1-mean(y1)).^2)\n",
- "fprintf('coefficient of determination in Case 1 is %f\\n',1-Sr1/St1)\n",
- "\n",
- "Sr2=sum((y2-a2(1)-a2(2)*x2).^2)\n",
- "St2=sum((y2-mean(y2)).^2)\n",
- "\n",
- "fprintf('coefficient of determination in Case 2 is %f\\n',1-Sr2/St2)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "collapsed": true
- },
- "source": [
- "# General Linear Regression"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "In general, you may want to fit other polynomials besides degree-1 (straight-lines)\n",
- "\n",
- "$y=a_{0}+a_{1}x+a_{2}x^{2}+\\cdots+a_{m}x^{m}+e$\n",
- "\n",
- "Now, the solution for $a_{0},~a_{1},...a_{m}$ is the minimization of m+1-dependent linear equations. \n",
- "\n",
- "Consider the following data:\n",
- "\n",
- "| x | y |\n",
- "|---|---|\n",
- "| 0.00 | 21.50 |\n",
- "| 2.00 | 20.84 |\n",
- "| 4.00 | 23.19 |\n",
- "| 6.00 | 22.69 |\n",
- "| 8.00 | 30.27 |\n",
- "| 10.00 | 40.11 |\n",
- "| 12.00 | 43.31 |\n",
- "| 14.00 | 54.79 |\n",
- "| 16.00 | 70.88 |\n",
- "| 18.00 | 89.48 |\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 78,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "load xy_data.csv\n",
- "x=xy_data(1,:)';\n",
- "y=xy_data(2,:)';\n",
- "plot(x,y,'o')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 46,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "xy_data =\n",
- "\n",
- " Columns 1 through 7:\n",
- "\n",
- " 0.00000 2.00000 4.00000 6.00000 8.00000 10.00000 12.00000\n",
- " 21.50114 20.84153 23.19201 22.69498 30.26687 40.11075 43.30543\n",
- "\n",
- " Columns 8 through 11:\n",
- "\n",
- " 14.00000 16.00000 18.00000 20.00000\n",
- " 54.78730 70.88443 89.48368 97.28135\n",
- "\n"
- ]
- }
- ],
- "source": [
- "xy_data"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The model can be rewritten as \n",
- "\n",
- "$y=\\left[Z\\right]a+e$\n",
- "\n",
- "where $a=\\left[\\begin{array}{c}\n",
- " a_{0}\\\\\n",
- " a_{1}\\\\\n",
- " a_{2}\\end{array}\\right]$\n",
- " \n",
- "$[Z]=\\left[\\begin{array} \n",
- "1 & x_{1} & x_{1}^{2} \\\\\n",
- "1 & x_{2} & x_{2}^{2} \\\\\n",
- "1 & x_{3} & x_{3}^{2} \\\\\n",
- "1 & x_{4} & x_{4}^{2} \\\\\n",
- "1 & x_{5} & x_{5}^{2} \\\\\n",
- "1 & x_{6} & x_{6}^{2} \\\\\n",
- "1 & x_{7} & x_{7}^{2} \\\\\n",
- "1 & x_{8} & x_{8}^{2} \\\\\n",
- "1 & x_{9} & x_{9}^{2} \\\\\n",
- "1 & x_{10} & x_{10}^{2} \\end{array}\\right]$\n",
- "\n",
- "The sum of squares residuals for this model is\n",
- "\n",
- "$S_{r}=\\sum_{i=1}^{n}\\left(y_{i}-\\sum_{j=0}^{m}a_{j}z_{ji}\\right)$\n",
- "\n",
- "Minimizing this function results in\n",
- "\n",
- "$y=[Z]a$\n",
- "\n",
- "->**A standard Linear Algebra Problem**\n",
- "\n",
- "*the vector a is unknown, and Z is calculated based upon the assumed function*\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 79,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Z =\n",
- "\n",
- " 1 0 0\n",
- " 1 2 4\n",
- " 1 4 16\n",
- " 1 6 36\n",
- " 1 8 64\n",
- " 1 10 100\n",
- " 1 12 144\n",
- " 1 14 196\n",
- " 1 16 256\n",
- " 1 18 324\n",
- " 1 20 400\n",
- "\n",
- "a =\n",
- "\n",
- " 21.40341\n",
- " -0.81538\n",
- " 0.23935\n",
- "\n"
- ]
- }
- ],
- "source": [
- "Z=[x.^0,x,x.^2]\n",
- "\n",
- "a=Z\\y"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 80,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "x_fcn=linspace(min(x),max(x));\n",
- "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### General Coefficient of Determination\n",
- "\n",
- "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}=1-\\frac{S_{r}}{S_t}$"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 49,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "St = 27.923\n",
- "Sr = 2.6366\n"
- ]
- }
- ],
- "source": [
- "St=std(y)\n",
- "Sr=std(y-a(1)-a(2)*x-a(3)*x.^2)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 81,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "the coefficient of determination for this fit is 0.880485\n",
- "the correlation coefficient this fit is 0.938342\n"
- ]
- }
- ],
- "source": [
- "r2=1-Sr/St;\n",
- "r=sqrt(r2);\n",
- "\n",
- "fprintf('the coefficient of determination for this fit is %f',r2)\n",
- "fprintf('the correlation coefficient this fit is %f',r)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Compare this to a straight line fit"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 83,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "St = 27.923\n",
- "Sr = 9.2520\n",
- "the coefficient of determination for this fit is 0.668655\n",
- "the correlation coefficient this fit is 0.817713\n"
- ]
- },
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "Z=[ones(size(x)) x];\n",
- "a_line=Z\\y;\n",
- "x_fcn=linspace(min(x),max(x));\n",
- "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n",
- "x_fcn,a_line(1)+a_line(2)*x_fcn)\n",
- "St=std(y)\n",
- "Sr=std(y-a_line(1)-a_line(2)*x)\n",
- "r2=1-Sr/St;\n",
- "r=sqrt(r2);\n",
- "\n",
- "fprintf('the coefficient of determination for this fit is %f',r2)\n",
- "fprintf('the correlation coefficient this fit is %f',r)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 85,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "plot(x,y-a_line(1)-a_line(2)*x)\n",
- "title('residuals of straight line')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 86,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "plot(x,y-a(1)-a(2)*x-a(3)*x.^2)\n",
- "title('residuals of parabola')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Warning \n",
- "**Coefficient of determination reduction does not always mean a better fit**\n",
- "\n",
- "Try the function, \n",
- "\n",
- "$y(x)=a_0+a_{1}x+a_{2}x^{2}+a_{4}x^{4}+a_{5}x^{5}+a_{5}x^{5}+a_{6}x^{6}+a_{7}x^{7}+a_{8}x^{8}$"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 90,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "a_overfit =\n",
- "\n",
- " 2.1487e+01\n",
- " -1.4264e+01\n",
- " 1.5240e+01\n",
- " -6.0483e+00\n",
- " 1.1887e+00\n",
- " -1.2651e-01\n",
- " 7.4379e-03\n",
- " -2.2702e-04\n",
- " 2.8063e-06\n",
- "\n"
- ]
- }
- ],
- "source": [
- "Z=[ones(size(x)) x x.^2 x.^3 x.^4 x.^5 x.^6 x.^7 x.^8];\n",
- "a_overfit=Z\\y"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 91,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n",
- "x_fcn,a_overfit(1)+a_overfit(2)*x_fcn+a_overfit(3)*x_fcn.^2+...\n",
- "a_overfit(4)*x_fcn.^3+a_overfit(5)*x_fcn.^4+...\n",
- "a_overfit(6)*x_fcn.^5+a_overfit(7)*x_fcn.^6+...\n",
- "a_overfit(8)*x_fcn.^7+a_overfit(9)*x_fcn.^8)\n",
- "legend('data','parabola','n=8-fit','Location','NorthWest')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 92,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "St = 27.923\n",
- "Sr = 0.77320\n",
- "the coefficient of determination for this fit is 0.972309\n",
- "the correlation coefficient this fit is 0.986057\n"
- ]
- }
- ],
- "source": [
- "St=std(y)\n",
- "Sr=std(y-polyval(a_overfit(end:-1:1),x))\n",
- "r2=1-Sr/St;\n",
- "r=sqrt(r2);\n",
- "\n",
- "fprintf('the coefficient of determination for this fit is %f',r2)\n",
- "fprintf('the correlation coefficient this fit is %f',r)\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Linear Regression is only limited by the ability to separate the parameters from the function to achieve\n",
- "\n",
- "$y=[Z]a$\n",
- "\n",
- "$Z$ can be any function of the independent variable(s)\n",
- "\n",
- "**Example**:\n",
- "We assume two functions are added together, sin(t) and sin(3t). What are the amplitudes?\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 93,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "load sin_data.csv\n",
- "t=sin_data(1,:)';\n",
- "y=sin_data(2,:)';\n",
- "plot(t,y)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 94,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "a =\n",
- "\n",
- " 0.99727\n",
- " 0.50251\n",
- "\n"
- ]
- }
- ],
- "source": [
- "Z=[sin(t) sin(3*t)];\n",
- "a=Z\\y"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 95,
- "metadata": {
- "collapsed": false
- },
- "outputs": [
- {
- "data": {
- "image/svg+xml": [
- ""
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "plot(t,y,'.',t,a(1)*sin(t)+a(2)*sin(3*t))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "collapsed": true
- },
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Octave",
- "language": "octave",
- "name": "octave"
- },
- "language_info": {
- "file_extension": ".m",
- "help_links": [
- {
- "text": "MetaKernel Magics",
- "url": "https://github.com/calysto/metakernel/blob/master/metakernel/magics/README.md"
- }
- ],
- "mimetype": "text/x-octave",
- "name": "octave",
- "version": "0.19.14"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/lecture_17/lecture_17.ipynb b/lecture_17/lecture_17.ipynb
index a9fec04..3159c42 100644
--- a/lecture_17/lecture_17.ipynb
+++ b/lecture_17/lecture_17.ipynb
@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": 65,
"metadata": {
"collapsed": true
},
@@ -13,7 +13,7 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 66,
"metadata": {
"collapsed": true
},
@@ -26,20 +26,2656 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### Questions from last class\n",
+ "data:image/s3,"s3://crabby-images/42821/42821b1dacf533e20c6b45995204ea98bbdf5dc1" alt="question 1"\n",
"\n",
- "### Questions for me\n",
+ "data:image/s3,"s3://crabby-images/cae42/cae4297304cbb793ba6eff5c56ebf4d8f1226bda" alt="question 2"\n",
+ "\n",
+ "### Project Ideas so far\n",
+ "\n",
+ "- Nothing yet...probably something heat transfer related\n",
+ "\n",
+ "- Modeling Propulsion or Propagation of Sound Waves\n",
+ "\n",
+ "- Low Thrust Orbital Transfer\n",
+ "\n",
+ "- Tracking motion of a satellite entering orbit until impact\n",
+ "\n",
+ "- What ever you think is best.\n",
+ "\n",
+ "- You had heat transfer project as an option; that sounded cool\n",
+ "\n",
+ "- Heat transfer through a pipe\n",
+ "\n",
+ "- I would prefer to do something with beam/plate mechanics or vibrations than a heat transfer or thermo problem\n",
+ "\n",
+ "- Modeling Couette flow with a pressure gradient using a discretized form of the Navier-Stokes equation and comparing it to the analytical solution\n",
+ "\n",
+ "- Software to instruct a robotic arm to orient itself based on input from a gyroscope"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Questions from you:\n",
+ "\n",
+ "How was your spring break\n",
+ "\n",
+ "Are grades for hw 3 and 4 going to be posted?\n",
+ "\n",
+ "For class occasionally switching to doc cam and working through problems by hand might help get ideas across.\n",
+ "\n",
+ "Are you assigning those who do not have groups to groups?\n",
+ "\n",
+ "How do you graph a standard distribution in Matlab?\n",
+ "\n",
+ "What is the longest code you have written?\n",
+ "\n",
+ "how to approach probability for hw 5?\n",
+ "??\n",
+ "\n",
+ "Will you be assigning groups to people who do not currently have one? \n",
+ "\n",
+ "what are some basic guidelines we should use to brainstorm project ideas?\n",
+ "\n",
+ "Are you a fan of bananas?\n",
+ "\n",
+ "Going through code isn't the most helpful, because you can easily lose interest. But I am not sure what else you can do.\n",
+ "\n",
+ "Has lecture 15 been posted yet? Looking in the repository I can't seem to find it."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Linear Least Squares Regression\n",
+ "\n",
+ "When approximating a set of data as a polynomial, there will always be error introduced (except in 2 cases). \n",
+ "\n",
+ "For a straight line, the actual data points, $y_{i}$ as a function of the independent variable, $x_{i}$, is:\n",
+ "\n",
+ "$y_{i}=a_{0}+a_{1}x_{i}+e_{i}$\n",
+ "\n",
+ "where $a_{0}$ and $a_{1}$ are the intercept and slope of the line and $e_{i}$ is the error between the approximate function and the recorded data point. \n",
+ "\n",
+ "We make the following assumptions in this analysis:\n",
+ "\n",
+ "1. Each x has a fixed value; it is not random and is known without error.\n",
+ "\n",
+ "2. The y values are independent random variables and all have the same variance.\n",
+ "\n",
+ "3. The y values for a given x must be normally distributed.\n",
+ "\n",
+ "The total error is \n",
+ "\n",
+ "$\\sum_{i=1}^{n}e_{i}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})$\n",
+ "\n",
+ "we don't care about the sign though. One approach has been demonstrated to provide a unique solution is minimizing the sum of squares error or\n",
+ "\n",
+ "$S_{r}=\\sum_{i=1}^{n}e_{i}^{2}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n",
+ "\n",
+ "Where, $S_{r}$ is the sum of squares error (SSE). \n",
+ "\n",
+ "$\\frac{\\partial S_{r}}{\\partial a_{0}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})$\n",
+ "\n",
+ "$\\frac{\\partial S_{r}}{\\partial a_{1}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})x_{i}$\n",
+ "\n",
+ "The minimum $S_{r}$ occurrs when the partial derivatives are 0. \n",
+ "\n",
+ "$\\sum y_{i}= \\sum a_{0}+\\sum a_{1}x_{i}$\n",
+ "\n",
+ "$\\sum x_{i}y_{i}= \\sum a_{0}x_{i}+\\sum a_{1}x_{i}^{2}$\n",
+ "\n",
+ "$\\left[\\begin{array}{c}\n",
+ "\\sum y_{i}\\\\\n",
+ "\\sum x_{i}y_{i}\\end{array}\\right]=\n",
+ "\\left[\\begin{array}{cc}\n",
+ "n & \\sum x_{i}\\\\\n",
+ "\\sum x_{i} & \\sum x_{i}^{2}\\end{array}\\right]\n",
+ "\\left[\\begin{array}{c}\n",
+ "a_{0}\\\\\n",
+ "a_{1}\\end{array}\\right]$\n",
+ "\n",
+ "\n",
+ "$b=Ax$\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Example \n",
+ "\n",
+ "Find drag coefficient with best-fit line to experimental data\n",
+ "\n",
+ "|i | v (m/s) | F (N) |\n",
+ "|---|---|---|\n",
+ "|1 | 10 | 25 |\n",
+ "|2 | 20 | 70 |\n",
+ "|3 | 30 | 380|\n",
+ "|4 | 40 | 550|\n",
+ "|5 | 50 | 610|\n",
+ "|6 | 60 | 1220|\n",
+ "|7 | 70 | 830 |\n",
+ "|8 |80 | 1450|"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a =\n",
+ "\n",
+ " -234.286\n",
+ " 19.470\n",
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "drag_data=[...\n",
+ "1 , 10 , 25 \n",
+ "2 , 20 , 70 \n",
+ "3 , 30 , 380\n",
+ "4 , 40 , 550\n",
+ "5 , 50 , 610\n",
+ "6 , 60 , 1220\n",
+ "7 , 70 , 830 \n",
+ "8 ,80 , 1450];\n",
+ "x=drag_data(:,2);\n",
+ "y=drag_data(:,3);\n",
+ "\n",
+ "b=[sum(y);sum(x.*y)];\n",
+ "A=[length(x),sum(x);\n",
+ " sum(x), sum(x.^2)];\n",
+ " \n",
+ "a=A\\b\n",
+ "\n",
+ "plot(x,y,'o',x,a(1)+a(2)*x)\n",
+ "legend('data','best-fit','Location','NorthWest')\n",
+ "xlabel('Force (N)')\n",
+ "ylabel('velocity (m/s)')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 72,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(x,y-a(1)-40*x)\n",
+ "legend('data','best-fit','Location','NorthWest')\n",
+ "ylabel('residuals (N)')\n",
+ "xlabel('velocity (m/s)')\n",
+ "title('Model does not capture measurements')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "How do we know its a \"good\" fit? \n",
+ "\n",
+ "Can compare the sum of squares error to the total sum of squares of the dependent variable (here F). \n",
+ "\n",
+ "$S_{r}=\\sum(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n",
+ "\n",
+ "$S_{t}=\\sum(y_{i}-\\bar{y})^{2}$\n",
+ "\n",
+ "Then, we can calculate the *coefficient of determination*, $r^{2}$ or *correlation coefficient*, r. \n",
+ "\n",
+ "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}$\n",
+ "\n",
+ "This represents the relative improvement of assuming that y is a function of x (if the x-values are not random and the y-values are random)\n",
+ "\n",
+ "For further information regarding statistical work on regression, look at \n",
+ "[NIST Statistics Handbook](http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 73,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "St = 1.8083e+06\n",
+ "St = 1.8083e+06\n"
+ ]
+ }
+ ],
+ "source": [
+ "Sr=sum((y-a(1)-a(2)*x).^2);\n",
+ "St=std(y)^2*(length(y)-1)\n",
+ "St=sum((y-mean(y)).^2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 74,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "r2 = 0.88049\n",
+ "r = 0.93834\n"
+ ]
+ }
+ ],
+ "source": [
+ "r2=(St-Sr)/St\n",
+ "r=sqrt(r2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Limiting cases \n",
+ "\n",
+ "#### $r^{2}=0$ $S_{r}=S{t}$\n",
+ "\n",
+ "#### $r^{2}=1$ $S_{r}=0$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "gen_examples\n",
+ "plot(x1(1:50:end),y1(1:50:end),'s',x2,y2,'o')\n",
+ "legend('Case 1','Case 2','Location','NorthWest')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 77,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a1 =\n",
+ "\n",
+ " 0.0482496\n",
+ " 0.0017062\n",
+ "\n",
+ "a2 =\n",
+ "\n",
+ " 0\n",
+ " 1\n",
+ "\n",
+ "Sr1 = 0.83505\n",
+ "St1 = 0.83529\n",
+ "coefficient of determination in Case 1 is 0.000291\n",
+ "Sr2 = 0\n",
+ "St2 = 1.0185\n",
+ "coefficient of determination in Case 2 is 1.000000\n"
+ ]
+ }
+ ],
+ "source": [
+ "b1=[sum(y1);sum(x1.*y1)];\n",
+ "A1=[length(x1),sum(x1);\n",
+ " sum(x1), sum(x1.^2)];\n",
+ " \n",
+ "a1=A1\\b1\n",
+ "\n",
+ "b2=[sum(y2);sum(x2.*y2)];\n",
+ "A2=[length(x2),sum(x2);\n",
+ " sum(x2), sum(x2.^2)];\n",
+ " \n",
+ "a2=A2\\b2\n",
+ "\n",
+ "Sr1=sum((y1-a1(1)-a1(2)*x1).^2)\n",
+ "St1=sum((y1-mean(y1)).^2)\n",
+ "fprintf('coefficient of determination in Case 1 is %f\\n',1-Sr1/St1)\n",
+ "\n",
+ "Sr2=sum((y2-a2(1)-a2(2)*x2).^2)\n",
+ "St2=sum((y2-mean(y2)).^2)\n",
+ "\n",
+ "fprintf('coefficient of determination in Case 2 is %f\\n',1-Sr2/St2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "# General Linear Regression"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In general, you may want to fit other polynomials besides degree-1 (straight-lines)\n",
+ "\n",
+ "$y=a_{0}+a_{1}x+a_{2}x^{2}+\\cdots+a_{m}x^{m}+e$\n",
+ "\n",
+ "Now, the solution for $a_{0},~a_{1},...a_{m}$ is the minimization of m+1-dependent linear equations. \n",
+ "\n",
+ "Consider the following data:\n",
+ "\n",
+ "| x | y |\n",
+ "|---|---|\n",
+ "| 0.00 | 21.50 |\n",
+ "| 2.00 | 20.84 |\n",
+ "| 4.00 | 23.19 |\n",
+ "| 6.00 | 22.69 |\n",
+ "| 8.00 | 30.27 |\n",
+ "| 10.00 | 40.11 |\n",
+ "| 12.00 | 43.31 |\n",
+ "| 14.00 | 54.79 |\n",
+ "| 16.00 | 70.88 |\n",
+ "| 18.00 | 89.48 |\n",
"\n"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "load xy_data.csv\n",
+ "x=xy_data(1,:)';\n",
+ "y=xy_data(2,:)';\n",
+ "plot(x,y,'o')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "xy_data =\n",
+ "\n",
+ " Columns 1 through 7:\n",
+ "\n",
+ " 0.00000 2.00000 4.00000 6.00000 8.00000 10.00000 12.00000\n",
+ " 21.50114 20.84153 23.19201 22.69498 30.26687 40.11075 43.30543\n",
+ "\n",
+ " Columns 8 through 11:\n",
+ "\n",
+ " 14.00000 16.00000 18.00000 20.00000\n",
+ " 54.78730 70.88443 89.48368 97.28135\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "xy_data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The model can be rewritten as \n",
+ "\n",
+ "$y=\\left[Z\\right]a+e$\n",
+ "\n",
+ "where $a=\\left[\\begin{array}{c}\n",
+ " a_{0}\\\\\n",
+ " a_{1}\\\\\n",
+ " a_{2}\\end{array}\\right]$\n",
+ " \n",
+ "$[Z]=\\left[\\begin{array} \n",
+ "1 & x_{1} & x_{1}^{2} \\\\\n",
+ "1 & x_{2} & x_{2}^{2} \\\\\n",
+ "1 & x_{3} & x_{3}^{2} \\\\\n",
+ "1 & x_{4} & x_{4}^{2} \\\\\n",
+ "1 & x_{5} & x_{5}^{2} \\\\\n",
+ "1 & x_{6} & x_{6}^{2} \\\\\n",
+ "1 & x_{7} & x_{7}^{2} \\\\\n",
+ "1 & x_{8} & x_{8}^{2} \\\\\n",
+ "1 & x_{9} & x_{9}^{2} \\\\\n",
+ "1 & x_{10} & x_{10}^{2} \\end{array}\\right]$\n",
+ "\n",
+ "The sum of squares residuals for this model is\n",
+ "\n",
+ "$S_{r}=\\sum_{i=1}^{n}\\left(y_{i}-\\sum_{j=0}^{m}a_{j}z_{ji}\\right)$\n",
+ "\n",
+ "Minimizing this function results in\n",
+ "\n",
+ "$y=[Z]a$\n",
+ "\n",
+ "->**A standard Linear Algebra Problem**\n",
+ "\n",
+ "*the vector a is unknown, and Z is calculated based upon the assumed function*\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 79,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Z =\n",
+ "\n",
+ " 1 0 0\n",
+ " 1 2 4\n",
+ " 1 4 16\n",
+ " 1 6 36\n",
+ " 1 8 64\n",
+ " 1 10 100\n",
+ " 1 12 144\n",
+ " 1 14 196\n",
+ " 1 16 256\n",
+ " 1 18 324\n",
+ " 1 20 400\n",
+ "\n",
+ "a =\n",
+ "\n",
+ " 21.40341\n",
+ " -0.81538\n",
+ " 0.23935\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "Z=[x.^0,x,x.^2]\n",
+ "\n",
+ "a=Z\\y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 80,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "x_fcn=linspace(min(x),max(x));\n",
+ "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### General Coefficient of Determination\n",
+ "\n",
+ "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}=1-\\frac{S_{r}}{S_t}$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "St = 27.923\n",
+ "Sr = 2.6366\n"
+ ]
+ }
+ ],
+ "source": [
+ "St=std(y)\n",
+ "Sr=std(y-a(1)-a(2)*x-a(3)*x.^2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "the coefficient of determination for this fit is 0.880485\n",
+ "the correlation coefficient this fit is 0.938342\n"
+ ]
+ }
+ ],
+ "source": [
+ "r2=1-Sr/St;\n",
+ "r=sqrt(r2);\n",
+ "\n",
+ "fprintf('the coefficient of determination for this fit is %f',r2)\n",
+ "fprintf('the correlation coefficient this fit is %f',r)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Compare this to a straight line fit"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 83,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "St = 27.923\n",
+ "Sr = 9.2520\n",
+ "the coefficient of determination for this fit is 0.668655\n",
+ "the correlation coefficient this fit is 0.817713\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "Z=[ones(size(x)) x];\n",
+ "a_line=Z\\y;\n",
+ "x_fcn=linspace(min(x),max(x));\n",
+ "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n",
+ "x_fcn,a_line(1)+a_line(2)*x_fcn)\n",
+ "St=std(y)\n",
+ "Sr=std(y-a_line(1)-a_line(2)*x)\n",
+ "r2=1-Sr/St;\n",
+ "r=sqrt(r2);\n",
+ "\n",
+ "fprintf('the coefficient of determination for this fit is %f',r2)\n",
+ "fprintf('the correlation coefficient this fit is %f',r)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 85,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(x,y-a_line(1)-a_line(2)*x)\n",
+ "title('residuals of straight line')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 86,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(x,y-a(1)-a(2)*x-a(3)*x.^2)\n",
+ "title('residuals of parabola')"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "# General Regression (Linear and nonlinear)\n",
+ "## Warning \n",
+ "**Coefficient of determination reduction does not always mean a better fit**\n",
+ "\n",
+ "Try the function, \n",
+ "\n",
+ "$y(x)=a_0+a_{1}x+a_{2}x^{2}+a_{4}x^{4}+a_{5}x^{5}+a_{5}x^{5}+a_{6}x^{6}+a_{7}x^{7}+a_{8}x^{8}$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 90,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a_overfit =\n",
+ "\n",
+ " 2.1487e+01\n",
+ " -1.4264e+01\n",
+ " 1.5240e+01\n",
+ " -6.0483e+00\n",
+ " 1.1887e+00\n",
+ " -1.2651e-01\n",
+ " 7.4379e-03\n",
+ " -2.2702e-04\n",
+ " 2.8063e-06\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "Z=[ones(size(x)) x x.^2 x.^3 x.^4 x.^5 x.^6 x.^7 x.^8];\n",
+ "a_overfit=Z\\y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 91,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n",
+ "x_fcn,a_overfit(1)+a_overfit(2)*x_fcn+a_overfit(3)*x_fcn.^2+...\n",
+ "a_overfit(4)*x_fcn.^3+a_overfit(5)*x_fcn.^4+...\n",
+ "a_overfit(6)*x_fcn.^5+a_overfit(7)*x_fcn.^6+...\n",
+ "a_overfit(8)*x_fcn.^7+a_overfit(9)*x_fcn.^8)\n",
+ "legend('data','parabola','n=8-fit','Location','NorthWest')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 92,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "St = 27.923\n",
+ "Sr = 0.77320\n",
+ "the coefficient of determination for this fit is 0.972309\n",
+ "the correlation coefficient this fit is 0.986057\n"
+ ]
+ }
+ ],
+ "source": [
+ "St=std(y)\n",
+ "Sr=std(y-polyval(a_overfit(end:-1:1),x))\n",
+ "r2=1-Sr/St;\n",
+ "r=sqrt(r2);\n",
+ "\n",
+ "fprintf('the coefficient of determination for this fit is %f',r2)\n",
+ "fprintf('the correlation coefficient this fit is %f',r)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Linear Regression is only limited by the ability to separate the parameters from the function to achieve\n",
+ "\n",
+ "$y=[Z]a$\n",
+ "\n",
+ "$Z$ can be any function of the independent variable(s)\n",
+ "\n",
+ "**Example**:\n",
+ "We assume two functions are added together, sin(t) and sin(3t). What are the amplitudes?\n",
"\n"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": 93,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "load sin_data.csv\n",
+ "t=sin_data(1,:)';\n",
+ "y=sin_data(2,:)';\n",
+ "plot(t,y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 94,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a =\n",
+ "\n",
+ " 0.99727\n",
+ " 0.50251\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "Z=[sin(t) sin(3*t)];\n",
+ "a=Z\\y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 95,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(t,y,'.',t,a(1)*sin(t)+a(2)*sin(3*t))"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,