diff --git a/lecture_17/lecture_16.ipynb b/lecture_17/lecture_16.ipynb deleted file mode 100644 index 3159c42..0000000 --- a/lecture_17/lecture_16.ipynb +++ /dev/null @@ -1,2710 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 65, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "setdefaults" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "%plot --format svg" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![question 1](q1.png)\n", - "\n", - "![question 2](q2.png)\n", - "\n", - "### Project Ideas so far\n", - "\n", - "- Nothing yet...probably something heat transfer related\n", - "\n", - "- Modeling Propulsion or Propagation of Sound Waves\n", - "\n", - "- Low Thrust Orbital Transfer\n", - "\n", - "- Tracking motion of a satellite entering orbit until impact\n", - "\n", - "- What ever you think is best.\n", - "\n", - "- You had heat transfer project as an option; that sounded cool\n", - "\n", - "- Heat transfer through a pipe\n", - "\n", - "- I would prefer to do something with beam/plate mechanics or vibrations than a heat transfer or thermo problem\n", - "\n", - "- Modeling Couette flow with a pressure gradient using a discretized form of the Navier-Stokes equation and comparing it to the analytical solution\n", - "\n", - "- Software to instruct a robotic arm to orient itself based on input from a gyroscope" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Questions from you:\n", - "\n", - "How was your spring break\n", - "\n", - "Are grades for hw 3 and 4 going to be posted?\n", - "\n", - "For class occasionally switching to doc cam and working through problems by hand might help get ideas across.\n", - "\n", - "Are you assigning those who do not have groups to groups?\n", - "\n", - "How do you graph a standard distribution in Matlab?\n", - "\n", - "What is the longest code you have written?\n", - "\n", - "how to approach probability for hw 5?\n", - "??\n", - "\n", - "Will you be assigning groups to people who do not currently have one? \n", - "\n", - "what are some basic guidelines we should use to brainstorm project ideas?\n", - "\n", - "Are you a fan of bananas?\n", - "\n", - "Going through code isn't the most helpful, because you can easily lose interest. But I am not sure what else you can do.\n", - "\n", - "Has lecture 15 been posted yet? Looking in the repository I can't seem to find it." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Linear Least Squares Regression\n", - "\n", - "When approximating a set of data as a polynomial, there will always be error introduced (except in 2 cases). \n", - "\n", - "For a straight line, the actual data points, $y_{i}$ as a function of the independent variable, $x_{i}$, is:\n", - "\n", - "$y_{i}=a_{0}+a_{1}x_{i}+e_{i}$\n", - "\n", - "where $a_{0}$ and $a_{1}$ are the intercept and slope of the line and $e_{i}$ is the error between the approximate function and the recorded data point. \n", - "\n", - "We make the following assumptions in this analysis:\n", - "\n", - "1. Each x has a fixed value; it is not random and is known without error.\n", - "\n", - "2. The y values are independent random variables and all have the same variance.\n", - "\n", - "3. The y values for a given x must be normally distributed.\n", - "\n", - "The total error is \n", - "\n", - "$\\sum_{i=1}^{n}e_{i}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})$\n", - "\n", - "we don't care about the sign though. One approach has been demonstrated to provide a unique solution is minimizing the sum of squares error or\n", - "\n", - "$S_{r}=\\sum_{i=1}^{n}e_{i}^{2}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n", - "\n", - "Where, $S_{r}$ is the sum of squares error (SSE). \n", - "\n", - "$\\frac{\\partial S_{r}}{\\partial a_{0}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})$\n", - "\n", - "$\\frac{\\partial S_{r}}{\\partial a_{1}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})x_{i}$\n", - "\n", - "The minimum $S_{r}$ occurrs when the partial derivatives are 0. \n", - "\n", - "$\\sum y_{i}= \\sum a_{0}+\\sum a_{1}x_{i}$\n", - "\n", - "$\\sum x_{i}y_{i}= \\sum a_{0}x_{i}+\\sum a_{1}x_{i}^{2}$\n", - "\n", - "$\\left[\\begin{array}{c}\n", - "\\sum y_{i}\\\\\n", - "\\sum x_{i}y_{i}\\end{array}\\right]=\n", - "\\left[\\begin{array}{cc}\n", - "n & \\sum x_{i}\\\\\n", - "\\sum x_{i} & \\sum x_{i}^{2}\\end{array}\\right]\n", - "\\left[\\begin{array}{c}\n", - "a_{0}\\\\\n", - "a_{1}\\end{array}\\right]$\n", - "\n", - "\n", - "$b=Ax$\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Example \n", - "\n", - "Find drag coefficient with best-fit line to experimental data\n", - "\n", - "|i | v (m/s) | F (N) |\n", - "|---|---|---|\n", - "|1 | 10 | 25 |\n", - "|2 | 20 | 70 |\n", - "|3 | 30 | 380|\n", - "|4 | 40 | 550|\n", - "|5 | 50 | 610|\n", - "|6 | 60 | 1220|\n", - "|7 | 70 | 830 |\n", - "|8 |80 | 1450|" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "a =\n", - "\n", - " -234.286\n", - " 19.470\n", - "\n" - ] - }, - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t-500\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t500\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1000\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1500\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t2000\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t30\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t40\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t50\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t60\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t70\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t80\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\t\n", - "\t\tvelocity (m/s)\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\tForce (N)\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\tdata\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\tdata\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\tbest-fit\n", - "\n", - "\t\n", - "\t\tbest-fit\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "drag_data=[...\n", - "1 , 10 , 25 \n", - "2 , 20 , 70 \n", - "3 , 30 , 380\n", - "4 , 40 , 550\n", - "5 , 50 , 610\n", - "6 , 60 , 1220\n", - "7 , 70 , 830 \n", - "8 ,80 , 1450];\n", - "x=drag_data(:,2);\n", - "y=drag_data(:,3);\n", - "\n", - "b=[sum(y);sum(x.*y)];\n", - "A=[length(x),sum(x);\n", - " sum(x), sum(x.^2)];\n", - " \n", - "a=A\\b\n", - "\n", - "plot(x,y,'o',x,a(1)+a(2)*x)\n", - "legend('data','best-fit','Location','NorthWest')\n", - "xlabel('Force (N)')\n", - "ylabel('velocity (m/s)')" - ] - }, - { - "cell_type": "code", - "execution_count": 72, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t-2000\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-1500\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-1000\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-500\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t30\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t40\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t50\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t60\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t70\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t80\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\t\n", - "\t\tresiduals (N)\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\tvelocity (m/s)\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\tModel does not capture measurements\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\tdata\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\tdata\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plot(x,y-a(1)-40*x)\n", - "legend('data','best-fit','Location','NorthWest')\n", - "ylabel('residuals (N)')\n", - "xlabel('velocity (m/s)')\n", - "title('Model does not capture measurements')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "How do we know its a \"good\" fit? \n", - "\n", - "Can compare the sum of squares error to the total sum of squares of the dependent variable (here F). \n", - "\n", - "$S_{r}=\\sum(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n", - "\n", - "$S_{t}=\\sum(y_{i}-\\bar{y})^{2}$\n", - "\n", - "Then, we can calculate the *coefficient of determination*, $r^{2}$ or *correlation coefficient*, r. \n", - "\n", - "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}$\n", - "\n", - "This represents the relative improvement of assuming that y is a function of x (if the x-values are not random and the y-values are random)\n", - "\n", - "For further information regarding statistical work on regression, look at \n", - "[NIST Statistics Handbook](http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm)" - ] - }, - { - "cell_type": "code", - "execution_count": 73, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "St = 1.8083e+06\n", - "St = 1.8083e+06\n" - ] - } - ], - "source": [ - "Sr=sum((y-a(1)-a(2)*x).^2);\n", - "St=std(y)^2*(length(y)-1)\n", - "St=sum((y-mean(y)).^2)" - ] - }, - { - "cell_type": "code", - "execution_count": 74, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "r2 = 0.88049\n", - "r = 0.93834\n" - ] - } - ], - "source": [ - "r2=(St-Sr)/St\n", - "r=sqrt(r2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Limiting cases \n", - "\n", - "#### $r^{2}=0$ $S_{r}=S{t}$\n", - "\n", - "#### $r^{2}=1$ $S_{r}=0$" - ] - }, - { - "cell_type": "code", - "execution_count": 76, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.2\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.4\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.6\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.8\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.2\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.4\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.6\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.8\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\tCase 1\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\tCase 1\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\tCase 2\n", - "\n", - "\t\n", - "\t\tCase 2\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "gen_examples\n", - "plot(x1(1:50:end),y1(1:50:end),'s',x2,y2,'o')\n", - "legend('Case 1','Case 2','Location','NorthWest')" - ] - }, - { - "cell_type": "code", - "execution_count": 77, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "a1 =\n", - "\n", - " 0.0482496\n", - " 0.0017062\n", - "\n", - "a2 =\n", - "\n", - " 0\n", - " 1\n", - "\n", - "Sr1 = 0.83505\n", - "St1 = 0.83529\n", - "coefficient of determination in Case 1 is 0.000291\n", - "Sr2 = 0\n", - "St2 = 1.0185\n", - "coefficient of determination in Case 2 is 1.000000\n" - ] - } - ], - "source": [ - "b1=[sum(y1);sum(x1.*y1)];\n", - "A1=[length(x1),sum(x1);\n", - " sum(x1), sum(x1.^2)];\n", - " \n", - "a1=A1\\b1\n", - "\n", - "b2=[sum(y2);sum(x2.*y2)];\n", - "A2=[length(x2),sum(x2);\n", - " sum(x2), sum(x2.^2)];\n", - " \n", - "a2=A2\\b2\n", - "\n", - "Sr1=sum((y1-a1(1)-a1(2)*x1).^2)\n", - "St1=sum((y1-mean(y1)).^2)\n", - "fprintf('coefficient of determination in Case 1 is %f\\n',1-Sr1/St1)\n", - "\n", - "Sr2=sum((y2-a2(1)-a2(2)*x2).^2)\n", - "St2=sum((y2-mean(y2)).^2)\n", - "\n", - "fprintf('coefficient of determination in Case 2 is %f\\n',1-Sr2/St2)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": true - }, - "source": [ - "# General Linear Regression" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In general, you may want to fit other polynomials besides degree-1 (straight-lines)\n", - "\n", - "$y=a_{0}+a_{1}x+a_{2}x^{2}+\\cdots+a_{m}x^{m}+e$\n", - "\n", - "Now, the solution for $a_{0},~a_{1},...a_{m}$ is the minimization of m+1-dependent linear equations. \n", - "\n", - "Consider the following data:\n", - "\n", - "| x | y |\n", - "|---|---|\n", - "| 0.00 | 21.50 |\n", - "| 2.00 | 20.84 |\n", - "| 4.00 | 23.19 |\n", - "| 6.00 | 22.69 |\n", - "| 8.00 | 30.27 |\n", - "| 10.00 | 40.11 |\n", - "| 12.00 | 43.31 |\n", - "| 14.00 | 54.79 |\n", - "| 16.00 | 70.88 |\n", - "| 18.00 | 89.48 |\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 78, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t30\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t40\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t50\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t60\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t70\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t80\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t90\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t100\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t \n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "load xy_data.csv\n", - "x=xy_data(1,:)';\n", - "y=xy_data(2,:)';\n", - "plot(x,y,'o')" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "xy_data =\n", - "\n", - " Columns 1 through 7:\n", - "\n", - " 0.00000 2.00000 4.00000 6.00000 8.00000 10.00000 12.00000\n", - " 21.50114 20.84153 23.19201 22.69498 30.26687 40.11075 43.30543\n", - "\n", - " Columns 8 through 11:\n", - "\n", - " 14.00000 16.00000 18.00000 20.00000\n", - " 54.78730 70.88443 89.48368 97.28135\n", - "\n" - ] - } - ], - "source": [ - "xy_data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The model can be rewritten as \n", - "\n", - "$y=\\left[Z\\right]a+e$\n", - "\n", - "where $a=\\left[\\begin{array}{c}\n", - " a_{0}\\\\\n", - " a_{1}\\\\\n", - " a_{2}\\end{array}\\right]$\n", - " \n", - "$[Z]=\\left[\\begin{array} \n", - "1 & x_{1} & x_{1}^{2} \\\\\n", - "1 & x_{2} & x_{2}^{2} \\\\\n", - "1 & x_{3} & x_{3}^{2} \\\\\n", - "1 & x_{4} & x_{4}^{2} \\\\\n", - "1 & x_{5} & x_{5}^{2} \\\\\n", - "1 & x_{6} & x_{6}^{2} \\\\\n", - "1 & x_{7} & x_{7}^{2} \\\\\n", - "1 & x_{8} & x_{8}^{2} \\\\\n", - "1 & x_{9} & x_{9}^{2} \\\\\n", - "1 & x_{10} & x_{10}^{2} \\end{array}\\right]$\n", - "\n", - "The sum of squares residuals for this model is\n", - "\n", - "$S_{r}=\\sum_{i=1}^{n}\\left(y_{i}-\\sum_{j=0}^{m}a_{j}z_{ji}\\right)$\n", - "\n", - "Minimizing this function results in\n", - "\n", - "$y=[Z]a$\n", - "\n", - "->**A standard Linear Algebra Problem**\n", - "\n", - "*the vector a is unknown, and Z is calculated based upon the assumed function*\n" - ] - }, - { - "cell_type": "code", - "execution_count": 79, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Z =\n", - "\n", - " 1 0 0\n", - " 1 2 4\n", - " 1 4 16\n", - " 1 6 36\n", - " 1 8 64\n", - " 1 10 100\n", - " 1 12 144\n", - " 1 14 196\n", - " 1 16 256\n", - " 1 18 324\n", - " 1 20 400\n", - "\n", - "a =\n", - "\n", - " 21.40341\n", - " -0.81538\n", - " 0.23935\n", - "\n" - ] - } - ], - "source": [ - "Z=[x.^0,x,x.^2]\n", - "\n", - "a=Z\\y" - ] - }, - { - "cell_type": "code", - "execution_count": 80, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t40\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t60\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t80\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t100\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t120\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t \n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\tgnuplot_plot_2a\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "x_fcn=linspace(min(x),max(x));\n", - "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### General Coefficient of Determination\n", - "\n", - "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}=1-\\frac{S_{r}}{S_t}$" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "St = 27.923\n", - "Sr = 2.6366\n" - ] - } - ], - "source": [ - "St=std(y)\n", - "Sr=std(y-a(1)-a(2)*x-a(3)*x.^2)" - ] - }, - { - "cell_type": "code", - "execution_count": 81, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "the coefficient of determination for this fit is 0.880485\n", - "the correlation coefficient this fit is 0.938342\n" - ] - } - ], - "source": [ - "r2=1-Sr/St;\n", - "r=sqrt(r2);\n", - "\n", - "fprintf('the coefficient of determination for this fit is %f',r2)\n", - "fprintf('the correlation coefficient this fit is %f',r)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compare this to a straight line fit" - ] - }, - { - "cell_type": "code", - "execution_count": 83, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "St = 27.923\n", - "Sr = 9.2520\n", - "the coefficient of determination for this fit is 0.668655\n", - "the correlation coefficient this fit is 0.817713\n" - ] - }, - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t40\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t60\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t80\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t100\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t120\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t \n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\tgnuplot_plot_2a\n", - "\n", - "\t\n", - "\t\n", - "\tgnuplot_plot_3a\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "Z=[ones(size(x)) x];\n", - "a_line=Z\\y;\n", - "x_fcn=linspace(min(x),max(x));\n", - "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n", - "x_fcn,a_line(1)+a_line(2)*x_fcn)\n", - "St=std(y)\n", - "Sr=std(y-a_line(1)-a_line(2)*x)\n", - "r2=1-Sr/St;\n", - "r=sqrt(r2);\n", - "\n", - "fprintf('the coefficient of determination for this fit is %f',r2)\n", - "fprintf('the correlation coefficient this fit is %f',r)" - ] - }, - { - "cell_type": "code", - "execution_count": 85, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t-15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\t\n", - "\t\tresiduals of straight line\n", - "\t\n", - "\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plot(x,y-a_line(1)-a_line(2)*x)\n", - "title('residuals of straight line')" - ] - }, - { - "cell_type": "code", - "execution_count": 86, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t-4\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-2\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t2\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t4\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t6\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\t\n", - "\t\tresiduals of parabola\n", - "\t\n", - "\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plot(x,y-a(1)-a(2)*x-a(3)*x.^2)\n", - "title('residuals of parabola')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Warning \n", - "**Coefficient of determination reduction does not always mean a better fit**\n", - "\n", - "Try the function, \n", - "\n", - "$y(x)=a_0+a_{1}x+a_{2}x^{2}+a_{4}x^{4}+a_{5}x^{5}+a_{5}x^{5}+a_{6}x^{6}+a_{7}x^{7}+a_{8}x^{8}$" - ] - }, - { - "cell_type": "code", - "execution_count": 90, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "a_overfit =\n", - "\n", - " 2.1487e+01\n", - " -1.4264e+01\n", - " 1.5240e+01\n", - " -6.0483e+00\n", - " 1.1887e+00\n", - " -1.2651e-01\n", - " 7.4379e-03\n", - " -2.2702e-04\n", - " 2.8063e-06\n", - "\n" - ] - } - ], - "source": [ - "Z=[ones(size(x)) x x.^2 x.^3 x.^4 x.^5 x.^6 x.^7 x.^8];\n", - "a_overfit=Z\\y" - ] - }, - { - "cell_type": "code", - "execution_count": 91, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t40\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t60\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t80\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t100\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t120\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t15\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t20\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\tdata\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\tdata\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\tparabola\n", - "\n", - "\t\n", - "\t\tparabola\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\tn=8-fit\n", - "\n", - "\t\n", - "\t\tn=8-fit\n", - "\t\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n", - "x_fcn,a_overfit(1)+a_overfit(2)*x_fcn+a_overfit(3)*x_fcn.^2+...\n", - "a_overfit(4)*x_fcn.^3+a_overfit(5)*x_fcn.^4+...\n", - "a_overfit(6)*x_fcn.^5+a_overfit(7)*x_fcn.^6+...\n", - "a_overfit(8)*x_fcn.^7+a_overfit(9)*x_fcn.^8)\n", - "legend('data','parabola','n=8-fit','Location','NorthWest')" - ] - }, - { - "cell_type": "code", - "execution_count": 92, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "St = 27.923\n", - "Sr = 0.77320\n", - "the coefficient of determination for this fit is 0.972309\n", - "the correlation coefficient this fit is 0.986057\n" - ] - } - ], - "source": [ - "St=std(y)\n", - "Sr=std(y-polyval(a_overfit(end:-1:1),x))\n", - "r2=1-Sr/St;\n", - "r=sqrt(r2);\n", - "\n", - "fprintf('the coefficient of determination for this fit is %f',r2)\n", - "fprintf('the correlation coefficient this fit is %f',r)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Linear Regression is only limited by the ability to separate the parameters from the function to achieve\n", - "\n", - "$y=[Z]a$\n", - "\n", - "$Z$ can be any function of the independent variable(s)\n", - "\n", - "**Example**:\n", - "We assume two functions are added together, sin(t) and sin(3t). What are the amplitudes?\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 93, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t-1.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-1\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-0.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t2\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t4\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t6\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t8\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t12\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t14\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "load sin_data.csv\n", - "t=sin_data(1,:)';\n", - "y=sin_data(2,:)';\n", - "plot(t,y)" - ] - }, - { - "cell_type": "code", - "execution_count": 94, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "a =\n", - "\n", - " 0.99727\n", - " 0.50251\n", - "\n" - ] - } - ], - "source": [ - "Z=[sin(t) sin(3*t)];\n", - "a=Z\\y" - ] - }, - { - "cell_type": "code", - "execution_count": 95, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "Gnuplot\n", - "Produced by GNUPLOT 5.0 patchlevel 3 \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\t\n", - "\t \n", - "\t \n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\t\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\t\n", - "\t\t-1.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-1\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t-0.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t1.5\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t0\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t2\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t4\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t6\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t8\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t10\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t12\n", - "\t\n", - "\n", - "\n", - "\t\t\n", - "\t\t14\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\t\n", - "\n", - "\n", - "\tgnuplot_plot_1a\n", - "\n", - "\n", - "\n", - "\t \n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "\t\n", - "\tgnuplot_plot_2a\n", - "\n", - "\t\n", - "\t\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plot(t,y,'.',t,a(1)*sin(t)+a(2)*sin(3*t))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Octave", - "language": "octave", - "name": "octave" - }, - "language_info": { - "file_extension": ".m", - "help_links": [ - { - "text": "MetaKernel Magics", - "url": "https://github.com/calysto/metakernel/blob/master/metakernel/magics/README.md" - } - ], - "mimetype": "text/x-octave", - "name": "octave", - "version": "0.19.14" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/lecture_17/lecture_17.ipynb b/lecture_17/lecture_17.ipynb index a9fec04..3159c42 100644 --- a/lecture_17/lecture_17.ipynb +++ b/lecture_17/lecture_17.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": 65, "metadata": { "collapsed": true }, @@ -13,7 +13,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 66, "metadata": { "collapsed": true }, @@ -26,20 +26,2656 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Questions from last class\n", + "![question 1](q1.png)\n", "\n", - "### Questions for me\n", + "![question 2](q2.png)\n", + "\n", + "### Project Ideas so far\n", + "\n", + "- Nothing yet...probably something heat transfer related\n", + "\n", + "- Modeling Propulsion or Propagation of Sound Waves\n", + "\n", + "- Low Thrust Orbital Transfer\n", + "\n", + "- Tracking motion of a satellite entering orbit until impact\n", + "\n", + "- What ever you think is best.\n", + "\n", + "- You had heat transfer project as an option; that sounded cool\n", + "\n", + "- Heat transfer through a pipe\n", + "\n", + "- I would prefer to do something with beam/plate mechanics or vibrations than a heat transfer or thermo problem\n", + "\n", + "- Modeling Couette flow with a pressure gradient using a discretized form of the Navier-Stokes equation and comparing it to the analytical solution\n", + "\n", + "- Software to instruct a robotic arm to orient itself based on input from a gyroscope" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Questions from you:\n", + "\n", + "How was your spring break\n", + "\n", + "Are grades for hw 3 and 4 going to be posted?\n", + "\n", + "For class occasionally switching to doc cam and working through problems by hand might help get ideas across.\n", + "\n", + "Are you assigning those who do not have groups to groups?\n", + "\n", + "How do you graph a standard distribution in Matlab?\n", + "\n", + "What is the longest code you have written?\n", + "\n", + "how to approach probability for hw 5?\n", + "??\n", + "\n", + "Will you be assigning groups to people who do not currently have one? \n", + "\n", + "what are some basic guidelines we should use to brainstorm project ideas?\n", + "\n", + "Are you a fan of bananas?\n", + "\n", + "Going through code isn't the most helpful, because you can easily lose interest. But I am not sure what else you can do.\n", + "\n", + "Has lecture 15 been posted yet? Looking in the repository I can't seem to find it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Linear Least Squares Regression\n", + "\n", + "When approximating a set of data as a polynomial, there will always be error introduced (except in 2 cases). \n", + "\n", + "For a straight line, the actual data points, $y_{i}$ as a function of the independent variable, $x_{i}$, is:\n", + "\n", + "$y_{i}=a_{0}+a_{1}x_{i}+e_{i}$\n", + "\n", + "where $a_{0}$ and $a_{1}$ are the intercept and slope of the line and $e_{i}$ is the error between the approximate function and the recorded data point. \n", + "\n", + "We make the following assumptions in this analysis:\n", + "\n", + "1. Each x has a fixed value; it is not random and is known without error.\n", + "\n", + "2. The y values are independent random variables and all have the same variance.\n", + "\n", + "3. The y values for a given x must be normally distributed.\n", + "\n", + "The total error is \n", + "\n", + "$\\sum_{i=1}^{n}e_{i}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})$\n", + "\n", + "we don't care about the sign though. One approach has been demonstrated to provide a unique solution is minimizing the sum of squares error or\n", + "\n", + "$S_{r}=\\sum_{i=1}^{n}e_{i}^{2}=\\sum_{i=1}^{n}(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n", + "\n", + "Where, $S_{r}$ is the sum of squares error (SSE). \n", + "\n", + "$\\frac{\\partial S_{r}}{\\partial a_{0}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})$\n", + "\n", + "$\\frac{\\partial S_{r}}{\\partial a_{1}}=-2\\sum(y_{i}-a_{0}-a_{1}x_{i})x_{i}$\n", + "\n", + "The minimum $S_{r}$ occurrs when the partial derivatives are 0. \n", + "\n", + "$\\sum y_{i}= \\sum a_{0}+\\sum a_{1}x_{i}$\n", + "\n", + "$\\sum x_{i}y_{i}= \\sum a_{0}x_{i}+\\sum a_{1}x_{i}^{2}$\n", + "\n", + "$\\left[\\begin{array}{c}\n", + "\\sum y_{i}\\\\\n", + "\\sum x_{i}y_{i}\\end{array}\\right]=\n", + "\\left[\\begin{array}{cc}\n", + "n & \\sum x_{i}\\\\\n", + "\\sum x_{i} & \\sum x_{i}^{2}\\end{array}\\right]\n", + "\\left[\\begin{array}{c}\n", + "a_{0}\\\\\n", + "a_{1}\\end{array}\\right]$\n", + "\n", + "\n", + "$b=Ax$\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Example \n", + "\n", + "Find drag coefficient with best-fit line to experimental data\n", + "\n", + "|i | v (m/s) | F (N) |\n", + "|---|---|---|\n", + "|1 | 10 | 25 |\n", + "|2 | 20 | 70 |\n", + "|3 | 30 | 380|\n", + "|4 | 40 | 550|\n", + "|5 | 50 | 610|\n", + "|6 | 60 | 1220|\n", + "|7 | 70 | 830 |\n", + "|8 |80 | 1450|" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a =\n", + "\n", + " -234.286\n", + " 19.470\n", + "\n" + ] + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t-500\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t500\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1000\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1500\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t2000\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t30\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t40\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t50\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t60\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t70\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t80\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\t\n", + "\t\tvelocity (m/s)\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\tForce (N)\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\tdata\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\tdata\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\tbest-fit\n", + "\n", + "\t\n", + "\t\tbest-fit\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "drag_data=[...\n", + "1 , 10 , 25 \n", + "2 , 20 , 70 \n", + "3 , 30 , 380\n", + "4 , 40 , 550\n", + "5 , 50 , 610\n", + "6 , 60 , 1220\n", + "7 , 70 , 830 \n", + "8 ,80 , 1450];\n", + "x=drag_data(:,2);\n", + "y=drag_data(:,3);\n", + "\n", + "b=[sum(y);sum(x.*y)];\n", + "A=[length(x),sum(x);\n", + " sum(x), sum(x.^2)];\n", + " \n", + "a=A\\b\n", + "\n", + "plot(x,y,'o',x,a(1)+a(2)*x)\n", + "legend('data','best-fit','Location','NorthWest')\n", + "xlabel('Force (N)')\n", + "ylabel('velocity (m/s)')" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t-2000\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-1500\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-1000\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-500\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t30\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t40\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t50\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t60\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t70\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t80\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\t\n", + "\t\tresiduals (N)\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\tvelocity (m/s)\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\tModel does not capture measurements\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\tdata\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\tdata\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot(x,y-a(1)-40*x)\n", + "legend('data','best-fit','Location','NorthWest')\n", + "ylabel('residuals (N)')\n", + "xlabel('velocity (m/s)')\n", + "title('Model does not capture measurements')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "How do we know its a \"good\" fit? \n", + "\n", + "Can compare the sum of squares error to the total sum of squares of the dependent variable (here F). \n", + "\n", + "$S_{r}=\\sum(y_{i}-a_{0}-a_{1}x_{i})^{2}$\n", + "\n", + "$S_{t}=\\sum(y_{i}-\\bar{y})^{2}$\n", + "\n", + "Then, we can calculate the *coefficient of determination*, $r^{2}$ or *correlation coefficient*, r. \n", + "\n", + "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}$\n", + "\n", + "This represents the relative improvement of assuming that y is a function of x (if the x-values are not random and the y-values are random)\n", + "\n", + "For further information regarding statistical work on regression, look at \n", + "[NIST Statistics Handbook](http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm)" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "St = 1.8083e+06\n", + "St = 1.8083e+06\n" + ] + } + ], + "source": [ + "Sr=sum((y-a(1)-a(2)*x).^2);\n", + "St=std(y)^2*(length(y)-1)\n", + "St=sum((y-mean(y)).^2)" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "r2 = 0.88049\n", + "r = 0.93834\n" + ] + } + ], + "source": [ + "r2=(St-Sr)/St\n", + "r=sqrt(r2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Limiting cases \n", + "\n", + "#### $r^{2}=0$ $S_{r}=S{t}$\n", + "\n", + "#### $r^{2}=1$ $S_{r}=0$" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.2\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.4\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.6\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.8\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.2\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.4\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.6\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.8\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\tCase 1\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\tCase 1\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\tCase 2\n", + "\n", + "\t\n", + "\t\tCase 2\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "gen_examples\n", + "plot(x1(1:50:end),y1(1:50:end),'s',x2,y2,'o')\n", + "legend('Case 1','Case 2','Location','NorthWest')" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a1 =\n", + "\n", + " 0.0482496\n", + " 0.0017062\n", + "\n", + "a2 =\n", + "\n", + " 0\n", + " 1\n", + "\n", + "Sr1 = 0.83505\n", + "St1 = 0.83529\n", + "coefficient of determination in Case 1 is 0.000291\n", + "Sr2 = 0\n", + "St2 = 1.0185\n", + "coefficient of determination in Case 2 is 1.000000\n" + ] + } + ], + "source": [ + "b1=[sum(y1);sum(x1.*y1)];\n", + "A1=[length(x1),sum(x1);\n", + " sum(x1), sum(x1.^2)];\n", + " \n", + "a1=A1\\b1\n", + "\n", + "b2=[sum(y2);sum(x2.*y2)];\n", + "A2=[length(x2),sum(x2);\n", + " sum(x2), sum(x2.^2)];\n", + " \n", + "a2=A2\\b2\n", + "\n", + "Sr1=sum((y1-a1(1)-a1(2)*x1).^2)\n", + "St1=sum((y1-mean(y1)).^2)\n", + "fprintf('coefficient of determination in Case 1 is %f\\n',1-Sr1/St1)\n", + "\n", + "Sr2=sum((y2-a2(1)-a2(2)*x2).^2)\n", + "St2=sum((y2-mean(y2)).^2)\n", + "\n", + "fprintf('coefficient of determination in Case 2 is %f\\n',1-Sr2/St2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# General Linear Regression" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In general, you may want to fit other polynomials besides degree-1 (straight-lines)\n", + "\n", + "$y=a_{0}+a_{1}x+a_{2}x^{2}+\\cdots+a_{m}x^{m}+e$\n", + "\n", + "Now, the solution for $a_{0},~a_{1},...a_{m}$ is the minimization of m+1-dependent linear equations. \n", + "\n", + "Consider the following data:\n", + "\n", + "| x | y |\n", + "|---|---|\n", + "| 0.00 | 21.50 |\n", + "| 2.00 | 20.84 |\n", + "| 4.00 | 23.19 |\n", + "| 6.00 | 22.69 |\n", + "| 8.00 | 30.27 |\n", + "| 10.00 | 40.11 |\n", + "| 12.00 | 43.31 |\n", + "| 14.00 | 54.79 |\n", + "| 16.00 | 70.88 |\n", + "| 18.00 | 89.48 |\n", "\n" ] }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t30\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t40\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t50\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t60\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t70\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t80\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t90\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t100\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t \n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "load xy_data.csv\n", + "x=xy_data(1,:)';\n", + "y=xy_data(2,:)';\n", + "plot(x,y,'o')" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "xy_data =\n", + "\n", + " Columns 1 through 7:\n", + "\n", + " 0.00000 2.00000 4.00000 6.00000 8.00000 10.00000 12.00000\n", + " 21.50114 20.84153 23.19201 22.69498 30.26687 40.11075 43.30543\n", + "\n", + " Columns 8 through 11:\n", + "\n", + " 14.00000 16.00000 18.00000 20.00000\n", + " 54.78730 70.88443 89.48368 97.28135\n", + "\n" + ] + } + ], + "source": [ + "xy_data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The model can be rewritten as \n", + "\n", + "$y=\\left[Z\\right]a+e$\n", + "\n", + "where $a=\\left[\\begin{array}{c}\n", + " a_{0}\\\\\n", + " a_{1}\\\\\n", + " a_{2}\\end{array}\\right]$\n", + " \n", + "$[Z]=\\left[\\begin{array} \n", + "1 & x_{1} & x_{1}^{2} \\\\\n", + "1 & x_{2} & x_{2}^{2} \\\\\n", + "1 & x_{3} & x_{3}^{2} \\\\\n", + "1 & x_{4} & x_{4}^{2} \\\\\n", + "1 & x_{5} & x_{5}^{2} \\\\\n", + "1 & x_{6} & x_{6}^{2} \\\\\n", + "1 & x_{7} & x_{7}^{2} \\\\\n", + "1 & x_{8} & x_{8}^{2} \\\\\n", + "1 & x_{9} & x_{9}^{2} \\\\\n", + "1 & x_{10} & x_{10}^{2} \\end{array}\\right]$\n", + "\n", + "The sum of squares residuals for this model is\n", + "\n", + "$S_{r}=\\sum_{i=1}^{n}\\left(y_{i}-\\sum_{j=0}^{m}a_{j}z_{ji}\\right)$\n", + "\n", + "Minimizing this function results in\n", + "\n", + "$y=[Z]a$\n", + "\n", + "->**A standard Linear Algebra Problem**\n", + "\n", + "*the vector a is unknown, and Z is calculated based upon the assumed function*\n" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Z =\n", + "\n", + " 1 0 0\n", + " 1 2 4\n", + " 1 4 16\n", + " 1 6 36\n", + " 1 8 64\n", + " 1 10 100\n", + " 1 12 144\n", + " 1 14 196\n", + " 1 16 256\n", + " 1 18 324\n", + " 1 20 400\n", + "\n", + "a =\n", + "\n", + " 21.40341\n", + " -0.81538\n", + " 0.23935\n", + "\n" + ] + } + ], + "source": [ + "Z=[x.^0,x,x.^2]\n", + "\n", + "a=Z\\y" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t40\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t60\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t80\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t100\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t120\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t \n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\tgnuplot_plot_2a\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "x_fcn=linspace(min(x),max(x));\n", + "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### General Coefficient of Determination\n", + "\n", + "$r^{2}=\\frac{S_{t}-S_{r}}{S_{t}}=1-\\frac{S_{r}}{S_t}$" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "St = 27.923\n", + "Sr = 2.6366\n" + ] + } + ], + "source": [ + "St=std(y)\n", + "Sr=std(y-a(1)-a(2)*x-a(3)*x.^2)" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "the coefficient of determination for this fit is 0.880485\n", + "the correlation coefficient this fit is 0.938342\n" + ] + } + ], + "source": [ + "r2=1-Sr/St;\n", + "r=sqrt(r2);\n", + "\n", + "fprintf('the coefficient of determination for this fit is %f',r2)\n", + "fprintf('the correlation coefficient this fit is %f',r)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compare this to a straight line fit" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "St = 27.923\n", + "Sr = 9.2520\n", + "the coefficient of determination for this fit is 0.668655\n", + "the correlation coefficient this fit is 0.817713\n" + ] + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t40\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t60\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t80\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t100\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t120\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t \n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\tgnuplot_plot_2a\n", + "\n", + "\t\n", + "\t\n", + "\tgnuplot_plot_3a\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "Z=[ones(size(x)) x];\n", + "a_line=Z\\y;\n", + "x_fcn=linspace(min(x),max(x));\n", + "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n", + "x_fcn,a_line(1)+a_line(2)*x_fcn)\n", + "St=std(y)\n", + "Sr=std(y-a_line(1)-a_line(2)*x)\n", + "r2=1-Sr/St;\n", + "r=sqrt(r2);\n", + "\n", + "fprintf('the coefficient of determination for this fit is %f',r2)\n", + "fprintf('the correlation coefficient this fit is %f',r)" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t-15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\t\n", + "\t\tresiduals of straight line\n", + "\t\n", + "\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot(x,y-a_line(1)-a_line(2)*x)\n", + "title('residuals of straight line')" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t-4\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-2\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t2\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t4\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t6\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\t\n", + "\t\tresiduals of parabola\n", + "\t\n", + "\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot(x,y-a(1)-a(2)*x-a(3)*x.^2)\n", + "title('residuals of parabola')" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# General Regression (Linear and nonlinear)\n", + "## Warning \n", + "**Coefficient of determination reduction does not always mean a better fit**\n", + "\n", + "Try the function, \n", + "\n", + "$y(x)=a_0+a_{1}x+a_{2}x^{2}+a_{4}x^{4}+a_{5}x^{5}+a_{5}x^{5}+a_{6}x^{6}+a_{7}x^{7}+a_{8}x^{8}$" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a_overfit =\n", + "\n", + " 2.1487e+01\n", + " -1.4264e+01\n", + " 1.5240e+01\n", + " -6.0483e+00\n", + " 1.1887e+00\n", + " -1.2651e-01\n", + " 7.4379e-03\n", + " -2.2702e-04\n", + " 2.8063e-06\n", + "\n" + ] + } + ], + "source": [ + "Z=[ones(size(x)) x x.^2 x.^3 x.^4 x.^5 x.^6 x.^7 x.^8];\n", + "a_overfit=Z\\y" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t40\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t60\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t80\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t100\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t120\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t15\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t20\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\tdata\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\tdata\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\tparabola\n", + "\n", + "\t\n", + "\t\tparabola\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\tn=8-fit\n", + "\n", + "\t\n", + "\t\tn=8-fit\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot(x,y,'o',x_fcn,a(1)+a(2)*x_fcn+a(3)*x_fcn.^2,...\n", + "x_fcn,a_overfit(1)+a_overfit(2)*x_fcn+a_overfit(3)*x_fcn.^2+...\n", + "a_overfit(4)*x_fcn.^3+a_overfit(5)*x_fcn.^4+...\n", + "a_overfit(6)*x_fcn.^5+a_overfit(7)*x_fcn.^6+...\n", + "a_overfit(8)*x_fcn.^7+a_overfit(9)*x_fcn.^8)\n", + "legend('data','parabola','n=8-fit','Location','NorthWest')" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "St = 27.923\n", + "Sr = 0.77320\n", + "the coefficient of determination for this fit is 0.972309\n", + "the correlation coefficient this fit is 0.986057\n" + ] + } + ], + "source": [ + "St=std(y)\n", + "Sr=std(y-polyval(a_overfit(end:-1:1),x))\n", + "r2=1-Sr/St;\n", + "r=sqrt(r2);\n", + "\n", + "fprintf('the coefficient of determination for this fit is %f',r2)\n", + "fprintf('the correlation coefficient this fit is %f',r)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Linear Regression is only limited by the ability to separate the parameters from the function to achieve\n", + "\n", + "$y=[Z]a$\n", + "\n", + "$Z$ can be any function of the independent variable(s)\n", + "\n", + "**Example**:\n", + "We assume two functions are added together, sin(t) and sin(3t). What are the amplitudes?\n", "\n" ] }, + { + "cell_type": "code", + "execution_count": 93, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t-1.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-1\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-0.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t2\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t4\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t6\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t8\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t12\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t14\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "load sin_data.csv\n", + "t=sin_data(1,:)';\n", + "y=sin_data(2,:)';\n", + "plot(t,y)" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a =\n", + "\n", + " 0.99727\n", + " 0.50251\n", + "\n" + ] + } + ], + "source": [ + "Z=[sin(t) sin(3*t)];\n", + "a=Z\\y" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "Gnuplot\n", + "Produced by GNUPLOT 5.0 patchlevel 3 \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\t\n", + "\t \n", + "\t \n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\t\n", + "\t\t-1.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-1\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t-0.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t1.5\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t0\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t2\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t4\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t6\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t8\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t10\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t12\n", + "\t\n", + "\n", + "\n", + "\t\t\n", + "\t\t14\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\tgnuplot_plot_1a\n", + "\n", + "\n", + "\n", + "\t \n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\t\n", + "\n", + "\t\n", + "\tgnuplot_plot_2a\n", + "\n", + "\t\n", + "\t\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot(t,y,'.',t,a(1)*sin(t)+a(2)*sin(3*t))" + ] + }, { "cell_type": "code", "execution_count": null,