Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
executable file 382 lines (276 sloc) 26.8 KB
\chapter{Validity and Reliability}
In the first section of this chapter we address the components, in the goals of validity and reliability, that are appropriate, according the Denzin and Lincoln\cite{norman2005sage}, for research in the social constructivist paradigm.
In the second section we highlight those portions of our methodology which were directed
towards those goals.
In the third section we describe these efforts as they took place in support of some published papers and a manuscript in preparation.
Other sources describing how decisions, such as categories, are influenced include Tversky and Kahneman\cite{tversky1981framing}.
\section{Introduction}
Several researchers have written advice about how to pursue validity and reliability.
Correspondingly, several requirements exist, at differing degrees of formality, for a qualitative research study to be valid.
Denzin and Lincoln are widely cited; we have adopted their advice.
We also follow the recommendations of Merriam\cite{merriam2009qualitative}.
%software programs that were used, to manage, organize data\\
This is intended to differ from quantitative research, in which the perspectives of the majority are thought to predominate, discounting viewpoints contributed especially by minorities.
Following the advice of Merriam\cite{Merriam2009}, in this section we report on
how (using the methods described in Chapter 3) we developed a degree of confidence in our findings, through
the multiplicity of practices and perspectives we used,
including
the software programs we used to manage and organize data, the parallel pipelined nature of collecting and analyzing data as the research proceeded, and our style of inductive and comparative analysis.
%analyze as we go\\
%inductive and comparative\\
%precisely how the analysis was done\\
%thorough explanation of any strategies, such as discourse analysis
\section{Validity and Reliability Goals in the Social Constructivist Paradigm}
Denzin and Lincoln \cite{norman2005sage} state that, for a constructivist paradigm, as our social constructivism is, the criteria are trustworthiness, credibility, transferability and confirmability.
Merriam\cite{merriam1995can} advises beyond these that a statement of researcher bias assists the reader's assessment of whether an interpretation is transferable to the reader's context of interest.
% Denzin and Lincoln \cite{norman2005sage} say, quoting Flick\cite{Flick 2002, p. 227} ``Triangulation is not a tool or a strategy of validation, but an alternative to validation.'' From Flick\cite{Flick 2002, p. 229} ``The combination of multiple methodological practices, empirical materials, perspectives, and observers in a single study is best understood, then, as a strategy that adds rigor, breadth, complexity, richness and depth to any inquiry.''
In this chapter we describe the components of our
strategy to make our work valid, reliable, trustworthy, credible, transferable and confirmable.
\subsection{Trustworthiness}
In their article ``But is it Rigorous: Trustworthiness and Authenticity in Naturalistic Evaluation'', Lincoln and Guba\cite[p. 18]{lincoln1986but} state \begin{quote}
The axiom concerned with the nature of ``truth'' statements demands that inquirers abandon the assumption that enduring, context-free truth statements -- generalizations -- can and should be sought. Rather, it asserts that all human behavior is time- and context-bound; this boundedness suggests that inquire is incapable of producing nomothetic knowledge but instead only idiographic ``working hypotheses'' that relate to a given and specific context.
\end{quote}
They observe that it is the naturalistic inquirer's obligation to provide a ``thick description'' of the sending context. What is ``thick'' enough is, that it enables a comparison between the context in which the data were found with a receiving context, in which the interpretation might be applied.
The criteria making up trustworthiness are
credibility, transferability, dependability and confirmability\cite[p. 18]{lincoln1986but}.
\subsection{Credibility}
Some criteria for judging the credibility of interpretations
include\cite{schwandt2007judging} persistent observation and prolonged engagement, triangulation (as can be obtained by asking others for their views) and searching for evidence that contradicts the interpretation.
To obtain inclusiveness, we sampled the population including representation of minorities to more than representative, see \ref{demog}. We used theoretical sampling to apparent saturation, see \ref{satur}. We used memos\ref{memo} and constant comparison\ref{constcom}, we used diverse sources (interview, homework)\ref{divsrcs}.
We performed member checking\ref{mchk}. We discussed preliminary findings with colleagues (peer examination); they found our results believable. Our findings were compatible with those of peer reviewed literature in math and computer science (see chapter 7 on Related Work).
We describe our population and context so that readers may judge transferability to their contexts.
\subsection{Transferability}
Lincoln and Guba \cite[p. 125]{lincoln1985naturalistic} state that a ``thick description of the sending context'' supports the ability of a reader to decide to what extent any given study may be applicable to a receiving context.
This description is found in section \ref{pop}.
Our population, sources and analysis methods have been described in Chapter 3, to aid readers in judging the extent to which the work may be generalizable.
\subsection{Confirmability}
Krefting\cite[p. 217]{krefting1991rigor} lists means of approaching confirmability. These include keeping an audit trail, using triangulation and reflexivity.
For audit trail, our methods are described to an extent that permits a replication study.
Our triangulation is described in section \ref{triang}.
Krefting's\cite[p. 218]{krefting1991rigor} reflexivity seems to match Merriam's\cite{merriam2009qualitative} statement of researcher bias.
We provide this in section \ref{bias}.
Our interview protocol is available.
Homework assignments included proofs of concepts treated in popular
textbooks in Discrete Mathematics, such as that by Epp\cite{epp2011discrete} and Hammack\cite{hammack2013book}.
Some confirmation has already been obtained, by consistency with related work.
\section{Approaches}
%from chapter 3\\
In this study we applied triangulation in several ways. \label{triang}
We interviewed faculty teaching the courses involving proofs. We interviewed TAs assisting in the courses involving proofs. The students in these courses are from our same population. To get an idea of the background preparation of our students, we substitute-taught geometry and algebra II classes in a high school. The high school population was quite similar to our university population, but differed by consisting almost entirely of domestic students, studying in their first language, and by having a larger percentage of women students, and of declared transgender students. Though the community served by this high school is diverse over socio-economic status, this component of diversity is probably greater in our university population.
Consistency with the work of other researchers is a check on the validity of an analysis.
In this study we compared our results with those achieved by some other researchers in computer science education and also by some researchers in the mathematics education community.
Checking possible interpretations is a technique that may aid in increasing confidence in validity.
We prepared a list of questions that was addressed by several faculty and several students, that began an examination of the role of specific representation styles (mathematical notation, figures and pseudocode) for proof related problem statements (see appendix \ref{qlist}). Results were reported in the combined data (see section \ref{combined}).
We used member checking of the summary report to contribute to validity.
\subsection{Purposefully Seeking Diversity}
We sought diversity in participants: students, TAs, instructors; in sources: interviews, homework, tests; over a period of time in two ways, by talking with new students, longer term students, employed former students, and students who left the major and by spending several years in the investigation.
Participants included persons identifying male and identifying female.
Participants included persons who were domestic students, and also international students.
Participants included students who remained in the major and some who left.
Participants included students who had been enrolled in school since childhood, and those who had taken time off.
Participants included current students, and employed former students.
Participants included students who were enrolled as students of computing and not mathematics, and students who were also enrolled in mathematics.
Participants included undergraduate and graduate students, and also professors.
The teaching assistants and professors were used as diverse observers.
\subsection{Diversity in Participants}
Having seen students in minority groups struggle, possibly related to cultural sensitivities, with proof, we care about the welfare of women, Latino/as and persons of color in computer science.
We included participants from each of these groups.
Noting that some of our Latino fellow students were declining offers for help, we inquired among friends and learned that consultation is a more culturally sensitive term.
Whether or not a student was a mathematics enthusiast was the only noticeable factor that differentiated the answers, in retrospect, of our participants.
The literature of mathematics education includes work on students' learning
about proof. Our work with computer science students has benefited from having some participants who are dual majors of math and computer science.
This offered us hints about
similarities and differences between students having more or less math background.
The significance of definitions, the necessity and utility of proof,
%the role
%played by interest in forming procedures and functions, the difference between
%functional and procedural programming have differed in these three cohorts,
%in so far as we have been able to examine. We did not explore the interest in
%developing procedures/functions or procedural or functional programming in
%mathematics majors who were not also computer science majors.
appeared different in these groups.
\subsubsection{Theoretical Sampling}
Theoretical sampling\cite{glaser1970theoretical} is a method of proceeding in qualitative investigation.
By performing some data collection and beginning analysis as soon as data are available, one obtains the opportunity
to allow earlier data to guide later collection of data.
Institutional Review Board (IRB) procedures must be followed.
Therefore, either the protocol is sufficiently broad
to enable variation in the collection, or multiple
protocols must be approved, to cover the range of questions that the investigation develops.
\subsubsection{Saturation}
Saturation is a state of affairs, that arises in co-occurring data collection and analysis, in which the incorporation of new data no longer results in collection of new ideas.
By collecting and analyzing data into saturation, we hope to avoid missing any conceptualizations.
With both the interviews and the documents, towards the end of collecting data the amount of material that sounded different dwindled. As analysis continued, the number of newly needed codes became reduced, and the rate of category creation vanished.
%\subsubsection{Purposefull Seeking Diversity}
\subsection{Diversity in Sources}
We combined interview data from students, TAs and faculty, and homework. We also included anonymous, aggregate data from classes and consultation sessions.
Interviews of students formed the core data.
We supplemented these with interviews of faculty and teaching assistants.
We compared these reports with evidence from homework, and written discussion
by students.
With gratitude to the participants, we
were able, as reported in section \ref{divsrcs} to interview students from any differing cultural, genetic, and international heritages, and economic backgrounds.
\subsubsection{Interviews}
Part of attaining validity is bracketing away researcher bias.
If the research were more reflective of researcher bias
than of data, surprise would rarely occur in the researcher.
Vignettes of validity include surprise associated with an outcome.
The experience of surprise contradicts the concern that researcher bias has merely obscured what information the data might contain.
For example, the earlier interpretation, that students were experiencing difficulty as they tried to generalize from an argument bound to concrete elements to an argument with free variables, to an argument bound to ideas that were more abstract, was replaced with a more surprising insight that some students are not engaging with the nature of argumentation as a means of convincing.
It was a surprising idea that these students are attempting to verify the correctness of a conclusion, and with concrete entities, they have their own wellsprings of confidence about the correctness of the conclusion and disengage from the argumentation. However, this does not give them a bridge to understanding an argument with free variables or variables bound to more abstract concepts.
\subsubsection{Documents}
The use of documents is thought to increase the multiplicity of our sources, thereby being a source of triangulation according to Denzin\cite{denzin1973research}.
One reason why homework may differ from interviews is the students' objectives.
What students write on assessments, such as homework, is taken by us to be the students' best efforts to earn credit, and therefore a reflection of what they hope is true.
%This can be seen to be different in nature from what students describe in interviews.
%For example, s
Students in interviews have been assured that revealing what they do not know will not hurt them, and students have benefited from revealing what they do not know, because they have been given explanations in a tutoring situation. For tests and homework it is probably the case that revealing what they do not know is thought to be counterproductive.
\subsubsection{Other Sources}
Crosschecking among sources, as described in section \ref{xchk},
is thought to be, like constant comparison, a source of
validity.
We crosschecked our own observations with those from faculty members having these students in courses in which proof was relevant.
We crosschecked with faculty of different institutions.
In both cases we found agreement and inspiration, but no disagreement.
We checked the published literature, to see whether findings were conflicting or compatible. Here also we found only agreement.
\subsection{Alternative Explanations}
The exercise of seeking alternative explanations was very useful. It forced us to find a differing perspective. This other perspective inspired new ideas.
\subsubsection{Looking for Supplementary Explanations and Supporting Data}
According to Patton\cite{patton1990qualitative}, credibility is gained by considering alternative explanations.
As seen in Chapter 5, our interpretation was carried out with concentration on finding alternative explanations.
For the question what is proof, in which we sought to gain insight into some students not being able to generalize from an argument about concrete entities to the form of the argument, the search for alternative explanations allowed to appear the supplementary interpretation that with concrete entities, some students might abandon paying attention to the argument as a source of confidence, substituting their own reasons for believing the conclusion.
\subsection{Consistency Checking}
The consolidation of collected diverse material into an internally consistent and externally believable whole is a product of qualitative research.
Merriam has recommended steps for achieving this.
\begin{itemize}
\item Merriam\cite[p.209]{merriam2009qualitative} states: ``Ensuring validity and reliability in qualitative research involves conducting the investigation in an ethical manner.''
We conducted our research under the supervision of the Institutional Review Board at the University of Connecticut, under several, similar protocols, H13-065, H14-112, and H15-022.
\item
Merriam\cite[p.210]{merriam2009qualitative} states: ``Regardless of the type of research, validity and reliability are concerns that can be approached through careful attention to a study's conceptualization and the way in which the data are collected, analyzed, and interpreted, and the way in which the findings are presented.''
\item
Adopting Denzin and Lincoln's\cite{norman2005sage} metaphor of quilting, we mentally check whether the quilt obtained by piecing together our insights from the analysis of interviews and documents, and from the viewpoints of students, assistants and instructors is connected and coherent and covers the space we are examining.
\item
Good practice\cite{norman2005sage} requires that we include the perspectives of all members of the domain for which we claim to apply.
Moreover, no members of this group should be marginalized.
This deliberate seeking of differing perspectives is called theoretical sampling.
The process of seeking these perspectives is judged concluded when
additional data seem to produce no additional codes.
This condition is called saturation.
(Addressed for this work, in Chapter 3.)
\item
A diversity of assessment methods should be used. (Addressed for this work, in Chapter 3.)
\item
Adherence to a methodology, that should be described in sufficient detail that readers could imagine carrying out the method, contributes to validity. (Addressed for this work, in Chapter 3.)
\item
Production of a report, that contains a summary description\cite{norman2005sage} of the results, contributes to validity. (Addressed for this work, as ``Navigating Through Space of Conceptualization of Proof'' in section \ref{thickNrich}.)
\item
Participant assessment of that summary, such that the authenticity of representation of the domain being described is confirmed by the people in that domain, contributes to validity. This process is called member-checking.
\item
Internal consistency, such as checked by constant comparison, and also external consistency, such as checked by others with a view to related data (in this case, several instructors experience the students' use of knowledge about proof).
\item
There has been much discussion\cite{merriam2009qualitative,lincoln1985naturalistic,wolcott1994transforming} about which terms should be used to address the concept of the bases for confidence in the results of qualitative study.
We choose the terms validity and reliability, and attempt to illustrate what we mean by those terms.
\item
In this research, we hope to do what Denzin and Lincoln \cite{norman2005sage} describe as: ``secure an in-depth understanding of the phenomenon in question''.
\end{itemize}
\subsubsection{Peer Review}
Merriam\cite[p. 220]{merriam2009qualitative} states that peer review is a useful source of validation.
Some papers resulting from this study have been published in conference proceedings.
Validation method applied and reported in those papers is summarized below.
\section{Specific Papers and Manuscripts}
For some published works and a manuscript in progress we can describe approaches to validity.
\subsection{Validity and Reliability in Proofs Using the Pumping Lemma for Regular Languages}
%should be FIE
In this paper, we proposed possible student conceptualizations that predicted
the errors we saw.
Pros and cons for validity for these proposed explanations were considered.
One conceptualization seemed to be that universality of a statement was not noticed, or not regarded as significant.
When a universal claim (for all $a \in A$) is treated as if it were an existential claim (there exists $a \in A$),
the application of unwarranted restrictions
(``Let $x$ be empty'', ``let $|xy|=p$'') is no longer problematic.
We predicted from this conjecture about insignificance of universality
that students would offer examples as proofs, would consider experimental evidence as helpful in the presence of a proof,
and would consider an algorithm or code demonstrating a result as adequate.
All of these predictions were consistent with our data and the data of others.
We check whether there are multiple variations of error that might
be explained by lack of appreciation of this components of the material
of discrete systems, and found some possibilities.
Some support for the validity of the results comes from seeing several variations
of each proposed error type. We found in our data multiple versions of
unwarranted restrictions: choosing $x$ to be empty or choosing the length of $xy$
to be $p$, and others. We found in literature warnings against attempts to prove
statements with universal qualifiers true by means of showing the existence
of examples \cite{devlin2012mathematical,Franklin}. These warnings suggest these errors have occurred before.
%Proof by contradiction for the purpose of showing such a
%statement true, i.e., that any particular tentative counterexample contained an
%inherent contradiction, is not itself universally accepted, due to not necessarily
%being constructive \cite[p. 2]{bridges2007did}. (Where is this intended to lead?)
We also found in our data, several versions
of misunderstanding inequalities. We found support in literature for errors of
misunderstanding how to work with inequalities, by students of this level\cite{Mattuck}.
Difficulties with inequalities is proposed as a useful grouping of several errors we found while examining documents.
We would like to consider inequalities
with student having difficult expressing ideas in a different representation from that in which the ideas were taught.
Inequalities are often illustrated.
(Some sketches)
It may provide some assurance of validity, when proposed conceptualization were well together to explain student progress.
Student difficulties with the pigeonhole principle offers an example of this synergy-based validity.
Students learning the pigeonhole principle sometimes apply a restriction that pigeons must be divided as evenly as possible among the nests.
They are understanding that no later than when the number of pigeons exceeds the number of nests, sharing must occur.
They use the evenly divided idea to find the first time at which we are certain that sharing or reuse occurs.
They do not always recognize that reuse can occur sooner than that earliest time at which we are certain it must have occurred.
So, we find evidence of this key idea of inequality being problematic in multiple
contexts.
Representation and inequality are acting with synergy in this explanation.
Variation theory supports our observation that comparing and contrasting fine
distinctions in material being taught aids the process of learning. We used
the difference between assignment and equality testing, manifest in the java
expression of ``=='' vs. ``=''. We compared a software procedure representation
with a mathematical formulation (the latter using only ``=''), for comprehensibility
by students. This helped us to see that barriers to student understanding
exist, for some students of computer science, at the level of formulation. It also
helped us see that the barrier between the internalization and interiorization
of Harel and Sowder\cite{harel1998students} might be less of a barrier in students of computer science
who are routinely conscious of the need to analyze procedures.
\subsection{Proof by Induction}
%is this Koli?
Here we see agreement among participants, and a spectrum of depth of understanding, which lend confidence to the interpretation.
We were encouraged by the overlap in description among interview participants.
The interviews were certainly not the same, but common elements,
specifically that there is a form to proofs by induction, appeared. Some students
referred to this form as steps, others as a procedure, or framework. Moreover,
degrees of understanding filled in a spectrum, from joyful deep understanding
to admissions of not understanding why the steps of proof by induction prove
anything, and conceptions in between. These included a supposition why a
proof of an induction step would, in combination with an established base case,
constitute a proof by induction, that its originator characterized as ``weird''.
%\section{Domain, Range, Mapping, Relation, Function, Equivalence Relation in Proofs}
%\section{Definitions, Language, Reasoning in Proofs}
%\subsection{Equivalence Class, Generic Particular, Abstraction in Proofs}
%Is this a paper, a manuscript?
%Using an analogy, I claim, is saying there is a set of relations among things $a_i$ that
%we agree upon, furthermore, I might wish to teach that there is a corresponding
%set of relations among things $b_i$. I might wish to say, use the relation we agree
%upon for municipalities provide addresses for homes that can be used for
%surface mail, and I might wish to teach that there is a corresponding provision
%of addresses for items a computer programmer might wish to use for storage
%and recall. We can note that addresses make use of a hierarchy of place names,
%countries, states, cities, streets, street numbers, apartment numbers. We can
%note that structured data types can correspondingly make use of instances and
%fields and indices that can be arranged in a tree.
%In the absence of abstraction, the surface mail address hierarchy might not pose
%much more difficulty, but the data structure might, because the fields therein
%are more subject to change than municipalities. In the absence of abstraction,
%the comparison between one hierarchical arrangement with another would be
%more difficult, because it is the structure of the abstraction itself, namely, the
%choices of features regarded as significant throughout the tree, that is to be
%recalled and used as a scaffold for the new information.
%``An alternative pathway towards abstraction involves recognizing an analogy
%between two structures in different domains, which then focuses one's attention
%on the abstract structure they share. This new abstraction then becomes a
%’concrete’ concept that one can study'' \cite [p 449]{}.
%\section{Validity for Transferrability to Other Contexts}
\section{Statement of Researcher Bias}\label{bias}
The researcher believes that students who are able to be successful in other courses in computer science will be able to learn the basics of proof, if they are willing to pay attention to all of the material presented in class, and carry out a reasonable number of well-chosen exercises.
The researcher finds differing mental experiences of the same information, presented in multiple representations, such as linear temporal logic and B\"uchi automata, or pseudocode and mathematical formulation, fascinating.
The researcher believes that emotional states and quality of sleep, in the students can have a significant effect upon the student's ability to understand and to remember.