Custom Search

Wednesday, November 10, 2010

Numerical Interpretation of Results

Numerical Interpretation of Results
Two key properties, referred to as descriptive statistics, come into play when we describe a set of data – or the results of our research.[ descriptive statistics numerical statements about the properties of data, such as the mean or standard deviation] These are the central tendency (what we usually call the average) and the amount of dispersion – or variation.[ central tendency measures of the ‘average’ (most commonly the mean, median and mode), which tell us what constitutes a typical value] Imagine a choreographer selecting a group of dancers for a performance supporting a lead dancer who has already been cast. The choreographer wants the supporting cast to be pretty much the same height as the lead dancer and also pretty much the same height as each other. So the choreographer is interested in the average height (which would need to be about the same as the lead dancer’s height) and the dispersion, or variation, in height (which would need to be close to zero). There are a number of ways in which the choreographer – or the psychologist – can measure central tendency (average) and dispersion. [dispersion measures of dispersion (most commonly range, standard deviation and variance) describe the distance of separate records or data points from each other]

Measures of central tendency
Measures of central tendency give us a typical value for our data. Clearly, ‘typical’ can mean different things. It could mean: n the average value; n the value associated with the most typical person; or n the most common value. [mean the sum of all the scores divided by the total number of scores] In fact, all three are used by researchers to describe central tendency, giving us the following measures: n The mean is the average value (response) calculated by summing all the values and dividing the total by the number of values.[ median the middle score of a ranked array – equal to the ((N 1)/2)th value, where N is the number of scores in the data set] n The median is the value with an equal number of values above and below it. So, if all values are ranked from 1 to N, the median is the ((N 1)/2)th value if N is odd. If N is even, the median is the mean of the two middle values. n The mode is the value that occurs most frequently in a given data set. [mode the most commonly occurring score in a set of data]

Measures of dispersion
We might also want to describe the typical distance of responses from one another – that is, how tightly they are clustered around the central point. This is typically established using one of two measures. The first and probably most obvious is the range of responses – the difference between the maximum and minimum values. But in fact the most commonly used measure of dispersion is standard deviation (SD). [standard deviation the square root of
the sum of the squares of all the differences (deviations) between each score and the mean, divided by the number of scores (or the number of scores minus 1 for a population estimate)] This is equal to the square root of the sum of the squares of all the differences (deviations) between each score and the mean, divided by the number of scores (in fact, the number of scores minus one if we want a population estimate, as we usually do). If this sounds complex, do not be too concerned: scientific calculators allow you to compute standard deviations very easily. The square of the standard deviation is called the variance.[ variance the mean of the sum of squared differences between a set
of scores and the mean of that set of scores; the square of the standard deviation]

Generalization of Results
Although psychologists often spend a lot of time studying the behaviour of samples, most of the time they want to generalize their results to say something about a whole population – often called the underlying population. Knowing how ten particular people are going to vote in an election may be interesting in itself, but it is even more interesting if it tells us who is likely to win the next election. But how can we make inferences of this sort confidently? By using inferential statistics we can make statements about underlying populations based on detailed knowledge of the sample we study and the nature of random processes. [inferential statistics numerical techniques used to estimate the probability that purely random sampling from an experimental population of interest can yield a sample such as the one obtained in the research study] The key point here is that, while random processes are (as the name tells us) random, in the long run they are highly predictable. Not convinced? Toss a coin. Clearly, there is no way that we can confidently predict whether it is going to come down heads or tails. But if we were to toss the coin fifty times, we could predict, reasonably accurately, that we would get around twenty-five heads. The more tosses we make, the more certain we can be that around about 50 per cent of the tosses will come up heads (and it is this certainty that makes the business of running casinos very profitable).


Of course, psychologists do not usually study coin tosses, but exactly the same principles apply to things they do study. For example, the mean IQ is 100 (with an SD of 15), so we know that if we study a large number of people, about 50 per cent will have an IQ greater than 100. So if we get data from 100 people (e.g. a class of psychology students) and find that all of them have IQs greater than 100, we can infer with some confidence that there is something psychologically ‘special’ about this sample. Our inference will take the form of a statement to the effect that the pattern we observe in our sample is ‘unlikely to have arisen as a result of randomly selecting (sampling) people from the population’. In this case, we know this is true, because we know that psychology students are not selected randomly from the population but are selected on the basis of their performance on tests related to IQ. But even if we did not know this, we would be led by the evidence to make an inference of this kind. Inferential statistics allow researchers to quantify the probability that the findings are caused by random influences rather than a ‘real’ effect or process. We do this by comparing the distribution obtained in an empirical investigation with the distribution suggested by statistical theory – in this case the normal distribution. We then make predictions about what the distributions would look like if certain assumptions (regarding the lack of any real effect on the data) were true. If the actual distribution looks very different from the one we expect, then we become more confident that those assumptions are wrong, and there is in fact a real effect or process operating. For example, the distribution of the mean IQ score of groups of people drawn from the population tends to have a particular shape. This is what we mean by the normal distribution. [normal distribution the symmetrical, bell-shaped spread of scores obtained when scores on a variable are randomly distributed around a mean] If a particular set of data does not look as though it fits the (expected) normal distribution, then we would start to wonder if the data really can be assumed to have been drawn at random from the population in question. So if you drew a sample of 100 people from a population and found that their mean IQ was 110, you can be fairly sure that they were not randomly drawn from a population with a mean of 100. Indeed, the normal distribution shows us that the likelihood of an event as extreme as this is less than one in a thousand.


Interpretation of Results
When we use inferential statistics, we might be in a position to make exact probability statements (as in the coin tossing example), but more usually we have to use a test statistic. Two things influence our judgement about whether a given observation is in any sense remarkable: 1. the information that something is ‘going on’; and 2. the amount of random error in our observations. In the IQ example, information comes from the fact that scores are above the mean, and random error relates to variation in the scores of individual people in the sample. For this reason, the statistics we normally use in psychology contain both an information term and an error term, and express one as a ratio of the other. So the test statistic will yield a high value (suggesting that something remarkable is going on) when there is relatively more information than error, and a low value (suggesting that nothing remarkable is going on) when there is more error than information.
Imagine we gave an IQ test to a class of 30 children and obtained a mean IQ of 120. How do we find out the statistical likelihood that the class mean differs reliably from the expected population mean? In other words, are we dealing here with a class of ‘smart kids’, whose performance has been enhanced above the expected level by some factor or combination of factors? Or is this difference from the population mean of 100 simply due to random variation, such as you might observe if you tossed a coin 30 times, and it came up heads 20 times? A statistical principl known as the law of large numbers tells us that uncertainty is reduced by taking many measurements of the same thing (e.g. making 50 coin tosses rather than one). [law of large numbers the idea that the average outcomes of random processes are more stable and predictable with large samples than with small samples] It means, for example, that although around 9 per cent of the population have IQs over 120, far fewer than 9 per cent of classes of 30 randomly selected students will have a mean IQ over 120. This statistical knowledge makes us more confident that if we do find such a class, this is highly unlikely to be a chance event. It tells us instead that these children are performing considerably higher than might be expected. We can summarize the process here as one of deciding where the sample mean lies in relation to the population mean. If there is a very low chance of sampling that mean from the population we conclude that the sample is probably not drawn from that population but instead belongs to another population. Perhaps more intelligent students were assigned to this class by the school authorities, or perhaps they came from an area where education funding was especially good. In short, we cannot be sure what the explanation is, but we can be relatively sure that there is something to be explained and this is the purpose of conducting statistical tests.

Think back to our ‘memory training study’, in which one group of participants in an experimental condition experience a new training method and another group in a control condition do not, then both groups take a memory test. Common sense tells us that we are likely to get two sets of memory scores – one for the experimental condition, one for the control – with different means. But how do we decide whether the difference is big enough to be meaningful? This is where inferential statistics come into play. Appropriate statistical procedures allow us to decide how likely it is that this difference could occur by chance alone. If that likelihood is sufficiently low (typically less than 1 in 20 or 5 per cent), we would reject the null hypothesis (expressed as 0)
that there is no difference between the means and that the manipulation of the independent variable has had no effect. Instead we would conclude that the manipulation of the IV has had a significant impact on the dependent variable – that is, that training does indeed improve memory. This process is typically referred to as significance testing, and this is one of the main approaches to statistical inference. While statistical tests can never tell us whether our results are due to chance, they can guide us in judging whether chance is a plausible explanation.
How does significance testing work in this case – that is, when comparing two means? In essence it comes down to the difference between the means relative to the variation around those means and the number of responses on which the means are based. The statistics that we calculate for comparing means are called t and F statistics. A large t or F statistic means there is a small probability that a difference as big as the one we have obtained could have occurred by randomly selecting two groups from the same population (i.e. it is not likely that the difference is due to chance). If that probability is sufficiently small, we conclude that there probably is a real difference between the means – in other words, that the difference is statistically significant.

No comments: