IB Questionbank

User interface language: English | Español

HL Paper 3

This question explores methods to determine the area bounded by an unknown curve.

The curve $y = f\left( x \right)$ is shown in the graph, for $0 \leqslant x \leqslant 4.4$ .

The curve $y = f\left( x \right)$ passes through the following points.

It is required to find the area bounded by the curve, the $x$ -axis, the $y$ -axis and the line $x = 4.4$ .

One possible model for the curve $y = f\left( x \right)$ is a cubic function.

A second possible model for the curve $y = f\left( x \right)$ is an exponential function, $y = p{{\text{e}}^{qx}}$ , where $p{\text{,}}\,\,q \in \mathbb{R}$ .

Use the trapezoidal rule to find an estimate for the area.

[3]

a.i.

With reference to the shape of the graph, explain whether your answer to part (a)(i) will be an over-estimate or an underestimate of the area.

[2]

a.ii.

Use all the coordinates in the table to find the equation of the least squares cubic regression curve.

[3]

b.i.

Write down the coefficient of determination.

[1]

b.ii.

Write down an expression for the area enclosed by the cubic function, the $x$ -axis, the $y$ -axis and the line $x = 4.4$ .

[2]

c.i.

Find the value of this area.

[2]

c.ii.

Show that ${\text{ln}}\,y = qx + {\text{ln}}\,p$ .

[2]

d.i.

Hence explain how a straight line graph could be drawn using the coordinates in the table.

[1]

d.ii.

By finding the equation of a suitable regression line, show that $p = 1.83$ and $q = 0.986$ .

[5]

d.iii.

Hence find the area enclosed by the exponential function, the $x$ -axis, the $y$ -axis and the line $x = 4.4$ .

[2]

d.iv.

In this question you will explore possible models for the spread of an infectious disease

An infectious disease has begun spreading in a country. The National Disease Control Centre (NDCC) has compiled the following data after receiving alerts from hospitals.

A graph of $n$ against $d$ is shown below.

The NDCC want to find a model to predict the total number of people infected, so they can plan for medicine and hospital facilities. After looking at the data, they think an exponential function in the form $n = a{b^d}$ could be used as a model.

Use your answer to part (a) to predict

The NDCC want to verify the accuracy of these predictions. They decide to perform a ${\chi ^2}$ goodness of fit test.

The predictions given by the model for the first five days are shown in the table.

In fact, the first day when the total number of people infected is greater than 1000 is day 14, when a total of 1015 people are infected.

Based on this new data, the NDCC decide to try a logistic model in the form $n = \frac{L}{{1 + c{e^{ - kd}}}}$ .

Use the data from days 1–5, together with day 14, to find the value of

Use an exponential regression to find the value of $a$ and of $b$ , correct to 4 decimal places.

[3]

the number of new people infected on day 6.

[3]

b.i.

the day when the total number of people infected will be greater than 1000.

[2]

b.ii.

Use your answer to part (a) to show that the model predicts 16.7 people will be infected on the first day.

[1]

Explain why the number of degrees of freedom is 2.

[2]

d.i.

Perform a ${\chi ^2}$ goodness of fit test at the 5% significance level. You should clearly state your hypotheses, the p-value, and your conclusion.

[5]

d.ii.

Give two reasons why the prediction in part (b)(ii) might be lower than 14.

[2]

$L$ .

[2]

f.i.

$c$ .

[1]

f.ii.

$k$ .

[1]

f.iii.

Hence predict the total number of people infected by this disease after several months.

[2]

Use the logistic model to find the day when the rate of increase of people infected is greatest.

[3]

A smartphone’s battery life is defined as the number of hours a fully charged battery can be used before the smartphone stops working. A company claims that the battery life of a model of smartphone is, on average, 9.5 hours. To test this claim, an experiment is conducted on a random sample of 20 smartphones of this model. For each smartphone, the battery life, $b$ hours, is measured and the sample mean, ${\bar b}$ , calculated. It can be assumed the battery lives are normally distributed with standard deviation 0.4 hours.

It is then found that this model of smartphone has an average battery life of 9.8 hours.

State suitable hypotheses for a two-tailed test.

[1]

Find the critical region for testing ${\bar b}$ at the 5 % significance level.

[4]

Find the probability of making a Type II error.

[3]

Another model of smartphone whose battery life may be assumed to be normally distributed with mean μ hours and standard deviation 1.2 hours is tested. A researcher measures the battery life of six of these smartphones and calculates a confidence interval of [10.2, 11.4] for μ.

Calculate the confidence level of this interval.

[4]

This question will connect Markov chains and directed graphs.

Abi is playing a game that involves a fair coin with heads on one side and tails on the other, together with two tokens, one with a fish’s head on it and one with a fish’s tail on it. She starts off with no tokens and wishes to win them both. On each turn she tosses the coin, if she gets a head she can claim the fish’s head token, provided that she does not have it already and if she gets a tail she can claim the fish’s tail token, provided she does not have it already. There are 4 states to describe the tokens in her possession; A: no tokens, B: only a fish’s head token, C: only a fish’s tail token, D: both tokens. So for example if she is in state B and tosses a tail she moves to state D, whereas if she tosses a head she remains in state B.

After $n$ throws the probability vector, for the 4 states, is given by ${{\mathbf{v}}_n} = \left( {\begin{array}{*{20}{c}} {{a_n}} \\ {{b_n}} \\ {{c_n}} \\ {{d_n}} \end{array}} \right)$ where the numbers represent the probability of being in that particular state, e.g. ${b_n}$ is the probability of being in state B after $n$ throws. Initially ${{\mathbf{v}}_0} = \left( {\begin{array}{*{20}{c}} 1 \\ 0 \\ 0 \\ 0 \end{array}} \right)$ .

Draw a transition state diagram for this Markov chain problem.

[3]

a.i.

Explain why for any transition state diagram the sum of the out degrees of the directed edges from a vertex (state) must add up to +1.

[1]

a.ii.

Write down the transition matrix M, for this Markov chain problem.

[3]

Find the steady state probability vector for this Markov chain problem.

[4]

c.i.

Explain which part of the transition state diagram confirms this.

[1]

c.ii.

Explain why having a steady state probability vector means that the matrix M must have an eigenvalue of $\lambda = 1$ .

[2]

Find ${{\mathbf{v}}_1}{\text{,}}\,\,{{\mathbf{v}}_2}{\text{,}}\,\,{{\mathbf{v}}_3}{\text{,}}\,\,{{\mathbf{v}}_4}\,$ .

[4]

Hence, deduce the form of ${{\mathbf{v}}_n}$ .

[2]

Explain how your answer to part (f) fits with your answer to part (c).

[2]

Find the minimum number of tosses of the coin that Abi will have to make to be at least 95% certain of having finished the game by reaching state C.

[4]

This question explores models for the height of water in a cylindrical container as water drains out.

The diagram shows a cylindrical water container of height $3.2$ metres and base radius $1$ metre. At the base of the container is a small circular valve, which enables water to drain out.

Eva closes the valve and fills the container with water.

At time $t = 0$ , Eva opens the valve. She records the height, $h$ metres, of water remaining in the container every $5$ minutes.

Eva first tries to model the height using a linear function, $h (t) = a t + b$ , where $a, b \in ℝ$ .

Eva uses the equation of the regression line of $h$ on $t$ , to predict the time it will take for all the water to drain out of the container.

Eva thinks she can improve her model by using a quadratic function, $h (t) = p t^{2} + q t + r$ , where $p, q, r \in ℝ$ .

Eva uses this equation to predict the time it will take for all the water to drain out of the container and obtains an answer of $k$ minutes.

Let $V$ be the volume, in cubic metres, of water in the container at time $t$ minutes.
Let $R$ be the radius, in metres, of the circular valve.

Eva does some research and discovers a formula for the rate of change of $V$ .

$\frac{d V}{d t} = - π R^{2} \sqrt{70 560 h}$

Eva measures the radius of the valve to be $0.023$ metres. Let $T$ be the time, in minutes, it takes for all the water to drain out of the container.

Eva wants to use the container as a timer. She adjusts the initial height of water in the container so that all the water will drain out of the container in $15$ minutes.

Eva has another water container that is identical to the first one. She places one water container above the other one, so that all the water from the highest container will drain into the lowest container. Eva completely fills the highest container, but only fills the lowest container to a height of $1$ metre, as shown in the diagram.

At time $t = 0$ Eva opens both valves. Let $H$ be the height of water, in metres, in the lowest container at time $t$ .

Find the equation of the regression line of $h$ on $t$ .

[2]

a.i.

Interpret the meaning of parameter $a$ in the context of the model.

[1]

a.ii.

Suggest why Eva’s use of the linear regression equation in this way could be unreliable.

[1]

a.iii.

Find the equation of the least squares quadratic regression curve.

[1]

b.i.

Find the value of $k$ .

[2]

b.ii.

Hence, write down a suitable domain for Eva’s function $h (t) = p t^{2} + q t + r$ .

[1]

b.iii.

Show that $\frac{d h}{d t} = - R^{2} \sqrt{70 560 h}$ .

[3]

By solving the differential equation $\frac{d h}{d t} = - R^{2} \sqrt{70 560 h}$ , show that the general solution is given by $h = 17 640 {(c - R^{2} t)}^{2}$ , where $c \in ℝ$ .

[5]

Use the general solution from part (d) and the initial condition $h (0) = 3.2$ to predict the value of $T$ .

[4]

Find this new height.

[3]

Show that $\frac{d H}{d t} \approx 0.2514 - 0.009873 t - 0.1405 \sqrt{H}$ , where $0 \leq t \leq T$ .

[4]

g.i.

Use Euler’s method with a step length of $0.5$ minutes to estimate the maximum value of $H$ .

[3]

g.ii.

This question uses statistical tests to investigate whether advertising leads to increased profits for a grocery store.

Aimmika is the manager of a grocery store in Nong Khai. She is carrying out a statistical analysis on the number of bags of rice that are sold in the store each day. She collects the following sample data by recording how many bags of rice the store sells each day over a period of $90$ days.

She believes that her data follows a Poisson distribution.

Aimmika knows from her historic sales records that the store sells an average of $4.2$ bags of rice each day. The following table shows the expected frequency of bags of rice sold each day during the $90$ day period, assuming a Poisson distribution with mean $4.2$ .

Aimmika decides to carry out a $χ^{2}$ goodness of fit test at the $5 %$ significance level to see whether the data follows a Poisson distribution with mean $4.2$ .

Aimmika claims that advertising in a local newspaper for $300$ Thai Baht $(THB)$ per day will increase the number of bags of rice sold. However, Nichakarn, the owner of the store, claims that the advertising will not increase the store’s overall profit.

Nichakarn agrees to advertise in the newspaper for the next $60$ days. During that time, Aimmika records that the store sells $282$ bags of rice with a profit of $495 THB$ on each bag sold.

Aimmika wants to carry out an appropriate hypothesis test to determine whether the number of bags of rice sold during the $60$ days increased when compared with the historic sales records.

Find the mean and variance for the sample data given in the table.

[2]

a.i.

Hence state why Aimmika believes her data follows a Poisson distribution.

[1]

a.ii.

State one assumption that Aimmika needs to make about the sales of bags of rice to support her belief that it follows a Poisson distribution.

[1]

Find the value of $a$ , of $b$ , and of $c$ . Give your answers to $3$ decimal places.

[5]

Write down the number of degrees of freedom for her test.

[1]

d.i.

Perform the $χ^{2}$ goodness of fit test and state, with reason, a conclusion.

[7]

d.ii.

By finding a critical value, perform this test at a $5 %$ significance level.

[6]

e.i.

Hence state the probability of a Type I error for this test.

[1]

e.ii.

By considering the claims of both Aimmika and Nichakarn, explain whether the advertising was beneficial to the store.

[3]

This question explores methods to analyse the scores in an exam.

A random sample of 149 scores for a university exam are given in the table.

The university wants to know if the scores follow a normal distribution, with the mean and variance found in part (a).

The expected frequencies are given in the table.

The university assigns a pass grade to students whose scores are in the top 80%.

The university also wants to know if the exam is gender neutral. They obtain random samples of scores for male and female students. The mean, sample variance and sample size are shown in the table.

The university awards a distinction to students who achieve high scores in the exam. Typically, 15% of students achieve a distinction. A new exam is trialed with a random selection of students on the course. 5 out of 20 students achieve a distinction.

A different exam is trialed with 16 students. Let $p$ be the percentage of students achieving a distinction. It is desired to test the hypotheses

${H_0}\,{\text{:}}\,p = 0.15$ against ${H_1}\,{\text{:}}\,p > 0.15$

It is decided to reject the null hypothesis if the number of students achieving a distinction is greater than 3.

Find unbiased estimates for the population mean.

[1]

a.i.

Find unbiased estimates for the population Variance.

[2]

a.ii.

Show that the expected frequency for 20 < $x$ ≤ 4 is 31.5 correct to 1 decimal place.

[3]

Perform a suitable test, at the 5% significance level, to determine if the scores follow a normal distribution, with the mean and variance found in part (a). You should clearly state your hypotheses, the degrees of freedom, the p-value and your conclusion.

[8]

Use the normal distribution model to find the score required to pass.

[2]

Perform a suitable test, at the 5% significance level, to determine if there is a difference between the mean scores of males and females. You should clearly state your hypotheses, the p-value and your conclusion.

[6]

Perform a suitable test, at the 5% significance level, to determine if it is easier to achieve a distinction on the new exam. You should clearly state your hypotheses, the critical region and your conclusion.

[6]

Find the probability of making a Type I error.

[3]

g.i.

Given that $p = 0.2$ find the probability of making a Type II error.

[3]

g.ii.

Juliet is a sociologist who wants to investigate if income affects happiness amongst doctors. This question asks you to review Juliet’s methods and conclusions.

Juliet obtained a list of email addresses of doctors who work in her city. She contacted them and asked them to fill in an anonymous questionnaire. Participants were asked to state their annual income and to respond to a set of questions. The responses were used to determine a happiness score out of $100$ . Of the $415$ doctors on the list, $11$ replied.

Juliet’s results are summarized in the following table.

For the remaining ten responses in the table, Juliet calculates the mean happiness score to be $52.5$ .

Juliet decides to carry out a hypothesis test on the correlation coefficient to investigate whether increased annual income is associated with greater happiness.

Juliet wants to create a model to predict how changing annual income might affect happiness scores. To do this, she assumes that annual income in dollars, $X$ , is the independent variable and the happiness score, $Y$ , is the dependent variable.

She first considers a linear model of the form

$Y = a X + b$ .

Juliet then considers a quadratic model of the form

$Y = c X^{2} + d X + e$ .

After presenting the results of her investigation, a colleague questions whether Juliet’s sample is representative of all doctors in the city.

A report states that the mean annual income of doctors in the city is $$ 80 000$ . Juliet decides to carry out a test to determine whether her sample could realistically be taken from a population with a mean of $$ 80 000$ .

Describe one way in which Juliet could improve the reliability of her investigation.

[1]

a.i.

Describe one criticism that can be made about the validity of Juliet’s investigation.

[1]

a.ii.

Juliet classifies response $K$ as an outlier and removes it from the data. Suggest one possible justification for her decision to remove it.

[1]

Calculate the mean annual income for these remaining responses.

[2]

c.i.

Determine the value of $r$ , Pearson’s product-moment correlation coefficient, for these remaining responses.

[2]

c.ii.

State why the hypothesis test should be one-tailed.

[1]

d.i.

State the null and alternative hypotheses for this test.

[2]

d.ii.

The critical value for this test, at the $5 %$ significance level, is $0.549$ . Juliet assumes that the population is bivariate normal.

Determine whether there is significant evidence of a positive correlation between annual income and happiness. Justify your answer.

[2]

d.iii.

Use Juliet’s data to find the value of $a$ and of $b$ .

[1]

e.i.

Interpret, referring to income and happiness, what the value of $a$ represents.

[1]

e.ii.

Find the value of $c$ , of $d$ and of $e$ .

[1]

e.iii.

Find the coefficient of determination for each of the two models she considers.

[2]

e.iv.

Hence compare the two models.

[1]

e.v.

Juliet decides to use the coefficient of determination to choose between these two models.

Comment on the validity of her decision.

[1]

e.vi.

State the name of the test which Juliet should use.

[1]

f.i.

State the null and alternative hypotheses for this test.

[1]

f.ii.

Perform the test, using a $5 %$ significance level, and state your conclusion in context.

[3]

f.iii.

A random variable $X$ has a distribution with mean $\mu$ and variance 4. A random sample of size 100 is to be taken from the distribution of $X$ .

Josie takes a different random sample of size 100 to test the null hypothesis that $\mu = 60$ against the alternative hypothesis that $\mu > 60$ at the 5 % level.

State the central limit theorem as applied to a random sample of size $n$ , taken from a distribution with mean $\mu$ and variance ${\sigma ^2}$ .

[2]

Jack takes a random sample of size 100 and calculates that $\bar x = 60.2$ . Find an approximate 90 % confidence interval for $\mu$ .

[2]

Find the critical region for Josie’s test, giving your answer correct to two decimal places.

[4]

c.i.

Write down the probability that Josie makes a Type I error.

[1]

c.ii.

Given that the probability that Josie makes a Type II error is 0.25, find the value of $\mu$ , giving your answer correct to three significant figures.

[5]

c.iii.

A firm wishes to review its recruitment processes. This question considers the validity and reliability of the methods used.

Every year an accountancy firm recruits new employees for a trial period of one year from a large group of applicants.

At the start, all applicants are interviewed and given a rating. Those with a rating of either Excellent, Very good or Good are recruited for the trial period. At the end of this period, some of the new employees will stay with the firm.

It is decided to test how valid the interview rating is as a way of predicting which of the new employees will stay with the firm.

Data is collected and recorded in a contingency table.

The next year’s group of applicants are asked to complete a written assessment which is then analysed. From those recruited as new employees, a random sample of size $18$ is selected.

The sample is stratified by department. Of the $91$ new employees recruited that year, $55$ were placed in the national department and $36$ in the international department.

At the end of their first year, the level of performance of each of the $18$ employees in the sample is assessed by their department manager. They are awarded a score between $1$ (low performance) and $10$ (high performance).

The marks in the written assessment and the scores given by the managers are shown in both the table and the scatter diagram.

The firm decides to find a Spearman’s rank correlation coefficient, $r_{s}$ , for this data.

The same seven employees are given the written assessment a second time, at the end of the first year, to measure its reliability. Their marks are shown in the table below.

The written assessment is in five sections, numbered $1$ to $5$ . At the end of the year, the employees are also given a score for each of five professional attributes: $V, W, X, Y$ and $Z$ .

The firm decides to test the hypothesis that there is a correlation between the mark in a section and the score for an attribute.

They compare marks in each of the sections with scores for each of the attributes.

Use an appropriate test, at the $5 %$ significance level, to determine whether a new employee staying with the firm is independent of their interview rating. State the null and alternative hypotheses, the $p$ -value and the conclusion of the test.

[6]

Show that $11$ employees are selected for the sample from the national department.

[2]

Without calculation, explain why it might not be appropriate to calculate a correlation coefficient for the whole sample of $18$ employees.

[2]

c.i.

Find $r_{s}$ for the seven employees working in the international department.

[4]

c.ii.

Hence comment on the validity of the written assessment as a measure of the level of performance of employees in this department. Justify your answer.

[2]

c.iii.

State the name of this type of test for reliability.

[1]

d.i.

For the data in this table, test the null hypothesis, $H_{0} : ρ = 0$ , against the alternative hypothesis, $H_{1} : ρ > 0$ , at the $5 %$ significance level. You may assume that all the requirements for carrying out the test have been met.

[4]

d.ii.

Hence comment on the reliability of the written assessment.

[1]

d.iii.

Write down the number of tests they carry out.

[1]

e.i.

The tests are performed at the $5 %$ significance level.

Assuming that:

there is no correlation between the marks in any of the sections and scores in any of the attributes,
the outcome of each hypothesis test is independent of the outcome of the other hypothesis tests,

find the probability that at least one of the tests will be significant.

[4]

e.ii.

The firm obtains a significant result when comparing section $2$ of the written assessment and attribute $X$ . Interpret this result.

[1]

e.iii.

The random variables $U,{\text{ }}V$ follow a bivariate normal distribution with product moment correlation coefficient $\rho$ .

A random sample of 12 observations on U, V is obtained to determine whether there is a correlation between U and V. The sample product moment correlation coefficient is denoted by r. A test to determine whether or not U, V are independent is carried out at the 1% level of significance.

State suitable hypotheses to investigate whether or not $U$ , $V$ are independent.

[2]

Find the least value of $|r|$ for which the test concludes that $\rho \ne 0$ .

[6]

A farmer sells bags of potatoes which he states have a mean weight of 7 kg . An inspector, however, claims that the mean weight is less than 7 kg . In order to test this claim, the inspector takes a random sample of 12 of these bags and determines the weight, $x$ kg , of each bag. He finds that $\sum {x = 83.64;{\text{ }}\sum {{x^2} = 583.05.} }$ You may assume that the weights of the bags of potatoes can be modelled by the normal distribution ${\text{N}}(\mu ,{\text{ }}{\sigma ^2})$ .

State suitable hypotheses to test the inspector’s claim.

[1]

Find unbiased estimates of $\mu$ and ${\sigma ^2}$ .

[3]

Carry out an appropriate test and state the $p$ -value obtained.

[4]

c.i.

Using a 10% significance level and justifying your answer, state your conclusion in context.

[2]

c.ii.

Two IB schools, A and B, follow the IB Diploma Programme but have different teaching methods. A research group tested whether the different teaching methods lead to a similar final result.

For the test, a group of eight students were randomly selected from each school. Both samples were given a standardized test at the start of the course and a prediction for total IB points was made based on that test; this was then compared to their points total at the end of the course.

Previous results indicate that both the predictions from the standardized tests and the final IB points can be modelled by a normal distribution.

It can be assumed that:

the standardized test is a valid method for predicting the final IB points
that variations from the prediction can be explained through the circumstances of the student or school.

The data for school A is shown in the following table.

For each student, the change from the predicted points to the final points $\left( {f - p} \right)$ was calculated.

The data for school B is shown in the following table.

School A also gives each student a score for effort in each subject. This effort score is based on a scale of 1 to 5 where 5 is regarded as outstanding effort.

It is claimed that the effort put in by a student is an important factor in improving upon their predicted IB points.

A mathematics teacher in school A claims that the comparison between the two schools is not valid because the sample for school B contained mainly girls and that for school A, mainly boys. She believes that girls are likely to show a greater improvement from their predicted points to their final points.

She collects more data from other schools, asking them to class their results into four categories as shown in the following table.

Identify a test that might have been used to verify the null hypothesis that the predictions from the standardized test can be modelled by a normal distribution.

[1]

State why comparing only the final IB points of the students from the two schools would not be a valid test for the effectiveness of the two different teaching methods.

[1]

Find the mean change.

[1]

c.i.

Find the standard deviation of the changes.

[2]

c.ii.

Use a paired $t$ -test to determine whether there is significant evidence that the students in school A have improved their IB points since the start of the course.

[4]

Use an appropriate test to determine whether there is evidence, at the 5 % significance level, that the students in school B have improved more than those in school A.

[5]

e.i.

State why it was important to test that both sets of points were normally distributed.

[1]

e.ii.

Perform a test on the data from school A to show it is reasonable to assume a linear relationship between effort scores and improvements in IB points. You may assume effort scores follow a normal distribution.

[3]

f.i.

Hence, find the expected improvement between predicted and final points for an increase of one unit in effort grades, giving your answer to one decimal place.

[1]

f.ii.

Use an appropriate test to determine whether showing an improvement is independent of gender.

[6]

If you were to repeat the test performed in part (e) intending to compare the quality of the teaching between the two schools, suggest two ways in which you might choose your sample to improve the validity of the test.

[2]

The weights, X kg, of the males of a species of bird may be assumed to be normally distributed with mean 4.8 kg and standard deviation 0.2 kg.

The weights, Y kg, of female birds of the same species may be assumed to be normally distributed with mean 2.7 kg and standard deviation 0.15 kg.

Find the probability that a randomly chosen male bird weighs between 4.75 kg and 4.85 kg.

[1]

Find the probability that the weight of a randomly chosen male bird is more than twice the weight of a randomly chosen female bird.

[6]

Two randomly chosen male birds and three randomly chosen female birds are placed on a weighing machine that has a weight limit of 18 kg. Find the probability that the total weight of these five birds is greater than the weight limit.

[4]

Mr Sailor owns a fish farm and he claims that the weights of the fish in one of his lakes have a mean of 550 grams and standard deviation of 8 grams.

Assume that the weights of the fish are normally distributed and that Mr Sailor’s claim is true.

Kathy is suspicious of Mr Sailor’s claim about the mean and standard deviation of the weights of the fish. She collects a random sample of fish from this lake whose weights are shown in the following table.

Using these data, test at the 5% significance level the null hypothesis ${H_0}\,{\text{:}}\,\mu = 550$ against the alternative hypothesis ${H_1}\,{\text{:}}\,\mu < 550$ , where $\mu$ grams is the population mean weight.

Kathy decides to use the same fish sample to test at the 5% significance level whether or not there is a positive association between the weights and the lengths of the fish in the lake. The following table shows the lengths of the fish in the sample. The lengths of the fish can be assumed to be normally distributed.

Find the probability that a fish from this lake will have a weight of more than 560 grams.

[2]

a.i.

The maximum weight a hand net can hold is 6 kg. Find the probability that a catch of 11 fish can be carried in the hand net.

[4]

a.ii.

State the distribution of your test statistic, including the parameter.

[2]

b.i.

Find the p-value for the test.

[2]

b.ii.

State the conclusion of the test, justifying your answer.

[2]

b.iii.

State suitable hypotheses for the test.

[1]

c.i.

Find the product-moment correlation coefficient $r$ .

[2]

c.ii.

State the p-value and interpret it in this context.

[3]

c.iii.

Use an appropriate regression line to estimate the weight of a fish with length 360 mm.

[3]

The times $t$ , in minutes, taken by a random sample of 75 workers of a company to travel to work can be summarized as follows

$\sum {t = 2165}$ , $\sum {{t^2} = 76475}$ .

Let $T$ be the random variable that represents the time taken to travel to work by a worker of this company.

Find unbiased estimates of the mean of $T$ .

[1]

a.i.

Find unbiased estimates of the variance of $T$ .

[2]

a.ii.

Assuming that $T$ is normally distributed, find

(i) the 90% confidence interval for the mean time taken to travel to work by the workers of this company,

(ii) the 95% confidence interval for the mean time taken to travel to work by the workers of this company.

[3]

Before seeing these results the managing director believed that the mean time was 26 minutes.

Explain whether your answers to part (b) support her belief.

[3]

Anne is a farmer who grows and sells pumpkins. Interested in the weights of pumpkins produced, she records the weights of eight pumpkins and obtains the following results in kilograms.

${\text{7.7}}\quad {\text{7.5}}\quad {\text{8.4}}\quad {\text{8.8}}\quad {\text{7.3}}\quad {\text{9.0}}\quad {\text{7.8}}\quad {\text{7.6}}$

Assume that these weights form a random sample from a $N(\mu ,{\text{ }}{\sigma ^2})$ distribution.

Anne claims that the mean pumpkin weight is 7.5 kilograms. In order to test this claim, she sets up the null hypothesis ${{\text{H}}_0}:\mu = 7.5$ .

Determine unbiased estimates for $\mu$ and ${\sigma ^2}$ .

[3]

Use a two-tailed test to determine the $p$ -value for the above results.

[3]

b.i.

Interpret your $p$ -value at the 5% level of significance, justifying your conclusion.

[2]

b.ii.

A shop sells carrots and broccoli. The weights of carrots can be modelled by a normal distribution with variance $25 {grams}^{2}$ and the weights of broccoli can be modelled by a normal distribution with variance $80 {grams}^{2}$ . The shopkeeper claims that the mean weight of carrots is $130 grams$ and the mean weight of broccoli is $400 grams$ .

Dong Wook decides to investigate the shopkeeper’s claim that the mean weight of carrots is $130 grams$ . He plans to take a random sample of $n$ carrots in order to calculate a $98 %$ confidence interval for the population mean weight.

Anjali thinks the mean weight, $μ grams$ , of the broccoli is less than $400 grams$ . She decides to perform a hypothesis test, using a random sample of size $8$ . Her hypotheses are

$H_{0} : μ = 400; H_{1} : μ < 400$ .

She decides to reject $H_{0}$ if the sample mean is less than $395 grams$ .

Assuming that the shopkeeper’s claim is correct, find the probability that the weight of six randomly chosen carrots is more than two times the weight of one randomly chosen broccoli.

[6]

Find the least value of $n$ required to ensure that the width of the confidence interval is less than $2 grams$ .

[3]

Find the significance level for this test.

[3]

Given that the weights of the broccoli actually follow a normal distribution with mean $392 grams$ and variance $80 {grams}^{2}$ , find the probability of Anjali making a Type II error.

[3]

Two independent random variables $X$ and $Y$ follow Poisson distributions.

Given that ${\text{E}}\left( X \right) = 3$ and ${\text{E}}\left( Y \right) = 4$ , calculate

${\text{E}}\left( {2X + 7Y} \right)$ .

[2]

Var $\left( {4X - 3Y} \right)$ .

[3]

${\text{E}}\left( {{X^2} - {Y^2}} \right)$ .

[4]

This question is about modelling the spread of a computer virus to predict the number of computers in a city which will be infected by the virus.

A systems analyst defines the following variables in a model:

$t$ is the number of days since the first computer was infected by the virus.
$Q (t)$ is the total number of computers that have been infected up to and including day $t$ .

The following data were collected:

A model for the early stage of the spread of the computer virus suggests that

$Q' (t) = β N Q (t)$

where $N$ is the total number of computers in a city and $β$ is a measure of how easily the virus is spreading between computers. Both $N$ and $β$ are assumed to be constant.

The data above are taken from city X which is estimated to have $2.6$ million computers.
The analyst looks at data for another city, Y. These data indicate a value of $β = 9.64 \times 10^{- 8}$ .

An estimate for $Q' (t), t \geq 5$ , can be found by using the formula:

$Q' (t) \approx \frac{Q (t + 5) - Q (t - 5)}{10}$ .

The following table shows estimates of $Q' (t)$ for city X at different values of $t$ .

An improved model for $Q (t)$ , which is valid for large values of $t$ , is the logistic differential equation

$Q' (t) = k Q (t) (1 - \frac{Q (t)}{L})$

where $k$ and $L$ are constants.

Based on this differential equation, the graph of $\frac{Q' (t)}{Q (t)}$ against $Q (t)$ is predicted to be a straight line.

Find the equation of the regression line of $Q (t)$ on $t$ .

[2]

a.i.

Write down the value of $r$ , Pearson’s product-moment correlation coefficient.

[1]

a.ii.

Explain why it would not be appropriate to conduct a hypothesis test on the value of $r$ found in (a)(ii).

[1]

a.iii.

Find the general solution of the differential equation $Q' (t) = β N Q (t)$ .

[4]

b.i.

Using the data in the table write down the equation for an appropriate non-linear regression model.

[2]

b.ii.

Write down the value of $R^{2}$ for this model.

[1]

b.iii.

Hence comment on the suitability of the model from (b)(ii) in comparison with the linear model found in part (a).

[2]

b.iv.

By considering large values of $t$ write down one criticism of the model found in (b)(ii).

[1]

b.v.

Use your answer from part (b)(ii) to estimate the time taken for the number of infected computers to double.

[2]

Find in which city, X or Y, the computer virus is spreading more easily. Justify your answer using your results from part (b).

[3]

Determine the value of $a$ and of $b$ . Give your answers correct to one decimal place.

[2]

Use linear regression to estimate the value of $k$ and of $L$ .

[5]

f.i.

The solution to the differential equation is given by

$Q (t) = \frac{L}{1 + C e^{- k t}}$

where $C$ is a constant.

Using your answer to part (f)(i), estimate the percentage of computers in city X that are expected to have been infected by the virus over a long period of time.

[2]

f.ii.

In a large population of hens, the weight of a hen is normally distributed with mean $\mu$ kg and standard deviation $\sigma$ kg. A random sample of 100 hens is taken from the population.

The mean weight for the sample is denoted by $\bar X$ .

The sample values are summarized by $\sum {x = 199.8}$ and $\sum {{x^2} = 407.8}$ where $x$ kg is the weight of a hen.

It is found that $\sigma$ = 0.27 . It is decided to test, at the 1 % level of significance, the null hypothesis $\mu$ = 1.95 against the alternative hypothesis $\mu$ > 1.95.

State the distribution of $\bar X$ giving its mean and variance.

[1]

Find an unbiased estimate for $\mu$ .

[1]

Find an unbiased estimate for ${\sigma ^2}$ .

[2]

Find a 90 % confidence interval for $\mu$ .

[3]

Find the $p$ -value for the test.

[2]

e.i.

Write down the conclusion reached.

[1]

e.ii.

John rings a church bell 120 times. The time interval, ${T_i}$ , between two successive rings is a random variable with mean of 2 seconds and variance of $\frac{1}{9}{\text{ second}}{{\text{s}}^2}$ .

Each time interval, ${T_i}$ , is independent of the other time intervals. Let $X = \sum\limits_{i = 1}^{119} {{T_i}}$ be the total time between the first ring and the last ring.

The church vicar subsequently becomes suspicious that John has stopped coming to ring the bell and that he is letting his friend Ray do it. When Ray rings the bell the time interval, ${T_i}$ has a mean of 2 seconds and variance of $\frac{1}{{25}}{\text{ second}}{{\text{s}}^2}$ .

The church vicar makes the following hypotheses:

${H_0}$ : Ray is ringing the bell; ${H_1}$ : John is ringing the bell.

He records four values of $X$ . He decides on the following decision rule:

If $236 \leqslant X \leqslant 240$ for all four values of $X$ he accepts ${H_0}$ , otherwise he accepts ${H_1}$ .

Find

(i) ${\text{E}}(X)$ ;

(ii) ${\text{Var}}(X)$ .

[3]

Explain why a normal distribution can be used to give an approximate model for $X$ .

[2]

Use this model to find the values of $A$ and $B$ such that ${\text{P}}(A < X < B) = 0.9$ , where $A$ and $B$ are symmetrical about the mean of $X$ .

[7]

Calculate the probability that he makes a Type II error.

[5]

An estate manager is responsible for stocking a small lake with fish. He begins by introducing $1000$ fish into the lake and monitors their population growth to determine the likely carrying capacity of the lake.

After one year an accurate assessment of the number of fish in the lake is taken and it is found to be $1200$ .

Let $N$ be the number of fish $t$ years after the fish have been introduced to the lake.

Initially it is assumed that the rate of increase of $N$ will be constant.

When $t = 8$ the estate manager again decides to estimate the number of fish in the lake. To do this he first catches $300$ fish and marks them, so they can be recognized if caught again. These fish are then released back into the lake. A few days later he catches another $300$ fish, releasing each fish after it has been checked, and finds $45$ of them are marked.

Let $X$ be the number of marked fish caught in the second sample, where $X$ is considered to be distributed as $B (n, p)$ . Assume the number of fish in the lake is $2000$ .

The estate manager decides that he needs bounds for the total number of fish in the lake.

The estate manager feels confident that the proportion of marked fish in the lake will be within $1.5$ standard deviations of the proportion of marked fish in the sample and decides these will form the upper and lower bounds of his estimate.

The estate manager now believes the population of fish will follow the logistic model $N (t) = \frac{L}{1 + C e^{- k t}}$ where $L$ is the carrying capacity and $C, k > 0$ .

The estate manager would like to know if the population of fish in the lake will eventually reach $5000$ .

Use this model to predict the number of fish in the lake when $t = 8$ .

[2]

Assuming the proportion of marked fish in the second sample is equal to the proportion of marked fish in the lake, show that the estate manager will estimate there are now $2000$ fish in the lake.

[2]

Write down the value of $n$ and the value of $p$ .

[2]

c.i.

State an assumption that is being made for $X$ to be considered as following a binomial distribution.

[1]

c.ii.

Show that an estimate for $Var (X)$ is $38.25$ .

[2]

d.i.

Hence show that the variance of the proportion of marked fish in the sample, $Var (\frac{X}{300})$ , is $0.000425$ .

[2]

d.ii.

Taking the value for the variance given in (d) (ii) as a good approximation for the true variance, find the upper and lower bounds for the proportion of marked fish in the lake.

[2]

e.i.

Hence find upper and lower bounds for the number of fish in the lake when $t = 8$ .

[2]

e.ii.

Given this result, comment on the validity of the linear model used in part (a).

[2]

Assuming a carrying capacity of $5000$ use the given values of $N (0)$ and $N (1)$ to calculate the parameters $C$ and $k$ .

[5]

g.i.

Use these parameters to calculate the value of $N (8)$ predicted by this model.

[2]

g.ii.

Comment on the likelihood of the fish population reaching $5000$ .

[2]

Peter, the Principal of a college, believes that there is an association between the score in a Mathematics test, $X$ , and the time taken to run 500 m, $Y$ seconds, of his students. The following paired data are collected.

It can be assumed that $\left( {X{\text{, }}Y} \right)$ follow a bivariate normal distribution with product moment correlation coefficient $\rho$ .

State suitable hypotheses ${H_0}$ and ${H_1}$ to test Peter’s claim, using a two-tailed test.

[1]

a.i.

Carry out a suitable test at the 5 % significance level. With reference to the $p$ -value, state your conclusion in the context of Peter’s claim.

[4]

a.ii.

Peter uses the regression line of $y$ on $x$ as $y = 0.248x + 83.0$ and calculates that a student with a Mathematics test score of 73 will have a running time of 101 seconds. Comment on the validity of his calculation.

[2]

Employees answer the telephone in a customer relations department. The time taken for an employee to deal with a customer is a random variable which can be modelled by a normal distribution with mean 150 seconds and standard deviation 45 seconds.

Find the probability that the time taken for a randomly chosen customer to be dealt with by an employee is greater than 180 seconds.

[2]

Find the probability that the time taken by an employee to deal with a queue of three customers is less than nine minutes.

[4]

At the start of the day, one employee, Amanda, has a queue of four customers. A second employee, Brian, has a queue of three customers. You may assume they work independently.

Find the probability that Amanda’s queue will be dealt with before Brian’s queue.

[6]

This question compares possible designs for a new computer network between multiple school buildings, and whether they meet specific requirements.

A school’s administration team decides to install new fibre-optic internet cables underground. The school has eight buildings that need to be connected by these cables. A map of the school is shown below, with the internet access point of each building labelled $A–H$ .

Jonas is planning where to install the underground cables. He begins by determining the distances, in metres, between the underground access points in each of the buildings.

He finds $AD = 89.2 m$ , $DF = 104.9 m$ and $A \hat{D} F = 83 °$ .

The cost for installing the cable directly between $A$ and $F$ is $$ 21 310$ .

Jonas estimates that it will cost $$ 110$ per metre to install the cables between all the other buildings.

Jonas creates the following graph, $S$ , using the cost of installing the cables between two buildings as the weight of each edge.

The computer network could be designed such that each building is directly connected to at least one other building and hence all buildings are indirectly connected.

The computer network fails if any part of it becomes unreachable from any other part. To help protect the network from failing, every building could be connected to at least two other buildings. In this way if one connection breaks, the building is still part of the computer network. Jonas can achieve this by finding a Hamiltonian cycle within the graph.

After more research, Jonas decides to install the cables as shown in the diagram below.

Each individual cable is installed such that each end of the cable is connected to a building’s access point. The connection between each end of a cable and an access point has a $1.4 %$ probability of failing after a power surge.

For the network to be successful, each building in the network must be able to communicate with every other building in the network. In other words, there must be a path that connects any two buildings in the network. Jonas would like the network to have less than a $2 %$ probability of failing to operate after a power surge.

Find $AF$ .

[3]

Find the cost per metre of installing this cable.

[2]

State why the cost for installing the cable between $A$ and $F$ would be higher than between the other buildings.

[1]

By using Kruskal’s algorithm, find the minimum spanning tree for $S$ , showing clearly the order in which edges are added.

[3]

d.i.

Hence find the minimum installation cost for the cables that would allow all the buildings to be part of the computer network.

[2]

d.ii.

State why a path that forms a Hamiltonian cycle does not always form an Eulerian circuit.

[1]

Starting at $D$ , use the nearest neighbour algorithm to find the upper bound for the installation cost of a computer network in the form of a Hamiltonian cycle.

Note: Although the graph is not complete, in this instance it is not necessary to form a table of least distances.

[5]

By deleting $D$ , use the deleted vertex algorithm to find the lower bound for the installation cost of the cycle.

[6]

Show that Jonas’s network satisfies the requirement of there being less than a $2 %$ probability of the network failing after a power surge.

[5]