1 Monday 5/20 - Intro statistics

(due on Tuesday 5/21)

You are given the data below.

24	29	25	27	27	25	27	28	28	26
30	28	26	27	30	27	29	26	30	25
25	25	23	27	26	29	25	25	33	28
25	27	29	30	35	26	30	27	29	28
23	28	26	29	26	26	26	27	29	26
28	26	25	27	30	25	27	28	29	25
31	27	31	27	31	27	33	31	26	27
31	25	26	25	29	28	23	26	28	26
27	26	25	26	28	26	27	26	28	27
28	24	34	27	26	25	27	25	26	26

Import the data in to R:
1. Paste or type the following commands into the R command line console, but do not execute them yet.
```
d <- read.table("clipboard")
x <- as.numeric(as.matrix(d))
```
2. Highlight and copy the data into your computer’s clipboard.
3. Execute the above R commands.
Plot a boxplot. Identify all parts of the boxplot.
Plot a histogram. Be sure to use reasonable bins for your histogram.
Describe the skewing or symmetry of the dataset. Are there any outliers?
What proportion of the data are less than or equal to 30?
Plot the empirical cdf with plot(ecdf(x))
Find the \(5^{th}\), \(10^{th}\), \(20^{th}\), and \(95^{th}\) percentiles (quantiles). Find them by sorting the data with sort(). Then find them using the plotted empirical cdf above. Then find them using quantile().

Consider the histogram below:

What is the size of the dataset?
Estimate the quartiles, median, and the \(10^{th}\) and \(90^{th}\) percentiles. (Hint: you do not know the actual data, but you know how many data points fall in each bin. You can at least determine which bin each statistic falls in.)
Do you think the mean is to the left or right of the median? Why?
Find the relative frequency and density for the bin \((38,40]\). (Hint: search my chapter 1 notes for density and relative frequency.)

The following boxplots were produced from 2 different datasets.
1. Estimate the minimum, maximum, quartiles, and median for each.
2. What is the relationship between the mean and median for each?
3. Describe any potential differences or similarities between these two datasets given what you can extrapolate from the boxplots.

You are given the datset below.

9 4 7 1 2
1. Write the list of order statistics.
2. Calculate by hand (using a basic calculator only): mean, median, quartiles, standard deviation, and variance.

2 Tuesday 5/21 - Counting and basic probability

A digital pad lock has 10 buttons and requires a 3-button code to unlock it.
1. How many possible unlock codes are there if buttons can be repeated, and the order you press them in matters?
2. How many possible unlock codes are there if buttons cannot be repeated, but the order the buttons are pushed in matters?
3. How many possible unlock codes are there if buttons cannot be repeated, and the order the buttons are pushed in does not matter? (Hint: the code here is just the set of buttons that are pressed. If the same buttons are pressed in a different order, we still consider that the same unlock code.)

Consider the random experiment where a fair coin is flipped 50 times. Consider the event that 10 of the flips are heads. How many ways can this occur?

You are given that a jar contains 5 blue marbles, 3 green marbles, and 8 red marbles.
1. If 3 marbles are drawn without replacement, how many possible outcomes are possible?
2. If 5 marbles are drawn without replacement, how many outcomes are possible?
3. If 3 marbles are drawn with replacement, how many outcomes are possible?
4. If 3 marbles are drawn wtihout replacement, what is the probability that exactly 2 are blue and 1 is red?

Consider the random experiment where a fair die is rolled once and events \(A=\){even}, \(B=\){less than 3}.
1. Write the entire sample space.
2. Find \(P(A)\), \(P(A^c)\), \(P(B)\), and \(P(B^c)\).
3. Find \(P(A\cap B)\) and \(P(A\cup B)\).
4. Find \(P(A\cap B^c)\) and \(P(A\cup B^c)\).

Consider the random experiment where a fair coin is flipped 3 times and events \(A=\){at least one head}, \(B=\){more tails than heads}.
1. Write the entire sample space.
2. Find \(P(A\cap B)\) and \(P(A\cup B)\).
3. Are \(A\) and \(B\) mutually exclusive?

You are given \(P(A)=0.4\), \(P(B^c)=0.2\), \(P(A\cup B)=0.6\). Find \(P(A\cap B)\).

3 Tuesday 5/28 - Conditional probability, independence, and Bayes

You are given that \(P(A)=0.5\), \(P(B)=0.4\), and \(P(A\cap B)=0.2\).
1. Calculate \(P(A\mid B)\) and \(P(B\mid A)\).
2. Are \(A\) and \(B\) independent?
3. Are \(A\) and \(B\) mutually exclusive?
4. Calculate \(P(A\mid B^c)\).

Consider the random experiment where a fair die is rolled twice. Let \(A=\) {both dice are the same}, and \(B=\){the first die is even}.
1. Are \(A\) and \(B\) independent?
2. Calculate \(P(A\mid B)\).

An industrial machine produces items that are defective with probability 0.02. Quality control cannot perfectly identify defective items though. Given that an item is defective, there is a 90% chance that quality control will correctly label it as defective. There is a 1% chance that quality control will incorrectly label an item as defective when it actually isn’t. Consider a particular production period where 10 thousand items are produced in total.

Let events \(D=\){item is actually defective}, \(L_d=\){QC labels an item as defective}, and \(Q=\){QC correctly labels an item}.
1. Describe the events in words \(D^c\), \(L_d^c\), \(Q^c\), \(L_d\cap Q\), \(L_d^c\cap Q\), \(D\cap Q^c\), \(D^c\cap Q\), and \(L_d\cap Q^c\).
2. How many items do you expect to be defective?
3. Out of the number you expect to be defective, how many do you expect quality control to correctly label?
4. How many total items do you expect to not be defective?
5. Out of the number you expect to not be defective, how many do you expect quality control to incorrectly label as defective?
6. How many total items do you expect quality control to label as defective? (Including those they correctly label and incorrrectly label.)
7. How many total items do you expect quality control to label incorrectly? (Including those labeled as defective and those labeled as not defective.)
8. If we have a particular item that is labeled defective by quality control, what is the probability that it is actually defective?
9. Is an item being defective or not independent of quality control’s labeling it defective?
10. Is an item being defective or not independent of quality control correctly assessing it?

4 Wednesday 5/29 - Probability distributions

Consider the probability mass function below for random variable \(X\).

\(x\) 5 10 20 50 100

\(p(x)\) 0.8 0.1 0.07 0.02 0.01

Write the formula for the cdf and plot it by hand.
Find \(\mathrm E(X)\) and \(\mathrm {Var}(X)\), both by hand and in R.
Sample values of \(X\) using sample().

You are given the probability mass function below for discrete random variable \(X\) and that \(\mathrm E(X)=2.4\).

\(x\) 1 2 3 \(b\)

\(f(x)\) 0.3 0.2 \(q\) 0.1
1. Find \(b\) and \(q\).
Consider the cumulative distirbution funciton plotted below for discrete random variable \(X\).
1. Calulate \(P(X=3)\) and \(P(X=6)\).
2. Calulate \(P(5<X<9)\).
3. Calulate the probability the \(X\) is at least four but less than eight.

You are given the probability density function below for continuous random variable \(X\). \[f_X(x)=kx \quad \text{ for } \quad 0\leq x \leq 5\]
1. Find \(k\).
2. Find the cdf.
3. Find \(\mathrm E(X)\) and \(\mathrm{Var}(X)\).
4. Find \(P(1<X<3)\).
You are given the cumulative distribution function below for continuous random variable \(X\). \[F_X(x)=1-e^{-3x} \quad \text{ for } \quad 0\leq x \]
1. Find the pdf.
2. Find \(\mathrm E(X)\) and \(\mathrm{Var}(X)\).
3. Find \(P(x\leq 0.5)\).
4. Find \(P(x>0.5)\).

5 Monday 6/3 - Special discrete distributions

Consider the experiment of drawing, with replacement, a marble out of a bag with 3 green and 7 blue marbles. Repeat this experiment 5 times. Let \(X\) be the number of times a green marble is drawn.
1. Write the formla for the probability mass function for \(X\). (Hint: binomial)
2. Calculate \(P(X\leq 4)\).
3. Calculate \(P(X= 4)\).
4. Calculate the probability that at least one of the drawn marbles is green.
5. Find \(\mathrm{E}(X)\) and \(\mathrm{Var}(X)\).
6. Now let \(Y\) be the number of trials until a green marble is drawn (including the final draw). Write the pmf for \(Y\). (Hint: geometric RV)

Consider a factory that produces rope. Assume that there is on average 2 imperfections for every 700 meters of rope produced. Use a Poisson RV to model the number of imperfections on a length of rope. Consider a 100 meter section of rope. Let \(X\) be the number of imperfections on it.
1. Write the formla for the probability mass function for \(X\).
2. Calculate the probability that there is at least one imperfection.
3. What is the average length of rope between imperfections?
4. Find \(\mathrm{E}(X)\) and \(\mathrm{Var}(X)\).
5. Calculate the probability that the nuber of imperfections is between 1 and 5 (inclusive).
6. If a customer orders a 200 meter spool of rope, what is the probability it has no imperfections? (Hint: scale the Poisson parameter)

6 Tuesday 6/4 - Special continuous distributions

Consider a voltimeter that measures voltage of batteries. Assume that the actual measurement is normally distributed and has mean value equivalent to the actual voltage of the battery and standard deviation equivalent to 1% the voltage of the battery. Let \(X\) be the RV that is the outputted measurement of the voltage for a 9 volt battery.
1. Write the formla for the probability density function for \(X\).
2. Calculate \(P(X\leq 8.7)\).
3. Calculate \(P(8.9 < X\leq 9.1)\).
4. Calculate \(P(X= 9)\).
5. Give a 95% prediction interval for the voltage measurement.
6. Calculate the probability that the voltage measurement is \(\pm 3\%\) from the true value.
7. Find \(\mathrm{E}(X)\) and \(\mathrm{Var}(X)\).

Consider a rope producing factory example again with on average 2 imperfections for every 700 meters of rope produced. Use an exponential RV to model the length of rope between imperfections. Let \(X\) be the length of rope between successive imperfections.
1. Write the formla for the probability density function for \(X\).
2. Calculate \(P(X<200)\) and \(P(X\geq 200)\).
3. Calculate \(P(X=200)\).
4. Calculate \(P(100\leq X \leq 500)\). Use both an integral and then verify it with R.
5. Find \(\mathrm{E}(X)\) and \(\mathrm{Var}(X)\).

7 Wednesday 6/5 - Sampling distributions, law of large numbers, and central limit theorem

You are given that random variable \(X\) has finite mean and finite variance. If we take an independent sample of \(X\) values and calculate the sample mean \(\overline x_n\), what will happen to the sample mean as the sample size gets larger and larger?

Random variable \(X\) has probability mass function given below.

\(x\) 1 2 3

\(f(x)\) 0.3 0.6 0.1
1. If two indpendent \(X\) values are sampled, with replacement, what is the probability the sample mean is less than 2?
2. If we sample repeatedly from this distribution with replacement, let \(x_i\) be the \(i^{th}\) sample data value, what happens to the average of all \(x_i\)’s as the sample size continues to grow?

Consider again the example where a voltimeter measures voltage of batteries. Assume that the actual measurement is normally distributed and has mean value equivalent to the actual voltage of the battery and standard deviation equivalent to 1% the voltage of the battery. Let \(X\) be the RV that is the outputted measurement of the voltage for a 9 volt battery.
1. If 10 independent measurements are made, what is the probability the mean of them is less than 8.95 volts?
2. What is the probability that the mean of these 10 measurements is within \(\pm 0.2\%\) of the true voltage?
3. If instead, we take 100 measurements, what is the probability that the mean of these measurements is within \(\pm 0.2\%\) of the true voltage?
4. Calculate a 95% prediction interval for the sample mean of 100 measurements.
5. What would you conclude about the effect of taking repeated measurements and averaging them?

You are given that the mean number of errors in signal transmission process has mean 2 errors per MegaByte of data with standard deviation 1 error per MB. If a collection of 100 signals, each 1 MB of data, then the mean number of errors per signal can be modeled according the the central limit theorem.
1. Find the probability that the mean number of errors is less than 1.9.
2. Find a 95% prediction interval for the mean number of errors in this sample.

24	29	25	27	27	25	27	28	28	26
30	28	26	27	30	27	29	26	30	25
25	25	23	27	26	29	25	25	33	28
25	27	29	30	35	26	30	27	29	28
23	28	26	29	26	26	26	27	29	26
28	26	25	27	30	25	27	28	29	25
31	27	31	27	31	27	33	31	26	27
31	25	26	25	29	28	23	26	28	26
27	26	25	26	28	26	27	26	28	27
28	24	34	27	26	25	27	25	26	26

\(x\)	5	10	20	50	100
\(p(x)\)	0.8	0.1	0.07	0.02	0.01

\(x\)	1	2	3	\(b\)
\(f(x)\)	0.3	0.2	\(q\)	0.1

\(x\)	1	2	3
\(f(x)\)	0.3	0.6	0.1

24	29	25	27	27	25	27	28	28	26
30	28	26	27	30	27	29	26	30	25
25	25	23	27	26	29	25	25	33	28
25	27	29	30	35	26	30	27	29	28
23	28	26	29	26	26	26	27	29	26
28	26	25	27	30	25	27	28	29	25
31	27	31	27	31	27	33	31	26	27
31	25	26	25	29	28	23	26	28	26
27	26	25	26	28	26	27	26	28	27
28	24	34	27	26	25	27	25	26	26

Daily Checkpoints

1 Monday 5/20 - Intro statistics

2 Tuesday 5/21 - Counting and basic probability

3 Tuesday 5/28 - Conditional probability, independence, and Bayes

4 Wednesday 5/29 - Probability distributions

5 Monday 6/3 - Special discrete distributions

6 Tuesday 6/4 - Special continuous distributions

7 Wednesday 6/5 - Sampling distributions, law of large numbers, and central limit theorem

24	29	25	27	27	25	27	28	28	26
30	28	26	27	30	27	29	26	30	25
25	25	23	27	26	29	25	25	33	28
25	27	29	30	35	26	30	27	29	28
23	28	26	29	26	26	26	27	29	26
28	26	25	27	30	25	27	28	29	25
31	27	31	27	31	27	33	31	26	27
31	25	26	25	29	28	23	26	28	26
27	26	25	26	28	26	27	26	28	27
28	24	34	27	26	25	27	25	26	26