# Survival Analysis project 1

Here are the data used by the researchers to monitor patients with Primary Biliary Cirrhosis. Patients are monitored for a number of years and several factors are being investigated as associated risk to death caused by the disease.

The tracking data is in excel  file and the documentation for that data is below.

1. Divide the follow-up period into full-year intervals and follow-up data beyond year 12, attach to year data 12. Describe through life tables the baseline data (ie, number of patients at the beginning of each year, number of patients who died during the year, number of patients who dropped out of follow-up, etc. ) Through which the probability of surviving the disease can be calculated throughout the follow-up years. Consider what is the probability of surviving each interval when you reach its beginning and what is the cumulative probability of survival from the start of follow-up to the end of each interval? Draw the survival curve accompanied by a confidence interval.

1. Repeat using the Kaplan Meier method to estimate the cumulative survival curve. Is the graph similar to the one you got in section 1? If they did not offer a reason for the differences. What is the median survival time? And what is the cumulative survival estimate for a period of 3, 5, and 9 years? Consider a 95% confidence interval for these estimates.

1. What is the hazard estimate for a death event at each one-year interval? Draw the hazard curve over 12 years.

1. What is the estimate of the cumulative probability of death until the end of a 3, 5, and 9 year period? And what is the cumulative hazard estimate for these years?

1. Under the exponential model assumption, what is the annual risk of death? And under this model assumption, what is the cumulative survival estimate for a period of 3, 5, and 9 years? Consider these estimates as a 95% confidence interval.

1. Under the exponential model assumption, draw (plots) the cumulative probability estimates for survival until the end of each year during the follow-up years. Take a look at these annual probability data to answer the question of whether the exponential model fits the survival data.

1. Divide the age variable into two groups: up to 49.99 the younger group and the 50+ adult group. Draw the cumulative survival curve for the two age groups.

1. Statistically examined the null hypothesis that the survival curve in these two age groups is the same.

1. Is there a relationship between gender and age variables? If so, does this affect the null hypothesis test that the survival curves in both age groups are the same?

1. Does the result change if you give higher weight to the observations at the start of the tracking time? Does the new test show a larger chi2 value compared to that obtained from the log rank test? Explain.

1. There are actually four age groups:

1 = 39.99 – 25

40 – 49.99 = 2

50 – 59.99 = 3

60+ = 4

Does this change the findings and conclusions you reached in Questions 8-9.

1. An argument has been made that bilirubin and albumin values ​​(as continuous variables) are associated with the risk of death in these patients. Using the COX model, examine the relationship between bilirubin and albumin levels with the risk of death among patients. Consider the results in terms of direction, effect size and significance. Are the relationships between these two variables and the risk of death independent? Explanation.     Are these relationships confused (effect of confounding) by sex and age variables (dichotomous scale – see section 7)?

1. Review the COX model to examine the relationship between bilirubin level and risk of death. This time, set the low bilirubin levels (0.1-0.999) as the reference group and compare it to the other two counting groups (medium 1.01-2.999 and high +3.01). Consider the findings in terms of direction, effect size and significance. Are there any differences between the results of this model and all the results you obtained using the previous model (Section 12. The albumin variable remains in a continuous scale). Explain the results.

1. Using the COX model, examine the relationship between bilirubin and albumin values ​​(as continuous variables) and the risk of death in patients when you also include the sex and age variables (dichotomous scale – see section 7) and two interaction variables, one between sex and bilirubin values ​​and the other between gender and values. Albumin. Do one or more of the interaction variables contribute to the fit of the model. Once the appropriate model is selected, are the results obtained using it different from the results obtained in Section 12. Explain the result.

1. Draw the survival curve from a COX model for 2 age groups (young and old – see question 7) with the values ​​adjusted for bilirubin and albumin sex values ​​(see model in question 12). Is the result different from the one you got in questions 7-8 about comparing the survival curves in these two age groups?

1. Check whether the bilirubin and albumin variables meet the proportional hazard assumption of the COX model or are time-dependent variables. Test this using a model in which the two variables are in a continuous scale, and the model also includes the age variable in the dichotomous scale (question 7) and the sex variable.

1. Write down the equation of the COX model for the hazard rate according to the results obtained in question 16.

1. State whether the following assumptions apply to using this model:

The risk is proportional: True / False / Unknown

The observations are independent: true / false / unknown

The hazard is constant over time: True / False / Unknown