Due: Monday 11/22
Box folder with the files for this perform
Introduction
The Final Perform will have you show all that you have learned in the class so far. This Perform consists of a skeleton notebook and a raw data set. You must process, clean, and analyze the raw data to learn something interesting. We encourage you to work in pairs so you can explore the data set more thoroughly, but it is not required.
The grading scale and points allocation are different than prior notebooks. Moreover, the last 3 (out of 100) points for this Perform are allocated towards a conclusion section and the overall cohesion of the notebook. These points focus on how well the sections are connected together and build towards a specific conclusion. Keep in mind that the syllabus states you only need 95% of the possible points to earn full credit. Therefore if you do not want to demonstrate that level of mastery, you do not need to spend the extra time to work on this.
Working together
- You may work with up to one other person.
- We recommend that you do, but understand if you would prefer to work by yourself.
- If you want to find a partner, try posting on the class forum.
- You may share your data loading and cleaning code.
- This is code that converts the data files into DataFrames and converts the columns into a useful format.
- Just like in the real world, developers would be helping each other in figuring out how to get raw data into a needed format. You may do so for this Perform.
- So you should feel free to ask and answer such questions on the class forum.
- If you are not sure a question falls under this designation, ask it as a private question first.
- You may discuss the kind of analysis you are doing.
- You may NOT share your analysis code with anyone except your partner (if you have one).
Assessment Goals
The goals of this Perform are for you to demonstrate the following skills:
- Load and process raw data that is not necessarily in an easy-to-use format for your intended analysis.
- Visualize data such that a meaningful interpretation can be made.
- Wisely choose, explain the choice of, conduct, and interpret the results of a hypothesis test.
- Create a prediction model from an existing data set.
- Stretch goal: Using all of the above elements to create a cohesive explanation of a finding(s).
Grading Scale and Points Allocation
Each section will be graded on a four-step rubric scale as follows.
- E (Exemplary) – Work that meets all requirements and displays full mastery of all learning goals and material.
- S (Satisfactory) – Work that meets all requirements and displays at least partial mastery of all learning goals as well as full mastery of core learning goals.
- N (Not yet) – Work that does not meet some requirements and/or displays developing or incomplete mastery of at least some learning goals and material.
- U (Unassessable) – Work that is missing, does not demonstrate meaningful effort, or does not provide enough evidence to determine a level of mastery.
There are 100 points possible. The number of points earned depends on the notebook section. The rubric will be converted to points as follows:
- E = full credit
- S = E_full_credit – 1
- N = E_full_credit / 2
- U = E_full_credit / 5
- Blank = 0
Notebook Sections and Grading Expectations
Overall Grading Considerations
The entire notebook is expected to take into account the following:
- The code takes advantage of Pandas and NumPy libraries
- For loops are allowed
- Do not use a for loop to iterate over a DataFrame’s rows, unless it is guaranteed to be < 100 rows
- Accounts for the fact that there is a different number of ratings for each professor in the data set
Section: Data Loading and Cleaning (21 points)
This section should have all of your data loading and cleaning code where you load and create your DataFrame(s). It does not need to contain all of the data processing code if creating a new column or table in a later section makes more sense for explanation and cohesion.
- Loads data from all of the data files
- Shows at least the first 10 rows of all DataFrames created that are used later in the notebook
- Plus overall grading considerations
Section: Visualization (19 points, Module 5B)
This section should contain at least one visualization showing something informative about the data. The skills you learned for this section primarily came from Module 5B.
- Each visualization has:
- X-axis and Y-axis are labeled and have appropriate values
- Legend is provided if needed to interpret the visualization
- Use of color adds and does not detract from the visualization
- A title or caption describing what the visualization is showing
- Draws at least 1 visualization from at least 1 column of data
- Provides a short 1-4 sentence summary of key takeaways from the visualizations.
- Plus overall grading considerations
Section: Hypothesis Test (19 points, Module 3B)
This section should contain at least one hypothesis test about the data. The skills you learned for this section primarily came from Module 3B.
- H0 and H1 hypotheses are clearly labeled and stated
- What kind of test is clearly written
- Has a clear interpretation of the test’s result
- Plus overall grading considerations
Section: Prediction (19 points, Module 6)
This section should contain the creation and testing of at least one model. The skills you learned for this section primarily came from Module 6.
- The data and target for the model are clearly labeled
- Has a clear rationale for the data used in the model
- Properly splits and uses a train and test set
- Has a clear interpretation for the results of the model
- Plus overall grading considerations
Section: Additional Analysis (19 points)
This section should contain one more analysis of your choosing. It can be like any of the other analysis sections, so another visualization, hypothesis test, or prediction analysis.
- Clearly states what the additional analysis is
- Provides a clear rationale for the analysis
- Has a clear interpretation for the results of the analysis
- Fulfills all of the requirements of the kind of analysis that it is
- Plus overall grading considerations
Section: Conclusion (and Cohesion, 3 points)
You only need this section if you are interested in earning these last points.
If you need to rearrange the sections to improve the cohesion of your notebook, you may do so.
These points can only be earned if at least two of the analysis sections earned an E and an S is earned for all of the other sections. These points focus on the overall cohesion of your sections and if the conclusion effectively summarizes the results across all of the sections.
- All five sections have a clear progression and build off of each other
- Each section references another as appropriate in building a cohesive explanation of the main results of the notebook
- The conclusion effectively summarizes the notebook (it should not just be a list of the results of each section)
- The conclusion provides a summary of the key takeaways from the analyses
- Plus overall grading considerations