Import Preliminaries and datasets
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.pylab as plb
import warnings
warnings.filterwarnings('ignore')
test_scores = pd.read_csv("Data/Test scores.csv", parse_dates=['Test taken date'])
test_master = pd.read_csv("Data/Test master.csv")
test_participant = pd.read_csv("Data/Audience summary.csv")
We have three datasets, namely -
Test Scores Dataset
This contains scores of each particpant in the test they appeared.
test_scores.head()
Participant identifier | Test Name | Test taken date | Track | Designation | Score | |
---|---|---|---|---|---|---|
0 | 37MCTM | If conditional | 2018-11-23 | Engineering | Lead | 18 |
1 | 37MCTM | Determiners and Quantifiers | 2018-11-23 | Engineering | Lead | 28 |
2 | 37MCTM | Modals | 2018-11-23 | Engineering | Lead | 22 |
3 | 37MCTM | Tenses | 2018-11-13 | Engineering | Lead | 12 |
4 | 37MCTM | Pronouns | 2018-11-13 | Engineering | Lead | 15 |
Test Master
This is about the other details associated with each test.
test_master
Test name | No. of questions | Complexity | Marks per question | |
---|---|---|---|---|
0 | Articles-New | 15 | Easy | 1 |
1 | Tenses | 15 | Easy | 1 |
2 | Pronouns | 15 | Easy | 1 |
3 | Articles | 15 | Easy | 1 |
4 | Conjuctions | 15 | Easy | 1 |
5 | Adjective & Adverb | 15 | Easy | 1 |
6 | Active and passive voice | 15 | Medium | 2 |
7 | Puctuations | 15 | Medium | 2 |
8 | If conditional | 15 | Medium | 2 |
9 | Determiners and Quantifiers | 15 | Medium | 2 |
10 | Modals | 15 | Medium | 2 |
11 | Prepositions | 15 | Medium | 2 |
12 | Comprehension | 10 | Difficult | 3 |
13 | Confusing words | 15 | Difficult | 3 |
14 | Synonyms & Antonyms | 15 | Difficult | 3 |
15 | Vocabulary | 15 | Difficult | 3 |
16 | Capitalization | 15 | Difficult | 3 |
Test Participants
This is abouth the other details associated with the pariticipants.
test_participant
Designation | Engineering | Quality Assurance | Support | |
---|---|---|---|---|
0 | Associate | 1400 | 250.0 | 220 |
1 | Lead | 1800 | 400.0 | 100 |
2 | Manager | 300 | 60.0 | 70 |
3 | Consultant | 200 | NaN | 10 |
4 | Associate Director and above | 600 | 5.0 | 32 |
We will create more metrics in the dataset provided so that it would be easy to analyse and compare across multiple factors, like -
- Weekday
- Week No.
- Month of the test taken date
- Maximum Score can be obtained
- Percentage of marks obtained by the participants
test_scores['weekday_name'] = test_scores['Test taken date'].dt.weekday_name
test_scores['month'] = test_scores['Test taken date'].dt.month_name()
test_scores['week'] = test_scores['Test taken date'].dt.week-42 # to get number from 1
test_master['maximum_score'] = test_master['No. of questions'] * test_master['Marks per question']
test_scores = pd.merge(test_scores,test_master,left_on="Test Name", right_on="Test name", how = "left")
cols = ['Participant identifier', 'Test Name', 'Track','Designation', 'Score',
'weekday_name', 'month', 'week','Complexity', 'maximum_score']
test_scores = test_scores[cols]
test_scores['Percent'] = round((test_scores['Score']/test_scores['maximum_score'])*100,2)
test_scores.head()
Participant identifier | Test Name | Track | Designation | Score | weekday_name | month | week | Complexity | maximum_score | Percent | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 37MCTM | If conditional | Engineering | Lead | 18 | Friday | November | 5 | Medium | 30 | 60.00 |
1 | 37MCTM | Determiners and Quantifiers | Engineering | Lead | 28 | Friday | November | 5 | Medium | 30 | 93.33 |
2 | 37MCTM | Modals | Engineering | Lead | 22 | Friday | November | 5 | Medium | 30 | 73.33 |
3 | 37MCTM | Tenses | Engineering | Lead | 12 | Tuesday | November | 4 | Easy | 15 | 80.00 |
4 | 37MCTM | Pronouns | Engineering | Lead | 15 | Tuesday | November | 4 | Easy | 15 | 100.00 |
Now we are ready to visualise this data for better analysis.
The first post in the series is - Part 1 - Plotting Using Seaborn - Violin, Box and Line Plot