Data Science is one of the hottest jobs today. According to The Economic Times, the job postings for the Data Science profiles have grown over 400 times over the past year. So, if you want to start your career as a Data Scientist, here are some top Data Science interview questions and answers which will help you crack your interview.
How is Data Science different from traditional application programming?
Data Science takes a fundamentally different approach in building systems that provide value than traditional application development.
In traditional programming paradigms, we used to analyze the input, figure out the expected output, and write code, which contains rules and statements needed to transform the provided input into the expected output. As we can imagine, these rules were not easy to write, especially, for data that even computers had a hard time understanding, e.g., s, videos, etc.
Data Science shifts this process a little bit. In it, we need access to large volumes of data that contain the necessary inputs and their mappings to the expected outputs. Then, we use Data Science algorithms, which use mathematical analysis to generate rules to map the given inputs to outputs.
This process of rule generation is called training. After training, we use some data that was set aside before the training phase to test and check the system’s accuracy. The generated rules are a kind of a black box, and we cannot understand how the inputs are being transformed into outputs.
However, If the accuracy is good enough, then we can use the system (also called a model).
As described above, in traditional programming, we had to write the rules to map the input to the output, but in Data Science, the rules are automatically generated or learned from the given data. This helped solve some really difficult challenges that were being faced by several companies.
6 Build a confusion matrix for the model where the threshold value for the probability of predicted values is 0.6, and also find the accuracy of the model.
Accuracy is calculated as:
Accuracy = (True positives + true negatives)/(True positives+ true negatives + false positives + false negatives)
To build a confusion matrix in R, we will use the table function:
Here, we are setting the probability threshold as 0.6. So, wherever the probability of pred_heart is greater than 0.6, it will be classified as 0, and wherever it is less than 0.6 it will be classified as 1.
Then, we calculate the accuracy by the formula for calculating Accuracy.
Q What do you understand by the term Normal Distribution?
This is one of the most important and widely used distributions in statistics. Commonly known as the Bell Curve or Gaussian curve, normal distributions, measure how much values can differ in their means and in their standard deviations. Refer to the below .
Fig 5: Normal Distribution – Data Analyst Interview Questions
As you can see in the above , data is usually distributed around a central value without any bias to the left or right side. Also, the random variables are distributed in the form of a symmetrical bell-shaped curve.
Q2. What is A/B Testing?
A/B testing is the statistical hypothesis testing for a randomized experiment with two variables A and B. Also known as the split testing, it is an analytical method that estimates population parameters based on sample statistics. This test compares two web pages by showing two variants A and B, to a similar number of visitors, and the variant which gives better conversion rate wins.
The goal of A/B Testing is to identify if there are any changes to the web page. For example, if you have a banner ad on which you have spent an ample amount of money. Then, you can find out the return of investment i.e. the click rate through the banner ad.