1 How do you treat outliers in a dataset?Â
An outlier is a data point that is distant from other similar points. They may be due to variability in the measurement or may indicate experimental errors.Â
The graph depicted below shows there are three outliers in the dataset.
To deal with outliers, you can use the following four methods:
4 Using the Sample Superstore dataset, display the top 5 and bottom 5 customers based on their profit.
How do you decide whether your linear regression model fits the data?
Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
2 Is Naïve Bayes bad? If yes, under what aspects.
One of the disadvantages of Naïve-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero. And this will get a zero when all the probabilities are multiplied.
1 What are the different types of Hypothesis testing?
Hypothesis testing is the procedure used by statisticians and scientists to accept or reject statistical hypotheses. There are mainly two types of hypothesis testing:
Example: There is no association between a patientâs BMI and diabetes.
Example: There could be an association between a patientâs BMI and diabetes.
2 How does the AND() function work in Excel?
AND() is a logical function that checks multiple conditions and returns TRUE or FALSE based on whether the conditions are met.
Syntax: AND(logica1,[logical2],[logical3]….)
In the below example, we are checking if the marks are greater than 45. The result will be true if the mark is >45, else it will be false.
2 Can you provide a dynamic range in âData Sourceâ for a Pivot table?
Yes, you can provide a dynamic range in the âData Sourceâ of Pivot tables. To do that, you need to create a named range using the offset function and base the pivot table using a named range constructed in the first step.
What is the curse of dimensionality?
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.
2 Explain how VLOOKUP works in Excel?
VLOOKUP is used when you need to find things in a table or a range by row.
VLOOKUP accepts the following four parameters:
lookup_value – The value to look for in the first column of a table
table – The table from where you can extract value
col_index – The column from which to extract value
range_lookup – [optional] TRUE = approximate match (default). FALSE = exact match
Letâs understand VLOOKUP with an example.
If you wanted to find the department to which Stuart belongs to, you could use the VLOOKUP function as shown below:
Here, A11 cell has the lookup value, A2:E7 is the table array, 3 is the column index number with information about departments, and 0 is the range lookup.Â
If you hit enter, it will return âMarketingâ, indicating that Stuart is from the marketing department.