# Accenture Data Engineering Analyst Interview Questions

### 1 How do you treat outliers in a dataset?Â

An outlier is a data point that is distant from other similar points. They may be due to variability in the measurement or may indicate experimental errors.Â

The graph depicted below shows there are three outliers in the dataset.

To deal with outliers, you can use the following four methods:

• Drop the outlier records
• Assign a new value
• Try a new transformation
• ### 4 Using the Sample Superstore dataset, display the top 5 and bottom 5 customers based on their profit.

• Drag Customer Name field on to Rows, and Profit on to Columns.
• Right-click on the Customer Name column to create a set
• Give a name to the set and select the top tab to choose the top 5 customers by sum(profit)
• Similarly, create a set for the bottom five customers by sum(profit)
• Select both the sets, right-click to create a combined set. Give a name to the set and choose All members in both sets.
• Drag top and bottom customers set on to Filters, and Profit field on to Colour to get the desired result.
• ### How do you decide whether your linear regression model fits the data?

• Make sure the assumptions are satisfactorily met.
• Examine potential influential point, the change in R2 and Adjusted R2 statistics.
• Check necessary interaction and apply the model to another data set and check its performance.
• Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

### 2 Is Naïve Bayes bad? If yes, under what aspects.

One of the disadvantages of Naïve-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero. And this will get a zero when all the probabilities are multiplied.

### 1 What are the different types of Hypothesis testing?

Hypothesis testing is the procedure used by statisticians and scientists to accept or reject statistical hypotheses. There are mainly two types of hypothesis testing:

• Null hypothesis: It states that there is no relation between the predictor and outcome variables in the population. H0 denoted it.Â Â
• Example: There is no association between a patientâs BMI and diabetes.

• Alternative hypothesis: It states that there is some relation between the predictor and outcome variables in the population. It is denoted by H1.
• Example: There could be an association between a patientâs BMI and diabetes.

### 2 How does the AND() function work in Excel?

AND() is a logical function that checks multiple conditions and returns TRUE or FALSE based on whether the conditions are met.

Syntax: AND(logica1,[logical2],[logical3]….)

In the below example, we are checking if the marks are greater than 45. The result will be true if the mark is >45, else it will be false.

### 2 Can you provide a dynamic range in âData Sourceâ for a Pivot table?

Yes, you can provide a dynamic range in the âData Sourceâ of Pivot tables. To do that, you need to create a named range using the offset function and base the pivot table using a named range constructed in the first step.

### What is the curse of dimensionality?

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.

### 2 Explain how VLOOKUP works in Excel?

VLOOKUP is used when you need to find things in a table or a range by row.

VLOOKUP accepts the following four parameters:

lookup_value – The value to look for in the first column of a table

table – The table from where you can extract value

col_index – The column from which to extract value

range_lookup – [optional] TRUE = approximate match (default). FALSE = exact match

Letâs understand VLOOKUP with an example.

If you wanted to find the department to which Stuart belongs to, you could use the VLOOKUP function as shown below:

Here, A11 cell has the lookup value, A2:E7 is the table array, 3 is the column index number with information about departments, and 0 is the range lookup.Â

If you hit enter, it will return âMarketingâ, indicating that Stuart is from the marketing department.