Accenture Data Engineer Interview Questions

During analysis, how do you treat missing values?

• Deleting Rows with missing values.
• Impute missing values for continuous variable and categorical variable.
• Other Imputation Methods.
• Using Algorithms that support missing values with prediction of missing values.
• 1 How will you explain logistic regression to an economist, physican scientist and biologist?

Logistic regression is a statistical analysis method used to predict a data value based on prior observations of a data set. A logistic regression model predicts a dependent data variable by analyzing the relationship between one or more existing independent variables

Write a function that takes in two sorted lists and outputs a sorted list that is their union.

• Take in the number of elements for the first list and store it in a variable.
• Take in the elements of the list one by one.
• Similarly, take in the elements for the second list also.
• Merge both the lists using the ‘+’ operator and then sort the list.
• Display the elements in the sorted list.
• Exit.
• How can you assess a good logistic model?

• Likelihood Ratio Test and Pseudo R^2.
• Hosmer-Lemeshow, Wald Test.
• Variable Importance, Classification Rate.
• ROC Curve, K-Fold Cross-Validation.
• A selected list of students came on 15th April 2021 along with me 4 other students were selected. A special thanks to GeeksForGeeks for helping me know the interview experiences of companies.

I had applied for Accenture On-Campus on 24/3/2021 through my campus link. I had my mock test on1st of April, please make sure u attend this test.

Communication Round – Non-elimination round (just to check on communication skills and the marks may be considered at the end of the selection process).

2 How will you define the number of clusters in a clustering algorithm?

The optimal number of clusters can be defined as follow: Compute clustering algorithm (e.g., k-means clustering) for different values of k. For each k, calculate the total within-cluster sum of square (wss). Plot the curve of wss according to the number of clusters k.

2 Is Naïve Bayes bad? If yes, under what aspects.

One of the disadvantages of Naïve-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero. And this will get a zero when all the probabilities are multiplied.

1 What is the Law of Large Numbers?

The law of large numbers is a theorem from probability and statistics that suggests that the average result from repeating an experiment multiple times will better approximate the true or expected underlying result. All sample observations for an experiment are drawn from an idealized population of observations.

2 Compare Sas, R, And Python Programming?

All big IT organizations choose SAS as their data analytics tools. As R is very good with heavy calculations, it is largely used by statisticians and researchers. Startups prefer Python over the other two due to its lightweight nature, large community, and deep learning capabilities.

1 What are the important libraries of Python that are used in Data Science?

The magnitude of the difference between the individual measurement and the true value of the quantity is called the absolute error of the measurement. The arithmetic mean of all the absolute error is taken as the mean absolute error of the value of the physical quantity.

Machine-learning algorithms use statistics to find patterns in massive amounts of data. And data, here, encompasses a lot of things—numbers, words, s, clicks, what have you. If it can be digitally stored, it can be fed into a machine-learning algorithm.