Top Data Science Interview Questions

What is a Database?

A Database is a container where data can be collected Systematically, Managing and Manipulation
of those data are easy.

Suppose an online telephone directory uses a database to store their data, like Name, address, phone, Numbers
and contact details.

An online library that has millions of books, in order, to maintain the data uses a database.

(2) Between what 2 components does DBMS act as an interface?

  • Database applications and the database
  • Data and Database
  • The User and Database Applications
  • Database Applications and SQL.

Answer: Database Application and Database

(3) Which one is not a component of DBMS?

(1) User Data
(2) Meta Data
(3) Reports
(4) Indexes

Answer: Reports

(4) Amazon an online commercial site is an example of a(n)

(1) Multiuser database Application
(2) SingleUser Database Application
(3) E-Commerce Application

Answer: E-Commerce Application

(5) What is the Full form of SQL?

Answer: Structured Query Language

(6) How can a Company Keep Track of its Business?

Answer: Database

(7) What is RDBMS?

The relationship between Data files is relational in RDBMS. They connect the data and different
file using common data numbers by using key concepts.

Properties of Database:

Values are atomic in the Database
Each row is different.
Columns are Different.
Each column has a Common name. (15:00)

(7) What is Key in RBDMS?

The key concept has an important role in relational Database Management Systems. The technique
is used to identify unique rows from tables and also helps to establish relationships among the tables.

(9) What are the Types of Keys in RBDMS?

(1) Primary key
(2) Super Key
(3) Candidate Key
(4) Alternative Key
(5) Composite Key
(6) Foreign Key

(9) What is the Primary key?

A primary key is basically a technique to classify unique tuples (rows) in the table.

The primary key does not contain any null value.

Value Should be Unique

Primary keys are not always single attributes/columns and they can also be set of more than
one attribute

Student_id Name Age

(11) What is the Super Key?

A Super key is a set of one or more columns or attributes to uniquely classify in the table

A super key is the superset of a Candidate Key.

Example: In the student table

  1. Student_id
  2. Student_id and Name are the Super keys.

(13) What is the Candidate key?

A candidate key is a set of one or more columns or attributes to a unique key and the rest keys are known as
Alternate or secondary keys.

(14) What is a Composite key?

The composite key consists of greater than one attribute to uniquely classify rows or records
& tuples in the table.

  • None of the columns can be performed as a Primary key.
  • So the combinations of the keys can be considered Composite keys.

(15) What is foreign key?

Foreign Keys are the columns of a table which refer to the primary key of another table
as they act as a cross-reference between tables.

(16) Differnce Between Super Keys and Candidate Keys? (25:48)

(17) What is Normalization?

Normalization is a process of organizing data to avoid duplicate and Redudancy?

  • Helps to Minimize Duplicate Data
  • To Minimise or avoid Data Modification Issues
  • To Simplify Queries

(18) What are the types of Normalization?

First Normal Form
Second Normal Form
Third Normal Form
Boyce and Codd Normal Form (BCNF)

(19) What is 1NF?

Each set of column Should have unique value.
It helps to prevent using the multiple columns to fetch the same row.
Each should contain a Primary key that identifies all the rows as unique data.
The Primary key is usually a single column, but if needed then more than one column
can be combined to create a single Primary key. (28:18)

(20) What is 2NF?

In the 2NF, relations must be 1NF.
First Normal Form is not able to reduce data redundancy.
2NF follows that there will not be any partial dependency on the column on Primary key.
It follows the concept of full functional Dependency.

(21) What is 3NF?

A relationship of table will be in 3NF if it is already in 2NF does not contain any
transistive partial dependency
3NF helps to reduce the data duplication
It also supports in Achieving data integrity.
If table has no transistive dependency for non-prime attributes, then the relation
should be in Third Normal Form.

(22) What is BCNF?

BCNF is basically the advance version of 3NF.
it is more strict than 3NF.
To use the law of BCNF we need to make sure that our data is already in 3NF.

(23) What is ER Diagram?
An Enity- relationship Diagram (ER Model) helps to describe the structure of particular
database with the help of diagram, that is known as Entity Relationship Diagram (ER Diagram)

(24) Why ER Diagram is Important?

ER Model helps to draw database Design.
It is an easy to display graphical tool for modeling data.
Mostly used in Database Design.
It is an GUI (Graphical User Interface) representation of the logical Structure of Database.
It assists you to identify the entities that exist in the system and the relationship between
those two entities.

(25) What are the components of an ER Diagram?

MySQL is a Database management system and also it is a relational database management system
based on SQL- Structured Query Langauage.

(26) How to Install mySQL?

From Official Site

(27) how Many types of Command in mySQL?


(28) What are DDL Commands?

DDL – Data Definition Langauge

Helps to define the Database

Deals with Description of the Database.

(29) Give an Example of DDL commands.


(30) Give an Example of DML commands

DML – Data Manipulation languge



(31) What are DQL commands.

DQL -> Data Query language
DQL used to make Queries on the data within Schema objects.
The main focus of DQL commands is get some Schema relation based on the query passed into it/

(32) What are DCL commands?
DCL – Data Control Language
Deals with the rights and permission of the database
Works for Controlling part of the data
GRANT – providers user’s access privilages to Database.

(33) What are TCL commands?

TCL -> Transactional Control language
COMMIT – Commits a Transaction
ROLLBACK – rollbacks a transaction for any erros that occurs
SAVEPOINT – uses a save point within a transaction
SET TRANSACTION – specifies the characteristics for the transction.

(34) What is an Aggregate Function?

An aggregate function is a function where the values of multiple rows are grouped together as
input on certain criteria to form single value of more significant meaning.

(35) What is the min function?

MIN function returns the smallest value from table.

(36) What is the Max Function?
MAX function returns the largest value in table.

(37) What is COUNT Function?

COUNT function returns the total count of the rows which are matched with the condition.

(38) What is Average Function?

Returns the Average value of the Function.

(39) What is Alias in SQL?

Alias concept is used SQL to give temporary name to the table or column of the table.

(40) What is Join?

A Join which is basically used to combine rows from different tables based on condition/

Types of Joins in SQL

Inner join
Left join
Right Join
Full Join
Self Join

(41) What is Inner Join?

The Inner Join helps thje selct the matching records from the both tables.

(42) What is left Join?

The Right join helps to fetch all records from the right table and the matching records from
the right table.

(43) What is Right Join?
The right Join helps to fetch all the records from the right table and match the record with
the left table

(44) What is Full Join?

A full Outer join helps to fetch all the records from both the tables whetherb there is
match or not.

(45) What is Self Join?
A self join helps to join with itself.

(46) What joins is needed when you include rows that do not have matching values?

(1) Equal – Join
(2) Left Join
(3) Right Join
(4) Full Outer Join

Ans: All of the Above

(47) What is Subqueries in SQL?

Suquery – Inner Query and One is Outer Query

First inner Query gets Executed

Get the results from the Inner Query.

Output of Inner Query added to the Outer Query.

(48) How Many Types of Subqueries are there?

Nested Subquery – Nested Subquery first executes the inner SELECT query then with the returning
values executes the outer Query

Correlated SubQuery – A correlated Subquery and reads every row in the table and compares
values to each row against related data.

(49) Difference Between Subquery and Join. (45:12)

(50) Which of the following statements is TRUE for subqueries?

A. Subquery can retreivew zero or More row.
B. The Subquery can appear on either side of comparsion operator.
C. There is no Limit on the number of subquery in the WHERE clause of staement.
D. Both A and B.

Answer : Both A and B

(51) Where can You not use Subqueries?

A. Field Names in the SELECT Statement
B. The WHERE clause only in the SELECT statement
C. The WHERE clause in SELECT as all DML statements.
D. The FROM clause in the SELECT statement.

Broadband Database Management System

How to handle Missing value Imputation for Quantative values.

(1) Replace them with mean
(2) Replace them with mode.
(3) Replace them with median
(4) None of them

Answer : A

(2) How will you compare two quantative variables?

Answer: We can compare them by using corelation. Change of any of one variable can
affect the other variable.

And it can be done in Scatter or dot plots gradually.

(3) What is DER model? And What are the method dependent on it?

Answer : Decision Estimation Rank

Decision : To Decide like yes/no. risky/ Not risky.

Elimination : How much and when the variables are estimated.

Rank: Ranking them accordingly, prioriotize what is more risky.

Statical techniques, Machine Learning, and Advanced Analytics Techniques are dependent on DER model.

(4) What is Forward Regression?

It is one of the automated mechanism of model selection that it starts with a variable
and each point it starts dropping variables.

(5) What is VIF and What rage it gives?
VIF Stands for Variance Inflation factor. VIF accesses whether the factors are correlated each other
(multicolinearity), which could affect p- values and the model isn’t going to be as reliable.

(6) What is the fisher exact test?
Fisher’s exact test is a statstical signifcance test used in the analysis of contingency tables.
Although in practice it is employed when sample size are small.It is valid for all sample sizes.

(7) what is Correlation and what are it’s measures?

It is the process to measure whether 2 variables are related or not. Scatter plot is the graphical way of
representing the relationship.

Measures of Correlation:

  • Strength of relationship
  • Pearson’s Correlation

(8) What do You mean by the Squared Error?

Mean Squared Error measures the average of the squares the average of the square of the erros. That is avarage
squared difference Between the estimated Values and the actual values.

(9) Is R-Squared goodness of fit?

Answer: R-squaerd is goodness of fit measure for lineaer regression models. This statstic
indicates the percentage of the variance in the dependent variable that is the independent
variables are explain correctively/

For instance, small R-squared values are not always a problem, and high R -squared values are not necessarily good.

(10) What is Difference Between RMSE and R-Squared In stastics?

Answer: The RMSE is the square root of the variance of the residuals. It indicates the absolute fit of the model
to the data- how close the observed data points are to model’s predicted Values.

Wheras R-Squared is a relative measures of fit. RMSE is an absolute measure of fit. In additiion,Adjusted R-squared
more then 0.75 is very good value for showing accuracy.In some cases,an Adjusted R-squared of 0.4 or more is acceptable
as well.

(11)What are the Best applications of Linear Regression?

A baker an estimate the right temperature for its even, to increase the shelf life of it

In BPO /KPOs, we can analyze the relationship between the wait of caller and numer of

Imagine You want to eatimate the demand of the Your customer. Or, You want to predict the sales in a particular items
in future.

Insurance company rely heavily on regression to understand the number of claims
at any given time ( A poplar user of Polynomial Regression)

(12) What are the assumptions of Linear Regression?

The Regression Model is Linear in parameters

The mean of residual is zero.

Homoscedaticity of residuals or equal variances.

No autocorrelation of resuduals.

The X – variables and residuals are uncorrelated.

The number of observation must be greater than number of X’s

The variability of X- values is positive

(13) What are the consequnces of violating the Lineaer Regression Assumptions.

Answer: Whenever we violates any of the linear regression assumption, the regression coefficient
produced by OLS will be either biased or variance estimates will be increased.

(14) What are the limitations of Linear Regression Modeling in Data Analysis?

Linear Regression modeling is one of the ways to specifying cause-effect relationship and
productive relationship in data analysis.

Variation in Data (both dependent and indepenent variables) – It requires enough variation
in the data.

Assumption of Deterministic indepndent variables independent of random error. This Assumption
is applied in many types of regression techniques.


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *