Key features
This course will prepare you to:
- Explain the architecture of the Python Certification for Datascience component
- Configure and use new functionalities in Python Certification for Datascience
- Use the standard Python Certification for Datascience Sub Modules
- Explain the Python Certification for Datascience Controlling Configuration and Customization option.
INTRODICTION PYTHON
- What is analytics & Data Science?
- Common Terms in Analytics
- Analytics vs. Data warehousing, OLAP, MIS Reporting
- Relevance in industry and need of the hour
- Types of problems and business objectives in various industries
- How leading companies are harnessing the power of analytics?
- Critical success drivers
- Overview of analytics tools & their popularity
- Analytics Methodology & problem solving framework
- List of steps in Analytics projects
- Identify the most appropriate solution design for the given problem statement
- Project plan for Analytics project & key milestones based on effort estimates
- Build Resource plan for analytics project
- Why Python for data science?
PYTHON: ESSENTIALS (CORE)
- Overview of Python- Starting with Python
- Introduction to installation of Python
- Introduction to Python Editors & IDE's(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
- Understand Jupyter notebook & Customize Settings
- Concept of Packages/Libraries - Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
- Installing & loading Packages & Name Spaces
- Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
- List and Dictionary Comprehensions
- Variable & Value Labels – Date & Time Values
- Basic Operations - Mathematical - string - date
- Reading and writing data
- Simple plotting
- Control flow & conditional statements
- Debugging & Code profiling
- How to create class and modules and how to call them?
SCIENTIFIC DISTRIBUTIONS USED IN PYTHON FOR DATA SCIENCE
- Numpy, scify, pandas, scikitlearn, statmodels, nltk etc
ACCESSING/IMPORTING AND EXPORTING DATA USING PYTHON MODULES
- Importing Data from various sources (Csv, txt, excel, access etc)
- Database Input (Connecting to database)
- Viewing Data objects - subsetting, methods
- Exporting Data to various formats
- Important python modules: Pandas, beautifulsoup
DATA MANIPULATION – CLEANSING – MUNGING USING PYTHON MODULES
- Cleansing Data with Python
- Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
- Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
- Python Built-in Functions (Text, numeric, date, utility functions)
- Python User Defined Functions
- Stripping out extraneous information
- Normalizing data
- Formatting data
- Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)
DATA ANALYSIS – VISUALIZATION USING PYTHON
- Introduction exploratory data analysis
- Descriptive statistics, Frequency Tables and summarization
- Univariate Analysis (Distribution of data & Graphical Analysis)
- Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
- Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
- Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)
INTRODUCTION TO STATISTICS
- Basic Statistics - Measures of Central Tendencies and Variance
- Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem
- Inferential Statistics -Sampling - Concept of Hypothesis Testing
- Statistical Methods - Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square
- Important modules for statistical methods: Numpy, Scipy, Pandas
INTRODUCTION TO PREDICTIVE MODELING
- Concept of model in analytics and how it is used?
- Common terminology used in analytics & modeling process
- Popular modeling algorithms
- Types of Business problems - Mapping of Techniques
- Different Phases of Predictive Modeling
DATA EXPLORATION FOR MODELING
- Need for structured exploratory data
- EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
- Identify missing data
- Identify outliers data
- Visualize the data trends and patterns
DATA PREPARATION
- Need of Data preparation
- Consolidation/Aggregation - Outlier treatment - Flat Liners - Missing values- Dummy creation - Variable Reduction
- Variable Reduction Techniques - Factor & PCA Analysis
SEGMENTATION: SOLVING SEGMENTATION PROBLEMS
- Introduction to Segmentation
- Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
- Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
- Behavioral Segmentation Techniques (K-Means Cluster Analysis)
- Cluster evaluation and profiling - Identify cluster characteristics
- Interpretation of results - Implementation on new data
LINEAR REGRESSION: SOLVING REGRESSION PROBLEMS
- Introduction - Applications
- Assumptions of Linear Regression
- Building Linear Regression Model
- Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
- Assess the overall effectiveness of the model
- Validation of Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
- Interpretation of Results - Business Validation - Implementation on new data
LOGISTIC REGRESSION: SOLVING CLASSIFICATION PROBLEMS
- Introduction - Applications
- Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
- Building Logistic Regression Model (Binary Logistic Model)
- Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
- Validation of Logistic Regression Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
- Interpretation of Results - Business Validation - Implementation on new data
TIME SERIES FORECASTING: SOLVING FORECASTING PROBLEMS
- Introduction - Applications
- Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
- Classification of Techniques(Pattern based - Pattern less)
- Basic Techniques - Averages, Smoothening, etc
- Advanced Techniques - AR Models, ARIMA, etc
- Understanding Forecasting Accuracy - MAPE, MAD, MSE, etc
MACHINE LEARNING -PREDICTIVE MODELING – BASICS
- Introduction to Machine Learning & Predictive Modeling
- Types of Business problems - Mapping of Techniques - Regression vs. classification vs. segmentation vs. Forecasting
- Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
- Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
- Overfitting (Bias-Variance Trade off) & Performance Metrics
- Feature engineering & dimension reduction
- Concept of optimization & cost function
- Overview of gradient descent algorithm
- Overview of Cross validation(Bootstrapping, K-Fold validation etc)
- Model performance metrics (R-square, Adjusted R-squre, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )
UNSUPERVISED LEARNING: SEGMENTATION
- What is segmentation & Role of ML in Segmentation?
- Concept of Distance and related math background
- K-Means Clustering
- Expectation Maximization
- Hierarchical Clustering
- Spectral Clustering (DBSCAN)
- Principle component Analysis (PCA)
SUPERVISED LEARNING: DECISION TREES
- Decision Trees - Introduction - Applications
- Types of Decision Tree Algorithms
- Construction of Decision Trees through Simplified Examples; Choosing the "Best" attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees
- Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness
- Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules
- Decision Trees - Validation
- Overfitting - Best Practices to avoid
SUPERVISED LEARNING: ENSEMBLE LEARNING
- Concept of Ensembling
- Manual Ensembling Vs. Automated Ensembling
- Methods of Ensembling (Stacking, Mixture of Experts)
- Bagging (Logic, Practical Applications)
- Random forest (Logic, Practical Applications)
- Boosting (Logic, Practical Applications)
- Ada Boost
- Gradient Boosting Machines (GBM)
- XGBoost
SUPERVISED LEARNING: ARTIFICIAL NEURAL NETWORKS (ANN)
- Motivation for Neural Networks and Its Applications
- Perceptron and Single Layer Neural Network, and Hand Calculations
- Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques
- Neural Networks for Regression
- Neural Networks for Classification
- Interpretation of Outputs and Fine tune the models with hyper parameters
- Validating ANN models
SUPERVISED LEARNING: SUPPORT VECTOR MACHINES
- Motivation for Support Vector Machine & Applications
- Support Vector Regression
- Support vector classifier (Linear & Non-Linear)
- Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)
- Interpretation of Outputs and Fine tune the models with hyper parameters
- Validating SVM models
SUPERVISED LEARNING: KNN
- What is KNN & Applications?
- KNN for missing treatment
- KNN For solving regression problems
- KNN for solving classification problems
- Validating KNN model
- Model fine tuning with hyper parameters
SUPERVISED LEARNING: NAÏVE BAYES
- Concept of Conditional Probability
- Bayes Theorem and Its Applications
- Naïve Bayes for classification
- Applications of Naïve Bayes in Classifications
TEXT MINING & ANALYTICS
- Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
- Finding patterns in text: text mining, text as a graph
- Natural Language processing (NLP)
- Text Analytics – Sentiment Analysis using Python
- Text Analytics – Word cloud analysis using Python
- Text Analytics - Segmentation using K-Means/Hierarchical Clustering
- Text Analytics - Classification (Spam/Not spam)
- Applications of Social Media Analytics
- Metrics(Measures Actions) in social media analytics
- Examples & Actionable Insights using Social Media Analytics
- Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)
- Fine tuning the models using Hyper parameters, grid search, piping etc.
Practice Test and Interview Questions
Practice Test and Interview Questions
Reviews
There are no reviews yet.