Summary of the STAM elective course - ABM and ML concepts
1. Introduction to ABM:
---------------------------------------------------------------------------------------------------------------------------------
2. Designing an ABM:
---------------------------------------------------------------------------------------------------------------------------------
3. Model Validation ABM:
--------------------------------------------------------------------------------------------------------------------------------
4. ODD Protocol
These changes help entities improve their fitness, efficiency, and effectiveness in their respective environments.
if an agent’s adaptive traits or learning procedures are based on estimating future consequences of decisions, how do agents predict the future conditions (either environmental or internal) they will experience?
Agents predict future conditions using a combination of heuristic-based, model-based, and data-driven approaches. They may rely on simple heuristics, build internal models, analyze historical data, interact with other agents, use real-time sensing, or employ advanced machine learning techniques. These methods enable agents to estimate future environmental or internal conditions and adjust their behaviors to improve their chances of meeting their objectives.
Interactions in ABM:
Agent-Agent Interactions: Various forms of Communication, cooperation, competetion, adaptation. These involve sharing of information, exchanging resources, competing for limited resources, coordinating actions, adapting behaviours based on others' actions and influencing each other's decision making process
Agent-Environment Interactions
Environment-Agent Interactions
Environment-Environment Interactions
How do Agents interact with other Agents?
Agents are at the same location
Agents are at same cell/within a threshold distance/can see each other/belong to same neighbourhood or in same room
---------------------------------------------------------------------------------------------------------------------------------
5. Integrating ABM and ML
Definition:
Characteristics:
Definition:
Characteristics:
Definition:
Characteristics:
Definition:
Characteristics:
Emergent behavior and pattern emergence are fundamental concepts in modeling complex systems:
=====================================================================
|If the dataset has too many features??
This dataset has too many features (i.e., the number of features is more than the number of observations). So we face the Curse of Dimensionality. If we have more features than observations, we have the risk of overfitting in supervised learning, so the performance will drop. Moreover, the observations become harder to cluster. When we have too many features, the observations appear at equal distances from each other. So defining meaningful clusters will be hard
The curse of dimensionality refers to the difficulties that arise when analyzing or modeling data with many dimensions. These problems can be summarized in the following points: Data Sparsity: Data points become increasingly spread out, making it hard to find patterns or relationships.
=====================================================================
PCA vs K-means clustering
PCA and k-means clustering are both unsupervised machine learning techniques used for data analysis, but they have different goals and methods. PCA is used to reduce the dimensionality of the data, while k-means clustering groups data points together based on similarity. The technique you select depends on the specific dataset and goals of your analysis.
PCA creates new variables, i.e. principal components, that are linear combinations of the original variables. PCA takes a dataset with multiple variables as input, and it produces a dataset into a lower subspace, that is, a reduced dataset with fewer variables. It is often used in exploratory data analysis for building predictive models, but it is also used in data preprocessing for dimensionality reduction.
K-means is a clustering algorithm that assigns data points to clusters based on their distance from the cluster centers. It takes a dataset with one or more variables as input, and it produces a set of clusters with similar data points. It is often used to cluster data for a variety of use cases, such as image segmentation, customer segmentation, and anomaly detection.
=====================================================================
The outcome variable is also called the response or dependent variable, and the risk factors and confounders are called the predictors, or explanatory or independent variables. In regression analysis, the dependent variable is denoted "Y" and the independent variables are denoted by "X".
=====================================================================
Machine Learning Concepts:
Components in artificial neural networks
ANNs comprise several components like the ones below.
===================================================================
Box Plots
Box plots (also known as box and whisker plots) provide a visualization that provide three key benefits compared to other visualization of data:
Box plots show the size of the center quartiles and the values of Q1, Q2, and Q3.
Box plots show the inter quartile range (commonly called the IQR), a measure of the spread of the data. The IQR is the value of Q3 - Q1.
The IQR tells us the range of the middle 50% of the data. In other words, it tells us the width of the “box” on the box plot.
Box plots show outliers in the dataset. Outliers are data points that differ significantly from most of the other points in the dataset. In other words, they “lie outside” most of the data. They are plotted as single dots on a box plot. You can calculate outliers mathematically using these rules:
Outliers can be typos, lies, or real data! Outliers can have a strong effect on certain statistics (like the average) so it’s important that as a data scientist, you recognize outliers and decide if you want to include them in your analysis. Outliers should only be excluded from analysis for a good reason!
=-==================================================================
===================================================================
Solving a K-means problem:
Given 2 random centres, for an iteration of 2 times,
Compute distance of all points from the 2 centres and fill in the table and give label as the class with the minimum distance
update cluster centre: all points within 1 cluster, compute average x and y
Now this value becomes the 2 centres, compute distance and label and compute new centre.
-------------------------------------------------------------------------------------------------------------------
A model that exhibits small variance and high bias will underfit the target, while a model with high variance and little bias will overfit the target. A model with high variance may represent the data set accurately but could lead to overfitting to noisy or otherwise unrepresentative training data.
====================================================================
Definition:
Key Points:
Definition:
Key Points: