Notes Exam

Introduction

S1: ODD Guidance and Checklists

Geospatial Analysis

K-means

Models and Modelling

An Introduction to Machine Learning

An agent in this model: is identifiable (discrete), is situated in an environment with which it 
interact, is goal-directed, is autonomous and self-directed (can function independently), can
perceive their environment, has behaviour (motion)

In Agent-Based Modeling (ABM), the environment can be transformed from stochastic (random or probabilistic) to deterministic (predictable and without randomness) by adjusting how agent interactions and environmental factors are governed. Transforming an ABM from stochastic to deterministic involves eliminating or controlling random elements in agent behavior and the environment. By using eliminate randomness, static environment with fixed attributes, predefined initialization, fixed rules, predefined conditions, and structured interactions, the behavior of the model becomes predictable and reproducible, making it easier to analyze and understand the dynamics of the system. 

  • Stochastic: Agents decide actions based on probabilities.
  • Deterministic: Agents decide actions based on specific conditions and thresholds.

 

Explanation

  • Statistical analysis like mean, median, and standard deviation provides a comprehensive understanding of the economic and demographic characteristics of potential business locations. These insights are essential for making informed decisions, minimizing risks, and optimizing resource allocation, ultimately contributing to the success of the business.
  • Eigenvalues: Indicate the amount of variance carried by each principal component, define magnitude.
  • Eigenvectors: Define the direction of the principal components in the feature space.
  • A perceptron is a type of artificial neural network used for binary classification tasks
  • Linearly Separable Data: The basic perceptron can only classify linearly separable data. If the data is not linearly separable, the perceptron will fail to find a suitable decision boundary.
  • Single Layer: The perceptron is a single-layer neural network, which limits its ability to capture complex patterns in the data. This limitation can be overcome by using multi-layer perceptrons (MLPs) or more advanced neural network architectures.
  • Types of ANNs

  • Feedforward Neural Networks (FNN):

    • The simplest type where connections do not form cycles.
    • Information moves in one direction from input to output.
  • Convolutional Neural Networks (CNN):

    • Specialized for processing grid-like data such as images.
    • Use convolutional layers to detect local patterns.
  • Recurrent Neural Networks (RNN):

    • Designed for sequence data like time series or natural language.
    • Have connections that form cycles, allowing them to maintain memory of previous inputs.
  • Multilayer Perceptrons (MLP):

    • A type of FNN with multiple hidden layers.
    • Each neuron in one layer is connected to every neuron in the next layer.
  • runing:

    • The process of removing unnecessary nodes from the tree to prevent overfitting and improve generalization.
    • Can be done pre-pruning (during tree construction) or post-pruning (after tree construction).
  • Voting (Classification) / Averaging (Regression) - Random Forest

  • For classification, each tree votes for a class, and the majority class is chosen as the final prediction.
  • For regression, the average of all tree predictions is taken as the final prediction.
  • In the context of the CART (Classification and Regression Trees) algorithm, pruning is a technique used to reduce the size of a decision tree by removing sections of the tree that may provide little to no additional power in predicting target variables. The goal of pruning is to enhance the model's ability to generalize to new data and prevent overfitting.
  • Abstract simulations are highly theoretical and often focus on exploring fundamental principles and mechanisms of complex systems. These simulations are not tied to real-world data but rather aim to understand the underlying dynamics of systems. Example: Schelling's segregation model, which explores how individual preferences can lead to the emergence of segregated neighborhoods even when those preferences are not strong
  • Experimental simulations are used to test hypotheses and explore the outcomes of different scenarios in a controlled virtual environment. They often involve systematic manipulation of variables to observe the effects.  Example: An ABM studying the impact of different vaccination strategies on the spread of a disease within a population.
  • Historical simulations aim to recreate and analyze past events or processes. These simulations use historical data to validate the model and to understand the dynamics that led to particular outcomes. Example: An ABM reconstructing the migration patterns of ancient human populations based on archaeological and genetic data.

  • Empirical simulations are grounded in real-world data and observations. These simulations strive for high fidelity and predictive accuracy, making them valuable for policy-making and practical applications. Example: An ABM used to simulate traffic flow in a city, utilizing actual traffic data to improve urban planning and reduce congestion.
  • In decision trees, bias represents the simplification error introduced by the model. While high bias leads to underfitting, low bias can lead to overfitting. Balancing bias and variance is crucial to building a robust model, often achieved through techniques like pruning and ensemble methods. Understanding and managing bias helps improve model performance and generalization to new, unseen data.
  • In classification, the goal is to split the data into subsets that are as homogeneous as possible with respect to the target variable (class labels). To achieve this, we use impurity measures like Gini impurity or entropy. 

    Calculation for splits:

    • For each feature, consider each possible split point.
    • Calculate the Gini impurity for each child node resulting from the split.
    • Calculate the weighted average of the Gini impurities of the child nodes.
    • Choose the split that results in the lowest weighted average Gini impurity.
  • Calculation for splits:

    • For each feature, consider each possible split point.
    • Calculate the entropy for each child node resulting from the split.
    • Calculate the weighted average of the entropies of the child nodes.
    • Choose the split that results in the lowest weighted average entropy.
  • In regression, the goal is to split the data into subsets such that the target variable values are as close as possible to each other within each subset. This is often done by minimizing the mean squared error (MSE) or mean absolute error (MAE).
  • Calculation for splits:

    • For each feature, consider each possible split point.
    • Calculate the MSE for each child node resulting from the split.
    • Calculate the weighted average of the MSEs of the child nodes.
    • Choose the split that results in the lowest weighted average MSE.

Examples

How to