Neural networks are function approximators inspired (loosely) by the brain.
They take an input vector (x), apply a sequence of linear transformations and nonlinear activations, and output a prediction (\hat{y}).
At a high level:
[ x \rightarrow \text{(Linear)} \rightarrow \text{(Nonlinearity)} \rightarrow \dots \rightarrow \text{Output} ]
By stacking many such layers, networks can model very complex, highly nonlinear relationships in data.
The basic building block is a neuron (or perceptron):
[ z = w^\top x + b,\quad a = \sigma(z) ]
where:
Common activations:
A simple fully connected network has:
Example with one hidden layer:
[ h = \sigma(W_1 x + b_1),\quad \hat{y} = f(W_2 h + b_2) ]
where (f) is usually:
Training a neural network means finding weights ({W, b}) that minimize a loss function on the training data.
Typical losses:
The key algorithm is backpropagation:
This process repeats for many epochs until the loss stops improving (or early stopping kicks in).
Neural networks can easily overfit, especially with many parameters. Common regularization techniques:
Neural networks are especially powerful when:
For small tabular datasets, simpler models (tree ensembles, linear models) are often competitive or better.
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
# Dummy data: regression example
X = np.random.randn(1000, 10)
y = (X[:, 0] * 2.0 + X[:, 1] * -3.0 + 0.5 * np.random.randn(1000))
model = keras.Sequential([
layers.Dense(32, activation="relu", input_shape=(10,)),
layers.Dense(16, activation="relu"),
layers.Dense(1) # regression output
])
model.compile(
optimizer="adam",
loss="mse",
metrics=["mae"]
)
model.summary()
model.fit(
X, y,
epochs=20,
batch_size=32,
validation_split=0.2,
verbose=1
)
This builds a small feedforward neural network for a toy regression task, trains it, and reports loss/MAE on a validation split.