Post on 22-Mar-2020
Ricardo de Castro
Technische Universität München
Lehrstuhl für Elektrische Antriebssysteme
Neural Networks: Introduction & Matlab Examples
Agenda
• Introduction & Motivation
• Single-Layer Neural Network
• Fundamentals: neuron, activation function and layer
• Matlab example: constructing & evaluating NN
• Learning algorithms
• Batch solution: least-squares
• Online solution: LMS
• Matlab example: online system identification with NN
• Multi-Layer Neural Network
• Network architecture
• Learning algorithm: backpropagation
• Matlab example: nonlinear fitting with noise
• Overfitting & regularization
• Case Study
• Matlab example: MPC solution via Neural Networks
3
References
[1] Hagan et al. Neural Network Design, 2nd edition, 2014 online version: https://hagan.okstate.edu/nnd.html
[2] Abu-Mostafa et al. Learning from Data, a Short Course, 2012.
[3] Mathworks, Neural Network Toolbox User‘s Guide (2017)
(aka Deep Learning Toolbox )Chapters 2,3, 10 and 11
4
Some Problems…
4
Computer visionsymptoms:
fever, headache,
blood pressure, …
biochemical analysis,
age, height, …
Medical Diagnosis
Finance (e.g. prediction)
stockPrice[k+1]
stockPrice[k],
stockPrice[k-1],
…
stockPrice[k-N]
diagnosis
Control
(e.g., prediction / system identification)
y[k+1]u[k], u[k-1],… u[k-N],
y[k], y[k-1], …, y[k-M]
u = control input, y=output, k=time index
How to build a system that can learn
these tasks?
5
Motivation Example: Credit Approval
Adapted from “Learning from Data, a Short Course”
credit approved
credit rejected
Application Information
Formalism:
• Input: � = ���, ������, �����������, … ∈ ℝ�
• Output: � ∈ 0,1
• Ideal function: � = ℎ �
• Data: ��, �� , ��, �� , … , (��, ��)
• Candidate model: �� = �(�)
6
Learning FrameworkCredit Approval
� �
(0 credit rejected, 1 credit approved)
Adapted from “Learning from Data, a Short Course”
(customer application)
(ℎ=ideal credit approval function)
(historical records)
(formula to be learned from data, e.g. Neural Network)
Remarks:
• Ideal (credit approval) function is unknown,
• …, but banks have massive amounts of data (customer info, default history, etc… )
7
Learning Framework (II)
Unknown Function
Candidate Model
(e.g. Neural Net)
� � ≈ �
Learning Algorithm
Adapt �s.t. � ≈ ��
� = ℎ ��
input output
�
Data Set
� = (��, �� , … , (��, ��)}
�� = �(�, �)
model output,
Main goal: learn � (i.e. � ≈ ℎ) using data�
Adapted from “Learning from Data, a Short Course”
��, �� =data point
�
parameters, �
��
�, �
8
Classification vs Regression
� ∈ {0,1, . . 9, }
Classification Problems
Output discrete categories
����������(�)����������(� − 1)
…����������(� − 1)
� ∈ [0,∞)
Regression Problems
Output continuous variable
10
Neuron (Single Input)
Input
• � ∈ ℝ
Output
• � ∈ ℝ
Parameters
• weight: � ∈ ℝ
• bias : � ∈ ℝ
Activation/Transfer
Function
• �: ℝ → ℝ
� = �(�� + �)
Net Input
• � ∈ ℝ
�
11
Neuron: Activation Functions
�Hard LimitLinear
� � = � � � = �0� < 01� ≥ 0
�
Log-Sigmoid
� � =1
1 + ���
Hyperbolic TangentSigmoid
� � =�� − ���
�� + ���
�
Matlab: purelin Matlab: hardlim
Matlab: logsig Matlab: tansig
12
Neuron: Multiple Inputs
� = �(�)
element-wise representation vector representation
Net input: � = ��,��� + ��,��� + ⋯��,��� + �
Inputs: ��, ��, … , ��
Weights: ��,�, ��,�, … , ��,�
Bias: �
Weight vector:
� = ��,� ��,� … ��,�
Bias: �
Net input: � = �� + �
Inputs: � = �� �� … �� �
� = �(�� + �)
Number of inputs: �
13
Single-Layer Neural Network (NN)
element-wise representation
Number of inputs=�
�� = � ��,��� + ��,��� + ⋯��,��� + ��
�� = � ��,��� + ��,��� + ⋯��,��� + ��
�� = � ��,��� + ��,��� + ⋯��,��� + ��
Number of outputs= �
Remark: each of the � inputs is connected to a
neuron through a weight ��,�
��,�= weight connecting neuron �
with input �
…
Notation:
14
Single-Layer Neural Network (NN)
vector representation
� = number of inputs�= number of outputs
� = �(�� + �)
P = [1;
2]; % data set - input (R=2)
Y = 1; % data set - output (S=1)
net = linearlayer;
% Creates a single layer of linear neurons
% OUTPUT: net = neural network object
net = configure(net, P, Y);
% Configures network dimensions (inputs, outputs, weight,
% bias, etc) to match the size of data set (P,Y)
% INPUT
% net= neural network object
% P = [R-by-N] matrix with data set (input)
% Y = [S-by-N] matrix with data set (output)
% OUTPUT
% net= configured neural network object
15
Create and Evaluate Single-Layer NN (I)
Matlab Code
% net = Neural network object
net.layers{1}.transferFcn ='hardlim';
net.IW{1,1} = W; % set weights Wnet.b{1} = b; % set bias b
A= sim(net, P)
% Simulates neural network
% INPUT
% net= neural network object
% P = [R-by-N] input data
% OUTPUT
% A= [S-by-N] neural network output
16
Create and Evaluate Single-Layer NN (II)
Matlab Code
define activation function
evaluate NN for input(s) P
define W,b
17
Matlab Example
Consider a two-input network with one neuron and the following parameters:
� = 3 2 b = 1.2� =−11
Write a Matlab script that computes the output of the network for the following
activation functions:
a) Hard limit
b) Linear
c) Log-Sigmoid
Answers:
a) 1.000
b) 0.200
c) 0.197
Data set: � = (��, �� , … , (��, ��)},
vector input: �� ∈ ℝ�
scalar output: � ∈ ℝ
Single-layer linear NN: � = �� + �
parameters � = �� �
Error function:
18
Learning Algorithm: Problem Setup
�(�) =1
2�� �� − ��(�) �
�
���
��(�) = ��� + �
min�
�(�)Problem:
�∗ = �∗�∗ �
= optimal NN parameters
19
Learning Algorithm: Batch Solution
�� � = 0
�� � = � ���
1= �� ��
1
Problem: solution might be costly to compute for large data sets
Solution 1:
min�
�(�) = min�
�
��� − �� �(� − ��) � =
��
��
⋮��
, � =
��� 1
��� 1⋮
��� 1
�∗ = ��� �����
A. vector representation
B. first-order optimality condition
Least-squares Solution
min�
�(�) = min1
2�� �� − ��(�) �
�
���
20
Learning Algorithm: On-line Solution
Solution 2:
A. Split error function
�� =1
2(�� − �� � )�
�� � = � ���
1= �� ��
1
B. Sequential gradient descent
�(���) = �(�) − ����
� = ������������
� = ���������������
�(�) = ������������
���(�) = −(�� − �� ��
1)
��
1
min�
�(�) = min1
����
�
���
�(���) = �(�) + �(�� − �� 1 � � )��
1
Remarks:
• data points are processed one at a time (useful for real-time applications)
• learning rate needs to be chosen with care to ensure algorithm convergence [Hagan, Chap. 10]
C. Compute gradient
aka least-mean squares (LMS) algorithm
min�
�(�) = min1
2�� �� − ��(�) �
�
���
21
On-line Learning Algorithm: Matlab API
Matlab Code
%net = Neural network object
net.IW{1,1} = …; % set initial weightsnet.b{1} = …; % set initial bias
% setup learning algorithmnet.inputWeights{1,1}.learnFcn = 'learnwh'; net.biases{1}.learnFcn = 'learnwh';
net.inputWeights{1,1}.learnParam.lr = alpha;
net.biases{1,1}.learnParam.lr = alpha;
net.trainFcn = 'trains'; %
[net] = adapt(net,p,y); %
% INPUT
% net= neural network object
% p = [R-by-1] data point- input
% y = [S-by-1] data point- output
% OUTPUT
% net= updated neural network object (with new weights and bias)
define learning rate (�)
define learning algorithm
(Widrow-Hoff weight/bias learning=LMS)
set sequential/online training
apply 1 steps of the LMS algorithm
Remark: simplified API for function adapt()
22
Online System Identification via NN (I)
controlinput
outputPlant
Neural Network
Learning
Algorithm
-
�,�
NN output
Consider the following discrete-time model of a plant
Goal: design and train a single-layer NN capable of approximating the output of this plant
Proposed structure for the NN
• Single layer
• Inputs: � = � � � � − 1 � � − 2 �
• Activation function: linear
Write a Matlab script that:
a) generates a data set for the problem
- assume: � � = sin(2�����), �� = 1�, � = 0.02��, � = 1,2,3, … , 150
b) plots data set
c) creates the NN and define initial weights and bias: � = 0 −10 0 , � = −2
d) adapts the weight and bias of the NN in order to approximate the plant’s output � �
- option 1: via adapt(.); option 2: manual implementation of the LMS algorithm
- learning rate � = 0.2
e) plots i) plant’s output, NN’s output; ii) difference between plant’s output and NN’s output23
Online System Identification via NN (II)
� � = � � + 1.8� � − 1 + 0.9�[� − 2]
Results:
24
Online System Identification via NN (III)
a) Training Data b) Fitting Results
Final weights/bias� = 6.23 −5.06 3.58 , � = 0.0883
0 50 100 150
index
-4
-3
-2
-1
0
1
2
3
4
u[k]
u[k-1]
u[k-2]
y[k]
0 50 100 150-4
-2
0
2
4 y(true)
y(NN prediction)
0 50 100 150
index
-5
0
5
err
or=
y-y(
NN
)
25
Multi-Layer Neural Network (I)Example: 2-layer NN
� = number of inputs
Layer 1 Layer 2
Number of neurons �� ��
Outputs ���, ��
�,… ���� ��
�, ���,… ���
�
Weights ����
� = 1,… , ��; � = 1,… , �
����
� = 1,… , ��; � = 1,… , ��
Bias ���, � = 1, … , �� ��
�, � = 1,… , ��
Activation function �� ��
Element-wise representation
26
Multi-Layer Neural Network (II)
Layer 1 Layer 2
Number of neurons �� ��
Outputs �� ��
Weights �� ��
Bias �� ��
Activation function �� ��
Vector representation
�� = ��(��� + ��) �� = �� ���� + ��
Example: 2-layer NN
27
Multi-Layer Neural Network (III)
�� = ��(��� + ��) �� = �� ���� + ��
Remarks:
• Layer 1 Hidden/Input layer
• Layer 2 Output layer
• �, �� defined by data set (input and output dimensions)
• ��, �� , �� defined by the designer
• ��, ��, ��, ��parameters defined by the learning algorithm
�� = �� ����(��� + ��) + ��
Hidden layer Output layer
Data set: � = (��, �� , … , (��, ��)},
vector input: �� ∈ ℝ�
scalar output: � ∈ ℝ
Two-layer NN:
� = � �, � = �� ����(��� + ��) + ��
parameters � = (��,��, ��, ��)
Error function:
28
Learning Algorithm: Problem Setup
�(�) =1
2�� �� − ��(�) �
���
��(�) = � ��, �
min�
�(�)Problem:
�∗ = optimal NN parameters
29
Learning Algorithm: Backpropagation
Solution :
A. Split error function
�� =1
2(�� − �� � )�
�� � = � ��, �
B. Sequential gradient descent
�(���) = �(�) − ����
� = ������������
� = ���������������
�(�) = ������������
min�
�(�) = min1
����
�
���
min�
�(�) = min1
2�� �� − ��(�) �
�
���
���(�) = −(�� − ��)��(��, �)
��
C. Compute gradient
Backpropagation algorithm
compute gradients of �(��, �) using chain rule
(see [Hagan, Chap. 11] for details)
� ��, � =�� ����(��� + ��) + ��
30
Matlab API: Create a 2-layer NN
P = [1;
2]; % data set - input (R=2)
Y = 1; % data set - output
hiddensize = [10];
net = feedforwardnet(hiddensize);
% Creates a multi-layer neuron network
% INPUT
% hiddensize = row vector with one or more
% hidden layer sizes
% OUTPUT: net = neural network object
net = configure(net, P, Y);
% configures network dimensions (inputs, outputs, weight,
% bias, etc) to match the size of data set (P,Y)
Matlab Code
�� = 10 � = 2, �� = 1
31
Matlab API: Setup activation functions, W, b
% define activation function,
% weight and bias of Layer 1net.layers{1}.transferFcn ='logsig';net.IW{1,1} = … ;% set W1 net.b{1} = …; % set b1
% define activation function,
% weight and bias of Layer 2net.layers{2}.transferFcn ='purelin';net.LW{2,1} = …; % set W2net.b{2} = …; % set b2
Matlab Code
����, �� ��
��, ��
% Remark: net.b{i} = bias of layer i; net.LW{i,j} = weights conecting layer i with layer j
32
Matlab API: Training Algorithm
net.trainFcn = 'traingd'; % Gradient descent backpropagation
%net.trainFcn = ' trainlm'; % Levenberg-Marquardt backpropagation (default)
net.trainParam.lr = 0.01; % learning rate (\alpha)net.trainParam.epochs=1000; % Maximum number of epochs to trainnet.trainParam.goal=0; % performance goal (stop criteria)
net.trainParam.min_grad=1e-5; % Minimum performance gradient (stop critera)
Matlab Code
Learning/Training
Algorithm��, ��,��, ��
(min acceptable value for �(�) )
(min acceptable value for��(�))
Data
Set
33
Matlab API: Training Algorithm
net.trainParam.showWindow = 0; % deactivate interactive GUInet.trainParam.showCommandLine=1; % Generate command-line outputnet.divideFcn = ’’; % use entire data set for training
[net]=train(net, P,Y);
% train trains a network net according to net.trainFcn and net.trainParam.
% INPUT
% net= neural network object
% P = [R-by-N] data set - input
% Y = [S-by-N] data set - output
% OUTPUT
% net = (trained) neural network output
Matlab Code
Learning/Training
Algorithm��, ��,��, ��
Data
Set
Consider the following system
Write a Matlab script that:
a) generates an artificial data set for this problem, i.e.
� = { ��, �� , }, where �� ∈ {−1,−0.9, −0.8, … , 1} is the input for the system
b) creates a neural network (NN) with 2 layers;
a) hidden layer with 10 neurons and activation function=logistic sigmoid
b) output layer with 1 neuron and activation function=linear
c) trains the weights of the NN to fit the data set �
d) plots
training results, i.e. the data set � and the output of the trained NN
testing results, i.e. the data set �, the (true) function and trained NN evaluated in the
domain �� ∈ {−1,−0.99, −0.98,−0.97,… , 1}34
Matlab Example: Nonlinear Fitting
• input: � ∈ [−1,1]
• output: � ∈ ℝ
• function: ℎ � = 1 + sin ��
• random noise: � ∼ �(0, ��)
Gaussian distribution with zero
mean and variance �� = 0.2
ℎ ��
��
35
Matlab Example: Nonlinear Fitting (II)
Perfect fitting of training data We might be fitting noise, instead of
the true signal (ℎ)
Problem: previous example shown poor generalization, i.e.
• NN fits very well the training data set,
• … but produces large error when faced with new inputs
Solution*: Regularization
• Idea: penalize large values of NN parameters (weights/bias)
• � ∈ [0,1] → controls the importance of regularlization vs fitting
• larger � →smoother fitting of neural network
36
Overfitting
min�
(1 − �)� � + � � �
regularization term
Problem:
� = (��,��, ��,��,… )
Matlab Code
net.performParam.regularization = 0.1; % sets value of lambda(�)
*Additional solutions: early stopping, reduce number of neurons (see Hagen,Chap.13, for details)
(Continuation of previous problem)
e) adapt the previous Matlab script in order to train neural network with regularization
• use � = 10��
f) Plot training and testing results for both cases
37
Matlab Example: Nonlinear Fitting (III)
• input: � ∈ [−1,1]
• output: � ∈ ℝ
• function: ℎ � = 1 + sin ��
• random noise: � ∼ �(0, ��)
Gaussian distribution with zero
mean and variance �� = 0.2
ℎ ��
��
38
Matlab Example: Nonlinear Fitting (IV)
� = 0
� = 10��
39
Matlab API: Auxiliary Commands
Matlab Code
gensim(net);
% net = neural network object
Generation of Simulink block
for neural network simulation
view(net);
Generation of a
graphical view
Consider the following discrete-time model of a mechanical system
40
MPC-NN: Motivation
Previous lecture: the MPC Toolbox was employed to design a controller that brings the
state of this system to the origin while fulfilling input and state constraints.
Issue: the computational time of the MPC is too high due to the numerical optimization
routines
Task: design a NN that learns the MPC’s control law, enabling us to quickly compute the
optimal control action
1) Design MPC Controller
41
MPC-NN: Design Approach
MPC Plant
controlinput states
2) Design NN
MPC
Neural Network
Training
Algorithm
-
parameters
error
3) Replace MPC with NN
Plant
controlinput states
Neural Network
Write a Matlab script that:
a) generates data set* where
b) plots training data
c) trains a neural network to learn the MPC‘s control law using the following settings
2 layers
hidden layer: 20 neurons, „logsig“ activation function
output layer: 1 neuron, linear activation function
d) plots neural network’s fitting of training data
e) exports trained network to Simulink and simulates the closed-loop response of
plant with initial state
42
MPC-NN: Matlab Example
*Matlab implementation of MPC controller: i) previous lecture; or ii) https://tinyurl.com/tsfvdyp
43
MPC-NN: Matlab Example (II)
b) training data d) NN fitting
-1.5
-1
-1
-0.5
-0.5
u
0
-10
x2
0-5
0.5
x1
0.5 0
1
51 10
-2
-1
-1
-0.5
0u
-10
x2
0
1
-5
x1
0.5 0
2
51 10
44
MPC-NN: Matlab Example (III)
x1
x2
u
time
e) Plant response with NN control
Summary
Introduction
• Data sets, learning algorithms, candidate models
Single-Layer Neural Network
• Fundamentals: neuron, activation function, layer
• Matlab example: constructing & evaluating NN
• Learning algorithms
• Batch solution: least-squares
• Online solution: LMS
Multi-Layer Neural Network
• Network architecture
• Learning algorithm: backpropagation
• Overfitting & regularization
Key Matlab Functions
linearlayer()
configure()
net.layer{i}.transferFcn=...
sim()
adapt()
feedforwardnet()
train()
net.performParam.
regularization=…
46
Homework
Consider the following system
Write a Matlab script that:
a) generates an artificial data set for this problem, i.e.
� = ��, �� , where �� is an input sampled from the grid −1,−0.9, . . , 1 × −1,−0.9, . . , 1
b) plots data set (hint use plot3)
c) creates a neural network (NN) with 2 layers;
a) hidden layer with 50 neurons and activation function= hyperbolic tangent sigmoid
b) output layer with 1 neuron and activation function=linear
d) trains the weights and bias of the NN to fit the data set �
• Use all data for training; number of epochs = 2000
e) plots training results, i.e. the data set � and the output of the trained NN
f) adds a regularization term to the training (� = 0.5) and plots results
• input: � = ����� ∈ −1,1 × [−1,1]
• output: � ∈ ℝ
• function: ℎ � = 10��� + 3�� + 7����
• random noise: � ∼ � 0, �� , �� = 2
ℎ ��
��
47
Homework: Expected Results
-201
-10
0
0.5 1
y
10
0.5
p2
20
0
p1
30
0-0.5
-0.5-1 -1
-201
-10
0.5 1
0y0.5
10
p2
0
p1
20
0-0.5
-0.5-1 -1
� = 0.5� = 0.0