I am working through a python book.. but using Julialang instead.. in order to learn the language etc... and I have come upon another area here where I am not quite clear ..
but when i start tossing more complex matrices it fell apart..
include("activation_function_exercise/spiral_data.jl")
include("activation_function_exercise/dense_layer.jl")
include("activation_function_exercise/activation_relu.jl")
include("activation_function_exercise/activation_softmax.jl")
coords, color = spiral_data(100, 3)
dense1 = LayerDense(2,3)
dense2 = LayerDense(3,3)
forward(dense1, coords)
println("Forward 1 layer")
activated_output = relu_activation(dense1.output)
forward(dense2, activated_output)
println("Forward 2 layer")
activated_output2 = softmax_activation(dense2.output)
println("\n", activated_output2)
I get a proper matrix back
julia> activated_output2
300×3 Matrix{Float64}:
0.00333346 0.00333337 0.00333335
0.00333345 0.00333337 0.00333335
0.00333345 0.00333336 0.00333335
0.00333344 0.00333336 0.00333335
0.00333343 0.00333336 0.00333334
0.00333311 0.00333321 0.00333322
but the book has
>>>
[[0.33333 0.3333 0.3333]
...
Seems I am an order of magnitude lower than the book? even when using FluxMLs softmax function
EDIT:
I thought maybe my ReLU activation code was causing the discrepancy.. and tried switching to the FluxML NNlib version... but get same activated_output2
with 0.0033333
instead of 0.333333
will keep checking other parts like my forward function
EDIT2:
Adding my DenseLayer
implementation for completeness
DenseLayer
# see https://github.com/FluxML/Flux.jl/blob/b78a27b01c9629099adb059a98657b995760b617/src/layers/basic.jl#L71-L111
using Base: Integer, Float64
mutable struct LayerDense
weights::Matrix{Float64}
biases::Matrix{Float64}
num_inputs::Integer
num_neurons::Integer
output::Matrix{Float64}
LayerDense(num_inputs::Integer, num_neurons::Integer) = new(0.01 * randn(num_inputs, num_neurons), zeros((1, num_neurons)),num_inputs, num_neurons)
end
function forward(layer::LayerDense, inputs::Matrix{Float64})
layer.output = inputs * layer.weights .+ layer.biases
end
EDIT3:
Using the library.. I started inspecting my spiral_data
implementation.. seems within reason
Python
import numpy as np
import nnfs
from nnfs.datasets import spiral_data
nnfs.init()
X, y = spiral_data(samples=100, classes=3)
print(X[:4]). # just check the first couple
>>>
[[0. 0. ]
[0.00299556 0.00964661]
[0.01288097 0.01556285]
[0.02997479 0.0044481 ]]
JuliaLang
include("activation_function_exercise/spiral_data.jl")
coords, color = spiral_data(100, 3)
julia> coords
300×2 Matrix{Float64}:
0.0 0.0
-0.00133462 0.0100125
0.00346739 0.0199022
-0.00126302 0.0302767
0.00184948 0.0403617
0.0113095 0.0492225
0.0397276 0.0457691
0.0144484 0.0692151
0.0181726 0.0787382
0.0320308 0.0850793