Getting Started with Flux for Julia (With CUDA)

Written by Madelyn Abbe

Machine learning in Julia has come a long way since the classic Mocha.jl days, and one of the most groundbreaking creations contributing to that ecosystem is Flux.jl. Flux is the definitive gradient descent library for Julia and can be compared to something like Tensorflow for Python. Flux follows a similar concept to a lot of Julia packages like Lathe.jl and DataFrames.jl, being only written in around one-thousand lines of code, and relying on only Julia itself. Compare this to solutions like Tensorflow and Pytorch, both of which use various languages including C++, Go, and C.

One great thing about Julia is just how seamlessly tethered parallel computing platforms and multi-threading are with the language. Nowhere is this more visualized (get it?) than the venerable platform for NVIDIA graphics processors, CUDA. The tight integration between Julia and your hardware continues very well into Flux, making Flux and CUDA a true match made in heaven. Set that zero flag to one in the machine-code and buckle-up, because this is positively going to be exciting!

Getting Data

For my sample data today, I chose to go with a data-set from MLDatasets.jl, which is a product of the lovely Julia Computing, you can add it using Pkg:

or alternatively, in the Pkg REPL:

I don’t usually select data-sets from packages that are commonly used, but I made this exception to ensure that this code would be reproducible without any downloads (at least not through your web-browser). The dataset I’m going to be using is the Fashion MINST data-set, which we can download like so:

Optionally you could also add a validation set, or split your own data-set with Lathe:

Since my data-set works with images, it might behoove me to convert the data into an image from its respective file format, which we can do like this:


First, we are going to need to import Flux itself:

I also directly imported several modules from Flux including onehotbatch, onecold, crossentropy, throttle, and params, as well as the mean function from Lathe.stats, partition from Julia’s Iterators, and Random. All of these are the pieces of the puzzle we can use to make our Flux model. The next step will be to construct a model chain. This is where Flux really shines because unlike most other machine-learning libraries, Flux’s gradient layers work using chains. Flux uses a combination of various unique and awesome syntax points inside of the Julia language to create a very elegant machine-learning environment, and chain is a great example of this.

Next, we’re going to need to get N for our train data:

Now we can use N to randomly shuffle and permute our train indexes with range iteration:

Something important to note here is that our data is going to need to be stored in sub-arrays, or dictionaries. Given that this will work with dictionaries, it will most likely work with DataFrames as well. After putting our data into a format that Flux’s batching can take, we can batch our data like so:

Now we simply construct our model with an anticipated return:

This is the output:

Chain(Conv((5, 5), 1=>64, elu), BatchNorm(64), MaxPool((3, 3), pad = (2, 2), stride = (2, 2)), Dropout(0.25), Conv((5, 5), 64=>128, elu), BatchNorm(128), MaxPool((2, 2), pad = (0, 0, 0, 0), stride = (2, 2)), Dropout(0.25), Conv((5, 5), 128=>256, elu), BatchNorm(256), MaxPool((2, 2), pad = (0, 0, 0, 0), stride = (2, 2)), Dropout(0.25), #9, Dense(2304, 256, elu), Dropout(0.5), Dense(256, 10), softmax)

I didn’t bother hyper-parameter tuning for this particular model, so it is very likely that accuracy could be increased with just a bit of optimization.

Next, we need a metric function that will allow our model to detect when it is doing good or bad. To do this we will need three big parts:

Attempt, Validation, Reconstruction

The attempt is what I like to call the initial guess before the network has learned anything. Validation is an essential step of the process where the model needs to detect whether it has gotten more accurate, or less accurate. Last, but not least, reconstruction is the recursive process where the guess is recovered and learned from. Here is my function:

And then we can plug all of these pieces into syntactical expressions:

And then train our model!

And now we can check out our accuracy using our same metric function:

97 percent accuracy!

Model Training Accuracy Graph-


Flux’s syntax, expressions, and speed make it a very valuable tool for Data Scientists working in Julia. Flux blows away the competition in many tests because of its compact size, simplicity, speed, and effectiveness. Another massive benefit to Flux is just how modular models can be, as I illustrated by building the layers of my network in a chain and then passing more built-ins onto them. Overall I’m excited about the development coming to Flux as well as Julia as a whole for machine-learning and statistics. If you’re interested in Flux, another cool thing you might be interested in is KNet, and I’m going to be writing “ A Swift Introduction” into that pretty soon! Something you could go and check out for yourself now is Metalhead.jl, which is an image classifier written in Flux that can be fit with new data and recycled for any classification use-case. Julia lovers do share this article!

I really hope that this article motivates you toward Julia and Machine Learning!


Copyright © 2019 by Techvik.