Machine Learning (in Elixir) - Intro

Machine Learning (in Elixir) - Intro

Find out what Machine Learning is about and what it looks in the Elixir's world

ยท

9 min read

This post will be a part of my journey in learning Machine Learning (ML). I'm a student of the decent ML course "Machine Learning with JavaScript". You may wonder: Why not Python? Well, to make things more interesting - it won't even be JS ๐Ÿ˜‰

I decided to go with Elixir ๐Ÿ’œ and its TensorFlow-like library - Nx! The foundations are the same for all languages and libraries. There's a lot of math. That's a bit scary. But the journey and results are so exciting! So don't worry and let's start the ML journey!

The goal of Machine Learning

This might sound trivial but have you thought about what ML is about? The ultimate goal of the ML model is to guess the result for the given input. There're plenty of real-life examples from many different fields, for instance:

  • What's the predicted gender (result), for the given height and weight (input)?

  • Tell me what number (result) is on the image (input)

  • Is this email (input) spam (result)?

  • AI, tell me everything you know (result) about Elixir programming language (input)

I'd like to highlight the word guess here. ML problems are often complex and almost never you will get 100% certainty that the answer is correct. Roughly we can say that accuracy above 80% is good enough. But it strongly depends on the solved problem.

In the ML world, we call an input set of features. The result is called a label. The individual set of features is called an example. So, for the first example of predicting gender based on height and weight, it would be something like that:

features = [
  #[height (cm), weight (kg)]
  [180, 75], # 1st example
  [159, 56], # 2nd example
  ...
]

labels = [
  #[gender 0 - man, 1 - woman]
  [0], # for the 1st example
  [1], # for the 2nd example
  ...
]

As you can see, order matters in both directions. Individual features should be placed in specific columns (vertical order). And remember that labels are linked to a particular set of features (horizontal order).

Notice that in the example above we represent gender as a number, that's because ML speaks in numbers. Transforming non-numerical values to numbers is called encoding.

When Machine Learns...

To predict the result correctly you need a pre-trained ML model. A model is an algorithm, a super math formula. For most of the problems, it's so tough for humans to determine it so we incorporate machines - our computers ๐Ÿ˜‰

How humans solve math problems

When we solve the math problem, it's about applying the math formula, calculating factors, and then we can calculate the result. Let's dig deeper for one of the simpler and most useful functions - linear function.

y = ax + b

y is the result, x is an input. a and b are factors we call weights in ML. If you have at least two examples (x-y pairs), you can figure out weights (a and b) and with you, you can calculate any y for the given x.

That's simple, isn't it? We don't need ML and all the hard stuff at all, right? ๐Ÿ˜‰ Well, in the math field or ideal world - yes! ๐Ÿ‘ But the real world is a bit more complicated...

How machines do it (in real life)

Math formulas won't solve many real problems themself, because they're too complex. But, with meaningful data (features with labels), we can apply some math operations and let the machine figure out the weights that can be used for predictions.

Let's suppose that we'd like to predict a person's weight (label) based on height (input). To simplify the task, let's consider only women. We can assume that this relationship is somehow linear - larger height = larger weight.

Would linear function work for this? Nope. But linear regression will do. I'll describe it more in the next post. For now, let's assume that it's a kind of "average linear function". This will work for the real data. ๐Ÿ‘

We'll use this dataset for our quick analysis. I generated the graph below in Apple Numbers, where you can see how it looks in practice and what's the calculated equation.

Numbers calculated that for the given data y = ax + b, a=0.0578 and b=95.853 (see top left corner). It calculated the weights (a and b)! We may say that the "machine learned" and figured it out itself.

Dots are spread out all over the y (weight) axis and only a few of them are close to the line representing the linear function we'd used for predictions. It looks like the accuracy is pretty bad. Why?

Is it because the height-weight relationship isn't linear? Long story short: height alone is insufficient data to accurately determine weight. It makes sense, in our dataset women with 172cm height weigh between 62-116kg. We can't help it. ๐Ÿคทโ€โ™‚๏ธ

Data quality and quantity matters

In machine learning, good quality data (without invalid, fake, or inappropriate values) and the number of samples (the more, the better) are essential for increasing the accuracy of predictions.

The dataset we used also contains the index column, with values ranging from 1-5, determining if the weight is relatively good (3), too low (1), or too high (5). Let's reevaluate the height-weight relationship using linear regression in spreadsheet, but only for data with an index of 3.

Now a=0.6232 and b=-40.047 . And as you can see, the dots are much closer to the prediction line. Much better! ๐Ÿ‘Œ

Training and testing

In the weight-height example, we used Apple Numbers for calculating weights (remember? a and b). In day-to-day work, we'd write a code loading the data and doing calculations based on the data. We call this step training. It's like: "Okay, here's the data, calculate the weights so the accuracy is good enough".

We tested accuracy manually, checking the graphs. It's always cool to plot a graph and do a sanity check ๐Ÿ˜‰. But in practice, we write a math formula checking the error for some features and labels. Then, we use the calculated formula to predict the known label for the given test features.

To make this work, we split the features-label data we have into training and test sets, usually, it's 90% for training and 10% for testing.

The quick analysis conclusions

  • Quality and quantity of the data used for ML are essential to get good accuracy

  • Providing meaningful features significantly increases the accuracy. The analysis worked poorly for just height. After involving the index, it gave much better results. Imagine how providing BMI, body fat percentage, or waist size could affect the accuracy

  • Even with super polished data, it's still predicting - almost never you'll get 100% accuracy

  • Bonus: A spreadsheet app may also do some simple ML-ish stuff for you ๐Ÿ˜‰

Other Machine Learning Glossary

Tensor - (un)necessary wrapper?

The foundation building brick is a tensor. It's a data structure that looks like a number or more often - an array. Tensors are created from plain values by the dedicated ML library, like Nx for Elixir.

Check tensor types based on dimensions in the table below. The most common tensors in ML are 2D tensors - matrices.

DimensionTypeExample
0Scalar123
1Vector[1, 2, 3]
2Matrix[[1], [2], [3]]
nn-dimensional tensor[[[1], [2]],[[3], [4]]]

And let's check out what the tensors look like in Nx.

> scalar = Nx.tensor(1.0)
#Nx.Tensor<
  f32
  1.0
>

> vector = Nx.tensor([1, 2, 3])
#Nx.Tensor<
  s64[3]
  [1, 2, 3]
>

> matrix = Nx.tensor([[1], [2], [3]])
#Nx.Tensor<
  s64[3][1]
  [
    [1],
    [2],
    [3]
  ]
>

Nx.tensor/1 function returns a Nx.Tensor struct, which looks a bit... dull ๐Ÿคทโ€โ™‚๏ธ It doesn't seem to be a big deal compared to plain numbers or lists. But IT IS a big deal! Why? For these two reasons:

  1. You can easily perform any of many ML operations from the Nx library on tensors

  2. Nx is optimized for "crunching numbers" and it's much faster than using plain numbers or lists. Also, you can use your GPU for calculations which speed things up even more ๐Ÿš€

Let's make it clear: You can do Machine Learning using plain numbers and lists. But it's much more painful and less performant than learning and using Nx.

We use tensors for all data we use in our ML project: features, labels, and weights.

A closer look at Nx tensor

Okay, now you know that tensor is essential in ML. Let's take a look closer at how it does look under the hood. We use the matrix tensor as an example.

> matrix = Nx.tensor([[1], [2], [3]])
> Map.from_struct(matrix)
%{
  data: %Nx.BinaryBackend{
    state: <<1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
      0, 0>>
  },
  type: {:s, 64},
  names: [nil, nil],
  shape: {3, 1}
}

Now you can see that data is stored as a binary - that's where the performance comes from. Each tensor has a type for the stored numbers. In the example above it's {:s, 64} or s64 which is a signed integer 64-bit. You can specify the type explicitly when creating a tensor.

Another attribute names stores names for the axes. Names work like aliases and are optional.

The shape is very important. It informs you about the size of each axis. {3, 1} means it has 3 rows and 1 column. Matrices and shapes are used a lot so keep in mind this mantra: "row-column, row-column, row-column, ..."

The shape is essential for many math operations that are performed on tensors, like matrix multiplication. Believe me, tensors are going to be transformed a lot - concatenated, reshaped, multiplied, split, etc. But more on this in the next post ๐Ÿ˜‰

Conclusion - Humanly on Machine Learning

I know there was a lot of "talking" and just a few lines of code, but I think it might be useful before diving deeper. I hope this post may encourage you to take a closer look at ML and maybe even with Nx and Elixir..? ๐Ÿ˜‰

But before the end, let's wrap up some concepts:

  • The main purpose of Machine Learning (from an end-user perspective) is to predict some information based on the given data

  • You can try to predict (better or worse) almost everything based on any data, as long as you are able to create an ML model for it (from a data-engineer perspective)

  • The better, more meaningful, and larger the data you provide, the better results you'll get

  • You will (almost) never achieve 100% accuracy in your predictions; roughly speaking, accuracy above 80% is considered good enough, but it depends on the particular case

  • Although it's possible to solve ML problems using plain data structures provided by a programming language, it's definitely worth learning a dedicated library like PyTorch, TensorFlow (Python), TensorFlow.js (JavaScript), or the mentioned Nx (Elixir)

  • Nx and Elixir work great with numbers and ML ๐Ÿ’œ

ย