In my previous article I gave an introduction to LIDAR for archaeology. This article aims to take you one step further. It shows an example of processing real world LIDAR point clouds for archaeology, in order to create a digital surface model (DSM).
In today’s example we will be looking at LIDAR data for Tikal, a Mayan city located in Guatemala. This city flourished between 200 and 850 A.D, and became the home to Temple IV, the tallest pre-Columbian structure in the Americas.
We will be going from site, to point cloud, to digital surface model (DSM) in a few steps. Well, today we will skip the first step of actually generating a point cloud as we would need some equipment as well as the ability to go outside for that, but you get the point (pun intended). …
If you read this blog often, then you have probably already seen some or other article about LIDAR being used in archaeology. This article aims to give you a proper introduction to the technology and its applications in archaeology.
LIDAR stands for Light detection and Ranging. It is a method for measuring distance using light. Take the image below as an example:
A sensor is attached to an airplane. Laser light is shot down, bouncing off various types of surfaces. We can measure distance to an object based on how long it takes for the light to bounce back.
The output of this exercise is called a point cloud. You can think of it like a 3D model made out of dots, where each dot represents a point that the light hit. Here is a visualisation of a point cloud representing a mountain. The colours represent the height of the point in relation to the…
I have spoken a lot in this blog about the process of bringing machine learning code to production. However, once the models are in production you are not done, you are just getting started. The model will have to face its worst enemy: The Real World!
This post focuses on what kinds of monitoring you can put in place in order to understand how your model is performing in the real world. This considers both, continuous training as well as the usage of the trained model. It looks into:
In the past few years there has been a large increase in tools trying to solve the challenge of bringing machine learning models to production. One thing that these tools seem to have in common is the incorporation of notebooks into production pipelines. This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.
Let’s start by defining what these are, for those readers who haven’t been exposed to notebooks, or call them by a different name.
Notebooks are web interfaces that allow a user to create documents containing code, visualisations and text. …
I received a Grove Starter kit at an internal work Conference a few months ago. Of course, I did something entirely useless with it, so here is the tutorial on how to make your own useless Marvin.
This is Marvin, and he does the following:
The Grove Starter Kit comes with quite a few…
When it comes to data products, a lot of the time there is a misconception that these cannot be put through automated testing. Although some parts of the pipeline can not go through traditional testing methodologies due to their experimental and stochastic nature, most of the pipeline can. In addition to this, the more unpredictable algorithms can be put through specialised validation processes.
Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.
Your standard simplified testing pyramid looks like this:
This pyramid is a representation of the types of tests that you would write for an application. We start with a lot of Unit Tests, which test a single piece of functionality in isolation of others. Then we write Integration Tests which check whether bringing our isolated components together works as expected. Lastly we write UI or acceptance tests, which check that the application works as expected from the user’s perspective. …
The most common approach to deploying machine learning models is to expose an API endpoint. This API endpoint would generally be called via a POST method containing the input data for the model as the body, and responding with the output of the model. However, an API endpoint is not always the most appropriate solution to your use case.
There are, for example, use cases that may require a machine learning model to be deployed on a mobile device, such as:
Traditionally neural networks are trained by adjusting weights based on a measure of error being passed back through the network. This error is calculated by comparing the result of the input fed through the network against the expected value. The person creating the neural network would spend some time fiddling with the neural network’s parameters until the network can learn from the given data by adjusting its weights using the said error.
This article is a high level introduction to how evolutionary algorithms can be used to ease this process.
Before I begin, let’s take a look at some existing techniques for parameter optimization. …
A missing smile at the passing of a palm tree
like a gust of winter wind — cold, yet unseen.
The setting of the clutching branches free
abruptly calm and unexpectedly serene.
A missing smile at the passing of an ice cream
travelling through the void of a mariachi’s guitar
merging into an eternal musical stream
on a journey to becoming the dust of a star
A missing smile at the passing of a beach
silenced by the laughter of a small wave.
Your absence there, yet out of reach
leaving a small but timeless breach
Originally published at http://exploringmycreativity.wordpress.com on July 1, 2018.
This post is about setting up the infrastructure to run yor spark jobs on a cluster hosted on Amazon.
Before we start, here is some terminology that you will need to know:
At the end of this post you should have an EMR 5.9.0 cluster that is set up in the Frankfurt region with the following tools:
By default EMR Spark clusters come with Apache Yarn installed as the resource manager. …