What kind of label could we have?

  • stop sign
  • person
  • wall
  • buildings, etc

Could M be just a matrix with size , where is the number of points?

No, cuz the matrix encoding discards … see the slides

How to include all the dimensions in the analysis? Apparently Deep Learning

So we are looking for a function that outputs the class for point . Typically, it also depends on its neighborhood . How do I select the neighborhood? See the slides or research. I think I can do it with kNN.

Turning features into classes

Through affine transformations (linear). , , …

But linearity fails on non-linearity separable data. So we can just use . I know from PointNet that the function is symmetrical. It’s important to keep the invariant property.

You can stack ReLU functions to create a really complex approx function. Insert the slide with the triangle. Apparently, it’s a norm?

How to sample the neighborhood of a point?

Computational Efficiency is still a problem. The neighborhood selection can be done with the ball query idea from PointNet. What’s the fastest way to downsample down to K points? Apparently it’a combination of random sampling and attention mechanism. See “RandLA-Net: Efficient semantic segmentation of large-scale point clouds”.

There’s a drawback. What if you have 90% of ground in your point cloud? That’s what the attentive pooling is solving.

Receptive Field: all the data that was used to compute the features aggregations. With each layer, the receptive field grows. Research this concept more :)

READ THAT PAPER! Results are quite insane

Current models are only based on transformers.

See the possible exam questions.