[2024] [2023]

hey, I’m Ashok.

Email / CV / Google Scholar / GitHub / LinkedIn


  • strikethrough text in latex using:
    \st{Strike through this text}


  • ssh port forwarding: ssh -L 8080:localhost:8080 user@remotehost forwards the remote port 8080 to the local port 8080.


  • sparse rewards lead to unstable training, whereas dense rewards lead to faster convergence.
  • sparse rewards are more realistic but harder to learn.
  • reward shaping through imitation learning can help in learning sparse rewards.


  • jumanji connector-v2 environment does not guarantee solvability.
  • get the version of ubuntu using lsb_release -a.
  • installing nle using pip can be a chore since the error messages are not helpful. steps to install on ubuntu 22.04:
    sudo apt-get install -y build-essential autoconf libtool \
        pkg-config python3-dev python3-pip python3-numpy git \ 
        flex bison libbz2-dev
    wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | sudo apt-key add -
    sudo apt-add-repository 'deb https://apt.kitware.com/ubuntu/ jammy main'
    sudo apt-get update && apt-get --allow-unauthenticated install -y \
        cmake \
    conda create -n py38 python=3.8
    conda activate py38
    pip install nle


  • xla is a compiler for machine learning models. It performs better on GPU and TPU.
  • jax uses xla as a backend and has syntax similar to numpy.
  • mava is a marl library that uses jax as a backend.


  • in sb3, batch_size can be changed for DQN and PPO.
  • what’s the right batch size to use?


  • sbx now supports custom activation functions. resolved with PR#41. Now, it works with TD3, PPO, SAC, DDPG and DQN.

  • policy specifies .

  • In on-policy, behaviour policy estimation policy, whereas, in off-policy, behaviour policy estimation policy. PPO is on-policy and DQN is off-policy.

  • Generally, on-policy is used for fast environments and off-policy is used for slow environments.

  • tmux:

    • prefix key: Ctrl-b by default.
    • New window: <prefix>c
    • Next window: <prefix>n
    • Previous window: <prefix>p.
  • tui file manager: nnn.


  • sumo charging station on the road using:

      <chargingStation chargeDelay="2" chargeInTransit="0" power="200000" efficiency="0.95"  startPos="10" endPos="25" id="cS_2to19_0a" lane="2to19_0"/>
  • dynamic time warping (DTW) (Berndt & Clifford 1994) is used to check the similarity between two time series.

  • change target vehicle color to red: traci.vehicle.setColor(vehID, (255, 0, 0))


  • to get the color of the plotted line in matplotlib:

    p = plt.plot(x,x, x,x*2, x,x*3)
    colors = [line.get_color() for line in p]
  • default plt.figsize is (6.4, 4.8) inches.


  • multi-armed bandit is a simpler version of reinforcement learning, regret equation:


    • is the cumulative regret,
    • is the number of time steps,
    • is the optimal reward at each time step,
    • is the reward at time step .
  • can multi-armed bandit perform better than reinforcement learning in some cases?


  • spatio-temporal dataset: pems08.
  • ctrl-tab in vscode to switch between open files.
  • policy gradient is on-policy whereas q-learning is off-policy.
  • equation for policy gradient:
    which uses stochastic gradient ascent.


  • pure param embeddings are randomly initialized and learned during training. they are not tied to any input token.
  • cross attention with a pure param embedding is getting common.


  • pwnagotchi runs on rpi zero w and uses a wifi adapter to capture handshakes.
  • it uses rl to learn the best way to capture handshakes.
  • what’s the bill-of-materials for a pwnagotchi?


  • neovim distributions exist such as lazyvim and spacevim.
  • they come with pre-installed plugins and configurations.
  • lazygit is a terminal based git client with cool UI.
  • lazydocker is a terminal based docker client with cool UI.


  • rpi pico w has two cores.
  • code for printing even numbers on core0 and odd numbers on core1:
from time import sleep
import _thread
def core0_thread():
    counter = 0
    while True:
        counter += 2
def core1_thread():
    counter = 1
    while True:
        counter += 2
second_thread = _thread.start_new_thread(core1_thread, ())


  • if importing both torch and tensorflow in the same script, and you get an error:
F ./tensorflow/core/kernels/random_op_gpu.h:246] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), key, counter, gen, data, size, dist) status: Internal: invalid configuration argument

then import tensorflow before torch.


  • tensorboard logs only 1000 steps by default to preserve memory but this results in exported csv files lacking data.
  • To increase the number of steps in tensorboard logs, use --samples_per_plugin=scalars=10000 in the tensorboard command.
  • rgb smd led model: KY-009. Working voltage: 2.8 for red, 3.2 for green, 3.2 for blue. Forward current: 20mA.


  • setting up a new rpi pico w with micropython requires downloading the micropython firmware.
  • thonny is the preferred editor. It is available in standard ubuntu repo.
  • if you get an error Unable to connect to /dev/ttyACM0: [Errno 13] could not open port /dev/ttyACM0: [Errno 13] Permission denied: '/dev/ttyACM0' try, sudo usermod -a -G dialout <username> and then logout or reboot.
  • thonny in the ubuntu repos is kinda outdated and doesn’t have native support for pico w. download the latest using:
wget -O thonny-latest.sh https://thonny.org/installer-for-linux   
chmod +x thonny-latest.sh
  • in thonny, go to Run -> Interpreter -> Micropython (Raspberry Pi Pico) -> install or update micropython.

  • hold the BOOTSEL button and then plug the micro-usb to get the mcu into filesystem mode.

  • pico w code for blinking on-board led is different from pico because its connected to a gpio on the wireless chip instead.

  • code for blinking on-board led on pico w:

import machine
led = machine.Pin("LED", machine.Pin.OUT)
  • save the file as main.py on the pico w filesystem to make it run on boot.

  • images:

    1. rpi pico gpio pins
    2. rpi pico in its packaging
    3. rpi pico alongside arduino for size comparison
    4. rpi pico led blink
rpi pico gpio pins rpi pico in its packaging
rpi pico alongside arduino for size comparison rpi pico led blink


  • overleaf docker container: github link.
  • texstudio also works well, sudo apt install texstudio.


  • trajectory stitching involves piecing together parts of the different trajectories.
  • it helps offline rl match the performance of online rl.
  • sub-optimal algorithms can be stitched to perform better than them.


  • in latex, \include{} adds a new page, instead use \input{}.
  • embedded firmware just means the arduino code.


  • rpi pico supports micropython and its only 6$. ¯\_(ツ)_/¯
  • its also dual core so it can multi-task.
  • simpla package in SUMO does not work with libsumo.


  • STM32F103RB has 128KB flash and 72MHz clock speed. It was about 14$.
  • micropython requires a minimum of 256KB flash.
  • micro:bit v2 has 512KB flash, 128KB RAM, and 64MHz clock speed. It has nRF52 chip.
  • micro:bit can be programmed using micropython.
  • python access index using enumerate:
for index, element in enumerate(['a', 'b', 'c']):
    print(index, element)


  • db9 is a serial port connector. db15 is a vga connector. T_T


  • pdf on how to use rplidar on windows to scan the environment.


  • nvim config is stored at ~/.config/nvim/init.vim.
  • minimal vim/nvim config:
syntax on
set tabstop=4
set shiftwidth=4
set expandtab
set autoindent
set number
set ruler


  • MAE loss is less sensitive to outliers.

  • MSE loss penalises large errors.

  • MAE is not differentiable whereas huber loss is better because its differentiable.

  • images:

    1. mae vs mse vs huber
    2. huber at different values of can become MSE or MAE.
mae vs mse vs huber huber at different d
  • in vim, switch between splits: Ctrl-W + [hjkl].

  • and reload the current file using :e.

  • ai inference hardware is getting better. tenstorrent sells e150 for 75k inr (shipping included).

  • quantization reduces the size of the model and makes it less memory hungry.


  • rpi pins max output is 3.3v.
  • how to monitor the rpi temperature?
  • is gpio cleanup necessary?


  • gpio pin layout is actually this way:
rpi v4 gpio
  • 5v to 3.3v converter: HW-122 (AMS1117-3.3).

  • the converter can be used for rpi to arduino serial communication.


  • ring attention is useful for increasing the context size.
  • miniforge works better on raspberry pi.
  • pinout.xyz for pin layout.


  • UART is a serial communication protocol.
  • Enabling serial on RPi 4:
    • sudo raspi-config
    • Interfacing Options > Serial > No > Yes
    • Reboot
  • GPIO connections:
    • TX of RPi to RX of USB to TTL
    • RX of RPi to TX of USB to TTL
    • GND of RPi to GND of USB to TTL
  • minicom can be used to access the serial console of RPi. (sudo apt install minicom)
  • minicom -b 115200 -o -D /dev/ttyUSB0 to start minicom with baud rate 115200 and device /dev/ttyUSB0
  • disable hardware flow control in minicom using Ctrl+A > O > Serial port setup > F > No


  • the notes belong to different categories, can I use a LLM to classify them without any labels? Each bullet point is a note and the category is the label.
  • the categories could be:
    1. Embedded
    2. ML
    3. GPU/Infra
    4. Programming
    5. Latex
    6. Unlabelled


  • to reduce matplotlib xticks:
num_xticks = 5  # Number of x-ticks to show
step = len(time_steps) // num_xticks
plt.xticks(time_steps[::step], rotation=45, fontsize=15)  # Set x-axis ticks to show only selected time steps
  • usb-c power delivery (pd) can deliver variable voltage and current using software negotiation.

  • power delivery trigger board can be used to negotiate power delivery and get a fixed voltage and current.

  • \usepackage{graphicx} and \usepackage{subcaption} for subfigures in latex.


  • how to flash a blank stm32f030f4p6 chip?
  • blinking led is the hello world of embedded systems
  • today’s commit deletes the old format files.

  • nvidia-driver-350 is compatible with cuda-11.8.
  • nvidia-driver-250 is compatible with cuda-11.5.
  • to switch display driver from nvidia to intel, use nvidia-prime:
sudo apt install nvidia-prime
sudo prime-select intel
  • install cuda 11.8:
wget https://developer.download.nvidia.com/compute/cuda/repos/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

and update path using:

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
  • when building cuda libraries using ninja if you get an error:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’

then install gcc-10 and g++-10:

sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10

and update version:

Ubuntu 22.04.1 LTS
Cuda compilation tools, release 11.8, V11.8.89
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
g++ (Ubuntu 9.5.0-1ubuntu1~22.04) 11.3.0
  • bash_aliases is a file to store aliases for bash commands such as export PATH and export LD_LIBRARY_PATH.

  • To install pytorch with cuda support:

conda install pytorch=*=*cuda* cudatoolkit -c pytorch


  • there’s no desktop ARM processors.
  • a usb to ttl converter pl2303hx can be used to access the serial console of a raspberry pi.
  • ssh gives virtual console whereas serial console gives physical console.
  • serial console doesn’t require wifi or hdmi.
  • arm is also risc.


  • embedded languages: c, c++, rust
  • rust can run bare metal on raspberry pi using no_std and no_main crate-level attributes
  • bare metal can be used to run code without an operating system


  • lora is duplex by default. It can send and receive at the same time.
  • analog pins on arduino can be used as digital pins too.
  • arduino D0 and D1 pins although set aside for TX and RX can also be used as digital pins.


  • nvidia display driver is different from nvidia cuda driver.
  • cuda version in nvidia-smi is not the installed version.
  • nvcc --version gives the installed cuda version.


  • neo6m gps module connects to the satellite and gives the location in NMEA format.
  • it has a cold start time of 27s and a hot start time of 1s. on my desk, it took 2-5 minutes to get a fix.
  • once fixed, it saves it to the eeprom and can be retrieved on the next boot.
  • the eepron battery is a coin cell.


  • einsum is cool. It uses the Einstein summation convention to perform matrix operation.
  • torch.einsum('ij,jk->ik', a, b) is equivalent to torch.matmul(a, b)
  • its drawbacks are that its not optimized on gpu (yet). Also doesn’t allow brackets in the expression.
>>> a = torch.rand(3, 5)
>>> a
tensor([[0.7912, 0.6213, 0.6479, 0.2060, 0.9857],
        [0.9950, 0.7826, 0.6850, 0.6712, 0.0524],
        [0.4367, 0.8872, 0.9622, 0.0159, 0.4960]])
>>> b = torch.rand(5, 3)
>>> b
tensor([[0.4560, 0.9680, 0.1179],
        [0.9072, 0.8982, 0.2926],
        [0.5526, 0.2779, 0.5810],
        [0.4366, 0.8061, 0.0065],
        [0.4744, 0.6915, 0.5326]])
>>> torch.einsum('ij,jk -> ik', a,b)
tensor([[1.8401, 2.3517, 1.1779],
        [1.8601, 2.4338, 0.7766],
        [1.7780, 1.8429, 1.1344]])
>>> torch.matmul(a, b)
tensor([[1.8401, 2.3517, 1.1779],
        [1.8601, 2.4338, 0.7766],
        [1.7780, 1.8429, 1.1344]])
  • stm32f030f4p6 as per the naming convention means:
    • stm32 is the family of microcontrollers
    • f is the series = General purpose
    • 0 is the core count = ARM Cortex-M0
    • 30 is the line number
    • f is the pin count = 20
    • 4 is the flash size = 16KB
    • p is the package type = TSSOP
    • 6 is the temperature range = -40 to 85 degree celsius


  • The stm32f030f4p6 chip is SMD and in TSSOP-20 footprint.
  • I also bought SMD to THT adapters which are called breakout boards and soldered the chip to it.
  • STM32 nucleo boards come with a built-in st-link programmer and debugger.


  1. stm32f030f4p6 soldered onto a breakout board
  2. stm32f030f4p6 with rpi v4 for scale
stm32f030f4p6 breakout board stm32f030f4p6 with rpi v4


  • v100s has 5120 cuda cores and 640 tensor cores
  • quadro rtx 5000 has 3072 cuda cores and 384 tensor cores
  • tensor cores are more important for deep learning than cuda cores
  • installing miniconda:
# install miniconda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc
  • installing nvidia gpu drivers:
# install nvidia drivers
sudo apt update && sudo apt upgrade
sudo apt autoremove nvidia* --purge
ubuntu-drivers devices
sudo apt install nvidia-driver-525
sudo reboot
# install pytorch with cuda support
pip install torch torchvision torchaudio
  • ICs come in different packages: DIP, SOP, QFP, TQFP


  • softmax suffers from numerical instability due to floating point precision error
>>> import torch
>>> m = torch.nn.Softmax(dim=1)
>>> a = torch.tensor([[ 0.4981e3,  0.5018, -0.7310]])
>>> m(a)
tensor([[1., 0., 0.]])
  • normalization is a way to solve numerical instability
>>> torch.nn.functional.normalize(a)
tensor([[ 1.0000,  0.0010, -0.0015]])
>>> m(torch.nn.functional.normalize(a))
tensor([[0.5762, 0.2122, 0.2117]])


  • color sensors (TCS34725, TCS3200) can detect intensity of R,G,B individually
  • because of open source, risc v is cheaper than arm and runs linux too
  • microcontroller (arduino, stm32) vs single board computer (raspberry pi, beaglebone)
  • models perform better when data is gaussian


  • warmup_step hyperparameter lowers the learning rate for the first few steps and then increases it
  • transformer = encoder + decoder + attention
  • K is the context window size in the attention mechanism which is the number of tokens that each token attends to.
  • attention in transformers has quadratic time complexity
  • flash attention has linear time complexity
  • An Attention Free Transformer also has linear time complexity
  • wandb can be self-hosted too inside the docker container


  • cpu architectures: x86, x86_64, arm, arm64, risc-v
  • famous arm dev board: stm32
  • risc-v is open source and is gaining popularity
  • LuckFox Pico Plus RV1103 is a risc-v dev board with ethernet and can run linux
  • softmax not summing to 1 T_T
  • how to make LoRa full duplex?


  • rl implementations: stable-baselines3
  • cleanrl has single file implementations of rl algorithms
  • tianshou is a pytorch based rl library
  • Through Hole Technology (THT) vs Surface Mount Technology (SMT)
