3D point cloud registration for CT-based surgical navigation using iterative closest point algorithm with sub-millimeter accuracy
This project extends a surgical navigation system by implementing the full Iterative Closest Point (ICP) algorithm for aligning 3D medical scans. While previous implementations used identity transformations, this complete ICP solution iteratively estimates the optimal registration transformation \(F_{reg}\) that aligns pointer tip positions with bone surface meshes from pre-operative CT data.
The system enables precise registration between physical space and medical imaging data, essential for accurate surgical navigation. The ICP algorithm alternates between finding closest point correspondences and computing optimal rigid transformations until convergence, significantly improving registration accuracy compared to single-pass approaches.
The core of the ICP algorithm minimizes the registration error between point sets:
ICP Error Minimization:
\[E(F_{reg}) = \sum_{k=1}^{N}\left\| F_{reg}\cdot d_{k} - c_{k}\right\|^{2}\]
where:
\(d_{k} = F_{B,k}^{-1}\cdot F_{A,k}\cdot A_{tip}\)
\(c_{k} =\) closest point on mesh to \(F_{reg}\cdot d_{k}\)
\(F_{reg} =\) registration transformation from B coordinates to CT coordinates
The Kabsch algorithm computes the optimal rotation matrix R and translation vector t that minimizes the root-mean-square deviation between two paired sets of points.
The ICP registration process implements iterative refinement through three main steps: (1) transforming pointer points using the current registration estimate, (2) finding closest mesh correspondences using spatial acceleration structures, and (3) computing optimal transformations via the Kabsch algorithm. Convergence is monitored through error metrics and transformation changes, with early stopping when improvements fall below surgical precision thresholds.
Sub-millimeter accuracy suitable for surgical applications with consistent low error rates across all test cases
Full ICP implementation with convergence monitoring and adaptive stopping criteria for optimal results
Reusable components from existing systems extended for enhanced functionality and maintainability
Continuous registration updates during surgical procedures with minimal computational overhead
Building and training a character-level GPT model from scratch with self-attention mechanisms and causal masking
Implemented a decoder-only Transformer architecture (TinyGPT) for character-level language modeling, trained on the Tiny Shakespeare dataset. The model features multi-head self-attention with causal masking, feed-forward networks with GELU activations, and positional embeddings.
The implementation includes the complete training pipeline with AdamW optimization and gradient checkpointing for memory efficiency, achieving state-of-the-art results for a model of its size.
The decoder-only Transformer architecture processes character sequences through stacked self-attention blocks with causal masking. Each attention head computes weighted combinations of input tokens, while feed-forward networks apply nonlinear transformations. Positional embeddings provide sequence order information, and the final softmax layer generates probability distributions over the vocabulary for next-character prediction.
Built Transformer architecture from first principles without relying on high-level libraries
Trained on raw character sequences with custom tokenizer implementation
Leveraged Hugging Face transformers for efficient GPT-2 fine-tuning
Gradient checkpointing and mixed precision training for handling large models
SLAM-based navigation with particle filtering and sensor fusion for robust robotic localization
Implemented a complete probabilistic robot navigation system featuring beam range finder models and odometry motion models for particle filtering-based SLAM. The system computes the probability P(z|s,m) of laser measurements given robot state and map, incorporating four error models for robust sensing in dynamic environments.
The beam model handles multiple measurement scenarios including correct readings, unexpected objects, sensor failures, and random noise, providing robust localization even in challenging conditions.
The beam model combines four probability distributions to handle different measurement scenarios:
Total Probability:
\[p = w_{\text{hit}} \cdot p_{\text{hit}} + w_{\text{short}} \cdot p_{\text{short}} + w_{\text{max}} \cdot p_{\text{max}} + w_{\text{rand}} \cdot p_{\text{rand}}\]
Component Distributions:
\[p_{\text{hit}} = \eta \cdot N(r; r_s, \sigma_{\text{hit}}^2)\]
\[p_{\text{short}} = \eta \cdot \lambda_{\text{short}} \cdot \exp(-\lambda_{\text{short}} \cdot r)\]
\[p_{\text{max}} = I(r = z_{\text{max}})\]
\[p_{\text{rand}} = \text{Uniform}(0, z_{\text{max}})\]
The beam range finder model combines four probabilistic components to handle different measurement scenarios: Gaussian distribution for correct hits, exponential for unexpected obstacles, uniform for random noise, and Dirac delta for maximum-range measurements. Each laser beam's probability is computed by ray-casting through occupancy grids and weighted combination of these models, with log probabilities aggregated for numerical stability in particle filtering.
Combines multiple error models for robust range finder measurements in dynamic environments
Uses particle filtering with odometry and sensor updates for accurate pose estimation
Integrates with occupancy grid maps for accurate localization and navigation
Optimized for real-time operation with efficient data structures and algorithms
Real-time semantic segmentation of surgical tools in endoscopic videos with augmented reality integration
Developed a real-time semantic segmentation system for surgical guidance using U-Net architecture to identify and delineate surgical instruments in endoscopic video streams. The system provides pixel-wise classification to enable augmented reality overlays, surgical guidance visualization, and instrument tracking during minimally invasive procedures.
The implementation achieves real-time performance with high accuracy, making it suitable for integration into surgical navigation systems with minimal latency.
The system implements a U-Net based architecture with encoder-decoder structure:
Optimized for >30 FPS inference on endoscopic video streams using TensorRT optimization
Simultaneous identification of multiple surgical instrument types with precise boundary detection
Output compatible with surgical AR overlay systems for enhanced visualization and guidance
Maintains accuracy across varying lighting conditions and surgical scenarios