Hi. I am Vishal Longani.

Hey! I'm an experienced Data Engineer with over years of expertise in building fault-tolerant systems for high-volume data. My skillset spans ETL pipelines, streaming data, and the fascinating world of distributed computing.

☁️ A maestro in Azure and AWS, I orchestrate data symphonies with PySpark and Airflow. Fluent in Python, I navigate data seamlessly using Kafka's magic.

📊 I'm not just crunching numbers; I'm a master storyteller with SQL queries, shaping real-time insights. As a web scraping maestro, I extract gems from the digital landscape.

📈 Bringing data to life with dazzling dashboards using Plotly Dash, I transform the web into a captivating stage for insightful performances! 🎭✨

Projects

Real-time Drone Fleet Monitoring System

Built a real-time monitoring system for autonomous drone fleets that tracks battery levels, altitude, speed, and proximity to restricted zones. The system provides early warnings (~10-12 sec latency) when drones approach restricted areas, monitors live drone status, and identifies potential hardware issues in real-time through an interactive dashboard.

Apache Kafka
Apache Flink
Astra DB
PostgreSQL
Protocol Buffers
Plotly

Freelance Trends

Freelance Trends provides both ongoing and historical data analysis of over 100,000 Upwork projects in 20+ categories, with daily refreshes

python
Docker
Azure Databricks
Azure Table Storage
Apache Airflow

RealtimeClickStreamETL

Clickstream ETL (Extract, Transform, Load) pipeline designed to consume real-time data from a Kafka and Zookeeper cluster. The pipeline utilizes Python for initial data processing and enrichment before storing the raw clickstream data in Apache Cassandra. The raw data in Cassandra is then processed and enriched using PySpark on Databricks to perform complex transformations. Finally, the processed data is stored in Elasticsearch for efficient querying and analysis.

Python
Docker
Azure Databricks (Pyspark)
Apache Cassandra
Elasticsearch

Tech Detector

A distributed system that detects and analyzes technologies used across 200M+ domains. Utilizes Playwright for HTML extraction through proxies, along with predefined patterns and regex matching to identify web technologies. Built with Python and RabbitMQ for distributed processing across multiple servers.

Python
Playwright
RabbitMQ
Elasticsearch

Moving-Average-on-Streaming-Data-using-Kafka

Calculates Real time Moving Average of Risk Score (Loan Data)

python
Kafka

Ecommerce Website(Olyst) Data Engineering

Building a Data Warehouse on Google BigQuery for Ecommerce Data and Writing SQL Queries for Data Analysis

python
Google Bigquery
Data Modeling
SQL