About
I like playing around with data and extracting insights from data to inform business decisions and drive operational improvements. With a strong background in data analysis and visualization, I am well-equipped to identify trends, patterns, and opportunities within large and complex datasets.
Let's Collaborate
- Phone: +1 (585) 831 8026
- City: New York, USA
- Education: Master's in Data Science
- Email: yuthikashekhar@gmail.com
With experience in a life sciences and telecommunication industries, I am proficient in programming languages such as Python and R, and skilled in using data visualization tools such as SQL, Tableau, Power BI and Excel to uncover insights and drive business growth. With a strong foundation in statistics and machine learning, I am well-equipped to identify trends, patterns, and opportunities within data, and to develop predictive models to optimize outcomes and drive efficiency.
I have been exploring other domains of data to generate insights. I would be more than happy to connect and learn more about how I can use your data and make the most sense of it.
Skills
Tools
- Python
- R
- Scikit-Learn
- SciPy
- TensorFlow
- PyTorch
- Keras
- Git
- OpenCV
Statistical Methods
- Hypothesis Testing
- Feature Selection
- Data Transforamtion
- Bayesian Statistics
- Causal Inference
- A/B Testing
Machine Leaning Models
- Regression Models
- Classification Models
- Clustering Analysis
- Time Series Forecasting
- Recommendation Models
- Sentiment Analysis
- Decision Trees
- Random Forest
- Support Vector Machines
Database
- MySQL
- SQL
- Microsoft SQL Server
Visualizations
- Power BI
- Tableau
- Seaborn
- Plotly
- Microsoft Excel
- MatplotLib
Cloud Services
- Kubernetes
- Microsoft Azure
- Snowflake
Work Experience
Vertex Phamaceuticals May 2022 - August 2022
Data Analyst Intern
- Collaborated with Senior Enterprise Architects, Solution Architects, and Product Managers, to understand the workflow and ensure standards and rationalization plans to improve visibility and insights of strategic outcomes to drive product impact by around 70%.
- Collected and aggregated data from multiple data sources and performed data aggregation, filtering and grouping using SQL and translated the data into tangible inputs.
- Created monitoring dashboards utilizing Power BI outlining the future technology roadmap visualizing and presenting actionable data insights to the business purposes using Enterprise Architect data pipeline from Enterprise Studio.
- Designed base data model and established relationships along with setting up automatic scheduling impacting the ease of use of information and reducing the manual workload by 80%.
- Crafted self-service data quality reports for Enterprise Architects and Portfolio Managers to track data discrepancies and ensure data consistency and integrity of source data with a 75% performance boost against the business goal.
- Worked with SQL in conjunction with the data visualization tool, Power BI to highlight the trends and design the future state blueprints for business stakeholders, improving business efficiency.
Reliance Jio September 2017 - July 2021
Data Scientist
- Analyzed ~1.8M subscriber data using machine learning techniques like logistic regression, and decision tree for churn prediction, enabling a 10% lift in monthly retention.
- Identified trends and subscriber behavior that indicate subscriber churn risk using Tableau and presented trends in an effective and visually appealing dashboard.
- Supported A/B testing and experimental design methods to evaluate effective retention strategies of nearly 84% of subscribers and present insights to inform business decisions.
- Performed segmentation and targeted strategies to identify and prioritize at-risk subscribers and present findings and recommendations to stakeholders and leadership.
- Worked on writing and optimizing complex SQL queries to extract and analyze 65K employees’ access data, merging data from multiple sources, like employee access records, access logs, and permissions data, and analyzing potential security breaches or unauthorized access.
- Designed ETL framework using Azure for employee access data, driving the end-to-end process with unit testing, enhancing the performance of the existing pipeline by 70%.
Projects
This section displays my projects done during the coursework as a Master's student as well as some projects which have been done out of my own interest using the publicly available datasets.
Contextual Image Based Recommendation Model For Personalized Fashion
A recommendation framework was built using Deep CNN model that creates rich feature vectors for apparel, and consider customer transactions to study shopping patterns and create customer feature vectors. Euclidean distances are calculated to find the closest product matches for customers. Recommend products that customer buys in the next three months with a recall score of 67%.
Tools: Python, PyTorch, OpenCV, Scikit-Learn
Identifying Causative Agents in Pneumonia Diagnoses Using Deep CNN
Most of the existing research succeeds in predicting the disease, this model detects the type of pneumonia by examining chest X-ray images of infants. In addition to that, this work studies the visual patterns in chest radiographs and how each type of pneumonia deviates from normalcy. The proposed model succeeds in identifying the causative agent with a sensitivity of 87% and a specificity of 93.25%.
Tools: Python, Keras, TensorFlow, Scikit-Learn
Estimating Childhood Obesity Rates in New York's Counties Using Socio Economic Factors
Obesity continues to rise and is also a cause of concern in many communities. We devised a novel method to measure the obesity rate across every county and population groups using social, family and economic contexts in New York's counties. Linear and tree-based models were used and XGBoost model gave the best and lowest MAPE score of 18.99%.
Tools: Python, Tableau, Scikit-Learn
Social Media Analysis of US Presidential Impact on Afghanistan
There had been no research on how poeple are reacting to US presidential impact on Afghanistan. This model has been designed using Vader Sentiment classifier and LDA. Through a social media study of tweets, the proposed approach had value of understanding people's concerns and joys. Review analysis employed NLP and sentiment analysis tools.
Tools: Python, Scikit-Learn, NLTK, Gensim, Seaborn, Plotly
Hand Gesture Recognition using Deep Learning and Neural Network
Hand gesture recognition has a wide range of uses, including enhancing control, accessibility, communication, and learning. I experimented with many convolutional neural network types, including my own unique model. I have used Deep Learning models like Conv2D and conv3D in order to identify the hand gestures. The final model with Conv3D produced accuracy of 94.57% and validation accuracy was 91.00%.
Tools: Python, Scikit-Learn, Tensorflow
Predicting Potential Churn Customers
Customers in the telecom industry can actively switch between operators and have access to a wide range of service providers. As a result acquiring new customers is 5–10 times more expensive than maintaining existing ones, client retention has now exceeded customer acquisition. To lower customer turnover, telecom companies must identify the customers who are most likely to depart. I created the XGBoost model, which had an 11% FPR and an 80.4% Recall.
Tools: Python, Scikit-Learn, Seaborn
Education
University of Rochester
During my course at the University, I was awarded 3rd rank for proposing framework for identifying whole food allergens using Machine Learning that received attention from the FDA and also secured 2nd position by developing interactive analytics dashboard to improve restaurant performance using Tableau, an event organised by RMDS Labs.
GPA: 3.7
Relevant Coursework:
- Time Series Analysis
- Data Mining
- Computer Vision
- Computational Methods in Cognitive Science
- Applied Statistical Methods
- Statistical Machine Learning
- Deep Learning
International Institute of Information Technology
Worked on different projects exploring different domains of Data Science like Healthcare, Finance, Telecommunication to name a few.
GPA: 3.4
Relevant Coursework:
- Statistics and Exploratory Data Analysis
- Machine Learning
- Big Data and SQL
- Deep Learning
SRM University
During my coursework at the University, I was a department scholar and secured 6th Rank in the Information Technology Department.
GPA: 3.9
Relevant Coursework:
- Data Structures and Algorithms
- Database Management Systems
- Python Programming
- Data Science and Big Data Analytics
- Programming in Java
- Probability and Statistics
Patents
An Access Control System and Method Thereof (In Process)
Designed and developed a cloud-native, intelligent and edge-based access control system (ACS). This an edge computing-based ACS getting deployed across various office locations in India. Various functional and non-functional aspects of this unique product are being systematically verified and validated through hundreds of deployments. This is fully compliant to the ideals of cloud-native computing (microservices, containers and Kubernetes as the container lifecycle management platform solution). Artificial intelligence (AI) capabilities are being inserted into the ACS controllers, which are highly miniaturized and multifaceted edge devices. Again, this is an event-driven system. A variety of events (location-centric, people-inspired, unplanned, etc.) can be captured and acted upon with all the alacrity.
The futuristic edge AI concepts are being incorporated into the edge controller. As far as the integration and orchestration requirements are concerned, this ACS is highly adaptive. Further on, edge device data gets captured and subjected to a variety of investigations to emit out actionable insights in time.
The faster maturity and stability of edge cloud and analytics technologies has made it possible to visualize and realize a new and inspiring digital life application that can bring in the much-demanded digital transformation. Especially this state-of-the-art ACS is for fully ensuring people and property security through a smart, swift and simplified user authentication and authorization mechanism.
Patent Details:
- Patent Application Number: 202121027765
- Filing Institute: Jio Platforms Limited
- Filing Year: 2021
Intelligent IoT Based Automated Irrigation System
Agriculture has a major impact on economy of the country. Lot of Research been carried out in automating the irrigation system by employing wireless sensor and mobile computing. Also research been done in applying machine learning in agricultural system too.
Recently “Machine to machine (M2M)” communicationn is an emerging technology which allows devices, objects etc tocommunicate among each other and send data to Server or Cloud through the Core Network. So accordingly we here have developed an Intelligent IoT based Automated Irrigation system where sensor data pertaining to soil moisture and temperature captured and accordingly KNN (K- Nearest Neighbor) classification machine learning algorithm deployed for analyzing the sensor data for prediction towards irrigating the soil with water.
This is a fully automated where devices communicate among themselves and apply the intelligence in irrigating. This has been developed using low cost embedded devices like Arduino Uno, Raspberry Pi3.
Patent Details:
- Patent Application No: 201841008102
- Patent Number:380185
- Filing Institute: SRM University
- Filing Year: 2018
- Report
Contact
Feel free to reach out to me on the details mentioned below
Call:
+1 585 831 8026
Location:
New York, USA