Skip to main content
← All Tags

Data Science

53 articles in this category (Page 1 of 3)

AI NewsMachine LearningData Science

Advanced SHAP Workflows for Machine Learning Explainability: A Comprehensive Coding Guide

Implementing SHAP workflows to compare explainers and detect data drift, showing TreeExplainer's speed advantage for interpreting complex machine learning models.

Read more
AI NewsArtificial IntelligenceData Science

Benchmarking 12 AI Models for Business Chart Generation: Llama vs. Qwen vs. Gemma

Llama 3.1 8B leads in accuracy with 28/32 successful chart generations, while Qwen 2.5 7B dominates multilingual performance in a 12-model benchmark.

Read more
AI NewsData ScienceTutorials

Portfolio Optimization with skfolio: A Scikit-Learn Compatible Approach to Modern Investment Strategies

Optimize investment portfolios using skfolio, a scikit-learn compatible library for building, testing, and tuning strategies. This technical guide demonstrates how to implement mean-variance, risk-parity, and hierarchical clustering methods while utilizing robust covariance estimators and Black-Litterman views to achieve higher Sharpe ratios through systematic hyperparameter tuning.

Read more
AI NewsData ScienceTutorials

Building Advanced Technical Analysis and Backtesting Workflows with pandas-ta-classic

Learn to implement a complete trading workflow using pandas-ta-classic, including RSI-based signals and Sharpe ratio performance metrics.

Read more
AI NewsData ScienceSoftware Engineering

Building a Single-Cell RNA-seq Analysis Pipeline with Scanpy: From PBMC Clustering to Trajectory Discovery

Learn to build a complete single-cell RNA-seq pipeline using Scanpy for PBMC analysis, covering quality control, doublet detection with Scrublet, and lineage trajectory discovery on benchmark datasets.

Read more
AI NewsData ScienceTechnology

Why Gradient Descent Zigzags and How Momentum Fixes It

Momentum optimizes gradient descent on anisotropic surfaces, reducing convergence from 185 to 159 steps by dampening oscillations and accelerating flat-axis movement.

Read more
AI NewsData ScienceSoftware Engineering

Predicting Startup Funding through GitHub Engineering Velocity

Tracking 4,200 GitHub organizations over six months revealed that commit velocity changes predict fundraising rounds with 70% accuracy within six weeks.

Read more
AI NewsData ScienceMachine Learning

Correcting Survey Bias with Meta's balance Library: A Technical Guide

Learn to eliminate sampling bias using Meta’s balance library, featuring IPW and CBPS methods to restore survey accuracy.

Read more
AI NewsArtificial IntelligenceData Science

Overcoming the LoRA Scaling Collapse in High-Rank Knowledge Tuning

Standard LoRA fails on factual data as rank-8 updates capture only 28% of signal; RS-LoRA's sqrt(r) scaling restores stability for high-rank knowledge.

Read more
AI NewsData ScienceBig Data

Rendering Massive Datasets with Datashader: A High-Performance Python Tutorial

Learn how to render 20 million points in under 1000ms using Datashader's aggregation pipeline to bypass traditional plotting tool limitations for big data visualization.

Read more
AI NewsDatabase EngineeringData Science

Advanced SQL Techniques: Mastering Window Functions and Common Table Expressions

Learn how to perform complex row-level calculations and improve query readability using SQL window functions and CTEs for data analytics.

Read more
AI NewsMachine LearningData Science

TabPFN vs. CatBoost: Achieving Superior Tabular Accuracy with In-Context Learning

TabPFN achieves 98.8% accuracy on tabular datasets using in-context learning, outperforming CatBoost and Random Forest with near-zero training time.

Read more
AI NewsMathematicsData Science

Exploring OEIS A359012: The Permutation Substring Sequence

John Samuel introduces OEIS A359012, identifying 712 terms below 10^6 where concatenated digits (x,y) appear within the permutation xPy.

Read more
AI NewsData ScienceSQL

Mastering SQL: A Deep Dive into Joins and Window Functions

Technical guide to 6 SQL join types and essential window functions like DENSE_RANK and ROW_NUMBER for advanced data analytics and relational database management.

Read more
AI NewsArtificial IntelligenceData Science

Beyond Accuracy: Quantifying Production Fragility in Regression Models

Redundant features in regression models increase coefficient instability by 2.6x and create silent failure points through feature drift.

Read more
AI NewsData ScienceArtificial Intelligence

Build an End-to-End Single Cell RNA Sequencing Pipeline with Scanpy

Learn to build a complete scRNA-seq pipeline using Scanpy to process the PBMC 3k dataset, featuring quality control, Leiden clustering, and rule-based cell type annotation.

Read more
AI NewsData ScienceMachine Learning

Advanced Progress Monitoring in Python: A Guide to tqdm for Async, Parallel, and Data Workflows

Learn to implement advanced tqdm progress tracking for Python workflows including asynchronous tasks, parallel processing, and streaming I/O operations.

Read more
AI NewsData ScienceTechnology

Production-Grade Graph Analytics with NetworKit 11.2.1: A Tutorial for Large-Scale Networks

Learn to implement a production-grade graph analytics pipeline using NetworKit 11.2.1, processing up to 250,000 nodes with optimized community detection, core decomposition, and local similarity sparsification.

Read more
AI NewsMachine LearningData Science

Reverse Engineering Amazon's Dynamic Pricing: Achieving 83% Prediction Accuracy

Avluz.com achieved 83% accuracy predicting Amazon price drops by processing 600,000 daily price points using MongoDB Time-Series and Random Forest ensembles.

Read more
AI NewsData ScienceAPI Development

Automating Governance Sentiment Analysis with the Pulsebit API and Python

Leverage the Pulsebit API to track governance sentiment shifts with high-confidence metrics like momentum and clustering via a single endpoint.

Read more
AI NewsData SciencePython

Jupyter Notebooks Revolutionize Data Science Workflow

Jupyter Notebooks transform Python into a narrative

Read more
AI NewsJavaData Science

DataFrames in Java: A Powerful Tool for Data-Oriented Programming

Vladimir Zakharov explains how DataFrames serve as a vital tool for data-oriented programming in the Java ecosystem, outperforming Python in memory management while maintaining code readability.

Read more
AI NewsData ScienceAlgorithms

Counting a Billion Unique Items with Almost No Memory

A new algorithm, CVM, can estimate the number of unique elements in a stream with 98% accuracy using only a few kilobytes of memory.

Read more
AI NewsData ScienceMachine Learning

Inside OpenAI’s in-house data agent

OpenAI built an in-house AI data agent leveraging GPT-5, Codex, and memory to reduce data analysis time from days to minutes for its 3.5k+ internal users.

Read more