Skip to main content

On This Page

Governance and Pipeline Sprawl: The Reality of Enterprise AI Strategies

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The messy truth of your AI strategies

Hema Raghavan, co-founder of Kumo.ai, addresses the operational risks of shadow AI and pipeline sprawl within the enterprise. At LinkedIn, tracing a single broken upstream pipeline across hundreds of dependencies often required opening a dedicated war room.

Why This Matters

The technical reality of AI implementation involves complex pipeline sprawl where dozens of models rely on hundreds of interconnected ETL processes. When an upstream tracking event breaks, the resulting lineage nightmare makes debugging nearly impossible for data science teams. This complexity motivates a shift toward foundation models that query relational databases on-the-fly, reducing the maintenance burden and technical debt associated with manual feature engineering.

Key Insights

  • LinkedIn’s AI infrastructure utilized dozens of models and hundreds of pipelines, highlighting the difficulty of tracing upstream failures in complex lineages.
  • Concept: ‘In-context learning’ for relational data allows querying databases on-the-fly, eliminating the need for static feature engineering pipelines.
  • Tool: Snowflake Snowpark Container Services are used by Kumo.ai to deploy models within the customer’s data perimeter to prevent data egress.
  • Fact: CISOs are increasingly concerned with ‘Shadow AI,’ where sensitive CRM or PII data is sent to unapproved LLM providers via prompts.
  • Concept: ‘Governance by architecture’ employs API gateways to monitor and intercept company-sensitive data before it leaves the internal network.

Practical Applications

  • Use Case: FinTech and healthcare organizations controlling sensitive data access by deploying AI within a VPC to maintain strict telemetry and security.
  • Pitfall: ‘Vibe coding’ with multiple specialized databases without a unified warehouse layer leads to out-of-sync embedding vectors and maintenance failures.
  • Use Case: Engineering teams using agent-stored Markdown files in repositories to ensure AI coding assistants adhere to specific design patterns.
  • Pitfall: Hiring based on whiteboard algorithms instead of evaluating an engineer’s ability to reason about agent-generated design choices and test cases.

References:

Continue reading

Next article

Automate Supply Chain Risk Audits with GitHub PR Comments

Related Content