Softdocs Blog

Redefining The Axis: Lessons Learned Measuring AI Adoption in the Softdocs Engineering Department

AI redefines many tactics within engineering and the speed at which they can be accomplished, but fundamental strategies still apply.

One of the biggest challenges for software engineering leadership in the past two years in the quest to build more, faster, and cheaper, is capturing how much artificial intelligence can influence that goal. There have been many claims made within the industry. Entire legacy codebase replacements in a matter of weeks. Full products launched with a single engineer assisted by an AI agent.

As a data-driven, results-oriented leader of a modestly sized engineering organization, AI’s promise, and the technology enthusiasts making it, felt like unfair posturing from big tech that had infiltrated the minds of business investors with unfounded claims of amazing returns. Every CEO was rolling out a sparkle on their applications while announcing cuts of traditional engineers and offshore development centers in favor of data scientists and expanded use of LLMs.

The efforts felt like a pageant, the claims of productivity were outlandish, and there was no real data to back it up.

The solution seemed simple: obtain real data. AI begins to deliver on its promise, regardless of the scale of the engineering organization, the moment it is measured. Once measurement begins, all the axes on the traditional Agile charts shift, as do the mindsets of those reading them, but the strategy does not. The process was a bit more complex and nuanced, but the results are both tangible and repeatable.

Part 1: The Measurement Hypothesis

Measurement for our company, Softdocs, starts with a single unit: lines of code.

This is not a perfect system. The pros and cons have been debated for decades. Some common challenges are:

  • Changes may be smaller than a full line.
  • They may be autocompleted by the IDE.
  • They can ignore the impact of refactoring.
  • They can cause asterisks in the week’s metrics. 

We accept that the measurement will not be perfect, but believe it is dramatically better than abstracting the measure to something less quantitative or not measuring at all. And, over time, trends will emerge.

The benefit of lines of code as the denominator is that most Code Assistants provide this usage information as part of their API, making it relatively easy to harvest.

There are several Key Performance Indicators you can track once you have a handle on lines of code. For transparency, Softdocs uses Visual Studio with Github Copilot on a (mostly) Angular and C# codebase.

Extracting these metrics requires Copilot’s API, Azure DevOps’ API, a small API for consumption, and a database for storage. We aggregate the metrics weekly.

Key Metrics Tracked by Our Team

Lines of AI-generated code suggested

Lines of code suggested by AI captured between Sunday and the following Saturday across all engineers.

Lines of AI-generated code accepted by engineers

Lines of AI-generated code accepted by engineers, which are captured between Sunday and the following Saturday across all engineers. Copilot aggregates the several options Visual Studio offers engineers to consume this information (copying or previewing from suggestions within the chat panel, autocompletion suggestions, shortcut menu to “Ask Copilot”).

Conversion Rate

Lines of AI-generated code accepted by engineers
Lines of AI-generated code suggested

Discarded Rate

(Lines of AI-generated code suggested - Lines of AI-generated code accepted by engineers)
Lines of AI-generated code suggested

Adoption Rate (% of all new code written that was generated by AI)

Lines of AI-generated code accepted by engineers
Total lines added to all active code repositories

Time spent per line of code per week

Lines of code added to repository
Total minutes in the work week

Time saved writing lines of code per month

| (Baseline time spent per line of code – Sum of time spent per line of code each week) |
x (Lines of code suggestions accepted by Engineers)

% of codebase generated by AI this year

Sum of Lines of code suggestions accepted by Engineers each month
Total lines of code across all active code repositories

Workdays saved using AI this year

This assumes 80% of an engineer’s day is focused on coding, with 20% overhead.

Sum of time saved writing lines of code per month
Minutes per workday * Focus time per day

There are other more familiar Agile metrics that should show direct correlation to these metrics, such as increase in velocity, and the real measurement of success, the only one customers really care about: more high-quality product released per sprint.

6 Months of Measurement Progress

With the unit of measurement and associated metrics determined, initial goals were established for the year in partnership with our leadership and our board:

  • 33% Increase in velocity over the entire department
  •  10% of our active codebase would be generated by AI
  • 30 workdays would be saved using AI this year

What has emerged from the accounting is proving two of these goals to be too easy and one to be challenging. Information is still being gleaned from the data being collected, but there is no denying that positive change is occurring with the introduction and deliberate measurement of AI. Figure 1 below summarized the data collected by month.

AI in Softdocs Engineering
Figure 1: Six months of Quantitative Data for AI Code Generation

Velocity did increase, or more accurately, stabilized. Over the course of five months, the amount of points teams were completing per month increased. Baseline team velocity per month was 107 points, which was only one point below last year's average and has steadily climbed since January, with May sitting at our highest total to date (182 points per team per month) with our current teams.

However, velocity is a fickle metric and can be lumpy depending on project phase and complexity. In 2024, the team average ranged from a low of 68 to a high of 156. That is important when attempting to interpret the impact of AI in 2025. We could attribute the steady velocity improvement and 70% positive increase to AI and the direct relationship with workdays saved.

Engineers are spending less time per line of code, allowing them to pull in more issues from the backlog. The results are promising, but we need to ensure we isolate as many variables as possible – team size, complexity, measurement methods – before attributing all of the velocity improvements to AI. AI is more likely a significant contributing factor to the department’s success rather than the only one.

At this rate, we won’t reach the goal of 10% of the codebase generated by AI. The amount of AI-generated code in our codebases is growing at a much slower rate. Some contributing factors here are the imperfection of the unit of measurement, the speed of adoption, the sheer size of the existing enterprise codebase, and the nature of the work within those large, legacy codebases.

In April, a large refactoring effort to consolidate frontend components and remove boilerplate code reduced the overall repo size by almost 200,000 lines of code. That reduced the impact of the unit of measurement. In April, while suggestions were high, acceptances were half of the previous month, more closely aligned with February. Why? While one team was working on a cleanup in preparation for a large project, another was working in difficult areas of legacy code, where they found the suggestions less useful or sometimes incorrect. Project-based swings were out of our control and became a backstory to the fluctuations.

In addition to the lofty top-line goals, intriguing indicators appeared at the department level between suggestions, acceptances, and adoption. Excepting the dip in April attributable to refactoring, the lines of code suggested count has more than tripled since the beginning of the year. While the lines of code accepted by engineering has almost doubled since February, Engineers are accepting less AI-generated code in May than in February (from 21% to only 13%).

Originally we thought that conversion rate would signal engineer buy-in, but this reduction prompted whether AI was a novelty whose shine had worn off. In actuality, adoption of AI was rapidly increasing. A better indicator is the percentage of added lines of code that were written by AI. While the conversion rate decreased, the adoption rate nearly doubled from 22% in February to 43% in May. More new lines of repository code are being written by AI each month, indicating engineers are adopting and trusting the AI output.

Lessons Learned

While these measurements are interesting to track and analyze in and of themselves, the more interesting piece is attempting to understand the impact of AI as it rapidly shifts decades of established engineering doctrine.

Every day better models, more tokens, faster agents and more creative engineers using them are changing the tactical landscape. But after several months of experiments in methodologies, scaling strategies, and business as usual for a mid-sized SaaS software company, here’s what we learned that is evolving our strategies and driving our quest for continuous improvement in this still nascent field.

 

Lesson One: Velocity and time gains when using AI rapidly expanded, but breaking through the plateau requires PROJECTS that utilize AI.

The biggest initial hurdle to AI adoption, especially for senior engineers, is muscle memory. As the AI measurement program was ramping up, growth was quick, but the most common pushback was that engineers had not integrated AI into their workflow.

  • Chat windows were clunky and often forgotten.
  • Autocomplete and Preview functionalities were disorienting.
  • Established patterns of going to a search engine and scanning a favorite community for answers were ingrained.
  • Only the younger team members were buying the argument that AI was going to change their flow, much less their careers.

The answer to this situation ended up finding us, in the form of a major cloud-hosting provider. Viewing our business as a strategic growth candidate, this provider extended an Accelerator program and an on-site Build Week to go from zero to a working product prototype using Generative AI on their AI service in only four days. The momentum from this single project raised the bar at Softdocs for product delivery. When a prototype can prove viability in days, not months, giddiness ensues. Tech debt that has become a millstone suddenly feels surmountable. Work feels more like play.

It only took a few days for all other product teams to request their own Build Week. The cloud provider fed this by hosting the Product Managers and Tech Leads alongside executive leadership in an AI Innovation session, in which each team’s leadership reviewed their existing roadmaps, discussed constraints, and volleyed ideas. The result was a next level roadmap that integrated AI into the products the teams would work on every day. The “fun projects” were no longer relegated to a single Innovation team. Wealth was spread, creativity spawned, and suddenly senior engineers were leading the charge in evaluating the latest tech and the solutions to build faster. Changing habits required allowing teams the ability to be creative with AI as part of what they were building, not just how they build it, and it also unlocked their flow.

 

Lesson Two: If you want to increase AI usage, don’t change your model. Train your engineers.

The technical debate between different AI models often overshadows the more significant challenge of up-skilling the team to drive organizational change. Suggestions more than doubled when Claude was made available to engineers in March, but comparative analysis between GPT and Claude Sonnet models revealed minimal differences in the conversion rate.

The issue here was quality, not quantity, presenting another hurdle to AI adoption: the art of prompt engineering. Much of the first quarter was spent in practical discussions of how to extract “good” responses from the model. When our engineers put the same prompts into different models to compare their output, the results were subtle, but none of them were outright failures. Like asking different developers to solve a problem independently, each model provided a slightly different approach to solving the problem. Either model could boost engineering speed, as long as the engineer knew what to ask for.

 

Lesson Three: Inverting the conversion and discarded rate is about trust.

A critical milestone in AI adoption occurs when the relationship between conversion and discard rates inverts, signaling the achievement of what practitioners call "vibe coding" – a state where engineers trust AI-generated first drafts enough to use them as starting points for refinement rather than obstacles to overcome. With the introduction of agentic coding and model context protocols, a single engineer can become an entire department, producing code in hours that would have taken a team weeks but barely raised their heart rate.

This "sweet spot" represents a fundamental shift from engineer-first development with disruptive AI suggestions to AI-first generation with human review and refinement, accelerating development velocity while maintaining code quality standards.

 

Lesson Four: Pairing traditional patterns and AI means ‘refactor’ is no longer a dirty word.

Contrary to widespread skepticism about AI's effectiveness with legacy enterprise systems, artificial intelligence can successfully refactor and enhance older codebases when combined with proven architectural patterns.

Building sidecar services to support the primary application capitalizes on AI’s strength – new code and service generation –and gives businesses the opportunity to refactor potentially troublesome areas of applications in place. Training AI on legacy code isn't always necessary or optimal – engineering principles and patterns change, especially as cloud deployments change. Organizations can leverage generative AI to build replacement services that are more performant, and often less complicated, that gradually supplant older functionality.

When combined with cloud provider partnerships, organizations can also experiment faster with tuned services with less resource commitment, bringing value to the market faster to see what resonates. This turns the perception of refactoring from a value-less money pit to an attractive business value-add, helping organizations achieve both modernization and feature enhancement simultaneously.

 

Lesson Five: AI can change the model for scaling engineering organizations, so as leaders, we need to push for human change

Artificial intelligence fundamentally alters traditional models for scaling engineering organizations, particularly in how leaders approach team composition and skill development. The ethical dilemma (and hard truth) is that AI coding assistants can effectively replace coders, which includes offshore development body shops.

AI results provided comparable quality faster and at lower cost than an offshore experiment while maintaining greater control over intellectual property and communication overhead. However, the technology doesn't eliminate the need for experienced engineers or subject matter experts. Instead, it amplifies their impact by handling routine implementation tasks while preserving the critical thinking and architectural decision-making that senior developers provide.

The vigorous adoption of AI changes the vetting process and the focus for scaling strategy. A good scaling partner should be:

  • Extensively using AI tools themselves
  • Measuring their own adoption levels
  • Able to demonstrate the results.

This clears the path to focus on the onboarding process and learning pathway for team members becoming SMEs in your codebase.

 

Lesson Six: The triangle still applies – you can only pick two (fast, cheap, good), but AI redefines the axes

The classic project management triangle – fast, cheap, good, pick two – remains relevant in AI-enhanced development, but artificial intelligence fundamentally changes the scale for each axis and introduces new trade-offs between them.

AI enables speed in initial development phases, compressing timelines from months to days for many types of projects. AI-assisted development, particularly when combined with a strong cloud provider, enables rapid prototyping and iteration cycles that were previously impossible. However, this acceleration often comes with increased operational costs due to SaaS pricing models for AI services, and potential quality concerns that require additional review and refinement cycles, similar effects to scaling using offshoring or rapid hiring.

The "cheap" axis becomes more complex with AI, as while individual developer productivity increases, the total cost of ownership includes AI service fees, additional tooling, and potentially higher-skilled developers needed to effectively guide AI output.

Agentic coding token usage may be inexpensive compared to adding an engineer headcount right now, but this phase of growth may prove to be a loss leader while the world recalibrates, and the value of these tools will eventually reflect demand. But the delivery of a proof of concept in days that provides answers to whether an idea is a viable market solution still beats the old paradigm that required full teams and many sprints to reach the same point.

Quality considerations shift from pure code correctness to encompass AI output reliability, consistency, and alignment with business requirements, regardless of the scale of the engineering organization. AI reiterates questions that have long divided business leadership from engineering leadership. How “good” does software need to be? If the product is selling and customers love it, does the underlying code have to be spotless? The focus should shift in the short term toward performance of prototypes and products generated by Vibe Coding under real load. If the code can’t keep up, more engineering experience is required, which increases cost and slows down the timelines.

Final Thoughts

These problems aren’t new, which means the strategies to solve them still apply. Teams that successfully navigate the redefined triangle in the new age of AI should establish clear quality gates and review processes specifically designed for AI-generated code, ensuring that speed gains don't compromise long-term maintainability or system reliability.

Organizations that excel will push for innovation and creativity as a blend of human and AI, embracing the changes rather than downplaying them, and continuing to adapt and scale while weighing the triangle tradeoffs.

One great way to ensure healthy, continual growth: measure it.

Tags

Related Blog Posts