Agentic AI: Pioneering Autonomy and Transforming Business Landscapes

(um) let’s make tech work for us

The new autonomous systems labelled AI agents represent the latest evolution of AI technology and mark a new era in business. AI agents, in contrast to traditional models of AI that simply follow commands given to them and emit outputs in a specific format, work with a certain level of freedom. According to Google, these agents are capable of functioning on their own, without needing supervision from a human all of the time. The World Economic Forum describes them as systems that have sensors to see and effectors to interact with the environment. AI agents are expected to transform industries as they evolve from rigid, rule-based frameworks to sophisticated models adept at intricate decision-making . With unprecedented autonomy comes equally unprecedented responsibility. The additional benefits agentic AI technology brings is accompanied by unique challenges that invite careful consideration, planning, governance, and foresight.

The Mechanics of AI Agents: A Deeper Dive

Traditional AI tools, such as Generative AI (GenAI) or predictive analytics platforms, rely on predefined instructions or prompts to deliver results. In contrast, AI agents exhibit dynamic adaptability, responding to real-time data and executing multifaceted tasks with minimal oversight. Their functionality hinges on a trio of essential components:

  • Foundational AI Model: At the heart of an AI agent lies a powerful large language model (LLM), such as GPT-4, LLama, or Gemini, which provides the computational intelligence needed for understanding and generating responses.
  • Orchestration Layer: This layer serves as the agent’s “brain,” managing reasoning, planning, and task execution. It employs advanced frameworks like ReAct (Reasoning and Acting) or Chain-of-Thought prompting, enabling the agent to decompose complex problems into logical steps, evaluate outcomes, and adjust strategies dynamically—mimicking human problem-solving processes.
  • External Interaction Tools: These tools empower agents to engage with the outside world, bridging the gap between digital intelligence and practical application. They include:
    • Extensions: Enable direct interaction with APIs and services, allowing agents to retrieve live data (e.g., weather updates or stock prices) or perform actions like sending emails.
    • Functions: Offer a structured mechanism for agents to propose actions executed on the client side, giving developers fine-tuned control over outputs.
    • Data Stores: Provide access to current, external information beyond the agent’s initial training dataset, enhancing decision-making accuracy.

This architecture transforms AI agents into versatile systems capable of navigating real-world complexities with remarkable autonomy.

Multi-Agent Systems: The Newest Frontier

The Multi-Agent System (MAS) market is in for tremendous growth – Mckinsey gold, with a staggering predicted growth rate of nearly 28% by 2030. Recently Bloomberg predicted AI breakthroughs will soon give rise to multi-agent systems, collaborative networks of AI agents working collaboratively towards ambitious objectives. These systems promise scalability, surpassing the benefits of single operating agents.

Artificial imagination translates to a smart city. Imagine multiple AI agents working alongside each other. One controlling the signals, another managing traffic directing units, and an extra aiding with alert responder rerouting. And all of this is happening in real time!

Governance is key here to prevent a systemic failure, restricting conflicting commands that can stipulate paralysis or possibly force dysfunctions.Guaranteeing multi-agent systems possibility should be able to provide semi-standardized freedom, however, the benefits are the only ensured protocols alongside need need to be prescribed to.

Opportunities and Challenges

The potential of AI agents is game changing, but their independence creates grave concerns. The World Economic Forum highlights some challenges that companies must deal with: 

  •  Risks Associated with Autonomy: Ensuring safety and reliability becomes all the more difficult as agents become more independent. For example, an unmonitored agent could execute resource allocation that would trigger operational failures with cascading effects. 
  • Lack of Accountability: As trust is already fragile due to the opaque reasoning of ‘black box’ behavior, it becomes even more crucial within high risk healthcare or finance situations. Ensuring transparency and accountability becomes non negotiable.
  • Risks Surrounding Privacy and Security: A lot of sensitive information puts trust in jeopardy. An agent functioning effectively only having access to a multitude of sensitive systems and datasets makes one pose the question ‘How do we grant sufficient permissions without compromising security?’ Strong policies are needed to enforce standards to protect sensitivity and privacy while preventing breaches.

Some of these risks require guarding by taking proactive measures like consistent monitoring, adhering to ethical AI principles, human-in-the-loop oversight for garnering vital AI decision-making frameworks to retain control. Organizations need to deploy auditing tools that monitor and alter agent paths during deviations to regain control and maintain organizational goals.

The Human-AI Partnership

Even though AI agents have an independent function, their purpose is not to replace human reasoning but rather to augment it. How the EU AI Act works reminds us of the necessity of human intervention in processes like security or legal compliance which are sensitive. The best situation is one where both humans and machines work together: agents perform the monotonous and repetitive work that requires processing large amounts of data—this enables humans to be more strategic, creative, and ethical.

In a logistics company, for instance, an AI agent may be able to optimize the delivery routes using traffic information autonomously, and a manager can use their judgment and approve the AI’s plan using customer preferences or other unforeseen factors. This enables human control and supervision to be maintained while efficiency is also enhanced.

Guidelines for Implementing Agentic AI Strategically

Both Google Analytics and the World Economic Forum are integrated around a central idea. The responsible use of AI agents can result in outstanding value creation and unparalleled innovation. To reap value with manageable risks, businesses need to employ the following practices:  

  • Develop Skills: Prepare workforce on the building, implementation, and administration of AI agents to ensure the effective application of AI technology.  
  • AI Ethics: Develop appropriate business governance frameworks that adhere to the international benchmarks, the EU AI Act for instance, requiring fair and accountable operations of the agents.  
  • Ethics Boundaries: Delegated agent discretion must come with boundary safeguards to eliminate boundary overreach or lateral decision making through establishing unique controls.  
  • Validation Check: Enable behavioral modification to organizational needs through active auditing of the agent, stress testing em, and refining organizational value objectives.


Final thoughts

The integration of reasoning and planning gives Agent AI the ability to act on its own, AI agents mark a pivotal leap in the evolution of artificial intelligence. Their potential to change industries like personalized healthcare or smart cities is phenomenal but using AI carelessly is a grave mistake. For AI programs to be dependable companions, trust and security must anchor their development.

Organizations that find the right balance to enable agents to innovate while maintaining human supervision would be the ones leading the charge in this technological revolution. Agentic AI transforms it from an ordinary business tool to a paradigm shift re-imagining autonomy. Such a future is bound to belong to those who embrace its potential with clarity and caution.

Grok Names Elon Musk as the Main Disinformer

Elon Musk is the main disseminator of disinformation in X, according to the AI ​​assistant Grok from the entrepreneur’s startup xAI, integrated into his social network.

The billionaire has a huge audience and often spreads false information on various topics, the chatbot claims. Among other disinformers, according to the neural network: Donald Trump, Robert F. Kennedy Jr., Alex Jones and RT (Russian television).

Trump shares false claims about the election, Kennedy Jr. – about vaccines, and Alex Jones is known for spreading conspiracy theories. Russian television lies about political issues, Grok added.

Grok’s Top Disseminators of Disinformation. Data: X.

The chatbot cited Rolling Stone, The Guardian, NPR, and NewsGuard as sources of information.

The selection process involved analyzing multiple sources, including academic research, fact-checking organizations, and media reports, to identify those with significant influence and a history of spreading false or misleading information,” the AI ​​noted.

The criteria for compiling the rankings included the volume of false information spread, the number of followers, and mentions in credible reports.

When asked for clarification, Grok noted that the findings may be biased because the sources provided are mostly related to the funding or opinions of Democrats and liberals.

Recall that in January, artificial intelligence was used to spread fake news about the fires in Southern California.

A similar situation arose after Hurricane Helene.

Google Unveils Memory Feature for Gemini AI Chatbot

Google has launched a notable update to its Gemini AI chatbot, equipping it with the ability to remember details from previous conversations, a development experts are calling a major advancement.

In a blog post released on Thursday, Google detailed how this new capability allows Gemini to store information from earlier chats, provide summaries of past discussions, and craft responses tailored to what it has learned over time.

This upgrade eliminates the need for users to restate information they’ve already provided or sift through old messages to retrieve details. By drawing on prior interactions, Gemini can now deliver answers that are more relevant, cohesive, and enriched with additional context pulled from its memory. This results in smoother, more personalized exchanges that feel less fragmented and more like a continuous dialogue.

Rollout Plans and Broader Access
The memory feature is first being introduced to English-speaking users subscribed to Google One AI Premium, a $20 monthly plan offering enhanced AI tools. Google plans to extend this functionality to more languages in the near future and will soon bring it to business users via Google Workspace Business and Enterprise plans.

Tackling Privacy and User Control
While the ability to recall conversations offers convenience, it may raise eyebrows among those concerned about data privacy. To address this, Google has built in several options for users to oversee their chat data. Through the “My Activity” section in Gemini, individuals can view their stored conversations, remove specific entries, or decide how long data is kept. For those who prefer not to use the feature at all, it can be fully turned off, giving users complete authority over what the AI retains.

Google has also made it clear that it won’t use these stored chats to refine its AI models, putting to rest worries about data being repurposed.

The Race to Enhance AI Memory

Google isn’t alone in its efforts to boost chatbot memory. OpenAI’s Sam Altman has highlighted that better recall is a top demand from ChatGPT users. Over the last year, both companies have rolled out features letting their AIs remember things like a user’s favorite travel options, food preferences, or even their preferred tone of address. Until now, though, these memory tools have been fairly limited and didn’t automatically preserve entire conversation histories.

Gemini’s new recall ability marks a leap toward more fluid and insightful AI exchanges. By keeping track of past talks, it lets users pick up where they left off without losing the thread, proving especially handy for long-term tasks or recurring questions.

As this feature spreads to more users, Google underscores its commitment to transparency and control, ensuring people can easily manage, erase, or opt out of data retention altogether.

Sam Altman talks about the features of GPT-4.5 and GPT-5

OpenAI CEO Sam Altman shared the startup’s plans to release GPT-4.5 and GPT-5 models. The company aims to simplify its product offerings by making them more intuitive for users.

Altman acknowledged that the current product line has become too complex, and OpenAI is looking to change that.

We hate model selection as much as you do and want to get back to magical unified intelligence,” he wrote.

GPT-4.5, codenamed Orion, will be the startup’s last AI model without a “chain of reasoning” mechanism. The next step is to move toward more integrated solutions.

The company plans to combine the o and GPT series models, creating systems capable of:

  • using all available tools;
  • independently determining when deep thinking is needed and when an instant solution is enough;
  • adapting to a wide range of tasks.

GPT-5 integrates various technologies, including o3. Other innovations will include canvas capabilities (Canvas-mode), search, deep research (Deep Research) and much more.

Free GPT-5 subscribers will get unlimited access to the model’s tools on standard settings. Plus and Pro account holders will be able to use advanced features with a higher level of intelligence.

Regarding the release dates of GPT-4.5 and GPT-5, Altman wrote in the comments to the tweet about “weeks” and “months“, respectively.

According to Elon Musk, ChatGPT’s competitor, the Grok 3 chatbot, is in the final stages of development and will be released in one to two weeks. Reuters writes about this.

Grok 3 has very powerful reasoning capabilities, so in the tests we’ve done so far, Grok 3 outperforms all the models that we know of, so that’s a good sign,” the entrepreneur said during a speech at the World Summit of Governments in Dubai.

Recall that Altman turned down Musk and a group of investors’ bid to buy the non-profit that controls OpenAI for $97.4 billion. The startup’s CEO admitted that this was an attempt to “slow down” the competing project.

Managing Large-Scale AI Systems: Data Pipelines and API Security

Artificial Intelligence is revolutionizing many sectors across the globe, and with this, as an organization scales its AI work, it is important that the infrastructure that will anchor these systems also evolves simultaneously. Lying at the heart of this infrastructure are data pipelines and APIs, crucial in the efficient functionality and performance of AI systems.

However, as companies start to use AI across their operations, the data pipes and API security present the big challenge. Weak management of this component might lead to data leakage, operational inefficiency, or catastrophic failure.

In this article, we’ll explore the key considerations and strategies for managing data pipelines and API security, focusing on real-world challenges faced by organizations deploying large-scale AI systems.

Data Pipelines: Intrinsic Building Block of AI Systems

Fundamentally, a data pipeline defines the flow of information that comes from various sources through a series of steps, eventually feeding AI models, which rely on this input for the purposes of training and making inferences. Large AI systems, specifically those designed to solve complex problems related to natural language processing or real-time recommendation engines, rely heavily on good-quality and timely data. Due to this fact, efficient management of data pipelines is crucial to ensure the efficacy and accuracy of AI models.

Scalability and Performance Optimization: One of the major problems related to data pipelines is scalability. In a small-scale implementation of AI, a simple data ingestion process might work. However, when the system grows and more data sources are added, performance bottlenecks can crop up. Large-scale AI applications often require processing large amounts of data in real-time or near real-time.

Achieving this goal requires an infrastructure that would be able to accommodate such increasing demand without losing the efficiency of vital operations. Distributed systems like Apache Kafka, combined with cloud-based services such as Amazon S3, provide scalable solutions that can efficiently deal with data transmission.

Data Quality and Validation: Regardless of the design excellence of the artificial intelligence model, subpar data quality will result in erroneous predictions. Consequently, the management of data quality is an indispensable component of data pipeline administration. This process encompasses the elimination of duplicates, addressing absent values, and standardizing datasets to maintain consistency across various sources.

With tools such as Apache Beam and AWS Glue, one gets a platform for real-time data cleansing and transformation, which ensures that only the most accurate and relevant data flows to the AI model.

Automation, Surveillance, and Fault Management: Automation becomes a key requirement for extended AI environments where data continuously flows in from various sources. The establishment of automated data pipelines means less intervention from human personnel to manage the data; on the other hand, real-time monitoring allows an organization to catch errors before they can affect business operations. On this line, Datadog and Grafana-like platforms create real-time views around the status of data pipelines-when latency or data corruption occurs-and the necessary automation of error-handling processes.

API Security: Gateway to Artificial Intelligence Systems

Basically, APIs are bridges that connect various applications, services, and systems with an AI model. As such, they become part and parcel of the core of modern AI systems. Equally, APIs are among the greatest weaknesses in the chain of large-scale systems. The rise in AI has meant increased API endpoints being created, and each endpoint is a root for another breach, maybe even more serious, if not well guarded.

Authentication and Authorization: Basic but very crucial security measures for APIs include efficient authentication and authorization. Without proper authentication, APIs can become a gateway to ciphered information and functions hidden inside the AI system. OAuth 2.0 and API keys are just some of the strategies that offer flexible methods of securely accessing APIs. However, it is not enough to just apply these techniques; regular audits regarding API access logs need to be performed to ensure that the right users have the proper access level.

Rate Limiting and Throttling: Large-scale AI systems are very vulnerable to malicious actors attempting Distributed Denial-of-Service attacks. In such an attack, the API endpoints are overloaded with requests by the attackers until the system becomes crashed. Rate limiting and throttling mechanisms could prevent this by allowing only a limited number of requests from a user within a certain period of time.

This ensures that no single user or collective group of users can overwhelm the system, and hence keeps the system intact and available.

Encryption and Data Protection: The protection of data involves more than just the security of the AI models and databases but also the data when it flows through the system via APIs. Encrypting data at rest and in transit using SSL/TLS protocols, for example, ensures that even if an attacker manages to intercept the data, it will still be unreadable. Moreover, encryption, together with other data protection approaches, protects sensitive information from unauthorized access, such as personal data and financial records.

Anomaly Detection and Monitoring: In large AI ecosystems, it is impossible to manually monitor each and every API interaction for potential security breaches. It is here that AI can be a strong ally. State-of-the-art security solutions, such as Google’s Cloud Armor or machine-learning-powered anomaly detection algorithms, can monitor API traffic in real time to spot unusual activities or behavior that may indicate an attack.

This is done by leveraging AI in securing the API infrastructure to better defend the system against emerging threats.

Balancing Security and Performance

One of the biggest challenges that organizations face with the management of data pipelines and API security is having to balance these issues against considerations around performance. For instance, encrypting all data moving across a pipeline can dramatically increase security; in turn, this can degrade performance due to increased latency, which then diminishes the overall effectiveness of the system. Similarly, very stringent rate limiting can help protect the system from DDoS attacks but at the same time can prevent legitimate users from accessing it during high demand periods.

In a word, the key to it all is finding a balance that works for both security and performance. This requires tight collaboration between security experts, data engineers, and developers. A DevSecOps methodology would ensure that security is indeed woven into the fabric of every stage of the development and deployment lifecycle without sacrificing performance. And, further testing and incremental improvements are much essential for the perfect tuning of security versus scalability.

Conclusion

Accordingly, with the increasing scale and complexity of AI systems, managing data pipelines and securing APIs become fundamentally critical aspects. Any failure to address these aspects on the part of any organization may lead to data breach, overall system inefficiencies, and loss of reputation.

However, the usage of scalable data pipeline frameworks, API protection using high-level authentication, encryption, and monitoring, and maintaining a proper balance between security and performance, allows an organization to use the full potential of artificial intelligence by minimizing the probable risks to its systems. Building on appropriate strategies and using efficient tools can provide a seamless integration of data pipelines and API security oversight into an organization’s AI infrastructure, so reliability, efficiency, and security are ensured as systems scale.

Authored by Heng Chi, Software Engineer

CIA and MI6 chiefs reveal the role of AI in intelligence work

The heads of the CIA and MI6 revealed how artificial intelligence is changing the work of intelligence agencies, helping to combat disinformation and analyze data. AI technologies are playing a major role in modern conflicts and helping intelligence agencies adapt to new challenges.

The heads of the CIA Bill Burns and MI6 Richard Moore, during a recent joint appearance, described how artificial intelligence is transforming the work of their intelligence agencies.

According to them, the main task of the agencies today is “adapting to modern challenges”. And it is AI that is central to this adaptation.

New challenges in modern conflicts

The heads of intelligence noted the key role of technology in the conflict in Ukraine.

For the first time, combat operations combined modern advances in AI, open data, drones and satellite reconnaissance with classical methods of warfare.

This experience confirmed the need not only to adapt to new conditions, but also to experiment with technology.

Combating disinformation and global challenges

Both intelligence agencies actively use AI to “analyze data, identify key information, and combat disinformation.”

Experts named China as one of the notable threats. In this regard, the CIA and MI6 have reorganized their services to work more effectively in this area.

AI — a tool for defense and attack

Artificial intelligence helps intelligence agencies not only analyze data, but also protect their operations by creating “red teams” to check for vulnerabilities.

The use of cloud technologies and partnerships with the private sector make it possible to unleash the full potential of AI.

The Application of AI towards Real Time Fraud Detection on Digital Payments

The growth and development of the internet coupled with advanced digital communication systems has greatly transformed the global economy, especially in the area of commerce. Fraud attempts, on the other hand, have become more diverse and sophisticated over time, costing businesses and financial institutions millions of dollars each year. Fraudster activities and techniques have evolved from unsophisticated detection processes to contemporary automated methods based on rules through intelligent systems. Currently, artificial intelligence (AI) assists in both controlling and combating fraud, offering help to advance the sector of finance technology (fintech). In this article, we will explain the mechanics of AI in digital payments fraud detection focusing on the technical aspects, a real case, and relevant comments for mid-level AI engineers, product managers, and other professionals in fintech.

The Increased Importance of Identifying Fraud In Real-Time

The volume and complexity of digital payments, which include credit card transactions, P2P app payments, A2A payments, and others, continue to rise. Between 2023 and 2028, Juniper Research estimates that the cost of online payment fraud will climb beyond $362 billion globally. Automated and social engineering attacks exploit weaknesses such as stolen credentials and synthetic identities, often attacking within moments. Outdated methods of fraud detection that depend upon static rules (‘flag transactions over $10,000’) are ineffective against these fast paced threats. Systems are overloaded and angry customers worsen the problem, all the while undetected fraud continues to sail through.

Thanks to AI. Now, everything is seconds away, (we’ll repeat) all because of AI. With machine learning, deep learning and real-time data processing, AI can evaluate large amounts of data, recognize patterns, adapt to changes, and detect anomalies, all in a matter of milliseconds. For professionals in fintech, this movement is both a chance and a challenge: build systems that are accurate, fast, and scalable all while reducing customer friction.

How AI-Fueled Real-Time Fraud Detection Works

AI-enhanced fraud detection is supported by three tiers: data, algorithms, and real-time execution. Let’s simplify this concept for a mid-level AI engineering or product management team. 

The Underlying Information: For any front line fraud detection system, a payment transaction generated in real-time must be coupled with rich and high-quality data. This means diverse data, which includes transaction histories, user behavior profile data, device fingerprints, IP geolocation, and external sources such as chatter from the dark web. For instance, a transaction attempted from a new device located in a foreign country can be flagged as suspicious, when it is combined with a user’s base spending patterns. AI systems pull this data through streaming services such as Apache Kafka, or even cloud-native solutions like AWS Kinesis, which promises low latency. Data engineers must be willing to collect clean basic structured datasets, because the system performs poorly when the data given is of poor granularity. This is a proven lesson learned many times in the past twenty years for me.

Algorithms: The realm of AI has brought super advanced machine learning models into the world of detecting fraudulent activities, and these models are the backbone of AI fraud detection. Models with supervised learning capabilities work with labeled datasets (e.g. “fraud” vs. “legitimate”) and are proficient in recognizing established fraud patterns. Due to their accuracy and interpretability, Random Forests, and Gradient Boosting Machines (GBMs) are among the most popular models. Unfortunately, fraud is evolving much faster than data can be labeled and this is where unsupervised learning comes in. Clustering algorithms DBSCAN or autoencoders do not need previous examples and can pull unusual transactions for review. For example, even in the absence of historical fraud signatures, the sudden spike in small, rapid transfers can be flagged as it might indicate money laundering. Detection is further improved by deep learning models, such as recurrent neural networks (RNNs), that observe time series data (e.g. transaction timestamp) for hidden patterns and relationships.

Execution In Real-Time: Time is of the essence with digital payments. The payment systems must make a decision to approve, decline, or escalate a transaction in less than 100 milliseconds. This is only achievable by using distributed computing frameworks such as Apache Spark’s batch processing and Flink’s stream real-time analysis processing. Scaling inference is done using GPU-accelerated hardware, e.g., millions of transactions per second through NVIDIA CUDA, allowing for easy handling of over a thousand transactions every second. Product managers should remember that latency trade-offs can be detrimental when the complexity of the model increases; a simpler logistic regression may be suitable for low-risk scenarios, while high-precision cases require complex neural networks.

Real-World Case Study: PayPal’s AI-Driven Fraud Detection

To illustrate AI’s impact, consider PayPal, a fintech giant processing over 22 billion transactions annually. In the early 2010s, PayPal faced escalating payment fraud, including account takeovers and stolen card usage. Traditional rule-based systems flagged too many false positives, alienating users, while missing sophisticated attacks. By 2015, PayPal had fully embraced AI, integrating real-time ML models to combat fraud – a strategy we’ve seen replicated across the industry.

PayPal’s approach combines supervised and unsupervised learning. Supervised models analyze historical transaction data—device IDs, IP addresses, email patterns, and purchase amounts—to assign fraud probability scores. Unsupervised models detect anomalies, such as multiple login attempts from disparate locations or unusual order sizes (e.g., shipping dozens of items to one address with different cards). Real-time data feeds from user interactions and external sources (e.g., compromised credential lists) enhance these models’ accuracy.

Numbers: According to PayPal’s public reports and industry analyses, their AI system reduced fraud losses by 30% within two years of deployment, dropping fraud rates to below 0.32% of transaction volume—a benchmark in fintech. False positives fell by 25%, improving customer satisfaction, while chargeback rates declined by 15%. These gains stemmed from processing 80% of transactions in under 50 milliseconds, enabled by a hybrid cloud infrastructure and optimized ML pipelines. For AI engineers, PayPal’s use of ensemble models (combining decision trees and neural networks) offers a practical lesson in balancing precision and recall in high-stakes environments.

Technical Challenges and Solutions

Implementing AI for real-time fraud detection isn’t without hurdles. Here’s how to address them:

  • Data Privacy and Compliance: Regulations like GDPR and CCPA mandate strict data handling. Techniques like federated learning—training models locally on user devices – minimize exposure, while synthetic data generation (via GANs) augments training sets without compromising privacy.
  •  Model Drift: Fraud patterns shift, degrading model performance. Continuous retraining with online learning algorithms (e.g., stochastic gradient descent) keeps models current. Monitoring metrics like precision, recall, and F1-score ensures drift is caught early.
  •  Scalability: As transaction volumes grow, so must your system. Distributed architectures (e.g., Kubernetes clusters) and serverless computing (e.g., AWS Lambda) provide elastic scaling. Optimize inference with model pruning or quantization to reduce latency on commodity hardware.

The Future of AI in Fraud Detection

Whatever the future holds, it’s clear that AI’s role will only become more pronounced. For one, Generative AIs such as large language models (LLMs) could develop new methods of simulating fraud, while the involvement of blockchain technology could guarantee that the leger’s transaction records are safe from any possible modification. Identity verification through biometrics face detection and voice recognition will limit synthetic identity fraud.

As was noted previously, the speed, accuracy, and adaptability of AI in real-time fraud detection can enable users to effortlessly pinpoint and eliminate issues within digital payments that rule-based systems cannot alleviate. While PayPal’s success is evidence of this capability, the journey is not easy and requires fundamental discipline along with a well-planned approach. Now, for AI engineers, product managers, and fintech professionals, moving into this space is no longer purely a career change; it is an opportunity to build a safer financial system for all.

From Bugs to Brilliance: How to Leverage AI to Left-Shift Quality in Software Development

Contributed by Gunjan Agarwal, Software Engineering Manager at Meta
Key Points
  • Research suggests AI can significantly enhance left-shifting quality in software development by detecting bugs early, reducing costs, and improving code quality.
  • AI tools like CodeRabbit and Diffblue Cover have proven effective in automating code reviews and unit testing, significantly improving speed and accuracy in software development.
  • The evidence leans toward early bug detection, saving costs, with studies showing fixing bugs in production can cost 30-60 times more than early stages.
  • An unexpected detail is that AI-driven CI/CD tools, like Harness, can reduce deployment failures by up to 70%, enhancing release efficiency.

Introduction to Left-Shifting Quality

Left-shifting quality in software development involves integrating quality assurance (QA) activities, such as testing, code review, and vulnerability detection, earlier in the software development lifecycle (SDLC). Traditionally, these tasks were deferred to the testing or deployment phases, often leading to higher costs and delays due to late bug detection. By moving QA tasks to the design, coding, and initial testing phases, teams can identify and resolve issues proactively, preventing them from escalating into costly problems. For example, catching a bug during the design phase might cost a fraction of what it would cost to fix in production, as evidenced by a study by the National Institute of Standards and Technology (NIST), which found that resolving defects in production can cost 30 to 60 times more, especially for security defects.

The integration of artificial intelligence (AI) into this process has been able to left-shifting quality, offering automated, intelligent solutions that enhance efficiency and accuracy. AI tools can analyze code, predict failures, and automate testing, enabling teams to deliver high-quality software faster and more cost-effectively. This article explores the concept, benefits, and specific AI-powered techniques, supported by case studies and quantitative data, to provide a comprehensive understanding of how AI is transforming software development.

What is Left-Shifting Quality in Software Development?

Left-shifting quality refers to the practice of integrating quality assurance (QA) processes earlier in the software development life cycle (SDLC), encompassing stages like design, coding, and initial testing, rather than postponing them until the later testing or deployment phases. This approach aligns with agile and DevOps methodologies, which emphasize continuous integration and delivery (CI/CD). By conducting tests early, teams can identify and address bugs and issues before they become entrenched in the codebase, thereby minimizing the need for extensive rework in subsequent stages.​

The financial implications of detecting defects at various stages of development are significant. For example, IBM’s Systems Sciences Institute reported that fixing a bug discovered during implementation costs approximately six times more than addressing it during the design phase. Moreover, errors found after product release can be four to five times more expensive to fix than those identified during design, and up to 100 times more costly than errors detected during the maintenance phase. ​

This substantial increase in cost underscores the critical importance of early detection. Artificial intelligence (AI) facilitates this proactive approach through automation and predictive analytics, enabling teams to identify potential issues swiftly and accurately, thereby enhancing overall software quality and reducing development costs.​

Benefits of Left-Shifting with AI

The benefits of left-shifting quality are significant, particularly when enhanced by AI, and are supported by quantitative data:

  • Early Bug Detection: Research consistently shows that addressing bugs early in the development process is significantly less costly than fixing them post-production. For instance, a 2022 report by the Consortium for Information & Software Quality (CISQ) found that software quality issues cost the U.S. economy an estimated $2.41 trillion, highlighting the immense financial impact of unresolved software defects. AI tools, by automating detection, can significantly reduce these costs.​
  • Faster Development Cycles: Identifying issues early allows developers to make quick corrections, speeding up release cycles. For example, AI-driven CI/CD tools like Harness have been shown to reduce deployment time by 50%, enabling faster iterations Harness Case Study.
  • Improved Code Quality: Regular quality checks at each stage, facilitated by AI, reinforce best practices and promote a culture of quality. Tools like CodeRabbit reduce code review time, improving developer productivity and code standards.​
  • Cost Savings: The financial implications of software bugs are profound. For instance, in July 2024, a faulty software update from cybersecurity firm CrowdStrike led to a global outage, causing Delta Air Lines to cancel 7,000 flights over five days, affecting 1.3 million customers, and resulting in losses exceeding $500 million. AI-driven early detection and remediation can help prevent such costly incidents.​
  • Qualitative Improvements:Developer Well-being: AI tools like GitHub Copilot have shown potential to support developer well-being by improving productivity and reducing repetitive tasks – benefits that some studies link to increased job satisfaction. However, evidence on this front remains mixed. Other research points to potential downsides, such as increased cognitive load when debugging AI-generated code, concerns over long-term skill degradation, and even heightened frustration among developers. These conflicting findings highlight the need for more comprehensive, long-term studies on AI’s true impact on developer experience.

Incorporating AI into software development processes offers significant advantages, but it’s crucial to balance these with an awareness of the potential challenges to fully realize its benefits.

AI-Powered Left-Shifting Techniques

AI offers a suite of techniques that enhance left-shifting quality, each addressing specific aspects of the SDLC. Below, we detail six key methods, supported by examples and data, explaining their internal workings, the challenges they face, and their impact on reducing cognitive load for developers.

1. Intelligent Code Review and Quality Analysis

Intelligent code review tools use AI to analyze code for quality, readability, and adherence to best practices, detecting issues like bugs, security vulnerabilities, and inefficiencies. Tools like CodeRabbit employ large language models (LLMs), such as GPT-4, to understand and analyze code changes in pull requests (PRs). Internally, CodeRabbit’s AI architecture is designed for context-aware analysis, integrating with static analysis tools like Semgrep for security checks and ESLint for style enforcement. The tool learns from team practices over time, adapting its recommendations to align with specific coding standards and preferences.

Challenges: A significant challenge is the potential for AI to misinterpret non-trivial business logic due to its lack of domain-specific knowledge. For instance, while CodeRabbit can detect syntax errors or common vulnerabilities, it may struggle with complex business rules or edge cases that require human understanding. Additionally, integrating such tools into existing workflows may require initial setup and adjustment, though CodeRabbit claims instant setup with no complex configuration.

Impact: By automating code reviews, tools like CodeRabbit reduce manual review time by up to 50%, allowing developers to focus on higher-level tasks. This not only saves time but also reduces cognitive load, as developers no longer need to manually scan through large PRs. A GitLab survey highlighted that manual code reviews are a top cause of developer burnout due to delays and inconsistent feedback. AI tools mitigate this by providing consistent, actionable feedback, improving productivity and reducing mental strain.

Case Study: At KeyValue Software Systems, implementing CodeRabbit reduced code review time by 90% for their Golang and Python projects, allowing developers to focus on feature development rather than repetitive review tasks.

2. Automated Unit Test Generation

Unit testing ensures that individual code components function correctly, but writing these tests manually can be time-consuming. AI tools automate this process by generating comprehensive test suites. Diffblue Cover, for example, uses reinforcement learning to create unit tests for Java code. Internally, Diffblue’s reinforcement learning agents interact with the code, learning to write tests that maximize coverage and reflect every behavior of methods. These agents are trained to understand method functionality and generate tests autonomously, even for complex scenarios.

Challenges: Handling large, complex codebases with numerous dependencies remains a challenge. Additionally, ensuring that generated tests are meaningful and not just covering trivial cases requires sophisticated algorithms. For instance, Diffblue Cover must balance test coverage with test relevance to avoid generating unnecessary or redundant tests.

Impact: Automated test generation saves developers significant time – Diffblue Cover claims to generate tests 250x faster than manual methods, increasing code coverage by 20%. This allows developers to focus on writing new code or fixing bugs rather than repetitive testing tasks. By reducing the need for manual test writing, these tools lower cognitive load, as developers can rely on AI to handle the tedious aspects of testing. A Diffblue case study showed a 90% reduction in test writing time, enabling teams to focus on higher-value tasks.

Case Study: A financial services firm using Diffblue Cover reported a 30% increase in test coverage and a 50% reduction in regression bugs within six months, significantly reducing the mental burden on developers during code changes.

3. Behavioral Testing and Automated UI Testing

Behavioral testing ensures software behaves as expected, while UI testing verifies functionality and appearance across devices and browsers. AI automates these processes, enhancing scalability and efficiency. Applitools, for instance, uses Visual AI to detect visual regressions by comparing screenshots of the UI with predefined baselines. Internally, Applitools captures screenshots and uses AI to analyze visual differences, identifying issues like layout shifts or color inconsistencies. It can handle dynamic content and supports cross-browser and cross-device testing.

Challenges: One challenge is handling dynamic UI elements that change based on user interactions or data. Ensuring that the AI correctly identifies meaningful visual differences while ignoring irrelevant ones, such as anti-aliasing or minor layout shifts, is crucial. Additionally, maintaining accurate baselines as the UI evolves can be resource-intensive.

Impact: Automated UI testing reduces manual testing effort by up to 50%, allowing QA teams to test more scenarios in less time. This leads to faster release cycles and reduces cognitive load on developers, as they can rely on automated tests to catch visual regressions.

Case Study: An e-commerce platform using Applitools reported a noticeable reduction in UI-related bugs post-release, as developers could confidently make UI changes without fear of introducing visual regressions.

4. Continuous Integration and Continuous Deployment (CI/CD) Automation

CI/CD pipelines automate the build, test, and deployment processes. AI enhances these pipelines by predicting failures and optimizing workflows. Harness, for example, uses AI to predict deployment failures based on historical data. Internally, Harness collects logs, metrics, and outcomes from previous deployments to train machine learning models that analyze patterns and predict potential issues. These models can identify risky deployments before they reach production.

Challenges: Ensuring access to high-quality labeled data is essential, as deployments can be complex with multiple failure modes. Additionally, models must be updated regularly to account for changes in the codebase and environments. False positives or missed critical issues can undermine trust in the system.

Impact: By predicting deployment failures, Harness reduces deployment failures by up to 70%, saving time and resources. This reduces cognitive load on DevOps teams, as they no longer need to constantly monitor deployments and react to failures. Automated CI/CD pipelines also enable faster feedback loops, allowing developers to iterate more rapidly.

Case Study: A tech startup using Harness reported a 50% reduction in deployment-related incidents and a 30% increase in deployment frequency, as AI-driven predictions prevented problematic releases.

5. Intelligent Bug Tracking and Prioritization

Bug tracking is critical, but manual prioritization can be inefficient. AI automates detection and prioritization, enhancing resolution speed. Bugasura, for instance, uses AI to classify and prioritize bugs based on severity and impact. Internally, Bugasura likely employs machine learning models trained on historical bug data to classify new bugs and assign priorities. It may also use natural language processing to extract relevant information from bug reports.

Challenges: Accurately classifying bugs, especially in complex systems with multiple causes or symptoms, is a significant challenge. Avoiding false positives and ensuring critical issues are not overlooked is crucial. Additionally, integrating with existing project management tools can introduce compatibility issues.

Impact: Intelligent bug tracking reduces the time spent on manual triage by up to 40%, allowing developers to focus on fixing the most critical issues first. This leads to faster resolution times and improved software quality. By automating prioritization, these tools reduce cognitive load, as developers no longer need to manually sort through bug reports.

Case Study: A SaaS company using Bugasura reduced their bug resolution time by 30% and improved customer satisfaction scores by 15%, as critical bugs were addressed more quickly.

6. Dependency Management and Vulnerability Detection

Managing dependencies and detecting vulnerabilities early is crucial for security. AI tools scan for risks and outdated dependencies without deploying agents. Wiz, for example, uses AI to analyze cloud environments for vulnerabilities. Internally, Wiz collects data from various cloud services (e.g., AWS, Azure, GCP) and uses machine learning models to identify misconfigurations, outdated software, and other security weaknesses. It analyzes relationships between components to uncover potential attack paths.

Challenges: Keeping up with the rapidly evolving cloud environments and constant updates to cloud services is a major challenge. Minimizing false positives while ensuring all critical vulnerabilities are detected is also important. Additionally, ensuring compliance with security standards across diverse environments can be complex.

Impact: Automated vulnerability detection reduces manual scanning efforts, allowing security teams to focus on remediation. By providing prioritized lists of vulnerabilities, these tools help manage workload effectively, reducing cognitive load. Wiz claims to reduce vulnerability identification time by 30%, enhancing overall security posture.

Case Study: A fintech firm using Wiz identified and patched 50% more critical vulnerabilities in their cloud environment compared to traditional methods, reducing their risk exposure significantly.

Conclusion

Left-shifting quality, enhanced by AI, is a critical strategy for modern software development, reducing costs, improving quality, and accelerating delivery. AI-powered tools automate and optimize QA processes, from code review to vulnerability detection, enabling teams to catch issues early and deliver brilliance. As AI continues to evolve, with trends like generative AI for test generation and predictive analytics, the future promises even greater efficiency. Organizations adopting these techniques can transform their development processes, achieving both speed and excellence.

What is LLMOps, MLOps for large language models, and their purpose

Why manage transfer learning of large language models and what is included in this management: getting acquainted with the MLOps extension for LLM called LLMOps.

How did LLMOps come to be? 

Large language models, embodied in generative neural networks (ChatGPT and other analogues), have become the main technology of the outgoing year, which is already actively used in practice by both individuals and large companies. However, the process of training LLM (Large Language Model) and their implementation in industrial use must be managed in the same way as any other ML system. A good practice for this has become the MLOps concept, aimed at eliminating organizational and technological gaps between all participants in the development, deployment and operation of machine learning systems.

As the popularity of GPT networks and their implementation in various application solutions grows, there is a need to adapt the principles and technologies of MLOps to transfer learning used in generative models. This is because language models are becoming increasingly large and complex to maintain and manage manually, which increases costs and reduces productivity. To avoid this, LLMOps, a type of MLOps that oversees the LLM lifecycle from training to maintenance using innovative tools and methodologies, can help.

LLMOps focuses on the operational capabilities and infrastructure required to fine-tune existing base models and deploy these improved models as part of a product. Because base language models are huge, such as GPT-3, which has 175 billion parameters, they require a huge amount of data to train, as well as time to map the computations. For example, it would take over 350 years to train GPT-3 on a single NVIDIA Tesla V100 GPU. Therefore, an infrastructure that can run GPU machines in parallel and process huge data sets is essential. LLM inference is also much more resource-intensive than more traditional machine learning, as it is not a single model, but a chain of models.

LLMOps provides developers with the necessary tools and best practices for managing the LLM development lifecycle. While the ideas behind LLMOps are largely the same as MLOps, large base language models require new methods, guidelines, and tools. For example, Apache Spark in Databricks works great for traditional machine learning, but it is not suitable for fine-tuning LLMs.

LLMOps focuses specifically on fine-tuning base models, since modern LLMs are rarely trained entirely from scratch. Modern LLMs are typically consumed as a service, where a provider such as OpenAI, Google AI, etc. offers an API of the LLM hosted on their infrastructure as a service. However, there is also a custom LLM stack, a broad category of tools for fine-tuning and deploying custom solutions built on top of open-source GPT models. The fine-tuning process starts with an already trained base model, which then needs to be trained on a more specific and smaller dataset to create a custom model. Once this custom model is deployed, queries are sent and the corresponding completion information is returned. Monitoring and retraining a model is essential to ensure its consistent performance, especially for LLM-driven AI systems.

Rapid engineering tools allow contextual training to be performed faster and cheaper than fine-tuning, without requiring sensitive data. In this case, vector databases extract contextually relevant information for specific queries, and prompt queries can optimize and improve model output based on patterns and chaining.

Similarities and differences with MLOps

In summary, LLMOps facilitates the practical application of LLM by incorporating operational management, LLM chaining, monitoring, and observation techniques that are not typically found in conventional MLOps. In particular, prompts are the primary means by which humans interact with LLMs. However, formulating a precise query is not a one-time process, but is typically performed iteratively, over several attempts, to achieve a satisfactory result. LLMOps tools offer features to track and version prompts and their results. This facilitates the evaluation of the overall performance of the model, including operational work with multiple LLMs.

LLM chaining links multiple LLM invocations in a sequential manner to provide a single application function. In this workflow, the output of one LLM invocation serves as the input to another to produce the final result. This design approach represents an innovative approach to developing AI applications by breaking down complex tasks into smaller steps. Chaining removes the inherent limitation on the maximum number of tokens that LLM can process simultaneously. LLMOps simplifies chaining management and combines it with other document retrieval methods, such as vector database access.

LLMOps’s LLM monitoring system collects real-time data points after a model is deployed to detect degradation in its performance. Continuous, real-time monitoring allows you to quickly identify, troubleshoot, and resolve performance issues before they affect end users. Specifically, prompts, tokens and their length, processing time, inference latency, and user metadata are monitored. This allows you to notice overfitting or changing the underlying model before performance actually degrades.

Monitoring models for drift and bias is also critical. While drift is a common problem in traditional machine learning models, as we’ve written about here, monitoring LLM solutions with LLMOps is even more important due to their reliance on underlying models. Bias can arise from the original datasets on which the base model was trained, custom datasets used for fine-tuning, or even from human evaluators judging fast completion. A thorough evaluation and monitoring system is needed to effectively remove bias.

LLM is difficult to evaluate using traditional machine learning metrics because there is often no single “right” answer, whereas traditional MLOps relies on human feedback, incorporating it into testing, monitoring, and collecting data for use in future fine-tuning.

Finally, there are differences in the way LLMOps and MLOps approach application design and development. LLMOps is designed to be fast, whereas traditional MLOps projects are typically iterative, starting with existing proprietary or open-source models and ending with custom fine-tuned or fully trained models on curated data.

Despite these differences, LLMOps is still a subset of MLOps. That’s why the authors of The Big Book of MLOps from Databricks have included the term in the second edition of this collection, which provides guiding principles, design considerations, and reference architectures for MLOps.