RISC-V Chip Adoption Driven by a Strategic Policy Set to Launch by China’s 2025

In a landmark move poised to reshape its technological landscape, China is gearing up to launch its inaugural national policy championing the adoption of RISC-V chips. This strategic initiative, slated for release as early as March 2025, marks a significant step in the country’s quest to pivot away from Western-dominated semiconductor technologies and bolster its homegrown innovation amid escalating global tensions.

Insiders familiar with the development reveal that the policy has been meticulously crafted through a collaborative effort involving eight key government entities. Among them are heavyweights like the Cyberspace Administration of China, the Ministry of Industry and Information Technology, the Ministry of Science and Technology, and the China National Intellectual Property Administration. Together, these bodies aim to cement RISC-V’s role as a cornerstone of China’s burgeoning tech ecosystem, fostering an environment ripe for domestic chip development and deployment.

The mere whisper of this policy has already sent ripples through the financial markets, igniting a wave of optimism among investors. On the day of the leak, Chinese semiconductor stocks staged an impressive rally. The CSI All-Share Semiconductor Products and Equipment Index, which had been languishing earlier, reversed course to surge by as much as 2.5%. Standout performers included VeriSilicon, which hit its daily trading cap with a 10% spike, alongside ASR Microelectronics, Shanghai Anlogic Infotech, and 3Peak, whose shares soared between 8.6% and an eye-catching 15.4% in afternoon trading.

At the heart of this policy push lies RISC-V, an open-source chip architecture that’s steadily carving out a global niche as a versatile, cost-effective rival to proprietary giants like Intel’s x86 and Arm Holdings’ microprocessor designs. Unlike its high-powered counterparts, RISC-V is often deployed in less demanding applications—think smartphones, IoT devices, and even AI servers—making it a pragmatic choice for a wide swath of industries. In China, its allure is twofold: slashed development costs and, critically, its freedom from reliance on U.S.-based firms, a factor that’s taken on heightened urgency amid trade restrictions and geopolitical friction.

Until now, RISC-V’s rise in China has been organic, driven by market forces rather than official mandates. This forthcoming policy changes the game, thrusting the architecture into the spotlight as a linchpin of Beijing’s broader campaign to achieve technological self-sufficiency. The timing is no coincidence—U.S.-China relations remain strained, with American policymakers sounding alarms over China’s growing leverage in the RISC-V space. Some U.S. lawmakers have even pushed to curb American companies’ contributions to the open-source platform, fearing it could turbocharge China’s semiconductor ambitions.

China’s RISC-V ecosystem is already buzzing with activity, spearheaded by homegrown innovators like Alibaba’s XuanTie division and rising star Nuclei System Technology, both of which have rolled out commercially viable RISC-V processors. The architecture’s flexibility is proving especially attractive in the AI sector, where models like DeepSeek thrive on efficient, lower-end chips. For smaller firms chasing affordable AI solutions, RISC-V offers a tantalizing blend of performance and price—a trend that could gain serious momentum under the new policy.

Sun Haitao, a manager at China Mobile System Integration, underscored the pragmatic appeal of RISC-V in a recent statement. “Even if these chips deliver just 30% of the performance of top-tier processors from NVIDIA or Huawei,” he noted, “their cost-effectiveness becomes undeniable when you scale them across multiple units.” This scalability could prove transformative for industries looking to maximize output without breaking the bank.

As China prepares to roll out this groundbreaking policy, the global tech community is watching closely. For Beijing, it’s a calculated gambit to secure its place at the forefront of the semiconductor race—one that could redefine the balance of power in a world increasingly divided by technology.

Opinion: AI Will Never Gain Consciousness

Artificial intelligence will never become a conscious being due to the lack of aspirations that are inherent in humans and other biological species. This statement was made by Sandeep Naiwal, co-founder of Polygon and the AI ​​company Sentient, in a conversation with Сointelegraph.

The expert does not believe that the end of the world is possible due to artificial intelligence gaining consciousness and seizing power over humanity.

Nailwal was critical of the theory according to which intelligence arises accidentally as a result of complex chemical interactions or processes. Although they can lead to the emergence of complex cells, there is no talk of the emergence of consciousness, the entrepreneur noted.

The co-founder of Polygon also expressed concerns about the risks of surveillance of people and the restriction of freedoms by centralized institutions with the help of artificial intelligence. Therefore, AI should be transparent and democratic, he believes.

[…] Ultimately, global AI, which can create a world without borders, must be controlled by every person,” Nailwal emphasized.

He added that everyone should have a personal artificial intelligence that is loyal and protects against the neural networks of influential corporations.

Recall that in January, Simon Kim, CEO of the crypto venture fund Hashed, expressed confidence that the future of artificial intelligence depends on a radical shift: opening the “black box” of centralized models and creating a decentralized, transparent ecosystem on the blockchain.

Agentic AI: Pioneering Autonomy and Transforming Business Landscapes

(um) let’s make tech work for us

The new autonomous systems labelled AI agents represent the latest evolution of AI technology and mark a new era in business. AI agents, in contrast to traditional models of AI that simply follow commands given to them and emit outputs in a specific format, work with a certain level of freedom. According to Google, these agents are capable of functioning on their own, without needing supervision from a human all of the time. The World Economic Forum describes them as systems that have sensors to see and effectors to interact with the environment. AI agents are expected to transform industries as they evolve from rigid, rule-based frameworks to sophisticated models adept at intricate decision-making . With unprecedented autonomy comes equally unprecedented responsibility. The additional benefits agentic AI technology brings is accompanied by unique challenges that invite careful consideration, planning, governance, and foresight.

The Mechanics of AI Agents: A Deeper Dive

Traditional AI tools, such as Generative AI (GenAI) or predictive analytics platforms, rely on predefined instructions or prompts to deliver results. In contrast, AI agents exhibit dynamic adaptability, responding to real-time data and executing multifaceted tasks with minimal oversight. Their functionality hinges on a trio of essential components:

  • Foundational AI Model: At the heart of an AI agent lies a powerful large language model (LLM), such as GPT-4, LLama, or Gemini, which provides the computational intelligence needed for understanding and generating responses.
  • Orchestration Layer: This layer serves as the agent’s “brain,” managing reasoning, planning, and task execution. It employs advanced frameworks like ReAct (Reasoning and Acting) or Chain-of-Thought prompting, enabling the agent to decompose complex problems into logical steps, evaluate outcomes, and adjust strategies dynamically—mimicking human problem-solving processes.
  • External Interaction Tools: These tools empower agents to engage with the outside world, bridging the gap between digital intelligence and practical application. They include:
    • Extensions: Enable direct interaction with APIs and services, allowing agents to retrieve live data (e.g., weather updates or stock prices) or perform actions like sending emails.
    • Functions: Offer a structured mechanism for agents to propose actions executed on the client side, giving developers fine-tuned control over outputs.
    • Data Stores: Provide access to current, external information beyond the agent’s initial training dataset, enhancing decision-making accuracy.

This architecture transforms AI agents into versatile systems capable of navigating real-world complexities with remarkable autonomy.

Multi-Agent Systems: The Newest Frontier

The Multi-Agent System (MAS) market is in for tremendous growth – Mckinsey gold, with a staggering predicted growth rate of nearly 28% by 2030. Recently Bloomberg predicted AI breakthroughs will soon give rise to multi-agent systems, collaborative networks of AI agents working collaboratively towards ambitious objectives. These systems promise scalability, surpassing the benefits of single operating agents.

Artificial imagination translates to a smart city. Imagine multiple AI agents working alongside each other. One controlling the signals, another managing traffic directing units, and an extra aiding with alert responder rerouting. And all of this is happening in real time!

Governance is key here to prevent a systemic failure, restricting conflicting commands that can stipulate paralysis or possibly force dysfunctions.Guaranteeing multi-agent systems possibility should be able to provide semi-standardized freedom, however, the benefits are the only ensured protocols alongside need need to be prescribed to.

Opportunities and Challenges

The potential of AI agents is game changing, but their independence creates grave concerns. The World Economic Forum highlights some challenges that companies must deal with: 

  •  Risks Associated with Autonomy: Ensuring safety and reliability becomes all the more difficult as agents become more independent. For example, an unmonitored agent could execute resource allocation that would trigger operational failures with cascading effects. 
  • Lack of Accountability: As trust is already fragile due to the opaque reasoning of ‘black box’ behavior, it becomes even more crucial within high risk healthcare or finance situations. Ensuring transparency and accountability becomes non negotiable.
  • Risks Surrounding Privacy and Security: A lot of sensitive information puts trust in jeopardy. An agent functioning effectively only having access to a multitude of sensitive systems and datasets makes one pose the question ‘How do we grant sufficient permissions without compromising security?’ Strong policies are needed to enforce standards to protect sensitivity and privacy while preventing breaches.

Some of these risks require guarding by taking proactive measures like consistent monitoring, adhering to ethical AI principles, human-in-the-loop oversight for garnering vital AI decision-making frameworks to retain control. Organizations need to deploy auditing tools that monitor and alter agent paths during deviations to regain control and maintain organizational goals.

The Human-AI Partnership

Even though AI agents have an independent function, their purpose is not to replace human reasoning but rather to augment it. How the EU AI Act works reminds us of the necessity of human intervention in processes like security or legal compliance which are sensitive. The best situation is one where both humans and machines work together: agents perform the monotonous and repetitive work that requires processing large amounts of data—this enables humans to be more strategic, creative, and ethical.

In a logistics company, for instance, an AI agent may be able to optimize the delivery routes using traffic information autonomously, and a manager can use their judgment and approve the AI’s plan using customer preferences or other unforeseen factors. This enables human control and supervision to be maintained while efficiency is also enhanced.

Guidelines for Implementing Agentic AI Strategically

Both Google Analytics and the World Economic Forum are integrated around a central idea. The responsible use of AI agents can result in outstanding value creation and unparalleled innovation. To reap value with manageable risks, businesses need to employ the following practices:  

  • Develop Skills: Prepare workforce on the building, implementation, and administration of AI agents to ensure the effective application of AI technology.  
  • AI Ethics: Develop appropriate business governance frameworks that adhere to the international benchmarks, the EU AI Act for instance, requiring fair and accountable operations of the agents.  
  • Ethics Boundaries: Delegated agent discretion must come with boundary safeguards to eliminate boundary overreach or lateral decision making through establishing unique controls.  
  • Validation Check: Enable behavioral modification to organizational needs through active auditing of the agent, stress testing em, and refining organizational value objectives.


Final thoughts

The integration of reasoning and planning gives Agent AI the ability to act on its own, AI agents mark a pivotal leap in the evolution of artificial intelligence. Their potential to change industries like personalized healthcare or smart cities is phenomenal but using AI carelessly is a grave mistake. For AI programs to be dependable companions, trust and security must anchor their development.

Organizations that find the right balance to enable agents to innovate while maintaining human supervision would be the ones leading the charge in this technological revolution. Agentic AI transforms it from an ordinary business tool to a paradigm shift re-imagining autonomy. Such a future is bound to belong to those who embrace its potential with clarity and caution.

Grok Names Elon Musk as the Main Disinformer

Elon Musk is the main disseminator of disinformation in X, according to the AI ​​assistant Grok from the entrepreneur’s startup xAI, integrated into his social network.

The billionaire has a huge audience and often spreads false information on various topics, the chatbot claims. Among other disinformers, according to the neural network: Donald Trump, Robert F. Kennedy Jr., Alex Jones and RT (Russian television).

Trump shares false claims about the election, Kennedy Jr. – about vaccines, and Alex Jones is known for spreading conspiracy theories. Russian television lies about political issues, Grok added.

Grok’s Top Disseminators of Disinformation. Data: X.

The chatbot cited Rolling Stone, The Guardian, NPR, and NewsGuard as sources of information.

The selection process involved analyzing multiple sources, including academic research, fact-checking organizations, and media reports, to identify those with significant influence and a history of spreading false or misleading information,” the AI ​​noted.

The criteria for compiling the rankings included the volume of false information spread, the number of followers, and mentions in credible reports.

When asked for clarification, Grok noted that the findings may be biased because the sources provided are mostly related to the funding or opinions of Democrats and liberals.

Recall that in January, artificial intelligence was used to spread fake news about the fires in Southern California.

A similar situation arose after Hurricane Helene.

Google Unveils Memory Feature for Gemini AI Chatbot

Google has launched a notable update to its Gemini AI chatbot, equipping it with the ability to remember details from previous conversations, a development experts are calling a major advancement.

In a blog post released on Thursday, Google detailed how this new capability allows Gemini to store information from earlier chats, provide summaries of past discussions, and craft responses tailored to what it has learned over time.

This upgrade eliminates the need for users to restate information they’ve already provided or sift through old messages to retrieve details. By drawing on prior interactions, Gemini can now deliver answers that are more relevant, cohesive, and enriched with additional context pulled from its memory. This results in smoother, more personalized exchanges that feel less fragmented and more like a continuous dialogue.

Rollout Plans and Broader Access
The memory feature is first being introduced to English-speaking users subscribed to Google One AI Premium, a $20 monthly plan offering enhanced AI tools. Google plans to extend this functionality to more languages in the near future and will soon bring it to business users via Google Workspace Business and Enterprise plans.

Tackling Privacy and User Control
While the ability to recall conversations offers convenience, it may raise eyebrows among those concerned about data privacy. To address this, Google has built in several options for users to oversee their chat data. Through the “My Activity” section in Gemini, individuals can view their stored conversations, remove specific entries, or decide how long data is kept. For those who prefer not to use the feature at all, it can be fully turned off, giving users complete authority over what the AI retains.

Google has also made it clear that it won’t use these stored chats to refine its AI models, putting to rest worries about data being repurposed.

The Race to Enhance AI Memory

Google isn’t alone in its efforts to boost chatbot memory. OpenAI’s Sam Altman has highlighted that better recall is a top demand from ChatGPT users. Over the last year, both companies have rolled out features letting their AIs remember things like a user’s favorite travel options, food preferences, or even their preferred tone of address. Until now, though, these memory tools have been fairly limited and didn’t automatically preserve entire conversation histories.

Gemini’s new recall ability marks a leap toward more fluid and insightful AI exchanges. By keeping track of past talks, it lets users pick up where they left off without losing the thread, proving especially handy for long-term tasks or recurring questions.

As this feature spreads to more users, Google underscores its commitment to transparency and control, ensuring people can easily manage, erase, or opt out of data retention altogether.

Sam Altman talks about the features of GPT-4.5 and GPT-5

OpenAI CEO Sam Altman shared the startup’s plans to release GPT-4.5 and GPT-5 models. The company aims to simplify its product offerings by making them more intuitive for users.

Altman acknowledged that the current product line has become too complex, and OpenAI is looking to change that.

We hate model selection as much as you do and want to get back to magical unified intelligence,” he wrote.

GPT-4.5, codenamed Orion, will be the startup’s last AI model without a “chain of reasoning” mechanism. The next step is to move toward more integrated solutions.

The company plans to combine the o and GPT series models, creating systems capable of:

  • using all available tools;
  • independently determining when deep thinking is needed and when an instant solution is enough;
  • adapting to a wide range of tasks.

GPT-5 integrates various technologies, including o3. Other innovations will include canvas capabilities (Canvas-mode), search, deep research (Deep Research) and much more.

Free GPT-5 subscribers will get unlimited access to the model’s tools on standard settings. Plus and Pro account holders will be able to use advanced features with a higher level of intelligence.

Regarding the release dates of GPT-4.5 and GPT-5, Altman wrote in the comments to the tweet about “weeks” and “months“, respectively.

According to Elon Musk, ChatGPT’s competitor, the Grok 3 chatbot, is in the final stages of development and will be released in one to two weeks. Reuters writes about this.

Grok 3 has very powerful reasoning capabilities, so in the tests we’ve done so far, Grok 3 outperforms all the models that we know of, so that’s a good sign,” the entrepreneur said during a speech at the World Summit of Governments in Dubai.

Recall that Altman turned down Musk and a group of investors’ bid to buy the non-profit that controls OpenAI for $97.4 billion. The startup’s CEO admitted that this was an attempt to “slow down” the competing project.

The Application of AI towards Real Time Fraud Detection on Digital Payments

The growth and development of the internet coupled with advanced digital communication systems has greatly transformed the global economy, especially in the area of commerce. Fraud attempts, on the other hand, have become more diverse and sophisticated over time, costing businesses and financial institutions millions of dollars each year. Fraudster activities and techniques have evolved from unsophisticated detection processes to contemporary automated methods based on rules through intelligent systems. Currently, artificial intelligence (AI) assists in both controlling and combating fraud, offering help to advance the sector of finance technology (fintech). In this article, we will explain the mechanics of AI in digital payments fraud detection focusing on the technical aspects, a real case, and relevant comments for mid-level AI engineers, product managers, and other professionals in fintech.

The Increased Importance of Identifying Fraud In Real-Time

The volume and complexity of digital payments, which include credit card transactions, P2P app payments, A2A payments, and others, continue to rise. Between 2023 and 2028, Juniper Research estimates that the cost of online payment fraud will climb beyond $362 billion globally. Automated and social engineering attacks exploit weaknesses such as stolen credentials and synthetic identities, often attacking within moments. Outdated methods of fraud detection that depend upon static rules (‘flag transactions over $10,000’) are ineffective against these fast paced threats. Systems are overloaded and angry customers worsen the problem, all the while undetected fraud continues to sail through.

Thanks to AI. Now, everything is seconds away, (we’ll repeat) all because of AI. With machine learning, deep learning and real-time data processing, AI can evaluate large amounts of data, recognize patterns, adapt to changes, and detect anomalies, all in a matter of milliseconds. For professionals in fintech, this movement is both a chance and a challenge: build systems that are accurate, fast, and scalable all while reducing customer friction.

How AI-Fueled Real-Time Fraud Detection Works

AI-enhanced fraud detection is supported by three tiers: data, algorithms, and real-time execution. Let’s simplify this concept for a mid-level AI engineering or product management team. 

The Underlying Information: For any front line fraud detection system, a payment transaction generated in real-time must be coupled with rich and high-quality data. This means diverse data, which includes transaction histories, user behavior profile data, device fingerprints, IP geolocation, and external sources such as chatter from the dark web. For instance, a transaction attempted from a new device located in a foreign country can be flagged as suspicious, when it is combined with a user’s base spending patterns. AI systems pull this data through streaming services such as Apache Kafka, or even cloud-native solutions like AWS Kinesis, which promises low latency. Data engineers must be willing to collect clean basic structured datasets, because the system performs poorly when the data given is of poor granularity. This is a proven lesson learned many times in the past twenty years for me.

Algorithms: The realm of AI has brought super advanced machine learning models into the world of detecting fraudulent activities, and these models are the backbone of AI fraud detection. Models with supervised learning capabilities work with labeled datasets (e.g. “fraud” vs. “legitimate”) and are proficient in recognizing established fraud patterns. Due to their accuracy and interpretability, Random Forests, and Gradient Boosting Machines (GBMs) are among the most popular models. Unfortunately, fraud is evolving much faster than data can be labeled and this is where unsupervised learning comes in. Clustering algorithms DBSCAN or autoencoders do not need previous examples and can pull unusual transactions for review. For example, even in the absence of historical fraud signatures, the sudden spike in small, rapid transfers can be flagged as it might indicate money laundering. Detection is further improved by deep learning models, such as recurrent neural networks (RNNs), that observe time series data (e.g. transaction timestamp) for hidden patterns and relationships.

Execution In Real-Time: Time is of the essence with digital payments. The payment systems must make a decision to approve, decline, or escalate a transaction in less than 100 milliseconds. This is only achievable by using distributed computing frameworks such as Apache Spark’s batch processing and Flink’s stream real-time analysis processing. Scaling inference is done using GPU-accelerated hardware, e.g., millions of transactions per second through NVIDIA CUDA, allowing for easy handling of over a thousand transactions every second. Product managers should remember that latency trade-offs can be detrimental when the complexity of the model increases; a simpler logistic regression may be suitable for low-risk scenarios, while high-precision cases require complex neural networks.

Real-World Case Study: PayPal’s AI-Driven Fraud Detection

To illustrate AI’s impact, consider PayPal, a fintech giant processing over 22 billion transactions annually. In the early 2010s, PayPal faced escalating payment fraud, including account takeovers and stolen card usage. Traditional rule-based systems flagged too many false positives, alienating users, while missing sophisticated attacks. By 2015, PayPal had fully embraced AI, integrating real-time ML models to combat fraud – a strategy we’ve seen replicated across the industry.

PayPal’s approach combines supervised and unsupervised learning. Supervised models analyze historical transaction data—device IDs, IP addresses, email patterns, and purchase amounts—to assign fraud probability scores. Unsupervised models detect anomalies, such as multiple login attempts from disparate locations or unusual order sizes (e.g., shipping dozens of items to one address with different cards). Real-time data feeds from user interactions and external sources (e.g., compromised credential lists) enhance these models’ accuracy.

Numbers: According to PayPal’s public reports and industry analyses, their AI system reduced fraud losses by 30% within two years of deployment, dropping fraud rates to below 0.32% of transaction volume—a benchmark in fintech. False positives fell by 25%, improving customer satisfaction, while chargeback rates declined by 15%. These gains stemmed from processing 80% of transactions in under 50 milliseconds, enabled by a hybrid cloud infrastructure and optimized ML pipelines. For AI engineers, PayPal’s use of ensemble models (combining decision trees and neural networks) offers a practical lesson in balancing precision and recall in high-stakes environments.

Technical Challenges and Solutions

Implementing AI for real-time fraud detection isn’t without hurdles. Here’s how to address them:

  • Data Privacy and Compliance: Regulations like GDPR and CCPA mandate strict data handling. Techniques like federated learning—training models locally on user devices – minimize exposure, while synthetic data generation (via GANs) augments training sets without compromising privacy.
  •  Model Drift: Fraud patterns shift, degrading model performance. Continuous retraining with online learning algorithms (e.g., stochastic gradient descent) keeps models current. Monitoring metrics like precision, recall, and F1-score ensures drift is caught early.
  •  Scalability: As transaction volumes grow, so must your system. Distributed architectures (e.g., Kubernetes clusters) and serverless computing (e.g., AWS Lambda) provide elastic scaling. Optimize inference with model pruning or quantization to reduce latency on commodity hardware.

The Future of AI in Fraud Detection

Whatever the future holds, it’s clear that AI’s role will only become more pronounced. For one, Generative AIs such as large language models (LLMs) could develop new methods of simulating fraud, while the involvement of blockchain technology could guarantee that the leger’s transaction records are safe from any possible modification. Identity verification through biometrics face detection and voice recognition will limit synthetic identity fraud.

As was noted previously, the speed, accuracy, and adaptability of AI in real-time fraud detection can enable users to effortlessly pinpoint and eliminate issues within digital payments that rule-based systems cannot alleviate. While PayPal’s success is evidence of this capability, the journey is not easy and requires fundamental discipline along with a well-planned approach. Now, for AI engineers, product managers, and fintech professionals, moving into this space is no longer purely a career change; it is an opportunity to build a safer financial system for all.

What is LLMOps, MLOps for large language models, and their purpose

Why manage transfer learning of large language models and what is included in this management: getting acquainted with the MLOps extension for LLM called LLMOps.

How did LLMOps come to be? 

Large language models, embodied in generative neural networks (ChatGPT and other analogues), have become the main technology of the outgoing year, which is already actively used in practice by both individuals and large companies. However, the process of training LLM (Large Language Model) and their implementation in industrial use must be managed in the same way as any other ML system. A good practice for this has become the MLOps concept, aimed at eliminating organizational and technological gaps between all participants in the development, deployment and operation of machine learning systems.

As the popularity of GPT networks and their implementation in various application solutions grows, there is a need to adapt the principles and technologies of MLOps to transfer learning used in generative models. This is because language models are becoming increasingly large and complex to maintain and manage manually, which increases costs and reduces productivity. To avoid this, LLMOps, a type of MLOps that oversees the LLM lifecycle from training to maintenance using innovative tools and methodologies, can help.

LLMOps focuses on the operational capabilities and infrastructure required to fine-tune existing base models and deploy these improved models as part of a product. Because base language models are huge, such as GPT-3, which has 175 billion parameters, they require a huge amount of data to train, as well as time to map the computations. For example, it would take over 350 years to train GPT-3 on a single NVIDIA Tesla V100 GPU. Therefore, an infrastructure that can run GPU machines in parallel and process huge data sets is essential. LLM inference is also much more resource-intensive than more traditional machine learning, as it is not a single model, but a chain of models.

LLMOps provides developers with the necessary tools and best practices for managing the LLM development lifecycle. While the ideas behind LLMOps are largely the same as MLOps, large base language models require new methods, guidelines, and tools. For example, Apache Spark in Databricks works great for traditional machine learning, but it is not suitable for fine-tuning LLMs.

LLMOps focuses specifically on fine-tuning base models, since modern LLMs are rarely trained entirely from scratch. Modern LLMs are typically consumed as a service, where a provider such as OpenAI, Google AI, etc. offers an API of the LLM hosted on their infrastructure as a service. However, there is also a custom LLM stack, a broad category of tools for fine-tuning and deploying custom solutions built on top of open-source GPT models. The fine-tuning process starts with an already trained base model, which then needs to be trained on a more specific and smaller dataset to create a custom model. Once this custom model is deployed, queries are sent and the corresponding completion information is returned. Monitoring and retraining a model is essential to ensure its consistent performance, especially for LLM-driven AI systems.

Rapid engineering tools allow contextual training to be performed faster and cheaper than fine-tuning, without requiring sensitive data. In this case, vector databases extract contextually relevant information for specific queries, and prompt queries can optimize and improve model output based on patterns and chaining.

Similarities and differences with MLOps

In summary, LLMOps facilitates the practical application of LLM by incorporating operational management, LLM chaining, monitoring, and observation techniques that are not typically found in conventional MLOps. In particular, prompts are the primary means by which humans interact with LLMs. However, formulating a precise query is not a one-time process, but is typically performed iteratively, over several attempts, to achieve a satisfactory result. LLMOps tools offer features to track and version prompts and their results. This facilitates the evaluation of the overall performance of the model, including operational work with multiple LLMs.

LLM chaining links multiple LLM invocations in a sequential manner to provide a single application function. In this workflow, the output of one LLM invocation serves as the input to another to produce the final result. This design approach represents an innovative approach to developing AI applications by breaking down complex tasks into smaller steps. Chaining removes the inherent limitation on the maximum number of tokens that LLM can process simultaneously. LLMOps simplifies chaining management and combines it with other document retrieval methods, such as vector database access.

LLMOps’s LLM monitoring system collects real-time data points after a model is deployed to detect degradation in its performance. Continuous, real-time monitoring allows you to quickly identify, troubleshoot, and resolve performance issues before they affect end users. Specifically, prompts, tokens and their length, processing time, inference latency, and user metadata are monitored. This allows you to notice overfitting or changing the underlying model before performance actually degrades.

Monitoring models for drift and bias is also critical. While drift is a common problem in traditional machine learning models, as we’ve written about here, monitoring LLM solutions with LLMOps is even more important due to their reliance on underlying models. Bias can arise from the original datasets on which the base model was trained, custom datasets used for fine-tuning, or even from human evaluators judging fast completion. A thorough evaluation and monitoring system is needed to effectively remove bias.

LLM is difficult to evaluate using traditional machine learning metrics because there is often no single “right” answer, whereas traditional MLOps relies on human feedback, incorporating it into testing, monitoring, and collecting data for use in future fine-tuning.

Finally, there are differences in the way LLMOps and MLOps approach application design and development. LLMOps is designed to be fast, whereas traditional MLOps projects are typically iterative, starting with existing proprietary or open-source models and ending with custom fine-tuned or fully trained models on curated data.

Despite these differences, LLMOps is still a subset of MLOps. That’s why the authors of The Big Book of MLOps from Databricks have included the term in the second edition of this collection, which provides guiding principles, design considerations, and reference architectures for MLOps.