Stefano Amorelli

Open-source financial research with LLMs, MCP, and the US SEC EDGAR

Stefano Amorelli — Mon, 21 Jul 2025 11:50:10 GMT

❗

EDGAR® and SEC® are trademarks of the U.S. Securities and Exchange Commission. This blog post and the related open-source project are not affiliated with, endorsed by, or connected in any way to the U.S. Securities and Exchange Commission.

Overview

In 1934, US Congress created the Securities and Exchange Commission (SEC) to oversee financial markets and protect investors.

The agency was built on a simple principle: investors deserve accurate, truthful, and complete information about the companies they want to invest in.

[…] “those who seek to draw upon other people's money must be wholly candid regarding the facts on which the investor's judgment is asked.” […]

- Franklin Delano Roosevelt, 32nd President of the United States

For nearly a century, the SEC has served as the financial world's transparency watchdog, requiring public companies to disclose everything from quarterly earnings to executive compensation, from business risks to major corporate events.

Through its Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system (launched in the 1990s) the SEC has made these filings freely available to anyone with an internet connection.

In theory, this created a level playing field where individual investors could access the same data as professional investors. In practice, however, there is a gap.

Institutional investors have analysts, sophisticated software, and millions in technology infrastructure to parse, analyze, and extract insights from SEC filings. Retail investors don’t. They need to manually navigate through dense, complex documents that often span hundreds of pages.

For example, a single Apple annual report (10-K filing) contains over 100 pages of financial data, business descriptions, and risk factors.

Page 1 of 121 of the 2024 Apple’s annual report (10-K)

The problem goes beyond just having access to data. Professional investors systematically extract specific metrics, compare trends across quarters and years, analyze segment performance, and identify patterns that would be nearly impossible to spot through manual review.

But today things have changed, individual investors can now keep up.

Large language models, the Model Context Protocol, and programmatic access to SEC EDGAR data are finally making sophisticated financial research accessible to individual investors. You can now use AI to extract key financial metrics, perform complex analyses across multiple companies and time periods, and uncover insights that previously required specialized expertise and expensive tools.

How exactly does this work in practice? Let's dive into the technical foundation that makes this possible: the SEC EDGAR MCP Server (released with version 1-alpha on the 21th July 2025), an open-source software built in public and maintained by the community that transforms the way how investors interact with financial research.

How AI changes everything

Before we dive into the technical details, let's understand what makes this approach fundamentally different from traditional financial research tools and workflows.

The Model Context Protocol (MCP) is an open standard that allows AI assistants to securely connect to external data sources and tools. Think of it as a universal connector that lets your AI assistant access and speak directly to databases, APIs, and services, including the SEC's EDGAR system.

Traditional financial research often involves jumping between multiple platforms: searching for companies on one site, downloading filings from another, then manually copying data into spreadsheets for analysis.

With an MCP server, your AI assistant can do all of this seamlessly in a single conversation.

From raw data to intelligence

Let's look at a practical example. Suppose you want to analyze Microsoft's financial data. Traditionally, this would involve:

Finding MSFT's recent 10-Q (quarterly report) or 10-K (annual report) filings on EDGAR
Downloading and opening multiple PDF documents
Manually searching for segment revenue data
Copying numbers into a spreadsheet
Calculating growth rates and trends

With the SEC EDGAR MCP, this entire process becomes a simple conversation: "Show me Microsoft balance sheet, income statement, and cashflow data." The AI assistant handles all the technical complexity behind the scenes and presents you with clean, formatted results:

https://youtu.be/HEWTR15-xZc

Another example would be to analyze Apple’s latest revenue:

https://youtu.be/T-6H3zF8fWU

Or create a dashboard with charts, on the fly, based on the latest NVIDIA financial data:

https://youtu.be/UH2JPfhuoGU

We could also search for the latest insider transactions in Amazon:

https://youtu.be/k4sVFcHpmfg

Or investigate company-specific data entries from the Apple filings, and plot them:

https://youtu.be/V2rSkgYZXGg

To use this workflow, you'll need a running LLM that supports the MCP protocol, such as Claude Desktop (used in the demos). The server runs locally via Docker.

You can follow the instructions on how to install and use it here.

But wait, the SEC does provide APIs for accessing EDGAR data. So why not just use those directly?

Why not EGAR API?

The answer lies in complexity and usability. The SEC's REST APIs are powerful but require technical expertise to use effectively. You need to understand company identifiers (CIKs), filing taxonomies, XBRL structures, and how to navigate complex JSON responses. Also, you’d need to know how to code.

For a simple question like "What was Apple's revenue last quarter?" you'd need to write a software to find Apple's CIK (Central Identifier Key), locate the right filing, parse XBRL data, and extract the specific financial concept. All of this before you even get to analysis.

This complexity naturally leads to another question: why not just ask ChatGPT or other AI assistants directly about financial data, without an MCP server?

Why not general LLMs?

The challenge here is accuracy and currency. General-purpose AI models are trained on data with cutoff dates, meaning they lack recent financial information. While they can navigate the web and try to find the financial data, they might miss important details. When you're making investment decisions, you need current, verified, and complete data from the source.

That's exactly the problem the SEC EDGAR MCP server was designed to solve.

As you can see from this conversation, the LLM connected to the MCP server is able to consume the information based on the original filing, the best source of information available:

All data is sourced directly from NVIDIA's SEC EDGAR filing (Form 10-Q, filed May 28, 2025, Accession Number: 0001045810-25-000116) with exact precision preserved from the original XBRL data.

Inner workings of the MCP server

This open-source package is freely available for anyone to use, modify, and improve. The package provides over 20 specialized tools to LLMs, that handle everything from finding company filings to extracting complex financial metrics:

You can read more in details about each tool in the documentation.

Here's what makes it powerful:

Smart data extraction and parsing: Instead of manually parsing through hundreds of pages of financial documents, the package can automatically extract specific metrics like revenue by geographic segment, quarterly comparisons, or executive compensation data.

Multiple data sources: The package taps into several SEC data streams, from the main EDGAR database to real-time RSS feeds of new filings, so that you have access to both historical data and the latest company updates.

XBRL analysis: Modern SEC filings use XBRL (eXtensible Business Reporting Language), a structured format that makes financial data machine-readable. The package understands XBRL natively, allowing it to extract precise financial concepts rather than consuming the whole document.

Company-specific insights: Different companies report data differently. Apple might break down revenue by "Americas, Europe, and Greater China" while Microsoft uses different regional categories. The package dynamically discovers and adapts to each company's specific reporting structure.

Open-source, why it matters?

The decision to make this package open source isn't just about free access, it's about transparency and community-driven innovation.

Financial tools shouldn't be black boxes. When you're making investment decisions, you need to trust not just the data, but the methods used to extract and analyze it.

Open source means you can inspect exactly how the package works, contribute improvements, and adapt it for your specific needs. It also means the tool can evolve with the community, incorporating new features.

Looking forward

The SEC has been collecting corporate disclosures for nearly 90 years. The data is all there, freely available to anyone. But until now, extracting meaningful insights from that data required time, deep technical expertise, and expensive analytical tools.

With MCP and LLMs individual investors can ask questions in plain English and get precise answers backed by official SEC filings.

It’s not a revolutionary technology, it's simply good engineering applied to a real problem. The SEC EDGAR already provides APIs, companies already file in structured formats, AI assistants already exist.

The MCP just connects these pieces together in a way that's actually useful for investors.

Roosevelt wanted markets where individual investors could make informed decisions. The SEC provided the transparency. Now open-source tools are providing the accessibility. What took teams of analysts before can now be done in a conversation by anyone.

Maybe the information advantage that Wall Street has held for decades is disappearing.

Acknowledgements

This work wouldn't be possible without the foundation laid by many others.

The US SEC deserves recognition for the incredible work of maintaining one of the world's most comprehensive and accessible corporate disclosure systems. The EDGAR database and REST APIs provide the reliable data foundation that makes tools like this possible.

Anthropic created the Model Context Protocol standard and continues to advance the field of AI safety and capability. Their commitment to open standards enables the kind of interoperability that benefits everyone.

Efficient large context management in AWS Strands Agents

Stefano Amorelli — Wed, 25 Jun 2025 21:00:00 GMT

Imagine: your AI agent has been working on a complex analysis for hours. It's pulled data from multiple sources, run various calculations, identified patterns, and made several key discoveries.

Then it hits a wall. It found something interesting but it can't remember the earlier context.

Which datasets were involved? What were the baseline findings?

The context window filled up. Your agent essentially forgot hours of work!

Extended AI agents sessions can hit this wall easily, and analytical workflows suffer the most because continuity matters.

SummarizingConversationManager (released in Strands Agents v0.1.8) addresses this challenge by implementing context compression. Rather than simply truncating old messages when the context window fills up, it creates a summary that preserve the essential information from the previous messages:

This approach maintains a coherent conversational flow while reducing context size. In a way, it operates like a real analyst: it remembers the most important bits and discards the details to make space for new data.

Let's see how you can use it!

Configuration options for summarizing context management

A key principle of Strands Agents is that it lets you build powerful workflows with minimal code, and the SummarizingConversationManager follows the same philosophy. It comes configured with sensible defaults and works immediately for most use cases:

With this configuration, when the agent hits the context limit it automatically summarizes previous conversations to extend the available context space.

But let's explore how you can have more control over the summarization process.

Delegating summarization to specialized agents

One approach is to use a separate agent specifically for creating summaries. This allows you to optimize the summarization independently from your main agent's configuration. For instance, you might want to use a model that's particularly good at distilling complex technical information:

This separation also means you can give the summarizer agent specific instructions or prompts that differ from your main agent's role. The summarizer might focus on extracting methodological details and quantitative results, while your main agent maintains its broader analytical perspective.

Controlling summary granularity

With summary_ratio we can determine how much compression occurs during summarization. This value represents the target ratio between the summary length and the original content length.

Higher summary_ratio values produce more aggressive compression, resulting in brief, high-level summaries that capture only the most essential points. Lower values create more detailed summaries that preserve additional context and nuance:

The optimal ratio depends on your use case. Exploratory data analysis might benefit from detailed summaries that preserve methodological nuances, while routine reporting workflows might work well with more compressed summaries.

Preserving recent context

You wouldn't want your agent to immediately summarize the conversation you just had. Recent exchanges contain the freshest context and often drive the current direction of your analysis. The preserve_recent_messages parameter controls how many of the most recent messages remain untouched by summarization.

When you set this parameter, those recent messages stay in their original form, maintaining the natural conversational flow and ensuring that immediate context remains accessible:

This parameter requires some consideration of the agent's typical patterns. If it tends to have many short exchanges, you might want to preserve more messages. For workflows with longer, more substantial exchanges, fewer preserved messages should be sufficient.

Custom summarization instructions

You can also provide specific instructions for how summaries should be created through the summary_prompt parameter. This allows you to tailor the summarization process to your particular domain or workflow:

Complete configuration example

Here's how these parameters work together in a realistic scenario. This configuration might be appropriate, for example, for a data science workflow where you need to maintain methodological details while managing long analytical sessions:

This setup creates a system where the agent can maintain awareness of analytical workflows over extended periods. The summarizer agent focuses specifically on creating accurate summaries, while the main agent can continue its work without losing important context from earlier in the conversation.

A note on large-context foundation models

If new models support huge context windows, why bother with summarization? Sure, you could throw millions of tokens at the problem and let your agent handle an endless conversation history (like Claude 3.5 Sonnet with 200K tokens, Gemini 1.5 Pro with 1M tokens, or GPT-4 Turbo with 128K tokens).

But most of that context can easily become irrelevant noise, and agents slow down if they need to process massive amounts of data on every request. Costs also spike because you're paying for a lot of tokens that might not add much value. And paradoxically, performance often degrades. In a ocean of context agents struggle to identify what actually matters.

Think about how you work on complex projects. You don't reread every email, every draft, every brainstorming session. You keep the key insights, the important decisions. You compress intelligently. That's exactly what SummarizingConversationManager does.

Maybe agents don't need to remember everything, just the right things!

Resources

How AI agents pay onchain.

Stefano Amorelli — Tue, 17 Jun 2025 15:47:09 GMT

It's January 1997.

The IETF (Internet Engineering Task Force) has just released RFC 2068, officially defining HTTP/1.1. The specification was authored by web pioneers Roy Fielding, Jim Gettys, Jeffrey Mogul, Henrik Frystyk, and Tim Berners-Lee, the architects who shaped how the internet communicates.

The specification introduces persistent connections: previously, every single HTTP request required a fresh TCP connection. Persistent connections resolve this, allowing multiple HTTP requests to flow through a single, long-lived TCP connection. No more establishing separate connections for every image, CSS file, or JavaScript snippet on a web page.

There's also chunked transfer encoding, a new way for web servers to stream content without knowing the full size beforehand. No longer does a server need to calculate the total size of dynamically generated content upfront, it's now free to deliver data incrementally, as it's produced.

But RFC 2068 quietly introduces something intriguing, a new status code:


HTTP 402 Payment Required

   This code is reserved for future use.

This shows how the founding fathers of world wide web predicted how money would eventually become a big part of internet, even if they had no clear path on how it would actually play out.

Today, 2025, nearly three decades and multiple HTTP versions later (HTTP/2 in 2015, HTTP/3 in 2022), Status Code 402 still sits there with the exact same note: 'reserved for future use.' Despite the fintech revolution, the rise of online payments, and an entire economy built on internet transactions, nobody had figured out what to do with it.

Until now.

Last month (May 2025), Coinbase released x402, an open source protocol that gives HTTP 402 its first real job: enabling native onchain payments within HTTP requests.

AI agents now need to make M2M (machine-to-machine) payments across the web with reduced HITL (human-in-the-loop) interventions. Traditional payment flows don’t work well in this case. They require multiple human interactions, redirects, and manual steps that simply don't work when an AI agent needs to make a transaction autonomously.

x402 fills this gap. It proposes an automated on-chain payment flow implemented natively within the HTTP protocol, making them as seamless as any other web request.

But what does this look like in practice?

Architecture and components of `x402`

x402 is built around four core components:

A client acts as the payment initiator, discovering what's required for access and constructing the appropriate payment payload. Put simply, this is whatever is making the HTTP request to a pay-walled resource. It could be a browser making a request for premium content, an AI agent purchasing API access, or a mobile app unlocking features. The client handles the cryptographic signing using the user's private key and automatically retries requests when payment is required.

The resource server enforces payment policies for its endpoints while remaining focused on its core business logic. This is the web server or API that hosts the content or service being purchased. It maintains simple pricing tables that map endpoints to costs, but delegates the payment verification logic to the facilitator.

Blockchain logic is implemented in the facilitator component: verifying cryptographic signatures, preventing replay attacks through nonce tracking, and managing the actual on-chain settlement. It allows both clients and servers to work with on-chain payments without understanding the blockchain implementation details.

On blockchain resides the final settlement layer, ensuring payments are immutable and transparent. It enables programmable money through smart contracts and stable-coins, but its complexity is completely hidden from the application layer by the facilitator.

Component	Primary Responsibility	Key Features	What it does
Client	Payment initiation	`EIP-712` signing, automatic retries, payment discovery	Makes requests, handles wallets, retries with payment
Resource Server	Payment enforcement	Pricing tables, `HTTP 402` responses, middleware integration	Sets prices, checks payments, serves content
Facilitator	Payment verification	Signature verification, nonce tracking, gas abstraction	Verifies signatures, talks to blockchain
Blockchain	Payment settlement	`USDC` transfers, smart contracts, immutable records	Settles payments on chain

Principles

This architecture demonstrates several fundamental software engineering principles. The most important is separation of concerns. Each component has a single, well-defined responsibility. Resource servers focus purely on business logic, facilitators handle payment complexity, and clients manage user interaction.

The system achieves loose coupling by having components interact only through standardized HTTP and REST interfaces. A resource server doesn't need to understand how blockchain transactions work, and a client doesn't need to know the server's internal implementation. This isolation means you can swap out components (for example, use a different blockchain, change facilitator providers, or modify server logic) without affecting the rest of the system.

The facilitator embodies the single responsibility principle by isolating all blockchain complexity into one specialized service. This prevents payment logic from leaking into business applications and keeps concerns properly separated.

Last but not least this architecture follows dependency inversion. High-level components depend on abstractions rather than concrete implementations. Servers and clients depend on HTTP interfaces, not specific blockchain APIs. This allows the same application code to work across different blockchains and payment schemes without modification.

Payment flow

When an AI agent or user hits an x402-enabled API, here's the four-step flow that happens:

Initial request: The client makes a standard HTTP request to access some resource
Payment required response: If no payment is attached, the server responds with HTTP 402 and includes payment details
Payment authorization: The client creates a cryptographically signed payment and retries the request
Verification and access: The server validates the payment, broadcasts it to the blockchain, and grants access

What makes this powerful is that it all happens at the HTTP protocol level. No redirects to third-party payment processors, no OAuth flows, no account creation. Just standard HTTP with extra headers:

X-PAYMENT flows from client to server and contains the signed payment payload. This includes the payment details (amount, recipient, token) plus a cryptographic signature proving the client authorized the payment.
X-PAYMENT-RESPONSE flows from server to client after successful payment and contains transaction receipt information, providing transparency about what happened on-chain.

sequenceDiagram
    participant C as Client
(Apps, AI Agents, Browsers)
    participant S as Resource Server
(APIs, Services)
    participant F as Facilitator
(Payment Processor)
    participant B as Blockchain
(Base, Ethereum, etc.)

    C->>S: 1. Request resource
    S->>C: 2. 402 Payment Required
    C->>S: 3. Request + X-PAYMENT header
    S->>F: 4. Verify payment
    F->>B: 5. Check on-chain
    B->>F: 6. Validation result
    F->>S: 7. Verification response
    S->>F: 8. Settle payment
    F->>B: 9. Broadcast transaction
    S->>C: 10. Resource + receipt

Server-side implementation

Payment middleware architecture

The core server-side implementation revolves around a payment filter (AKA middle-ware) that intercepts HTTP requests and enforces payment requirements. When integrated into your web server, this middle-ware checks incoming requests against a price table that maps endpoints to their costs.

The middle-ware follows a simple decision tree: if a request hits a protected endpoint without payment, it responds with HTTP 402 and detailed payment instructions. If payment is included in the X-PAYMENT header, it verifies the payment with a facilitator service before allowing the request to proceed.

Here's the essential structure from the Java implementation:

public class PaymentFilter implements Filter {
    private final String payTo;
    private final Map priceTable; // path → amount
    private final FacilitatorClient facilitator;

    public void doFilter(ServletRequest request, ServletResponse response, 
                        FilterChain chain) throws IOException, ServletException {
        String path = req.getRequestURI();
        String paymentHeader = req.getHeader("X-PAYMENT");

        if (!priceTable.containsKey(path)) {
            chain.doFilter(request, response);  // Free endpoint
            return;
        }

        if (paymentHeader == null) {
            send402Response(resp, path);  // Request payment
            return;
        }

        // Verify payment, process request, then settle
        VerificationResponse verification = facilitator.verify(paymentHeader, requirements);
        if (verification.valid) {
            chain.doFilter(request, response);
            facilitator.settle(paymentHeader, requirements);
        }
    }
}

The beauty of this approach is that it requires minimal changes to existing applications. You simply add the payment filter to your middle-ware stack and define which endpoints require payment.

`PaymentRequirements` Response

When a client hits a protected endpoint without payment, the server constructs a detailed payment requirements object. This includes the payment amount, accepted tokens (like USDC), the receiving wallet address, blockchain network, and an expiration time to prevent replay attacks.

private void send402Response(HttpServletResponse response, String path) throws IOException {
    response.setStatus(HttpStatus.PAYMENT_REQUIRED);
    response.setContentType("application/json");

    PaymentRequirements requirements = PaymentRequirements.builder()
        .paymentRequirement(List.of(
            PaymentRequirement.builder()
                .kind(new Kind("exact", "base-sepolia"))             // Payment scheme + blockchain network
                .receiver(payTo)                                     // Wallet address to receive payment
                .amount(priceTable.get(path))                        // Cost for this specific endpoint
                .asset("")                      // USDC token contract
                .expiry(Instant.now().plus(Duration.ofMinutes(5)))   // Payment window
                .nonce(UUID.randomUUID().toString())                 // One-time use identifier
                .build()
        ))
        .build();

    response.getWriter().write(Json.MAPPER.writeValueAsString(requirements));
}

Each field in the PaymentRequirements is described as follows:

kind: Defines the payment scheme (exact for fixed amounts) and target blockchain network (base-sepolia for Base testnet). This tells the client exactly how to structure and execute the payment.
receiver: The wallet address where payment should be sent. This is your business wallet that will receive the funds.
amount: The cost for accessing this specific endpoint, retrieved from your price table. For USDC, this is typically expressed in wei (smallest unit).
asset: The smart contract address of the token to be used for payment. The example shows USDC on Base Sepolia testnet.
expiry: A timestamp after which this payment requirement becomes invalid. This prevents old payment requests from being reused and adds security against replay attacks.
nonce: A unique identifier (UUID) that ensures each payment requirement can only be fulfilled once, even if the same client makes multiple requests to the same endpoint.

Client-side implementation

Automatic payment handling

Client libraries wrap standard HTTP clients to automatically handle 402 responses. When a client receives a payment requirement, (1) it constructs a payment payload, (2) signs it with the user's private key, and (3) retries the original request with the payment attached.

public HttpResponse makeRequest(String url, String method) throws Exception {
    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create(url))
        .method(method, HttpRequest.BodyPublishers.noBody())
        .build();

    HttpResponse response = httpClient.send(request, 
        HttpResponse.BodyHandlers.ofString());

    // Handle 402 Payment Required
    if (response.statusCode() == 402) {
        PaymentRequirements requirements = Json.MAPPER.readValue(
            response.body(), PaymentRequirements.class);

        // Create payment payload matching the requirements
        PaymentPayload payment = PaymentPayload.builder()
            .receiver(requirements.getPaymentRequirement().get(0).getReceiver())
            .amount(requirements.getPaymentRequirement().get(0).getAmount())
            .asset(requirements.getPaymentRequirement().get(0).getAsset())
            .nonce(requirements.getPaymentRequirement().get(0).getNonce())
            .expiry(requirements.getPaymentRequirement().get(0).getExpiry())
            .build();

        // Sign using EIP-712 structured data signing
        String signature = signer.sign(payment.toSigningMap());

        // Retry with payment header
        String paymentHeader = encodePaymentHeader(payment, signature);
        HttpRequest paidRequest = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .method(method, HttpRequest.BodyPublishers.noBody())
            .header("X-PAYMENT", paymentHeader)
            .build();

        return httpClient.send(paidRequest, HttpResponse.BodyHandlers.ofString());
    }

    return response;
}

The signing process uses the EIP-712 standard, which creates a structured, human-readable representation of the payment data before hashing and signing it. This ensures the payment is cryptographically secure and tied to the specific request.

Payment verification flow

Facilitator integration

The facilitator service is where the blockchain complexity lives, abstracting it away from both clients and servers. When a server receives a payment, it forwards the payment payload to the facilitator for verification.

public class HttpFacilitatorClient implements FacilitatorClient {
    private final HttpClient http;
    private final String baseUrl;

    @Override
    public VerificationResponse verify(String paymentHeader, PaymentRequirements requirements) 
        throws Exception {

        // Construct verification request with payment and requirements
        VerifyRequest body = VerifyRequest.builder()
            .paymentHeader(paymentHeader)     // The X-PAYMENT header from client
            .requirements(requirements)       // What the server expects
            .build();

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(baseUrl + "/verify"))
            .POST(HttpRequest.BodyPublishers.ofString(Json.MAPPER.writeValueAsString(body)))
            .header("Content-Type", "application/json")
            .build();

        String json = http.send(request, HttpResponse.BodyHandlers.ofString()).body();
        return Json.MAPPER.readValue(json, VerificationResponse.class);
    }

    @Override
    public SettlementResponse settle(String paymentHeader, PaymentRequirements requirements) 
        throws Exception {

        // Settlement happens after successful verification
        SettleRequest body = SettleRequest.builder()
            .paymentHeader(paymentHeader)
            .requirements(requirements)
            .build();

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(baseUrl + "/settle"))
            .POST(HttpRequest.BodyPublishers.ofString(Json.MAPPER.writeValueAsString(body)))
            .header("Content-Type", "application/json")
            .build();

        String json = http.send(request, HttpResponse.BodyHandlers.ofString()).body();
        return Json.MAPPER.readValue(json, SettlementResponse.class);
    }
}

The facilitator checks several things:

Is the signature valid?
Does the payment amount match the requirements?
Has this payment been used before?
Is the payment still within its expiration window?

If verification passes, the facilitator also handles settlement by broadcasting the transaction to the blockchain of choice. The FacilitatorClient interface defines the contract that any facilitator client must implement:

public interface FacilitatorClient {
    VerificationResponse verify(String paymentHeader, PaymentRequirements requirements);
    SettlementResponse settle(String paymentHeader, PaymentRequirements requirements);
}

Your application needs to provide a concrete implementation of this interface.

Integration example

Now that we've seen the individual components (payment filters, facilitator integration, and client handling) let's look at how these pieces come together in a real application. Here's a minimal Spring Boot example that demonstrates the complete flow.

First, create a @PaymentRequired annotation:

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface PaymentRequired {
    String price();
    String currency() default "USDC";
    String network() default "base-sepolia";
}

Then modify the PaymentFilter to scan for these annotations at startup:

@Component
public class PaymentFilter implements Filter {
    private final Map priceTable;
    private final String payTo;
    private final FacilitatorClient facilitator;

    public PaymentFilter(ApplicationContext context, String payTo, FacilitatorClient facilitator) {
        this.payTo = payTo;
        this.facilitator = facilitator;
        this.priceTable = buildPriceTableFromAnnotations(context);
    }

    private Map buildPriceTableFromAnnotations(ApplicationContext context) {
        Map prices = new HashMap<>();

        // Scan all @RestController beans for @PaymentRequired annotations
        Map controllers = context.getBeansWithAnnotation(RestController.class);

        for (Object controller : controllers.values()) {
            Method[] methods = controller.getClass().getMethods();
            for (Method method : methods) {
                PaymentRequired payment = method.getAnnotation(PaymentRequired.class);
                if (payment != null) {
                    String path = extractPathFromMapping(method);
                    BigInteger amount = new BigInteger(payment.price().replace(".", ""));
                    prices.put(path, amount);
                }
            }
        }
        return prices;
    }
}

Now you can annotate your controller methods directly:

@RestController
public class WeatherController {

    @GetMapping("/weather")
    @PaymentRequired(price = "0.001", currency = "USDC", network = "base-sepolia")
    public WeatherData getWeather(@RequestParam String city) {
        // Your existing business logic
        return weatherService.getWeatherForCity(city);
    }

    @GetMapping("/premium-forecast")
    @PaymentRequired(price = "0.01", currency = "USDC", network = "base-sepolia")
    public ExtendedForecast getPremiumForecast(@RequestParam String city) {
        return weatherService.getExtendedForecast(city);
    }
}

@Configuration
public class PaymentConfig {

    @Bean
    public PaymentFilter paymentFilter(ApplicationContext context) {
        return new PaymentFilter(
            context,
            "", // Your wallet address
            new HttpFacilitatorClient("")
        );
    }

    @Bean
    public FilterRegistrationBean paymentFilterRegistration(PaymentFilter filter) {
        FilterRegistrationBean registration = new FilterRegistrationBean<>();
        registration.setFilter(filter);
        registration.addUrlPatterns("/*");
        registration.setOrder(1);
        return registration;
    }
}

The @PaymentRequired annotation handles pricing configuration declaratively, while the PaymentFilter automatically discovers these annotations at startup and builds the price table. Your existing business logic in the controller methods remains completely unchanged. The configuration wires everything together by registering the payment filter and connecting it to a facilitator service. Once deployed, requests to /weather cost 0.001 USDC and /premium-forecast costs 0.01 USDC, with all payment handling happening transparently at the HTTP layer.

Security and production considerations

x402 is simple and elegant, it hides complexity. This is a pro and a con. It makes integration easy, but it also hides an important aspect: putting AI agents in charge of money creates new attack vectors.

While EIP-712 signatures and nonce management handle replay attacks, what happens when an agent gets compromised? Traditional fraud detection relies on human behavioral patterns, but AI agents don't follow human spending habits. A compromised agent could drain funds faster than any human fraudster.

The facilitator component becomes another high-value target since it's verifying signatures and managing nonces. Unlike traditional payment processors that can reverse transactions, blockchain settlements are final.

Since x402 is based on on-chain transactions, it inherits the operational risks of blockchain transactions. Gas fees fluctuate wildly, sometimes making micropayments economically inconvenient. Network congestion can delay transactions. What's an AI agent supposed to do when it needs real-time data but the payment is stuck in a mempool?

Another important aspect is regulation. Compliance varies across jurisdictions with different rules about automated payments, cryptocurrency usage, and data retention. An AI agent making a large volume of micro-transactions across borders might trigger AML alerts or violate local regulations without anyone realizing it.

What's next

What's interesting about x402 is the timing. AI agents need autonomous payment capabilities, stablecoins provide a programmable money layer, and blockchain infrastructure has matured enough to handle scalable applications. These pieces haven't aligned before. Internet now needs a new form of payments, and traditional flows weren't built for autonomous agents or micropayments.

x402 is compelling thanks to its pragmatic approach. Instead of reinventing payments from scratch, it extends existing HTTP infrastructure. Instead of requiring new integrations, it works with standard patterns that we already understand.

The security and operational challenges are real, but they're engineering problems with possible solutions.

After nearly three decades, HTTP 402 might finally do what it was designed for: make paying for things on the internet as simple as requesting them.

The foundation is set. Now it's time to build.

Thanks Erik Reppel, Ronnie Caspers, Kevin Leffew, Danny Organ, and all the team at Coinbase for open sourcing this protocol.

Thanks Erik Reppel and Yuga Cohler for reviewing my contributions to x402.

Resources and Further Reading

HTTP 402 Payment Required - RFC 2068 - Original 1997 HTTP specification
x402 Protocol Specification - Official protocol documentation
x402 GitHub Repository - Coinbase's open source implementation
x402 Java implementation - The PR that introduced the Java implementation of the protocol
EIP-712: Ethereum Typed Structured Data Hashing and Signing - Signing standard used in x402
Base Network Documentation - Layer 2 blockchain platform used in examples
USDC Documentation - USD Coin stablecoin contract details
Spring Boot Filter Documentation - Java middleware implementation
Java HTTP Client API - Client-side HTTP handling
Machine-to-Machine (M2M) Communication Standards - Autonomous system communication
Autonomous AI Agent Architectures - Research on AI agent design patterns
Google’s Approach to Secure AI Agents - Google propositions of a secure, human-guided AI agent framework

AI-Driven Refactoring in Large Scale Migrations: Strategies and Techniques.

Stefano Amorelli — Tue, 03 Jun 2025 21:00:00 GMT

A mountain of legacy code

Seven years ago, when Qonto was a new fintech with a handful of engineers and a single mission-critical web app, we chose Ember.js as our framework. In April 2016, the very first commit of what would become app.qonto.com landed in the repo, and Ember carried us through our explosive growth: from “just launched” to hundreds of thousands of customers, 30 deployments a day, and a rock-solid 93% test coverage.

Ember carried us through years of rapid product development and countless feature launches, but the innovation in software engineering never stops. Our product ambitions expanded, and so did the frameworks and technical stacks around us. To stay at the forefront of front-end development, in late 2023, we set a new heading, as detailed in our article “Setting sail from Ember: why we are charting a course toward React at Qonto”: “Let’s chart a course toward React.”

The decision felt energising, but it came with a price: migrating ~1 million lines of code from Ember to React.

We estimated a software engineer to migrate roughly 50 lines of code a day without sacrificing quality. At that pace, with enough engineers dedicated to this task, rewriting the whole app would stretch beyond two years of full-time effort (while we kept shipping new features on top of the old stack). And let’s be honest, migrating and refactoring code from one framework to the other is not exactly anyone’s idea of fun.

We needed a better route. One where AI and codemods automate most of the job, and engineers focus on the tricky edges.

“In the age of AI, how can we speed up migrating a mountain of legacy code?”

This post is about that journey: how we are approaching this project, what worked, what didn’t, and how AI transformed what seemed like an overwhelming migration into a clear and attainable task.

“How can we leverage AI to do this for us?”

In late January of this year, 2025, we launched an internal kaizen (a focused, continuous improvement initiative). By that time, Claude 3.5 Sonnet by Anthropic had already demonstrated significant maturity as a model, especially given its strong performance on SWE-Bench, a benchmark evaluating foundation models’ capabilities in realistic coding tasks. Recognizing its potential, we asked ourselves: how could we leverage this advanced LLM to automate substantial portions of our migration?

We set ambitious goals for ourselves: if an individual engineer managed to migrate ~50 LoC (lines of code) per day manually, maybe AI assistance could double that to 100 LoC/day; in this kaizen initiative, we aimed at doubling it again to 200 LoC/day.

This digit felt bold at the time.

Little did we know we would blow past this target by orders of magnitude!

First experiment: a quick win with RAG

Every journey starts small. Our first experiment was to build a quick prototype, a web-based AI assistant that could help us convert Ember code to React on a case-by-case basis. We started with a Retrieval-Augmented Generation (RAG) approach using an AI agent fed with our internal data sources, including our monorepo on GitHub and our knowledge base.

The result of this first iteration was a web-based chatbot that we could feed Ember code to and ask for the equivalent React code. We augmented it with a refined system prompt, coding guidelines, and snippet examples for context.

To our delight, this chatbot showed promising results! With proper prompting and some back and forth, it could output acceptable React components from Ember inputs. This was a big morale boost, the concept was sound.

However, this was just the beginning.

The bot worked well in a sandbox, but it wasn’t yet integrated into our development workflow. It still required engineers to context switch between the web interface of the chatbot and their editor, and manually prompt each time they wanted to convert code from Ember to React.

We wanted to go further: to have AI directly working on our actual codebase, ideally making changes that were automatically committed with little supervision.

The question now became: how do we turn this into a more automated, practical engineering tool?

Hackathon to production: building an AI-powered CLI

A few weeks in, at the beginning of February 2025, armed with confidence from the prototype, we organised a one-day internal hackathon to level up the idea.

Our small team (engineers from various squads who were passionate and curious about AI) got together with a mission: to enable an LLM to do the migration for us.

In a single day, we brainstormed, coded, and demoed an MVP CLI tool that automatically refactored Ember components to React. It was a lot of trial and error, analysing already-migrated code to spot patterns, crafting prompts based on Ember-React diffs and our internal guidelines, and evaluating different tools and LLMs.

By the end of the hack day, we had a rough but working CLI agent. This script could take an Ember component file and generate a React version, end-to-end.

At the core of the script, we picked aider an open-source command-line tool designed to automatically apply code changes generated by LLMs based on a prompt. It offers built-in support for multiple models and easy to plug it in using AWS Bedrock as a provider. We chose it because being a CLI we could quickly script it, customise prompts, iterate, and experiment across different models and inputs, all within our existing development workflow.

Here’s an overview of how it works:

General overview of our CLI agent using aider and Anthropic’s Claude via AWS Bedrock

In simple terms, the CLI automates the whole refactoring process:

Select code & branch creation: We choose a target component (via a fuzzy finder) and the tool spins up a new git branch for that migration. This keeps changes isolated and reviewable.
AI Pass 1, Ember to React: aider calls our LLM (claude-3–5-sonnet, Anthropic’s latest model at that time, via AWS Bedrock) with a carefully crafted prompt. The prompt includes the Ember component code as input and asks for the equivalent React code, while following our coding guidelines and style conventions. aider streams the AI’s suggested changes and applies them to the code-base as an initial diff.
codemod adjustments: Next, we run a custom codemod (react-bridge-migrator) on the output. This codemod handles mechanical tasks and fine-grained fixes. Think of things like adjusting imports, hooking the component into our React app framework, and other repetitive changes that are easier to script deterministically.
AI Pass 2, review and polish: With the codemod adjustments in place, we then invoke aider + claude again to review the combined diff (original Ember vs. new React) and refine the result. In this second pass, the engineer can pair program interactively in natural language with the LLM to fix any inconsistencies, apply final touches, or handle pieces the codemod didn’t cover.
Automatic commit: If all goes well, the CLI then automatically creates a git commit with the changes (Ember component replaced by the new React component). The developer’s role shifts from writing boilerplate to reviewing the automatically migrated output, a much more efficient use of time!

Throughout this process, the LLM is doing the heavy lifting of code generation.

We experimented with different models and at that time found that claude hit a sweet spot of quality and context length for our needs, and aider made it easy to feed multiple files into the prompt (e.g., "already migrated" examples and style guides) and to iteratively refine the output.

In effect, we built an AI pair programmer, embedded in our CLI, that knew our guidelines, our code-base, and could directly migrate Ember code to React with little supervision.

Results: high-velocity

Velocity of migration (lines of code converted per week). After introducing the AI-driven CLI. Our actual migration velocity (blue line) surged far above the original projected pace (red line).

With the initial version of the agent complete, we began a two-week evaluation period starting February 10th, 2025. During this time, we closely monitored the lines of code migrated with the agent: the results exceeded all expectations. Our migration throughput skyrocketed. We went from ~50 LoC/day per engineer to hundreds of lines per day, sometimes even breaking the 1,000 LoC/day per engineer. The weekly velocity chart above tells the story: once the AI agent came online, the blue line (actual migrated code per week) shoots up like a rocket, leaving the modest linear projection (red line) in the dust.

In concrete terms, at peak performance, the tool delivered about 20 times the output of manual coding.

That’s roughly a 2,000% gain in productivity over our original estimation!

Even we had trouble believing these numbers at first! What used to take days now takes hours or minutes. In the span of two weeks, one dedicated engineer, with marginal assistance from a teammate, successfully migrated 8,632 lines of code using the agent.

Equally important: quality and consistency remained high. Since the AI was adopting our own code patterns (as we fed it examples of already-migrated components and guidelines), the React code it produced was in line with our expectations: while not every AI suggestion was perfect the first time, a human-in-the-loop review filled the gap.

Notably, this performance level was already attained using claude-3–5-sonnet, and as new models were released, we observed a significant improvement when upgrading to claude-3–7-sonnet when released on February 24th, 2025, and most recently to the latest claude-4-sonnet released on May 22nd of 2025.

Kaizen in action: small team, big impact

It’s worth highlighting how crucial the kaizen approach was to this effort. We treated this project as a series of small, iterative improvements rather than a big top-down initiative. A tiny task force (just a small team of motivated engineers) took the idea and ran with it in a hackathon-style blitz. This gave us speed and creative freedom. We weren’t afraid to try something experimental.

After the initial hackathon success, we continued in iterative cycles: brainstorm, prototype, test, repeat. Each iteration taught us something that fed into the next. This lean approach is classic kaizen: making continuous small improvements that compound into significant gains. In our case, this mindset helped us to quickly navigate uncertainties in prompt engineering and tool integration. Instead of getting stuck in analysis paralysis, we built a proof of concept, evaluated impact, and then doubled down on what worked; as a result, we moved fast.

What can you learn from our experiment?

Using AI to refactor such a large codebase in production taught us valuable lessons:

The importance of an extensive test suite: At Qonto, we are proud of a comprehensive test suite that, between integration, unit, and acceptance tests, covers 93% of our code. In a very large refactoring, having a solid testing foundation allows you to move fast and with much more confidence, knowing that regressions would be caught by the tests.
Prompt engineering is the key: the quality of the AI’s output is highly dependent on the prompt. Being explicit in the instructions results in much higher quality output. For example, our prompt was not a simple “convert this Ember component to React” but contained a detailed structure containing our coding guidelines (function component style, hooks usage, etc.) and hand-picked examples of Ember-to-React transformations, with dos and don’ts and best practices.
Human-in-the-loop: As much as the AI helps us move faster, it does not replace the need for experienced human judgment. Having a software engineer in the loop to validate and run the application ensures quality. The AI got us 90% of the way there, and we, engineers, handled the rest.
AI evolves fast: During our experimentation period of only a few months, we witnessed significant developments, including major LLM releases such as claude-3–7-sonnet and claude-4-sonnet, along with the introduction of new powerful tools like Claude Code. Now more than ever it’s fundamental to maintain short feedback loops, continuously update our knowledge, and remain open to emerging technologies.
Tooling: aider was instrumental as an open-source CLI because it allowed flexibility during our initiative.

Challenges and next steps

“So, what’s next for the AI migration agent?”

Short answer: we’re evolving it.

While the CLI agent delivered massive wins during our kaizen, this is not the final form. We quickly learned that, as powerful as it was, a CLI agent in the shape of a CLI script didn’t quite fit the daily habits of most front-end engineers. We had to ask ourselves: what’s the most natural environment for developers to use this kind of tooling?

That led us to a new direction, a VS Code extension that is integrated within developers’ IDE.

What didn’t work

The CLI agent was a bashscript, and while that kept things simple for a quick prototype, it started to show limits. There was little room for scalability, debugging prompts were clunky, and it wasn’t tightly integrated with the editor.
Front-end engineers preferred staying inside their IDE, not context-switching between a terminal and their editor. An integrated extension makes the experience more fluid and ergonomic.
The prompt architecture needed rethinking. Initially, we put everything (guidelines, examples, diffs) into one huge markdown prompt. This made it hard to maintain and even harder to debug.
We’ve since learned that splitting the prompt into modular files makes it easier to reason about and easier for teammates to refine and contribute. It also helps the LLM process the input better: the separation creates a clearer semantic structure and reduces noise.

So the next step in this journey? We’re keeping up to date with the latest trends of agentic engineering, and we’re actively developing new tools to help our engineers. For example, we’re developing a dev extension that brings the power of our AI agent closer to where engineers work. We’re keeping the heart of what made the CLI great, automation, prompts, and AI pair programming, but wrapping it in a smoother, more accessible interface.

Conclusion: adopting AI with a continuous improvement mindset

What started as an experiment ended up fundamentally accelerating a critical migration for us. We leveraged AI and, with a true kaizen mindset, we turned a huge project into an opportunity to innovate and learn.

AI is extremely powerful, and the question is, how best can you leverage it? In our case, we applied it to augment our engineers, automating extremely time-intensive tasks at a scale and speed that still impresses us.

This journey also reminded us of the value of exploring new approaches and pushing the boundaries of our engineering workflows.

It’s easy to stick to tried-and-true methods, but a small, passionate team with a bold idea can deliver outsized results with new and innovative approaches!