Data sovereignty instead of compromises: contexxt.ai builds Europe’s answer to ChatGPT

How did the idea for contexxt.ai and Sesame come about? What need or problem particularly motivated you to found the startup?

It was around the time when OpenAI was slowly becoming public. It was clear to us that a disruptive technology was emerging, but companies could only use it by accepting significant compromises in sovereignty and security. Apart from that, the target group was generally the individual, not the company. Meaning: corporate knowledge could not and still cannot be integrated into the models in a safe and usable way.
So: let us take open source models, let us use open source components and build a purely European SaaS solution for small and medium sized businesses that cannot or do not want to invest heavily in building competencies in GenAI, RAG or cybersecurity (more than 90 percent of all companies).

Which technical core components or models set contexxt.ai apart from existing AI or security solutions?

We work exclusively with open source models that we control ourselves, and we remain agnostic in that respect. We use the latest open source frameworks (including Meteor and Milvus), and on that basis we have developed complex pipelines that enable highly efficient and secure use of corporate knowledge in GenAI technologies. Especially when handling large documents that tend to exceed existing context windows, we have a significant advantage due to our preprocessing, semantic separation and metadata generation. Data security has always been a requirement at every development stage, so it is built into the solution in multiple layers.

Which mechanisms or principles do you use to ensure the security, robustness and trustworthiness of your systems, especially against attacks like indirect prompt injections or other manipulations?

First of all, we adhere to established industry standards and use a European cloud provider that has a great deal of experience in dealing with security critical infrastructures (OVH). Another key element is that we built the central RAG architecture in a way that essentially corresponds to a nested multi tenant separation. You can prompt whatever you want, there is absolutely no access to any data for which you are not explicitly authorized. This sometimes irritates customers at first, because it is restrictive, but that is our core. Security first.

To what extent does your solution follow regulatory frameworks like the EU AI Act, data protection laws or other security guidelines? How do you integrate compliance into your architecture?

Our platform is built fairly horizontally and also includes a strong API in the middle, which allows customers to do whatever they want. So if someone uses our technology to do things that contradict the AI Act in substance, we cannot prevent that systemically. It is also the responsibility of the company using it, and we discuss and advise on use cases when needed. In practice this has not happened yet, as we are still a quite young startup and still know all our customers personally.
Data protection laws and especially related industry standards are mandatory for us, even for our own protection. If our solution were to leak data, we could shut down the company. Compliance is an important topic for us. We have a strict four eyes principle for critical components and we have also developed an AI supported CI/CD pipeline that analyzes every piece of code for security relevant vulnerabilities and prevents deployment until everything is clean. Sometimes that feels like using a cannon to shoot a sparrow, but it prevents development artifacts from ending up in the productive system.

What role do collaborations with research institutes (such as CISPA and other partners) play in your development?

We aim to operate as close as possible to the latest state of research. Regular exchange or audits are extremely valuable to us. Of course, the institutes also have extensive networks of investors and companies with whom we would otherwise hardly come into contact without an introduction. Beyond that, close and supportive relationships have formed among the startups that have emerged from these institutes, which I greatly appreciate. We all face similar challenges, even if we work on completely different topics.

Which industries or use cases are particularly suitable for your solution? Can you name one or two concrete examples?

Industries are not really important to us. Generative AI helps everywhere. Take all the use cases you cover with ChatGPT and add the ones for which you need knowledge or information from your own company.
One nice use case might be storing constantly changing leasing conditions in our solution so that salespeople can give targeted information at any time while they are in a sales conversation. Another, perhaps amusing use case: a car mechanic is standing under a car, the oil pan is filling up, he asks our solution via voice input how much oil the car has and knows whether the pan will overflow or not. Another use case, which primarily uses our API, is automated writing of requirements based on stored regulations.
We are currently working on integrating a deep research framework. The first planned use case is automatic strengths and weaknesses analysis of business plans or pitch decks along with an integrated competitor analysis. Another case is monitoring various data sources to assess impacts on a company’s core business. As you can see, the use cases are limitless.

What have been or are the biggest technical, organizational or market related hurdles you have faced?

Our infrastructure is extremely complex. Many different components interact and must be orchestrated. The high security requirements often lead to additional loops that cost a lot of time. The biggest hurdle, however, is fundraising. German investors seem extremely risk averse at the moment. In our field, competition is perceived as strong, particularly due to the major hyperscalers.
Beyond that, investors often measure the quality of our innovation solely by how much we have already sold, meaning how high MRR or ARR are, but because of the restrictions of the pre sales research funding from the BMFTR (StartUpSecure), we are not even allowed to be relevant there yet. A real chicken and egg problem.

How does contexxt.ai address ethical issues in the AI context, such as bias, transparency or explainability?

On our platform we use the base models, so we encapsulate every user interaction in a system prompt that we have optimized through various evaluations. We continuously develop these evaluation methods, especially since we will replace the underlying models over time. Transparency and explainability are essentially at the heart of our solutions, which is why we always provide references showing where the model obtained its information. Bias and hallucinations will of course always remain an issue, which we actively try to minimize through various methods.

How important is European sovereignty in the areas of cybersecurity and artificial intelligence?

This is essentially the basis of our existence. The requirement for every feature is that it must be independent of non European suppliers and that, if external services are used (for example web search), they must remain interchangeable. A slightly nerdy idea: if the zombie apocalypse comes, I want our solution to remain useful for as long as possible. That thought helps a lot in making certain decisions.

How do you see your role in the European AI ecosystem and what developments (technological, regulatory or societal) do you expect in the next three to five years that contexxt.ai wants to help shape or influence?

I believe that as a startup we are already very clear about European sovereignty. I often see many compromises among competitors. We are also reaching strongly toward France because we have good access there and also find a very open market.
In the coming years, European sovereignty will gain tremendous relevance. Today it is already essentially required in German and French public tenders. Geopolitical developments are already moving strongly in this direction for various reasons, and I consider the maturation of the European Union and its associated independence to be essential.
Technically, I see the next major topics, in which we intend to be very active, in autonomous AI agents and the resulting agent to agent communication, as well as connecting all kinds of services to generative AI via MCP (Model Context Protocol). By the way, who said back in 2018 that AI is always a question of context? Take a guess.

More information about contexxt.ai: https://contexxt.ai