Interview with Xiao Yumin, CTO of TorchV AI: Harnessing unstructured data for business advantage

  • Xiao Yumin, CTO of TorchV AI, is an expert in technical development, with a focus on RAG, vector search, and unstructured data parsing.
  • Xiao discusses the company’s focus on providing B2B solutions, leveraging unstructured data, and the unique challenges and opportunities in the evolving landscape of AI-driven technologies.

Recently, we had the opportunity to sit down with Xiao Yumin, the CTO of TorchV AI. TorchV AI is a leading innovator in the Platform-as-a-Service (PaaS) writing assistance space. It has been making waves since its inception in 2023 with its cutting-edge platform that supports marketing content creation and official document drafting.

Introduction of Xiao Yumin

Xiao Yumin serves as the CTO at TorchV AI. Xiao has been involved in technical development using Java and Python, with extensive expertise in technical architectures, microservices, open-source frameworks, and a particular focus on RAG (retrieval-augmented generation), vector search, and unstructured data parsing. Currently, he oversees the product and research activities at TorchV AI, concentrating on large models, RAG, and vector search. Additionally, Xiao is the author of the Open Source China GVP project, Knife4j.

Also read: Interview with Feng Ruohang, author of Pigsty: Simplifying PostgreSQL management and advancing the Chinese open-source community

Q: I understand that your company’s product primarily targets B2B customers. Compared with B2C products such as Baidu’s ‘Wenyan Yixin’ and Alibaba’s ‘Tongyi Qianwen’, which are generative AI focusing on document retrieval what motivated your decision to concentrate on serving B2B clientele?

“Within corporate environments, unstructured data holds significant value. It’s like fuelling a vehicle; data can energise a company, continuously unleashing its value.”

Xiao Yumin, CTO of TorchV AI

Initially, our objective was to develop a Software-as-a-Service (SaaS) solution, and we currently offer two versions. One is an online SaaS service, which has been in operation since the advent of RAG and large models. As far back as 2019, we were engaged in the development of intelligent customer service products, albeit with a somewhat outdated technology stack. Upon the emergence of large models, we fundamentally transformed our technology stack. Previously, we operated a knowledge base that required substantial human resources to maintain the information. For example, if a user inquired about the weather in Shanghai, our approach would involve maintaining specific responses, either by utilising weather APIs, or leveraging other text-based knowledge, which was quite demanding for our knowledge base staff.

Seizing the opportunity presented by large models and building on our prior experience, we decided to launch our business with the knowledge base as a cornerstone. Furthermore, as you have mentioned, major corporations such as Baidu and Alibaba are also active in this domain. However, smaller companies have their own distinct advantages. Firstly, many small and medium-sized enterprises (SMEs) may not have fully embraced digital transformation. With the advent of artificial intelligence, the knowledge base we have developed enables us to build upon earlier digitalisation efforts, making AI a strongly relevant product. Additionally, in our practical work scenarios, approximately 80% of the time is spent dealing with unstructured data.

Moreover, we firmly believe that, within corporate environments, unstructured data holds significant value. How do we unlock the full potential of this data? It is akin to fuelling a vehicle; data can energise a company, continuously unleashing its value. When interacting with numerous clients, they often share similar concerns. Much of this type of data, including documents, is typically stored on individual employees’ computers. They desire a centralised platform for data, akin to a data hub. However, when we previously discussed data hubs and big data, the focus was largely on large companies establishing big data centres, which did not fully exploit the value of data hubs, as the emphasis was primarily on structured data.

Large corporations have consumer-oriented products, such as WeChat and DingTalk, which are deeply integrated into office environments and have substantial data accumulations, enabling data analysis and exploration. However, in small and medium-sized enterprises, these types of products are not tailored to their needs. They possess a variety of documents, including financial records, employee information, contracts, and other pertinent documents. Therefore, the challenge lies in effectively utilising these data in the age of AI. Our current focus lies in effectively leveraging these tools through collaborative methods within the company to streamline the entire workflow.

Q: When designing your company’s products, do you tailor different categories of bespoke products to address the specific issues of different clients?

In response to your query about whether we customise products specifically for our clients, we do not engage in extensive customisation. Instead, we are building upon a foundational knowledge base, which acts as the cornerstone of our data ecosystem. Once this cornerstone is established, we develop a variety of applications atop it, such as an application centre designed to cater to the needs of businesses. For instance, today we might need to create an application for contract review to enhance the efficiency of a company’s legal department. We have an application for contracts in place, and tomorrow it might be for writing, specifically focusing on scenarios like annual report writing. We tailor the application to the specific circumstances of our clients to ensure that it genuinely assists them in creating useful AI scenarios within their enterprise. We strive to solidify each application one by one. In a business context, deploying AI is markedly different from generating a fun picture, video, or piece of music. The demands on AI in a professional setting are considerably higher.

Also read: Interview with Du Junping, founder and CEO of Datastrato: Driving innovation in data and AI

Q: Could you discuss any technical challenges you have faced during the development of your products and solutions?

There are indeed several challenging issues. As technologists often say, this particular problem seems to have no bottom. Currently, in the industry, handling documents, particularly PDFs, is perhaps the most difficult and problematic aspect. At present, no provider, not even the most advanced systems like ChatGPT4, can guarantee the complete and accurate extraction of information from PDF documents. As we observe, it is an ongoing iterative process, given that everyone in the AI field is increasingly focused on addressing this issue. The development of technology is certainly trending upwards, including many open-source projects and various AI models that are all advancing in this area.

Q: Are there any further insights you would like to share with us regarding your perspective on unstructured data?

“The ability to parse, analyse, and understand unstructured data is not just a technological challenge but also a strategic imperative for businesses looking to gain a competitive edge.”

Xiao Yumin, CTO of TorchV AI

Unstructured data represents a vast and untapped resource, one that holds immense potential for organisations. Given the complexity and volume of unstructured data, harnessing its value requires innovative approaches and technologies. The advent of large language models and advancements in AI have enabled us to unlock insights from this data in ways that were previously unimaginable. From my perspective, the future lies in our capacity to turn this data into actionable intelligence, and we are actively working towards that goal.

A personal insight

Xiao Yumin emerges as a visionary and pragmatic leader in the realm of artificial intelligence and software development. His deep technical expertise in areas such as RAG, vector search, and unstructured data parsing, coupled with his hands-on experience in developing open-source projects like Knife4j, positions him as a credible authority in his field. Xiao’s commitment to leveraging AI technologies to solve real-world problems, particularly in the B2B sector, reflects his understanding of the market and the challenges faced by small and medium-sized enterprises.

Xiao’s approach to product development is methodical and focused on creating scalable solutions that can be adapted to meet the diverse needs of businesses. His emphasis on building a robust foundational knowledge base as a cornerstone for various applications showcases his strategic thinking and long-term vision. By prioritising the development of applications that can be tailored to specific client requirements, Xiao demonstrates a keen awareness of the importance of flexibility and adaptability in a rapidly evolving technological landscape.

Furthermore, Xiao’s insights into the challenges of handling unstructured data, particularly the complexities associated with extracting meaningful information from documents like PDFs, reveal his commitment to continuous improvement and innovation. His recognition of the strategic value of unstructured data and the potential it holds for businesses underscores his forward-thinking mindset and his dedication to unlocking new avenues for growth and competitive advantage.

Overall, Xiao Yumin is a thoughtful and driven individual who combines technical prowess with a clear understanding of the business landscape. His leadership at TorchV AI is marked by a focus on developing practical AI solutions that can genuinely transform how businesses operate and thrive in the digital age.

Vicky-Wu

Vicky Wu

Vicky is an intern reporter at Blue Tech Wave specialising in AI and Blockchain. She graduated from Dalian University of Foreign Languages. Send tips to v.wu@btw.media.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *