- Datastrato, led by Du Junping, is based in the U.S. and specialises in data infrastructure for AI.
- The company focuses on improving data management to support advanced AI technologies.
- Datastrato is building a data centre designed to handle both structured and unstructured data for AI applications.
Du Junping, founder & CEO of Datastrato, director of LF AI & DATA, and ASF member, has been deeply involved in the AI and Data open-source fields for over a decade. He has served as the general manager of Open Source Business for a Fortune 500 company, head of Data Business and chief architect, and as an expert in big data technology and the open-source field. He has been the chair of the TOC (Technical Oversight Committee) at the OpenAtom Open Source Foundation, a member of the Apache Open Source Foundation, and a committer and PMC for projects such as Apache Hadoop and Submarine. He has also served as a mentor for projects like Apache YuniKorn and TubeMQ. He has held positions such as chairman of Tencent’s Open Source alliance and director of Big Data platform R&D at Hortonworks, leading the Hadoop YARN team.
The role of open-source in AI and data technologies
“How to manage the unstructured data for better usage for larger models is definitely a top challenge today in the AI domain.”
Du Junping, founder & CEO of Datastrato
In a recent interview with Du Junping, Founder and CEO of Datastrato, he highlighted the pivotal role of open-source technologies in advancing AI and data applications. Du Junping emphasised, “I definitely trust the open-source community to the scaling law for engineering resources and technology values.” This trust is rooted in the belief that open-source frameworks can significantly accelerate innovation and collaboration across the tech industry.
Du Junping also discussed how open-source technologies are crucial for managing unstructured data. “How to manage the unstructured data for better usage for larger models is definitely a top challenge today in the AI domain.” This perspective underscores the necessity of developing robust open-source tools to handle the growing complexity of data in AI applications.
Furthermore, Du Junping pointed out the transformative impact of generative AI, noting, “We see the more magic between the data and AI, they combine more tightly.” This synergy between data and AI is driving advancements in model capabilities, making open-source contributions even more valuable.
Also read: OpenAI backs California bill for AI content labels
Trends shaping the future of AI and data technologies
“Open-source is the only way for developers to gather together and innovate.”
Du Junping, founder & CEO of Datastrato
Du Junping outlined several key trends shaping the future of AI and data technologies. He observed, “Recently several years, we see generative AI create a lot of miracles.” This observation reflects the rapid progress in AI, particularly in generative models, which are pushing the boundaries of what is possible with data.
He further elaborated on the challenges faced by data technology, stating, “We expect to see a big change to adapt to these kinds of challenges.” As AI technologies advance, the ability to manage and utilise data effectively becomes increasingly critical. The emergence of generative AI models is amplifying the need for more sophisticated data handling techniques.
Additionally, Du Junping discussed the need for open-source innovation to keep pace with AI advancements. “Open-source is the only way for developers to gather together and innovate,” he said. This approach fosters a collaborative environment where diverse ideas and expertise contribute to the development of cutting-edge technologies.
Involvement in LF AI & Data Foundation
“Our goal is to make it easier for people to get involved in open-source projects, regardless of their level of experience.”
Du Junping, founder & CEO of Datastrato
Du Junping’s involvement with the LF AI & Data Foundation reflects his commitment to advancing open-source initiatives. He noted, “I’ve been LF AI & DATA for a very long time,” highlighting his long-term engagement with the foundation. His role as board chair has involved promoting projects and fostering collaboration within the open-source community.
He described his efforts to enhance the foundation’s impact, stating, “I joined many discussions on how to incubate the project from sandbox to graduate.” This process ensures that open-source projects mature and become more accessible to external contributors, thereby driving innovation in the AI and data sectors. Du Junping also mentioned his experience with promoting open-source projects, noting, “We have some project donated LF AI & DATA and we promoted.” This experience underscores his dedication to expanding the reach and influence of open-source technologies.
Also read: GitHub CEO advocates for competition and open source in AI
Challenges in open-source business models
“The future of data technology lies in how effectively we can manage unstructured data.”
Du Junping, founder & CEO of Datastrato
Reflecting on his experience as a general manager of open-source business in a Fortune 500 company, Du Junping shared insights into the challenges of valuing open-source initiatives. He stated, “The first challenge will be how to value on the open source,” emphasising the need to balance commercial interests with the unique value propositions of open-source projects.
He explained the importance of building a sustainable business model, saying, “How to build an open-source commercial business model is important.” This involves aligning the company’s business strategy with the broader open-source ecosystem, ensuring that the technology can effectively collaborate with the global community.
Du Junping also highlighted the role of open-source in fostering innovation, remarking, “Open-source is very critical in driving adoption.” This sentiment reflects the growing recognition of open-source contributions as essential to advancing AI and data technologies.
Advice for aspiring entrepreneurs and developers
Du Junping offered valuable advice to aspiring entrepreneurs and developers, noticing the importance of embracing open-source collaboration. He said, “We are moving towards open innovation,” suggesting that the future of AI and data technologies will be shaped by collective efforts and shared knowledge.
He also noted the need for continuous learning and adaptation, stating, “We should go this way, the open innovation.” This advice underscores the necessity for entrepreneurs and developers to stay engaged with the open-source community and leverage its collective intelligence.
Additionally, Du Junping highlighted the importance of building standards and reducing barriers, saying, “We expect more open-source innovation.” This approach will facilitate the development of standardised solutions and enhance the overall efficiency of AI technologies.
Long-term goals and vision for Datastrato
“We try to make the data across internal the organisation and also can be shared or exchanged safely.”
Du Junping, founder & CEO of Datastrato
Du Junping shared his vision for the future of Datastrato, focusing on the evolving concept of big data. He stated, “We try to make the data across internal the organisation and also can be shared or exchanged safely.” This vision involves creating a more integrated and accessible data environment, essential for advancing AGI technologies. He also discussed the importance of data diversity and multi-modal data, noting, “Large language models need a lot of diverse and multi-modal data.” This emphasis on data variety highlights the need for comprehensive data solutions to support the development of sophisticated AI models.
Du Junping concluded with an optimistic outlook on future developments, stating, “We want to build something like this in the next 5 to 10 years.” His long-term goals reflect a commitment to advancing data and AI technologies through innovative and collaborative approaches.