To reverse the brain drain of local AI talent while avoiding the B2C market dominated by Big Tech, the Canadian innovation ecosystem focuses its efforts on building AI startups serving the B2B enterprise market, consisting of large businesses that generate and possess great volumes of data necessary to train AI solutions. Unlike the B2C space, the enterprise domain has not been cornered by foreign tech scale-ups and incumbents, including Big Tech. Thus, since it presents real growth prospects, Canadian AI talents are in turn encouraged to stay in-country and build local enterprise AI software-as-a-service (SaaS) startups that have the potential to scale, that is, conquer global market shares. However, since the enterprise realm is becoming more lucrative with the advent of AI, foreign tech incumbents and scale-ups are expanding into enterprise AI, making it harder for Canadian AI SaaS startups to scale. To stay ahead of the competition, foreign scale-ups and tech incumbents are reinforcing their presence in the Canadian AI ecosystem by setting up corporate R&D labs, contributing to university research, investing in Canadian startups, or outright buying them through M&As. This leads to a localized brain drain where Canadian AI talents are captured by foreign interests, thanks to the promise of higher salaries, and the prospect of large financial and material resources needed to undertake ambitious projects. This leaves fewer local AI experts to build Canadian AI SaaS startups that could potentially become scale-ups and major players in the global enterprise market.
While increased foreign competition in the enterprise market does make life harder for Canada’s AI SaaS startups to scale and retain local AI talents in the process, the lack of data readiness among many potential enterprise customers compounds Canadian tech founders’ challenge of scaling up, thus amplifying the brain drain of local grey matter in favor of foreign tech incumbents and scaleups.
Data preparation is the first step in the Machine Learning (ML) pipeline, followed by training the ML model before finally deploying it within an enterprise customer’s organization. It is the most vital phase, which determines the success or failure of an AI project. Since data is the new oil in the age of AI, a SaaS company must guarantee that the data the customer possesses is adequate for training a ML model designed to automate a certain task within the client’s organization. In the context of supervised learning, which is the current dominant ML approach requiring troves of labelled data to train a given model, data preparation broadly consists of three steps. The first step is to collect a relevant and sufficiently large and varied training dataset with enough data points under each category or concept, thus enabling a model to learn patterns covering all known situations it will be confronted with once deployed. The second step is to clean that newly assembled dataset by removing anomalies and errors as well as formatting the data according to a set of standards to make it consistent. The third step consists of labeling all data points within the assembled and cleaned dataset before using it to train a ML model.
Data preparation is a time-consuming and capital-intensive task with no guarantee of success. While all three steps of data preparation under the SL paradigm are essential, the first and second ones are critical. For instance, a data science team cannot go beyond the first step of data preparation if the enterprise customer does not have a sufficiently large and varied dataset that can be used for training a ML model. It is common that while one category or concept displays enough data points, another one does not have enough to enable a model to identify a specific pattern. In the event a data science team was able to assemble a large and varied training dataset, it can be bogged down at phase 2 if the dataset in question does not display a consistent format and is rife with errors and anomalies. Such a situation is the result of a lack of a pre-existing and rigorous corporate data collection mechanism that ensures all new data points are managed and written in a consistent way according to standards applied across the enterprise customer’s organization. If those first two steps of the data preparation process are completed, a startup can move to the labeling phase which is comparatively easier. Difficulties with the first two steps can lead to costly delays, putting the project on hold indefinitely or outright cancelling it. All those scenarios can be devastating for a startup with a high burning rate, thus making data preparation a make-or-break moment for AI SaaS upstarts. One must bear in mind that the issues related to steps 1 and 2 of data preparation are quite common among potential enterprise customers, hence reducing Canadian AI SaaS startups’ business opportunities, thus leading to fewer local ventures able to scale up and compete with larger foreign tech players.
Therefore, the lack of enterprise data readiness amplifies the international and local brain drain of Canada’s AI talent pool. Since Canadian AI SaaS startups do not have many business opportunities, tech founders have no other choice but to shut down their venture or sell it to a competitor that in many cases are foreign tech incumbents or scaleups looking to boost either their fundamental or applied AI R&D efforts in the B2C or B2B enterprise markets through M&As that capture talents and IP. Moreover, considering this added difficulty faced by local AI upstarts, newly graduated Canadian AI experts are further convinced that it is better to work for more established foreign tech companies that offer lucrative and stable careers.
One must keep in mind that foreign tech scaleups and incumbents have an edge over startups when it comes to successfully completing all three steps of the data preparation stage. They tend to build ML products that complement or supersede pre-existing proprietary non-ML software solutions used by current enterprise customers, in the process generating troves of data points that may be used to train a future ML app, hence making the completion of step one easier. Plus, the data generated by those non-ML apps is managed and cleaned from the start through a data collection mechanism implemented within the enterprise customer’s organization by the incumbent or scaleup, therefore facilitating the second step of data cleaning. Accordingly, a data science team can move on to the third step of data preparation consisting of labeling the large, varied, and cleaned dataset.
Consequently, it appears that Canadian AI SaaS startups face barriers in both the B2C and B2B enterprise markets, thus seemingly leaving tech incumbents and scale-ups as the winners in the two spaces. Nonetheless, the B2B enterprise market still represents the Canadian innovation ecosystem’s best avenue to become a global AI hub, provided local upstarts overcome the general lack of enterprise data readiness. To that end, emerging ML technologies can work around that problem, therefore increasing the number of business opportunities for Canadian AI SaaS startups to seize and scale up, thus retaining in turn valuable local AI talent convinced of the upstarts’ growth prospects. Those new ML technics will be explored in a subsequent analysis.
Photo: an image for blogs and news sites dealing with artificial intelligence, AI, machine learning, smart computers etc (2018) by Mike Mackenzie via Flickr. Public domain.
Disclaimer: Any views or opinions expressed in articles are solely those of the authors and do not necessarily represent the views of the NATO Association of Canada.