The world has seen over a decade of modern Artificial Intelligence (AI) being used in multiple ways from detection, prediction, matching and recommending and it continues to be one of the most talked about topics on the technological landscape.
Despite being perceived as futuristic some years ago, AI and associated tools have reached our lives whether we are fully aware of it or not. Examples of these types of everyday AI usage are self-driven cars, high-end robotics, AI-driven traffic management, Smart grid maintenance, digital assistants (Alexa), facial recognition and more.
AI as we know it today started and slowly entered our lives and inside our homes over the last ten years and was powered by the introduction of hardware with new computational capabilities never seen before.
The introduction of AI can be roughly split into two significant periods, one from 1950’s to 2010, the robot's period, and from 2010 to the current date, which can be defined as the big-data period.
From its origins, most AI technology was used for the development of robots and humanoid robots (i.e) and game-playing machines (Deep Blue by IBM). On very scientific grounds, from 2010 onwards, AI has become a tool used in our day-to-day life.
One of the major events for the end of the AI winter was the launch of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a dataset of images annotated by Fei Fei Li and colleagues.
The introduction of this large dataset, and subsequent datasets in the most various AI fields, namely, Computer Vision (image data), Natural Language Processing (NLP - text data), and Automatic Speech Recognition (audio data), allowed for the fast development of AI-based solutions, among which are body tracking games (i.e Xbox360), natural language question-answering computers (ie Watson), virtual assistants (ie. Siri, Cortana and Alexa), image recognition systems (DistBelief by Jeff Dean and Andrew Ng) and DeepMind’s AlphaGo and “Starcraft II.”(which defeated various human champions).
Developments and introduction of AI in our lives owe their large expansion due to the introduction of the internet-of-things which allows the acquisition of gargantuan amounts of data "for free" from almost any conceivable device, being it a smart-device, a medical scan, a surveillance camera, voice interactions with devices and others.
The latest trends in AI can be divided into Computer Vision, NLP, and Speech fields.
Computer vision is one of the fields of AI in which models are trained so that computer programs are able to replicate the way humans perceive the word. Typical computer vision tasks involve teaching computer models to perform tasks such as detection and classification of objects in videos or images as humans would do. In essence, the ultimate goal of computer vision is to augment and automate the human visual insights.
Using Computer Vision AI in medical diagnosis has been one of the longest trends in AI computer vision applications, due to the exponential increase of medical imaging diagnosis and the finite number of medical experts. In recent times there were even bigger expectations and bold statements from companies such as IBM and Google, stating that computers were “out-performing” experts, but unfortunately, these AI solutions did not quite meet the bold statements and results obtained in real-life scenarios did not corroborate the lab results .
This led the community to learn some very important lessons that are, to date, driving active research and are part of some of the latest trends in computer vision applications. One of such lessons is related to a “change” in paradigm. While before the clear understanding that lab results could not be directly translated to real-life scenarios, research was mainly “model-centric”, focused on improving models’ performance, developing more advanced architectures, increasing networks depths, etc. Unlike then, we are now seeing a more “data-centric” approach. In other words, some of the computer vision’s latest trends show that equal, if not more, effort should be put into optimising the quality of data than training algorithms.
Of course, such a change in paradigm does not come without a cost. While improving a model's performance is technically more complex, acquiring and labelling good quality data is a very time-consuming, expertise dependent and expensive task. To overcome such limitations, there has been active research focusing on the development of better and new self-supervised learning approaches that can cope with the large amounts of data acquired everyday and that can still perform well with little supervision under real-life conditions.
Relevant for medical imaging but also for other computer vision applications is the need for increased interpretability and even better understanding, the inner workings of deep neural networks. Only this will increase the trust and acceptance of users to novel computer vision AI applications in crucial life-science and life-threatening areas, such as e-health applications and autonomous driving, for instance [2, 3].
On a similar topic, some researchers at MIT-IBM Watson AI Lab have been attempting to combine the expert’s and AI knowledge by exploring the concept of Neuro-Symbolic AI. This approach involves the combined use of modern deep learning techniques with traditional symbolic AI methods and tries to establish a human-level reasoning about entities and their relations.
Computer vision applications have also seen an increased drive for retail and safety compliance applications edge locations and devices, which can have very limited processing power and require real-time predictions.
Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in the same way human beings can.
Some of the biggest trends in 2022, in the NLP fields, and in upcoming years will be aligned with :
Depending less on massive amounts of data (as an example, the state-of-the-art, GPT-3  has been reported to be so expensive that it is not possible to fine-tune the model) and continue leveraging transfer learning to allow for democratised access to AI.
Understand and balance sources of biases and take into account the diversity of humanity
Understand and detect fake news (promoted by ever more powerful NLP models) exploiting sentiment analysis features.
Explore and construct NLP Multilingual Models for inclusivity and global reach.
Increased automation of tasks, enabled by NLP technologies, with improved and demonstrated performances. For instance, robust intelligent document processing has the advantage of freeing workers from monotonous and laborious work while improving their productivity.
While NLP and speech are commonly associated with each other, Textless NLP is a new trend in the sub-field of Voice AI. One of the best examples of this type of approach is the Generative Spoken Language Model (GSLM), released 2021 by Meta AI . This type of technology aims at generating speech continuation without any dependency on text. Which, by default, would enable less structured and, at the same time, much richer language expression by capturing much more para-linguistic information than just semantics. This would bring us a step closer to sentimental AI since it could open the door to the embedding of tone, emotions, health, vitality and speakers' voice characteristics into classification or generation models . Moreover, less exciting but very valuable information can also be extracted from this type of speech namely, age, gender and geography.
One of its biggest trends in the future is looking at data in a holistic way by, for instance, combining the power of NLP with the adoption of voice technologies. The maturation of this Speech & NLP together, will give businesses more confidence in their data analysis capabilities. It has the potential to reveal a treasure trove of insights that companies cannot ignore. This holistic approach can be further developed by combining Speech and NLP with visual data and could be used for applications such as recognizing whether a meme is hateful, which requires considering both the image and the text content of the meme or evaluating the interest of e-commerce users with respect to both visual and textual content.
Lastly, the ever-growing amounts of data will make manual annotation an impossible task which will put pressure on the industry to develop self-supervised approaches.
Deeper Insights is the leading Data Science AI/ Machine Learning company helping organisations across industries unlock the transformative power of AI.