Newsletter: October 2023
The latest Deeper Insights blogs
Unveiling the AI black box: The challenge of understanding decisions
While AI's capabilities are expanding, there's a growing demand for Explainable AI (XAI) to make these systems transparent and understandable. XAI aims to reveal the "how" and "why" behind AI decisions, fostering trust and accountability. [Read more]
The Wearable Revolution: From Step Counting to Lifesaving Insights
Explore the transformative journey of wearable technology, from basic step counters to powerful health monitors, underpinned by Artificial Intelligence. Discover how this synergy is revolutionising personal health, opening avenues in healthcare, and promising a proactive, personalised approach to wellness. [Read more]
The Unspoken Challenges of Large Language Models
From computational demands to a scarcity of specialised talent, explore the array of unspoken challenges of building an LLM. Explore what it takes in terms of technical and human resources to deploy these transformative technologies successfully. [Read more]
Featured GenAI news
NVIDIA, Lenovo Enhance Partnership for Generative AI - Nvidia.com
Lenovo and NVIDIA are broadening their collaboration to provide integrated systems employing generative AI from edge to cloud. They intend to assist enterprises in rapidly launching customised AI applications for business advancement. Supported by Lenovo AI Professional Services Practice, their solutions will adopt a hybrid cloud strategy and leverage NVIDIA's newest hardware for generative AI. [Read more]
Will generative AI transform business? - Financialtimes.com
Generative AI's rapid advancement post-ChatGPT's release is poised to transform various industries economically and operationally, despite facing challenges like errors and misuse, necessitating strategic adoption and human oversight. [Read more]
Accelerating Innovation Through Generative AI - Forbes.com
Explore how generative AI is propelling innovation. Dive into its transformative role in research, prototype creation, and testing. Discover its significant impact, likened to the steam engine and internet, while also understanding the call for mindful leadership in harnessing its potential. [Read more]
The uncertain role of generative AI in NHS care - DigitalHealth.net
Generative AI's integration in NHS is a double-edged sword, presenting both promises and pitfalls. While it can significantly reduce administrative work for clinicians, concerns about fabricated information raises concerns on their clinical applications. Efforts are underway to establish stringent regulation and validation processes to safely deploy AI, with certain solutions, like generative AI for dictation, showing potential to streamline backend operations. [Read more]
GenAI news snapshots - Industry report
- OpenAI is hosting its first developer conference, OpenAI DevDay, on November 6, 2023, in San Francisco. Developers will gain insights into new tools and interact with OpenAI's technical team. The API, featuring advanced models like GPT-4 and DALL·E, is popular among over 2 million developers. Registration details coming soon at OpenAI DevDay. [Read more]
- LinkedIn is rolling out AI-powered updates, such as improved search capabilities in Recruiter, an AI-driven learning coach, and a new marketing tool called Accelerate. These changes signal a shift toward using OpenAI and Microsoft technology, aligning with the growing prevalence of AI in mainstream applications. [Read more]
- OpenAI is considering producing its own AI chips to address the chip shortage issue. They are exploring potential acquisitions, with NVIDIA mentioned as a potential partner. However, it's uncertain whether they will proceed with building custom chips. [Read more]
- OpenAI has reintroduced internet browsing capability to ChatGPT, enabling access to real-time information beyond its previous September 2021 cutoff. This feature, "Browse with Bing," is now available to paid ChatGPT users, with plans for wider access. Privacy and content quality concerns remain, as the update aims to compete with Google in providing up-to-date information. [Read more]
- ChatGPT Elevates Multimodal Interaction with Hearing, Speaking, and Visual Capabilities. OpenAI enhances ChatGPT with multimodal abilities: audio input, text-to-speech, and image recognition, paving the way for more intuitive user interactions. These updates position ChatGPT as a more versatile tool in the evolving AI landscape. [Read more]
- Meta is unveiling "Gen AI Personas," a range of generative AI chatbots geared towards engaging younger users. These personas, inspired by various characters, include a "sassy robot" and "Alvin the Alien." Meta intends to create multiple chatbots, some for celebrities and productivity purposes, aiming to tap into Gen Z's tech comfort and extend ad-serving opportunities. [Read more]
- OpenAI is enhancing ChatGPT by enabling voice commands and image uploads. Users can speak questions, with the AI converting speech to text, and use images for queries. While these features offer convenience, OpenAI is cautious about potential misuse, limiting its application. [Read more]
- Apple is heavily investing in AI for Siri, focusing on chatbots and advanced language models. They aim to integrate these models into Siri, though challenges remain due to their size. Apple is also considering smaller models for privacy. Internal documents revealed Apple's interest in developing its own AI technology. [Read more]
- Project Gutenberg, Microsoft, and MIT collaborated to automatically generate thousands of open-license audiobooks from Project Gutenberg's e-books, making literature more accessible and customisable. Users can personalize audiobooks, but some recordings may have errors or content issues. [Read more]
- Spotify now offers AI-powered podcast translations with the same voices as the original, initially in English to Spanish, and later in French and German. While this expands accessibility, concerns arise regarding potential misuse for misinformation and implications for voice actors. [Read more]
- Microsoft has integrated OpenAI's DALL-E 3 model into Bing Chat and Bing Image Creator, offering improved image generation capabilities with enhanced prompt understanding and safety features. Server overload issues have been reported during initial use. [Read more]
- Meta is unveiling "Gen AI Personas," a range of generative AI chatbots geared towards engaging younger users. These personas, inspired by various characters, include a "sassy robot" and "Alvin the Alien." Meta intends to create multiple chatbots, some for celebrities and productivity purposes, aiming to tap into Gen Z's tech comfort and extend ad-serving opportunities. [Read more]
- Google's Bard AI chatbot goes beyond web searches, now scanning Gmail, Docs, and Drive to retrieve information efficiently. This integration, called extensions, streamlines tasks like summarising emails and highlighting key document points, with more use cases on the horizon, albeit currently in English only. [Read more]
- Adobe's Firefly generative AI models are now officially available in Creative Cloud, Adobe Express, and Adobe Experience Cloud after 176 days in beta. Users can enjoy features like generative fill and expand in Photoshop without the need for beta installations. [Read more]
- Getty Images has partnered with NVIDIA to offer a commercially safe content creation solution. It's trained exclusively on Getty Images' high-quality content, allowing users to download and license visuals without IP concerns. It compensates creators and doesn't add generated images to the library for others to use. [Read more]
GenAI tools: LLM models
- Llama 2 Long - Meta's latest AI model outperforms competitors like GPT-3.5 Turbo and Claude 2 in handling long user prompts. This achievement highlights the viability of Meta's open-source AI approach. [Read more]
- Mistral 7B - Launched by Mistral AI, is a high-performance 7.3 billion-parameter language model under Apache 2.0 license. It outperforms Llama 2 13B on multiple benchmarks, excels in code and reasoning tasks, and offers easy fine-tuning for different applications, making it a versatile language model choice. [Read more]
- DreamLLM - A synergistic multimodal comprehension and creation. DreamLLM is a pioneering framework that enhances Multimodal LLMs by improving the synergy between language and image understanding and generation. It achieves this by directly sampling the raw multimodal space, enabling the generation of raw, interleaved documents that combine text and images. DreamLLM outperforms as a zero-shot multimodal generalist, benefiting from this improved synergy. [Read more]
- LongLoRA - Efficiently extends context for Large Language Models. LongLoRA is an efficient fine-tuning method to extend the context of LLMs without significant computational overhead. By employing sparse local attention during training, it achieves remarkable results, enabling context expansion for LLaMA2 7B models from 4k to 100k and 32k for 70B models on a single 8x A100 machine. [Read more]
- RAIN - Enables LLMs to align with human preferences without fine-tuning or extra data. It uses self-evaluation and rewind mechanisms to improve LLM responses in terms of safety. Experimental results show its effectiveness in enhancing LLM alignment with human values. [Read more]
GenAI tools: Visual models
- AnthroNet - A novel human body model based on anthropometric measurements. It uses deep generative techniques and is trained solely on synthetic data, enabling the generation of diverse human body shapes and poses. The model was trained on 100k synthetic posed human meshes and corresponding measurements, making it valuable for non-commercial academic research purposes. [Read more]
- DivSem - A new framework by Bilkent University that enhances semantic image editing. DivSem addresses the complex task of inpainting pixels while ensuring contextual harmony and adherence to semantic maps. It differentiates styles for visible and partially visible objects, improving image quality and diversity. Extensive testing shows DivSem's superiority over existing methods. Demo. [Read more]
- Emu - A text-to-image model that improves image quality through quality-tuning. It's pre-trained on a vast dataset and fine-tuned with highly appealing images. Emu outperforms pre-trained models, achieving an 82.9% win rate in visual appeal and surpassing existing models in real-world use cases. [Read more]
GenAI tools: Everything else models
- Haystack - A versatile end-to-end NLP framework that facilitates the development of applications for tasks like question answering and semantic document search. It offers modular components, supports various NLP models, and allows customisation for diverse use cases, making it a scalable and open solution [Read more]
- Efficient Vision Transformer Fine-Tuning with SCT - Introduces "Salient Channel Tuning" (SCT), a method for fine-tuning pre-trained vision transformers. SCT efficiently utilizes task-specific information by tuning only 1/8 of the channels, achieving impressive performance gains across various tasks with minimal additional parameters, making it suitable for low-data scenarios. [Read more]
- DeepEval - DeepEval is a Python tool for evaluating LLMs in CI/CD pipelines, offering various metrics and a user-friendly platform for tracking and improving model performance. [Read more]
- Optimising Vision - Researchers have introduced a method to fine-tune Vision-Language Models (VLMs) using chat-based LLMs as black-box optimisers. This approach outperforms white-box methods, such as OpenAI's prompts, and is more efficient than other black-box approaches. It generates interpretable and transferable prompts for diverse CLIP architectures. [Read more]
- AnomalyGPT - A cutting-edge Large Vision-Language Model (LVLM) for Industrial Anomaly Detection (IAD). Unlike traditional methods, it doesn't require manual threshold settings. AnomalyGPT excels in identifying anomalies, achieving state-of-the-art performance on the MVTec-AD dataset, and provides detailed image information by aligning text descriptions with images using LVLMs and image encoders. [Read more]
- LLaVA-RLHF - An open-source multimodal model with Factually Augmented RLHF alignment, improving performance and reducing hallucination in reinforcement learning. It excels in visual reasoning, sets new benchmarks, and offers public access to its data and codebase. [Read more]
- Kosmos-2.5 - A multimodal literate model designed for text-intensive images. It excels in generating text blocks with spatial coordinates and structured markdown format text. The model is adaptable for various text-rich image tasks, making it a versatile tool for real-world applications. [Read more]
- Yasa-1 - Introduced by Reka, is a versatile multimodal AI assistant with text, image, audio, and video processing capabilities, including code execution and dataset integration. While promising, Yasa-1 acknowledges accuracy limitations and emphasizes responsible usage. Future improvements are expected. [Read more]
- EnCodecMAE - A novel approach to universal audio representation learning. It utilizes EnCodec, a neural audio codec, to generate discrete targets for training a universal audio model with a masked autoencoder (MAE). This model demonstrates competitive or superior performance across a wide range of audio tasks, including speech, music, and environmental sounds. [Read more]
- NExT-GPT - Bridges the gap by accepting and delivering content in various modalities, including text, images, videos, and audio. Leveraging well-trained encoders and decoders, it offers efficient and cost-effective training, while modality-switching instruction tuning enhances cross-modal understanding.[Read more]
- Stable Audio - Groundbreaking latent diffusion model architecture for audio. This innovation, conditioned on text metadata, audio duration, and start time, empowers precise control over the content and length of generated audio. Unlike conventional models, Stable Audio can generate audio of varying lengths, such as full songs, with remarkable speed—rendering 95 seconds of stereo audio in under one second on an NVIDIA A100 GPU, all thanks to a downsized latent representation of audio. [Read more]
Let us solve your impossible problem
Speak to one of our industry specialists about how Artificial Intelligence can help solve your impossible problem