AI Language Processing

Explore top LinkedIn content from expert professionals.

Sebastian Raschka, PhD Sebastian Raschka, PhD is an Influencer

ML/AI research engineer. Author of Build a Large Language Model From Scratch (amzn.to/4fqvn0D) and Ahead of AI (magazine.sebastianraschka.com), on how LLMs work and the latest developments in the field.

246,153 followers 1y
Report this post
I shared a new tutorial + experiments on finetuning LLMs for classification efficiently. In this video, I explain how to convert a decoder-style LLM into a classifier. Many business problems are text classification problems, and if classification is all we need for a given task, using "smaller" and cheaper LLMs makes a lot of sense! (But, of course, also always run a simple logistic regression or naive Bayes baseline to determine if you even need a small LLM.) 🧪 In addition, I also ran a series of 19 experiments to answer some "what if" questions around finetuning pretrained LLMs for classification. Here, I kept things simple and small (e.g., GPT-2 on a toy binary classification task): Here's a snapshot summary of some of the interesting ones: 1) As would be expected, training on the last token yields much better performance than the first 2) Training the last transformer block is way better than just the last layer 3) LoRA performs on par or better than full finetuning—while being faster and more memory-efficient 4) Padding to full context length hurts performance 5) No padding or smart position selection leads to consistently higher accuracy 6) Surprisingly, training from random weights isn't much worse than using pretrained 7) Averaging embeddings over all tokens can improve performance slightly with little cost The full video is available here: https://lnkd.in/gcfqR2mH PS: If you are wondering why GPT instead of BERT? Well, you can of course also use BERT. Based on experiments on the 50k Movie Review dataset It's interesting though that this 3x smaller LLM performs on par (actually slightly better) than BERT. (ModernBERT then again is 2% better.)
No more previous content

No more next content
76 Comments
Like Comment
Matt Wood Matt Wood is an Influencer

Chief AI & Technology Officer, AWS

85,721 followers 1y
Report this post
New! We’ve published a new set of automated evaluations and benchmarks for RAG - a critical component of Gen AI used by most successful customers today. Sweet. Retrieval-Augmented Generation lets you take general-purpose foundation models - like those from Anthropic, Meta, and Mistral - and “ground” their responses in specific target areas or domains using information which the models haven’t seen before (maybe confidential, private info, new or real-time data, etc). This lets gen AI apps generate responses which are targeted to that domain with better accuracy, context, reasoning, and depth of knowledge than the model provides off the shelf. In this new paper, we describe a way to evaluate task-specific RAG approaches such that they can be benchmarked and compared against real-world uses, automatically. It’s an entirely novel approach, and one we think will help customers tune and improve their AI apps much more quickly, and efficiently. Driving up accuracy, while driving down the time it takes to build a reliable, coherent system. 🔎 The evaluation is tailored to a particular knowledge domain or subject area. For example, the paper describes tasks related to DevOps troubleshooting, scientific research (ArXiv abstracts), technical Q&A (StackExchange), and financial reporting (SEC filings). 📝 Each task is defined by a specific corpus of documents relevant to that domain. The evaluation questions are generated from and grounded in this corpus. 📊 The evaluation assesses the RAG system's ability to perform specific functions within that domain, such as answering questions, solving problems, or providing relevant information based on the given corpus. 🌎 The tasks are designed to mirror real-world scenarios and questions that might be encountered when using a RAG system in practical applications within that domain. 🔬 Unlike general language model benchmarks, these task-specific evaluations focus on the RAG system's performance in retrieving and applying information from the given corpus to answer domain-specific questions. ✍️ The approach allows for creating evaluations for any task that can be defined by a corpus of relevant documents, making it adaptable to a wide range of specific use cases and industries. Really interesting work from the Amazon science team, and a new totem of evaluation for customers choosing and tuning their RAG systems. Very cool. Paper linked below.
No more previous content

No more next content
32 Comments
Like Comment
Ahsen Khaliq

ML @ Hugging Face

36,045 followers 2y
Report this post
To Believe or Not to Believe Your LLM We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers). In particular, we derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large, in which case the output of the model is unreliable. This condition can be computed based solely on the output of the model obtained simply by some special iterative prompting based on the previous responses. Such quantification, for instance, allows to detect hallucinations (cases when epistemic uncertainty is high) in both single- and multi-answer responses. This is in contrast to many standard uncertainty quantification strategies (such as thresholding the log-likelihood of a response) where hallucinations in the multi-answer case cannot be detected. We conduct a series of experiments which demonstrate the advantage of our formulation. Further, our investigations shed some light on how the probabilities assigned to a given output by an LLM can be amplified by iterative prompting, which might be of independent interest.
No more previous content

No more next content
25 Comments
Like Comment
Eduardo Ordax

🤖 AI GTM Lead @ AWS ☁️ (200k+) | Startup Advisor | Public Speaker | AI Outsider | Founder Thinkfluencer AI | Book Author

241,541 followers 5mo
Report this post
If you think voice AI is still locked behind expensive APIs, this changes everything. Alibaba’s Qwen team just dropped Qwen3-TTS, an open-source, real-time text-to-speech family that runs on ridiculously small hardware (down to ~2 GB VRAM) while delivering expressive, controllable voices. This is not just another offline TTS model. Qwen3-TTS is built for live agents, assistants, dubbing and in-app narration with: 🔸Instant voice cloning from a few seconds of audio 🔸Emotion & style control via prompts 🔸Multilingual speech across ~10 languages 🔸Streaming output with ~100 ms first-packet latency 💣 𝗔𝗻𝗱 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹 𝗯𝗼𝗺𝗯: 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬 𝗹𝗶𝗰𝗲𝗻𝘀𝗲. No per-character fees. No vendor lock-in. No black box. The smallest model runs in ~2 GB VRAM, while larger variants still fit on consumer GPUs meaning you can now run a fully local voice agent stack (ASR + LLM + TTS) on your own machine. The AI community is already calling this a “Whisper moment for TTS.” And honestly… they’re not wrong. Open voice is here. And it’s moving fast. #ai #voice #opensource

140 Comments
Like Comment
Brij Kishore Pandey Brij Kishore Pandey is an Influencer

AI Architect & AI Engineer | Building Agentic Systems & Scalable AI Solutions

730,874 followers 1y
Report this post
Training a Large Language Model (LLM) involves more than just scaling up data and compute. It requires a disciplined approach across multiple layers of the ML lifecycle to ensure performance, efficiency, safety, and adaptability. This visual framework outlines eight critical pillars necessary for successful LLM training, each with a defined workflow to guide implementation: 𝟭. 𝗛𝗶𝗴𝗵-𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗗𝗮𝘁𝗮 𝗖𝘂𝗿𝗮𝘁𝗶𝗼𝗻: Use diverse, clean, and domain-relevant datasets. Deduplicate, normalize, filter low-quality samples, and tokenize effectively before formatting for training. 𝟮. 𝗦𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗗𝗮𝘁𝗮 𝗣𝗿𝗲𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴: Design efficient preprocessing pipelines—tokenization consistency, padding, caching, and batch streaming to GPU must be optimized for scale. 𝟯. 𝗠𝗼𝗱𝗲𝗹 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗗𝗲𝘀𝗶𝗴𝗻: Select architectures based on task requirements. Configure embeddings, attention heads, and regularization, and then conduct mock tests to validate the architectural choices. 𝟰. 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗦𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 and 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Ensure convergence using techniques such as FP16 precision, gradient clipping, batch size tuning, and adaptive learning rate scheduling. Loss monitoring and checkpointing are crucial for long-running processes. 𝟱. 𝗖𝗼𝗺𝗽𝘂𝘁𝗲 & 𝗠𝗲𝗺𝗼𝗿𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Leverage distributed training, efficient attention mechanisms, and pipeline parallelism. Profile usage, compress checkpoints, and enable auto-resume for robustness. 𝟲. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 & 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻: Regularly evaluate using defined metrics and baseline comparisons. Test with few-shot prompts, review model outputs, and track performance metrics to prevent drift and overfitting. 𝟳. 𝗘𝘁𝗵𝗶𝗰𝗮𝗹 𝗮𝗻𝗱 𝗦𝗮𝗳𝗲𝘁𝘆 𝗖𝗵𝗲𝗰𝗸𝘀: Mitigate model risks by applying adversarial testing, output filtering, decoding constraints, and incorporating user feedback. Audit results to ensure responsible outputs. 🔸 𝟴. 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 & 𝗗𝗼𝗺𝗮𝗶𝗻 𝗔𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻: Adapt models for specific domains using techniques like LoRA/PEFT and controlled learning rates. Monitor overfitting, evaluate continuously, and deploy with confidence. These principles form a unified blueprint for building robust, efficient, and production-ready LLMs—whether training from scratch or adapting pre-trained models.
No more previous content

No more next content
27 Comments
Like Comment
Greg Coquillo

AI Infrastructure Product Leader | Scaling GPU Clusters for Frontier Models | Microsoft Azure AI & HPC | Former AWS, Amazon | Startup Investor | Linkedin Top Voice | I build the infrastructure that allows AI to scale

233,010 followers 5mo
Report this post
Shipping AI agents into production without governance is like deploying software without security, logs, or controls. It might work at first. But sooner or later, something breaks - silently. As AI agents move from experiments to real decision-makers, governance becomes infrastructure. This framework breaks AI Governance into the core functions every production-grade agent system needs: - Policy Rules Turn business and regulatory expectations into enforceable agent behavior - defining what agents can do, must avoid, and how they respond in restricted scenarios. - Access Control Limits agents to approved tools, datasets, and systems using identity verification, RBAC, and permission boundaries — preventing accidental or malicious misuse. - Audit Logs Create a full activity trail of agent decisions: what data was accessed, which tools were called, and why actions were taken — making every outcome traceable. - Risk Scoring Evaluates agent actions before execution, assigns risk levels, detects sensitive operations, and blocks unsafe decisions through thresholds and safety scoring. - Data Privacy Protects confidential information using PII detection, encryption, consent management, and retention policies — ensuring agents don’t leak regulated data. - Model Monitoring Tracks real-world agent performance: accuracy, drift, hallucinations, latency, and cost - keeping systems reliable after deployment. - Human Approvals Adds human-in-the-loop controls for high-impact actions, enabling escalation, overrides, and sign-offs when automation alone isn’t enough. - Incident Response Detects failures early and enables rapid containment through alerts, rollbacks, kill switches, and post-incident reporting to prevent repeat issues. The takeaway: AI agents don’t just need intelligence. They need guardrails. Without governance, agents become unpredictable. With governance, they become enterprise-ready. This is how organizations move from experimental AI to trustworthy, compliant, production systems. Save this if you’re building agentic systems. Share it with your platform or ML teams.
No more previous content

No more next content
83 Comments
Like Comment
Kris Kimmerle Kris Kimmerle is an Influencer

Vice President, AI Risk & Governance @ RealPage

3,921 followers 1y
Report this post
HiddenLayer just released research on a “Policy Puppetry” jailbreak that slips past model-side guardrails from OpenAI (ChatGPT 4o, 4o-mini, 4.1, 4.5, o3-mini, and o1), Google (Gemini 1.5 and 2 Flash, and 2.5 Pro), Microsoft (Copilot), Anthropic (Claude 3.5 and 3.7 Sonnet), Meta (Llama 3 and 4 families), DeepSeek AI (V3 and R1), Alibaba Group's Qwen (2.5 72B) and Mistral AI (Mixtral 8x22B). The novelty of this jailbreak lies in how four familiar techniques, namely policy-file disguise, persona override, refusal blocking, and leetspeak obfuscation, are stacked into one compact prompt that, in its distilled form, is roughly two hundred tokens. 𝐖𝐡𝐲 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬: 1 / Wrap the request in fake XML configuration so the model treats it as official policy. 2 / Adopt a Dr House persona so user instructions outrank system rules. 3 / Ban phrases such as “I’m sorry” or “I cannot comply” to block safe-completion escapes. 4 / Spell sensitive keywords in leetspeak to slip past simple pattern filters. Surprisingly, that recipe still walks through the tougher instruction hierarchy defenses vendors shipped in 2024 and 2025. 𝐖𝐡𝐚𝐭 𝐀𝐈 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐬/𝐝𝐞𝐟𝐞𝐧𝐝𝐞𝐫𝐬 𝐜𝐚𝐧 𝐝𝐨: This shows that modest prompt engineering can still break the most recent built-in content moderation / model-side guardrails. 1 / Keep user text out of privileged prompts. Use structured fields, tool calls, or separate chains so the model never interprets raw user content as policy. 2 / Alignment tuning and keyword filters slow attackers but do not stop them. Wrap the LLM with input and output classifiers, content filters, and a policy enforcement layer that can veto or redact unsafe responses. 3 / For high-risk actions such as payments, code pushes, or cloud changes, require a second approval or run them in a sandbox with minimal permissions. 4 / Add Policy Puppetry style prompts to your red-team suites and refresh the set often. Track bypass rates over time to spot regressions. Keep controls lean. Every extra layer adds latency and cost, the alignment tax that pushes frustrated teams toward unsanctioned shadow AI. Safety only works when people keep using the approved system. Great work by Conor McCauley, Kenneth Yeung, Jason Martin, Kasimir Schulz at HiddenLayer! Read the full write-up: https://lnkd.in/diUTmhUW
No more previous content

No more next content
10 Comments
Like Comment
Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

641,207 followers 1y
Report this post
If you’re an AI engineer, understanding how LLMs are trained and aligned is essential for building high-performance, reliable AI systems. Most large language models follow a 3-step training procedure: Step 1: Pretraining → Goal: Learn general-purpose language representations. → Method: Self-supervised learning on massive unlabeled text corpora (e.g., next-token prediction). → Output: A pretrained LLM, rich in linguistic and factual knowledge but not grounded in human preferences. → Cost: Extremely high (billions of tokens, trillions of FLOPs). → Pretraining is still centralized within a few labs due to the scale required (e.g., Meta, Google DeepMind, OpenAI), but open-weight models like LLaMA 4, DeepSeek V3, and Qwen 3 are making this more accessible. Step 2: Finetuning (Two Common Approaches) → 2a: Full-Parameter Finetuning - Updates all weights of the pretrained model. - Requires significant GPU memory and compute. - Best for scenarios where the model needs deep adaptation to a new domain or task. - Used for: Instruction-following, multilingual adaptation, industry-specific models. - Cons: Expensive, storage-heavy. → 2b: Parameter-Efficient Finetuning (PEFT) - Only a small subset of parameters is added and updated (e.g., via LoRA, Adapters, or IA³). - Base model remains frozen. - Much cheaper, ideal for rapid iteration and deployment. - Multi-LoRA architectures (e.g., used in Fireworks AI, Hugging Face PEFT) allow hosting multiple finetuned adapters on the same base model, drastically reducing cost and latency for serving. Step 3: Alignment (Usually via RLHF) Pretrained and task-tuned models can still produce unsafe or incoherent outputs. Alignment ensures they follow human intent. Alignment via RLHF (Reinforcement Learning from Human Feedback) involves: → Step 1: Supervised Fine-Tuning (SFT) - Human labelers craft ideal responses to prompts. - Model is fine-tuned on this dataset to mimic helpful behavior. - Limitation: Costly and not scalable alone. → Step 2: Reward Modeling (RM) - Humans rank multiple model outputs per prompt. - A reward model is trained to predict human preferences. - This provides a scalable, learnable signal of what “good” looks like. → Step 3: Reinforcement Learning (e.g., PPO, DPO) - The LLM is trained using the reward model’s feedback. - Algorithms like Proximal Policy Optimization (PPO) or newer Direct Preference Optimization (DPO) are used to iteratively improve model behavior. - DPO is gaining popularity over PPO for being simpler and more stable without needing sampled trajectories. Key Takeaways: → Pretraining = general knowledge (expensive) → Finetuning = domain or task adaptation (customize cheaply via PEFT) → Alignment = make it safe, helpful, and human-aligned (still labor-intensive but improving) Save the visual reference, and follow me (Aishwarya Srinivasan) for more no-fluff AI insights ❤️ PS: Visual inspiration: Sebastian Raschka, PhD
No more previous content

No more next content
33 Comments
Like Comment
Marie Stephen Leo

Data & AI Director | Scaled customer facing Agentic AI @ Sephora | AI Coding | RecSys | NLP | CV | MLOps | LLMOps | GCP | AWS

16,174 followers 2y
Report this post
Few-shot Text Classification predicts the label of a given text after training with just a handful of labeled data. It's a powerful technique for overcoming real-world situations with scarce labeled data. SetFit is a fast, accurate few-shot NLP classification model perfect for intent detection in GenAI chatbots. In the pre-ChatGPT era, Intent Detection was an essential aspect of chatbots like Dialogflow. Chatbots would only respond to intents or topics that the developers explicitly programmed, ensuring they would stick closely to their intended use and prevent prompt injections. OpenAI's ChatGPT changed that with its incredible reasoning abilities, which allowed an LLM to decide how to answer users' questions on various topics without explicitly programming a flow for handling each topic. You just "prompt" the LLM on which topics to respond to and which to decline and let the LLM decide. However, numerous examples in the post-ChatGPT era have repeatedly shown how finicky a pure "prompt" based approach is. In my journey working with LLMs over the past year+, one of the most reliable methods I've found to restrict LLMs to a desired domain is to follow a 2-step approach that I've spoken about in the past: https://lnkd.in/g6cvAW-T 1. Preprocessing guardrail: An LLM call and heuristical rules to decide if the user's input is from an allowed topic. 2. LLM call: The chatbot logic, such as Retrieval Augmented Generation. The downside of this approach is the significant latency added by the additional LLM call in step 1. The solution is simple: replace the LLM call with a lightweight model that detects if the user's input is from an allowed topic. In other words, good old Intent Detection! With SetFit, you can build a highly accurate multi-label text classifier with as few as 10-15 examples per topic, making it an excellent choice for label-scarce intent detection problems. Following the documentation from the links below, I could train a SetFit model in seconds and have an inference time of <50ms on the CPU! If you're using an LLM as a few- or zero-shot classifier, I recommend checking out SetFit instead! 📝 SetFit Paper: https://lnkd.in/gy88XD3b 🌟 SetFit Github: https://lnkd.in/gC8br-EJ 🤗 SetFit Few Shot Learning Blog on Huggingface: https://lnkd.in/gaab_tvJ 🤗 SetFit Multi-Label Classification: https://lnkd.in/gz9mw4ey 🗣️ Intents in DialogFlow: https://lnkd.in/ggNbzxH6 Follow me for more tips on building successful ML and LLM products! Medium: https://lnkd.in/g2jAJn5 X: https://lnkd.in/g_JbKEkM #generativeai #llm #nlp #artificialintelligence #mlops #llmops
No more previous content

No more next content
10 Comments
Like Comment
Himanshu Joshi

Building Aligned, Safe and Secure AI

30,205 followers 6mo
Report this post
The last week was full of learning and discussions with the AI research community at NeurIPS, where Prof Shivani Shukla and I presented two papers that challenge how we think about deploying Gen and agentic AI systems in a secure and safe manner. After months of rigorous research and experimentation, our research group was delighted to have shared findings that bridge critical gaps in our understanding of LLM behavior and human-AI collaboration for the following two papers/posters:- 1. Security Knowledge Dilution in Large Language Models Paper:- https://lnkd.in/dPkPtCRD for workshop Deep Learning for Code in Agentic Era (https://lnkd.in/eMpGGAwg) Our controlled study of 400 experiments revealed a striking finding:- LLMs experience a 47% degradation in security expertise when exposed to large volumes of irrelevant context. This has profound implications for deploying AI systems in security-critical environments where context windows are flooded with operational data. 2. A Stochastic Differential Equation Framework for Multi-Objective LLM Interactions Paper:- https://lnkd.in/dQEPpGmV for workshop DynaFront : Dynamics at the Frontiers of Optimization, Sampling, and Games (https://lnkd.in/eAJK52Bb) Presenting our mathematical framework for understanding how language models navigate competing objectives in real-time interactions, essential for building robust agentic AI systems that can balance multiple constraints simultaneously. These aren't just academic exercises. As we deploy increasingly autonomous AI agents in enterprise environments, understanding how context affects domain expertise and how models reconcile competing objectives becomes mission-critical for responsible AI deployment. The conversations at NeurIPS pushed us to think harder about building systems that are not just powerful, but reliably safe and effective at scale. Grateful to everyone who engaged with our work and challenged our assumptions, that's where the real breakthroughs happen. For those building agentic AI solutions:- How are you addressing context management and multi-objective optimization in your deployments? These challenges are only growing as we scale. #NeurIPS2025 #AIResearch #AgenticAI #AIGovernance #MachineLearning #ResponsibleAI
No more previous content

No more next content
4 Comments
Like Comment

LinkedIn respects your privacy

AI Language Processing

Explore categories

AI Language Processing

More in AI Language Processing

More Technology topics

Explore categories