Starter Notes
Overview
AI has moved from handling structured tasks to understanding and generating human language. This shift led to Large Language Models (LLMs), which can read, write, and converse like humans.
A Large Language Model is mainly defined by two things:
- The training algorithm
- The model parameters (weights)
The training algorithm is the code created by engineers to train the model. The result of that training process is a set of model parameters, also called weights.
Together, these determine how the model behaves and generates responses.
Training Algorithm
The training algorithm controls how the model learns from data.
- Engineers create the training process
- The model learns patterns from large amounts of text
- The final result is a set of weights
These weights store what the model has learned. When people talk about a "600-billion parameter model", they are referring to the number of weights inside the model.

The training process creates the model, but the weights are what make the model useful after training.
They are “large” not just because of size but also because of their ability to understand and produce complex text.
Models
A model learns patterns and structures from data to make predictions or generate new results.

How LLMs Generate Text
An LLM works by breaking your input into smaller pieces called tokens and then predicting what token should come next.
- Tokens are small pieces of text
- Each token has a unique ID
- The model predicts the next token based on the input
- Multiple token candidates are considered
- Each candidate is assigned a probability
- The most likely token is usually selected
For example, if the input is:
The sky is
The model might generate the following candidates:
| Token | Probability |
|---|---|
| blue | 45% |
| clear | 12% |
| visible | 21% |
Since blue has the highest probability, it would most likely be selected as the next token.

The model then repeats the process using both the original input and the newly generated token. By predicting one token at a time, it gradually builds complete sentences and responses. The probabilities used for these predictions are calculated using the billions of parameters learned during training.
Language Generators
Several powerful language generators exist today, each with unique strengths.
- The GPT series became popular for human-like responses
- LexaAI and NovaChat emerged with similar capabilities
- Open-source alternatives continue to grow rapidly
Real-world Applications
Business Opportunities
LLMs offer many ways to improve industries, automate tasks, and create new value.
- Automate repetitive or complex tasks
- Improve efficiency in operations
- Generate new products, services, or revenue streams
- Help companies discover innovative solutions
Finance
LLMs can process financial information that is often unstructured, such as reports, news, or social media posts.
- Analyze unstructured text like annual reports or market news
- Identify trends and investment opportunities
- Assist in managing portfolios and risk
By understanding large amounts of complex text, LLMs provide actionable insights that help financial decisions become faster and more accurate.

Healthcare
LLMs enable healthcare providers to analyze data efficiently while keeping patient privacy, leading to better personalized care.
- Process medical records, lab results, and imaging reports
- Provide personalized treatment suggestions
- Follow privacy laws to protect sensitive patient information

Education
LLMs can act as interactive tutors that offers personalized learning experiences for students of different levels.
- Answer questions and provide explanations tailored to learners
- Adjust teaching style based on understanding and progress
- Generate custom exercises and examples for practice

Multimodal Applications
LLMs can also handle multiple types of data, not just text.
- Process text, audio, video, and images
- Perform tasks like visual question answering
- Relationships between different types of input
Example: A model can look at a photo of a zebra and answer questions about it, describe the scene, or even add context or humor.

Challenges
Language models need to grasp how words connect and relate to capture meaning accurately.
-
Sequence matters
- Word order changes meaning
- Example: "I only follow a healthy lifestyle" vs "Only I follow a healthy lifestyle"
-
Context is key
-
Words have different meanings depending on surrounding text
-
Models use nearby words to decide the correct meaning
-
Example: "run" can mean jogging, managing, or operating a machine

-
Long-Range Dependencies
Some sentences have information spread far apart, which make connections harder to track. Models need to link the distant words.
Consider the example below. Understanding this requires connecting "book" with "was quite heavy"

Single-Task vs Multi-Task Learning
Traditional models often focus on one task at a time, while LLMs can handle many tasks together.
-
Single-task learning
-
Trains separate models for each task
-
Examples: question answering, summarization, translation
-
Requires more resources and limits flexibility

-
-
Multi-task learning
-
One model learns multiple tasks at once
-
Improves learning from shared data
-
Can handle unseen data better but may trade some efficiency

-