LLAMA3 on Replicate
What You’ll Learn
How to configure Foyle to use LLAMA3 hosted on Replicate
Prerequisites
- You need a Replicate account
Setup Foyle To Use LLAMA3 on Replicate
Get an API Token from Replicate and save it to a file
Configure Foyle to use this API key
foyle config set replicate.apiKeyFile=/path/to/your/key/file
Configure Foyle to use LLAMA3 hosted on Replicate
Configure Foyle to use the appropriate model deployments
foyle config set agent.model=meta/meta-llama-3-8b-instruct
foyle config set agent.modelProvider=replicate
How It Works
Foyle uses 2 Models
- A Chat model to generate completions
- An embedding model to compute embeddings for RAG
Replicate provides hosted versions of meta/llama-3-8b-instruct and meta/llama-3-8b-instruct which are chat models. Notably, these models are kept warm so when you send predictions Replicate doesn’t need to boot up new instances. Replicate also provides an OpenAI proxy so you can use the OpenAI APIs to generate responses.
Unfortunately, Replicate doesn’t provide hosted, always warms versions of embedding models. So Foyle continues to use OpenAI for the embedding models.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.