Image Command
The image
command allows you to analyze images using OpenAI's vision-capable models (like GPT-4o). This feature enables you to ask questions about images, extract text from images, describe visual content, and more.
Overview
The image subcommand automatically ensures you're using a vision-capable model and will switch to gpt-4o
if your current model doesn't support vision. It supports multiple image formats and provides flexible prompting options.
Basic Usage
cgip image --file photo.jpg "What do you see in this image?"
Command Help
Analyze an image using vision models. Use --file to specify the image path
Usage: cgip image [OPTIONS] --file <FILE> [PROMPT]
Arguments:
[PROMPT] Additional prompt text to include with the image analysis
Options:
-f, --file <FILE> Path to the image file to analyze
-m, --max-tokens <MAX_TOKENS> Maximum number of tokens in the response [default: 300]
-h, --help Print help
-V, --version Print version
Supported Image Formats
The image command supports the following formats:
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- WebP (.webp)
Basic Examples
Simple Image Description
cgip image --file photo.jpg
# Uses default prompt: "What is in this image?"
Custom Analysis Prompt
cgip image --file screenshot.png "Extract all text from this image"
Detailed Analysis
cgip image --file diagram.jpg "Explain this diagram in detail" --max-tokens 500
Common Use Cases
Text Extraction (OCR)
Extract text from screenshots, documents, or signs:
cgip image --file receipt.jpg "Extract all text from this receipt and format it as a list"
cgip image --file whiteboard.png "What does the text on this whiteboard say?"
Document Analysis
Analyze documents, receipts, forms, and other text-heavy images:
cgip image --file receipt.jpg "What items are on this receipt and what's the total?"
cgip image --file form.png "Help me fill out this form - what information is required?"
Code Analysis
Analyze screenshots of code or development environments:
cgip image --file code_screenshot.png "What does this code do? Are there any potential issues?"
cgip image --file error_screen.png "What error is shown here and how can I fix it?"
Visual Content Description
Describe photos, artwork, charts, and other visual content:
cgip image --file chart.png "Describe the data shown in this chart"
cgip image --file artwork.jpg "Describe the style and composition of this artwork"
Technical Diagrams
Analyze technical diagrams, flowcharts, and system architectures:
cgip image --file architecture.png "Explain this system architecture diagram"
cgip image --file flowchart.jpg "Walk me through this process flowchart"
Advanced Usage
Combining with Other Commands
Process image analysis results with other tools:
# Save analysis to file
cgip image --file diagram.png "Explain this" --max-tokens 1000 > analysis.txt
# Convert analysis to speech
cgip image --file chart.jpg "Describe this data" | cgip tts --voice nova
Multiple Images Workflow
While the command handles one image at a time, you can process multiple images:
# Process multiple screenshots
for img in screenshots/*.png; do
echo "=== Analyzing $img ==="
cgip image --file "$img" "What error or issue is shown here?"
done
Context-Aware Analysis
Combine image analysis with additional context:
# Analyze error with additional context
cgip image --file error.png "Given that this is a React application, what's causing this error?"
Response Length Control
Default Token Limit
The default max-tokens is 300, suitable for brief descriptions:
cgip image --file photo.jpg "Briefly describe this image"
Extended Analysis
For detailed analysis, increase the token limit:
cgip image --file complex_diagram.png "Provide a comprehensive analysis" --max-tokens 1000
Concise Responses
For very brief responses, use focused prompts:
cgip image --file text.png "Extract only the main heading" --max-tokens 50
Model Selection
Automatic Model Selection
The image command automatically uses vision-capable models:
- If your configured model supports vision, it uses that model
- If your configured model doesn't support vision, it switches to
gpt-4o
Supported Vision Models
Currently supported vision-capable models include:
gpt-4o
gpt-4-vision-preview
gpt-4-turbo
gpt-4
Checking Your Model
Verify your current model configuration:
cgip config --get model
Set a vision-capable model as default:
cgip config --set model=gpt-4o
Image Processing Details
Automatic Processing
Images are automatically:
- Read from the file system
- Encoded as base64
- Sent to the API with appropriate MIME type based on file extension
- Combined with your text prompt for analysis
File Size Considerations
- Large images may take longer to process
- Very large images might hit API limits
- Consider resizing extremely large images for better performance
Quality Recommendations
- Higher resolution images generally produce better analysis
- Ensure text in images is clearly readable
- Good lighting and contrast improve results
Error Handling
Common Issues and Solutions
File not found:
cgip image --file nonexistent.jpg "analyze this"
# Error: No such file or directory
Check the file path and ensure the file exists.
Unsupported format:
cgip image --file document.pdf "analyze this"
# May not work - PDF is not a supported image format
Convert PDFs to images first or use supported formats.
Network errors: Check your API key and network connection.
Model not available: Ensure you have access to vision-capable models in your API account.
Best Practices
Prompt Engineering
Be specific about what you want:
# Good
cgip image --file chart.png "What are the main trends shown in this sales data chart?"
# Less effective
cgip image --file chart.png "What is this?"
Ask for structured output when needed:
cgip image --file receipt.png "Extract the items and prices as a formatted list"
Image Quality
- Use clear, well-lit images
- Ensure text is readable
- Avoid blurry or low-contrast images
Token Management
- Use appropriate max-tokens for your needs
- Brief descriptions: 100-300 tokens
- Detailed analysis: 500-1000 tokens
- Comprehensive reports: 1000+ tokens
Cost Considerations
Vision requests consume more tokens than text-only requests:
- Monitor your usage
- Use appropriate token limits
- Consider image size and complexity
Troubleshooting
Image Not Loading
Check file permissions and path:
ls -la path/to/image.jpg
Poor Analysis Quality
- Try a more specific prompt
- Ensure image quality is good
- Increase max-tokens for more detailed responses
Model Errors
If you get model-related errors:
# Check current model
cgip config --get model
# Set to a known vision model
cgip config --set model=gpt-4o
Integration Examples
Development Workflow
# Analyze UI mockups
cgip image --file mockup.png "What UI elements are shown and how should they be implemented?"
# Debug visual issues
cgip image --file bug_screenshot.png "What's wrong with this UI layout?"
Documentation
# Generate image descriptions for documentation
cgip image --file feature_screenshot.png "Write a caption for this screenshot" --max-tokens 100
Data Analysis
# Analyze charts and graphs
cgip image --file sales_chart.png "What insights can you derive from this sales data?"
Next Steps
- Learn about TTS command to convert image analysis to speech
- Explore agent command for automated workflows involving images
- Try combining with other features for powerful multi-modal workflows