Skip to main content

Loading Content Into SaaSy Content Guardian

This guide explains the three methods for importing content into SaaSy Content Guardian: website crawling, PDF uploads, and CSV bulk uploads.

Updated over a week ago

This guide explains the three methods for importing content into SaaSy Content Guardian: website crawling, PDF uploads, and CSV bulk uploads.

1. Overview of Content Loading Methods

SaaSy Content Guardian supports three ways to load content for analysis:

1. Website Crawling

- Single page crawl (1 credit per page)

- Deep crawl up to 10 pages (1 credit per page)

2. PDF Upload

- Upload PDF documents

- Automatic text extraction

- Support for up to 20MB file size

3. CSV Bulk Upload

- Import multiple articles at once

- Structured data import

- Template provided

2. Website Crawling

2.1 Single Page Crawl

How to crawl a single webpage:

1. Navigate to the "Load Content" section

2. Click "Crawl Website"

3. Enter the full URL of the page you want to analyse

4. Select "Single Page" as the cral type

5. Click "Start Crawl"

6. Wait for the crawl to complete (usually 10-30 seconds)

7. Review the extracted content

8. The page will be added to your content library

9. 1 credit will be deducted from your account

TIP: Make sure to include the full URL including "https://" or "http://" for successful crawling.

2.2 What Gets Crawled?

When crawling a webpage, SaaSy Content Guardian extracts:

- Page title

- Main content text

- Meta descriptions

- Headings (H1, H2, H3, etc.)

- Paragraph content

- List items

- Article structure

Excluded from crawling:

- Navigation menus

- Footers

- Sidebar content

- Advertisements

- Cookie banners

- Comments sections

2.3 Deep Website Crawl

Deep website crawling allows you to crawl multiple pages from a website at once, up to a maximum of 10 pages.

How to perform a deep crawl:

1. Navigate to the "Content" section

2. Click "Add Content" or "Load Website"

3. Select "Deep Crawl"

4. Enter the root URL or starting page

5. Specify how many pages to crawl (maximum 10)

6. Optional: Enable "Same domain only" to restrict crawling to the same website

7. Click "Start Deep Crawl"

8. Monitor the crawl progress (may take 1-3 minutes)

9. Review the list of discovered pages

10. Select which pages to include in your content library

11. Confirm the crawl

12. Credits will be deducted based on the number of pages crawled

Credit cost: 1 credit per page crawled (e.g., 10 pages = 10 credits)

2.4 Deep Crawl Configuration Options

- Maximum pages: Up to 10 pages per crawl

- Same domain only: Only crawl pages from the same domain

- Follow links: Automatically discover linked pages

- Respect robots.txt: Honour website crawling preferences

TIP: Use deep crawling to analyse an entire blog category, product range, or section of your website in one go.

2.5 Troubleshooting Website Crawls

Common issues and solutions:

Issue: "Failed to crawl website"

Solution: Check that the URL is correct and publicly accessible. Some websites block automated crawling.

Issue: "No content extracted"

Solution: The page may be JavaScript-heavy or behind a login. Try copying the content manually and using CSV upload instead.

Issue: "Crawl timed out"

Solution: The website may be slow to respond. Try again later or use a more specific URL.

Issue: "403 Forbidden error"

Solution: The website is blocking automated access. You may need to contact the site owner or use manual content copying.

3. PDF Upload

3.1 Uploading PDF Files

PDF upload allows you to analyse content from PDF documents, such as reports, articles, whitepapers, and ebooks.

How to upload a PDF:

1. Navigate to the "Content" section

2. Click "Add Content" or "Upload PDF"

3. Click "Choose File" or drag and drop your PDF

4. Wait for the file to upload (progress bar will appear)

5. The system will automatically extract text from the PDF

6. Review the extracted content

7. Give the document a descriptive name (optional)

8. Click "Save to Library"

9. The PDF content is now available for analysis

Maximum file size: 20MB per PDF

3.2 Supported PDF Formats

- PDF version 1.0 to 1.7

- Text-based PDFs (with selectable text)

- Scanned PDFs with OCR support (where possible)

- Multi-page documents

TIP: PDFs with images and complex layouts may have varying text extraction quality. For best results, use text-based PDFs.

3.3 Text Extraction Process

The system automatically:

1. Uploads the PDF to secure storage

2. Extracts all readable text content

3. Preserves paragraph structure where possible

4. Removes headers, footers, and page numbers

5. Cleans up formatting artefacts

6. Stores the extracted text in your content library

3.4 PDF Upload Limitations

- Maximum file size: 20MB

- Maximum pages: No hard limit, but very large PDFs may take longer to process

- Scanned PDFs: OCR is attempted but may not work on all files

- Encrypted PDFs: Password-protected PDFs cannot be processed

WARNING: If your PDF is password-protected, you'll need to remove the password before uploading.

4. CSV Upload

4.1 CSV Upload Overview

CSV (Comma-Separated Values) bulk upload allows you to import multiple articles or content pieces at once using a structured spreadsheet format.

Benefits of CSV upload:

- Import dozens or hundreds of articles simultaneously

- Prepare content offline in spreadsheet software

- Ideal for analysing large content libraries

- Maintain consistent structure across all articles

4.2 CSV File Format Requirements

Your CSV file MUST include these two columns (case-sensitive):

1. title - The title or headline of each article

2. content - The full text content of the article

Optional columns (currently ignored but may be supported in future):

- author

- date_published

- url

- category

4.3 CSV Template Download

To make CSV creation easier, we provide a template:

1. Navigate to the "Content" section

2. Click "Upload CSV"

3. Click "Download CSV Template"

4. Open the template in Microsoft Excel, Google Sheets, or similar

5. Fill in your content following the template structure

6. Save as CSV format

Template structure:

```

title,content

"Your Article Title","Your article content goes here..."

"Another Article","More content here..."

```

4.4 Preparing Your CSV File

Guidelines for creating a valid CSV:

1. Use UTF-8 encoding to support special characters

2. Enclose text fields in double quotes

3. Escape internal quotes by doubling them ("")

4. Each row represents one article

5. Do not include blank rows

6. Keep the header row (title,content) intact

7. Limit file size to 10MB

Example CSV content:

```

title,content

"10 Tips for Better Writing","Writing well is a skill that can be learned. Here are our top 10 tips: 1. Keep sentences short..."

"The Complete Guide to SEO","Search Engine Optimisation is crucial for online visibility. This guide covers everything you need to know..."

```

4.5 Uploading Your CSV File

How to upload a CSV:

1. Navigate to the "Content" section

2. Click "Add Content" or "Upload CSV"

3. Click "Choose File" or drag and drop your CSV

4. Wait for file validation (system checks format)

5. Review the detected articles and count

6. If errors are found, correct them and re-upload

7. Click "Import Articles"

8. Wait for the import to complete

9. All articles are now in your content library

TIP: The import process may take 1-2 minutes for large CSV files with many articles.

4.6 CSV Validation And Error Handling

The system validates your CSV and checks for:

Common errors:

- Missing required columns (title, content)

- Empty title or content fields

- Incorrect file encoding

- Malformed CSV structure

- File too large (>10MB)

If errors are detected, you'll see:

- Error message describing the issue

- Row numbers where errors occurred

- Suggestions for fixing the problems

Fix errors in your CSV and re-upload.

5. Managing Loaded Content

After loading content using any of the three methods, your content appears in the Content Library. See the "Managing Your Content Library" guide for details on:

- Viewing all loaded content

- Enabling/disabling content for analysis

- Editing content names

- Deleting content

- Viewing website pages

6. Content Loading Best Practices

6.1 Choose The Right Method

- Single page crawl: Best for individual blog posts, articles, landing pages

- Deep crawl: Best for analysing a section of your website or multiple related pages

- PDF upload: Best for reports, whitepapers, ebooks, downloaded articles

- CSV upload: Best for bulk imports, migrating from other tools, analysing large content archives

6.2 Optimising Credit Usage

- Review loaded content before running analysis to avoid wasting credits on irrelevant pages

- Use the enable/disable toggle to exclude pages you don't want to analyse

- Combine related content into a single analysis run

- Schedule regular analysis instead of manual runs

6.3 Content Quality Tips

- Ensure URLs are publicly accessible (not behind logins)

- For PDFs, use text-based documents rather than scanned images

- For CSVs, ensure content is properly formatted and complete

- Remove duplicate content before uploading

- Use descriptive titles for easy identification

7. Troubleshooting Content Loading

Issue: "Invalid file format"

Solution: Ensure your file is a valid PDF or CSV. Check file extension and contents.

Issue: "Content extraction failed"

Solution: The file may be corrupted or incompatible. Try re-downloading or re-creating the file.

Issue: "Upload failed"

Solution: Check your internet connection. Ensure file size is within limits. Try again.

Issue: "No content detected in CSV"

Solution: Verify that your CSV has 'title' and 'content' columns and contains data rows.

Did this answer your question?