What's actually inside a PDF
A PDF isn't one blob of data — it's a small filesystem of numbered objects, each holding a piece of the document. Some objects describe pages, some are fonts, some are images, and a lot of them are content streams full of drawing instructions. When you 'compress a PDF', what you're really doing is going object by object and asking 'can this one be smaller?'
The reason two 10-page PDFs of the same length can be 200 KB and 80 MB respectively is almost entirely about what's in those objects. A PDF that is 10 pages of body text from a word processor is mostly compressed text plus subsetted fonts — there's almost nothing to shrink. A PDF that is 10 scanned book pages saved at 600 DPI as full-color JPEGs is 99% image data, and that image data is where every megabyte you ever recover lives.
Understanding which kind of PDF you have on screen is the single most important step. Compression that works miracles on one is a waste of CPU on the other.
The four compression filters that do all the work
PDF has a handful of named compression filters, but four of them account for almost every byte saved in practice. FlateDecode is the same DEFLATE algorithm used in zip and gzip; it's applied to text content streams and to PNG-like images. It's lossless and fast, and it's usually already on by default — re-compressing a FlateDecode stream won't shrink it further.
DCTDecode is JPEG. It's what an embedded photograph almost always uses. Lossy, very effective on photos, terrible on screenshots of text. The big lever here is quality: a Q90 photo and a Q70 photo can differ in file size by 3-4x while being visually indistinguishable on a phone screen. Re-encoding an existing DCT image at lower quality is one of the highest-leverage operations a compressor performs.
JBIG2 is the format that makes scanned documents shrinkable. It treats the page as a bitmap of marks (letters, mostly), finds visually-identical marks, and stores each unique mark once with a lookup table of where it appears. A 50-page scanned legal brief that is 80 MB with naive JPEG can drop to 4-5 MB with JBIG2 and still be perfectly readable.
CCITT (Group 4 fax encoding) is the old workhorse for pure black-and-white scans. If a page is genuinely 1-bit (no greyscale, no color, no anti-aliasing), CCITT often beats both JBIG2 and Flate for that page. SnapToolz Compress PDF picks among these per-image automatically based on what the source looks like.
Image downsampling — the biggest lever you have
If a PDF was generated from a scanner or from a phone camera, the embedded images are almost certainly higher resolution than they need to be. A page printed at 300 DPI on letter paper is 2550 x 3300 pixels. If you're going to view that PDF on a 1080p laptop screen at full-page zoom, you only need ~1100 x 1400 pixels. Downsampling the embedded image to that target can cut a file in half before any re-encoding happens.
The trade-off is sharpness at high zoom. If your reader will pinch-zoom into the page to read footnotes or numbers in a graph, downsampling below 200 DPI will visibly hurt. For documents that will be read at full-page zoom and never printed, 100-150 DPI is usually invisible. SnapToolz exposes this as a single 'target use' setting (Email, Web, Archive, Print) rather than asking you to pick DPI numbers, because picking DPI numbers is the wrong unit for the decision.
Downsampling is also the operation most likely to be permanent — once you've thrown pixels away you cannot put them back. Keep your original.
Font subsetting and content-stream cleanup
A PDF that embeds a font usually embeds the whole font — 70 KB or more per typeface for every weight and style. If the document only uses 200 distinct characters, you don't need the other 600 glyphs in the file. Font subsetting walks the document, finds which glyphs are actually used, and rewrites the embedded font to contain just those. For documents with several embedded fonts this can cut hundreds of KB.
Content-stream cleanup is the catch-all for everything else: removing unused objects, dropping editing history that some PDF generators leave behind, removing thumbnails the viewer will recreate on its own, and rewriting cross-reference tables in a more compact form. These usually account for 5-15% of the final saving on its own, but they're free — there's no quality loss whatsoever.
How to hit a target size
If you have a specific ceiling — 100 KB to attach to a job application, 200 KB for a visa form, 1 MB for most email systems, 5 MB for most government uploads — the right strategy depends on what's eating the bytes. SnapToolz Compress PDF lets you set the target directly; under the hood it tries progressively more aggressive image downsampling until the output fits, then stops.
A realistic example: a 50 MB scanned PDF (full color, 300 DPI, JPEG quality 95). Re-encode the JPEGs at Q75: drops to ~18 MB. Downsample to 200 DPI: drops to ~9 MB. Convert text-only pages to JBIG2: drops to ~5 MB. Strip embedded thumbnails and subset fonts: drops to ~4.6 MB. That's a 10x reduction with no visible loss for normal reading.
If you need to go below ~2 MB on a heavy scanned document, you're usually trading some sharpness or going from color to grayscale. The tool will tell you when it had to do that.
When you can't compress further
Three honest cases where no tool can help much. First, a PDF that's already been heavily compressed once. Compressing a JPEG that's already at Q60 to Q40 gives you about 15% more saving and a noticeably worse image — usually not worth it.
Second, a PDF that's mostly vector content (think CAD drawings, scientific charts, beautifully typeset technical papers). Vectors are already tiny; there's nothing to squeeze. A 12 MB CAD drawing has 12 MB of actual geometry.
Third, encrypted or signed PDFs. Some operations break the signature; some encryption modes prevent rewriting the file at all. SnapToolz will warn you and offer to remove the protection first if you have the password.
Tools used in this guide
Run it as a workflow
FAQ
- Why did my compressed PDF come out almost the same size as the original?
- Almost always because the file was already optimized — either it was generated by a careful tool, or someone else already ran it through a compressor. If it's already 95% JBIG2 or Flate, there's nothing left to squeeze without losing real quality.
- Will compression hurt the searchable text layer?
- No. Text content streams are compressed losslessly with FlateDecode, and the searchable text layer behind a scanned page (if there is one) is preserved. The only operation that can affect text quality is re-rasterizing a page to an image, which SnapToolz only does if you explicitly ask for it.
- Can I batch-compress a folder of PDFs?
- Yes — open the tool, drop multiple files, pick a target setting, and let it run. Each file is processed in sequence in a Web Worker so the UI stays responsive. Output files are offered as a single zip at the end.
- How small can I realistically get a scanned 50-page document?
- For a typical office scan (color, 300 DPI), expect ~10x reduction without visible loss, so a 50 MB file lands around 5 MB. With grayscale conversion and JBIG2 it can drop to 1-2 MB. Below that you're sacrificing readability, and the tool will warn you before doing it.