WordCram

open-source word clouds for Processing

Follow WordCram on Twitter @wordcram

Ask a Question in the Forum /wordcram

Pull Requests Welcome on GitHub danbernier/WordCram

Latest News

WordCram 1.0.0 Released, for Processing 3.0

WordCram 0.6.2 Released

WordCram 0.6.1 Released

All posts

PDFTextStream + WordCram

Snowtide Informatics has announced that their PDF text-extraction software, PDFTextStream, is now free for use in single-threaded applications.

This means you could take any PDF, extract its text, and make a word cloud from it - or even piles of PDFs. If you don’t have piles of PDFs, you can find a bunch available for free through The Hacker Shelf.

For instance, here’s a word cloud I made from Dick Gabriel’s Patterns of Software, which he makes freely available on his site:

(As good as the wordcram is, I highly recommend reading the book in its entirety, with all the words in their proper order.)

Here’s the code I used to create it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import com.snowtide.pdf.fonts.*;
import com.snowtide.util.*;
import com.snowtide.pdf.afm.*;
import com.snowtide.commons_logging.*;
import com.snowtide.pdf.forms.*;
import com.snowtide.pdf.parser.*;
import com.snowtide.pdf.annot.*;
import com.snowtide.pdf.util.*;
import com.snowtide.pdf.*;
import pdfts.examples.*;
import com.snowtide.pdf.lucene.*;
import com.snowtide.util.logging.*;
import com.snowtide.pdf.layout.*;
import com.snowtide.io.*;
import com.snowtide.commons_logging.impl.*;

import wordcram.*;

void setup()
  size(1100, 600);
  background(255);

  String text = "";
  try {
    text = loadPdf("~/reading/PatternsOfSoftware.pdf");
  }
  catch (Exception x) {
    println(x);
    exit();
  }

  new WordCram(this)
    .fromTextString(text)
    .withFont("DevanagariMT")
    .withWordPadding(1)
    .angledBetween(radians(10), radians(-10))
    .drawAll();

  save("patterns-of-software.png");
}

String loadPdf(String pdfFilePath) throws IOException {

  PDFTextStream pdfts = new PDFTextStream(pdfFilePath);
  StringBuilder text = new StringBuilder(1024);
  pdfts.pipe(new OutputTarget(text));
  pdfts.close();
  System.out.printf("The text extracted from %s is:", pdfFilePath);
  System.out.println(text);

  return text.toString();
}

<<

>>