THIS IS HISTORY! Check fabien.benetou.fr for news.
Seedea, scalable creativity
Xye, consultancy for serious creators
Information
(Updates)
see also AImatrix
Collect the key repositories of easy to interesting, accurate, large and easy exploit datasets.
PS : individual specialized dataset should not be listed here, only repositories.
Name | Specificity | Institution | Size | Last update |
---|---|---|---|---|
CKAN | Handling datasets as packages and having its own "client" datapkg* | Open Knowledge Foundation (irc) | 860 registered packages | live |
The Map of Data by Sindice | ? | DERI | ? | ? |
Linked Data Sets as RDF Dumps | from the ESW Wik | W3C | ? | ? |
Public Datasets | Dedicated to EC2 usage | Amazon AWS | ? | ? |
Free Redistributable Rich Data Sets | not always easy to use because of scarce data or "old" formats | InfoChimps.org | ? | ? |
data sets | "for people with large data sets" | theinfo.org | ? | ? |
datasets from ManyEyes | ? | IBM | ? | ? |
Data.gov | restricted to the USA, no mention of DARPA data (as of June 2009) | US government | ? | ? |
data.gov.uk | restricted to the UK | ? | ? | ? |
Twine | ? | Radar Networks | ? | ? |
Freebase | ? | Metaweb | ? | ? |
Worldometers | ? | ? | ? | ? |
Community Open Data Tables | prepared for Yahoo's YQL | Yahoo/community-driven | ? | ? |
Datasets from Programming For Peace | Datasets specialized in political conflicts | Multiple research groups | <10 | ? |
NYTimes Linked Open Data | SKOS File + API | New York Times | ? | live |
Concept Web | relying on wiki structure | WikiProfessional | 1 Million | ? |
LinkedData.org | provides a (non-working...) RSS feed | administered by Tom Heath | ? | ? |
ScraperWiki | wiki with scapers to configure | ? | ? | ? |
* Systems installed on Seedea.
Most (if not all?) are linked to one specific set of well formatted datasets, be sure to check if the data your want are there of easy to convert first.