Understanding Indie Search Engines

At Mojeek we like to do things differently, that's why we're building a search engine that respects your privacy whilst providing unique and unbiased results.

Archived Read

Ecosia - the search engine that plants trees

8/18/2022

Ecosia uses the ad revenue from your searches to plant trees where they are needed the most. By searching with Ecosia, you’re not only reforesting our planet, but you’re also empowering the communities around our planting projects to build a better future for themselves. Give it a try!

ecosia, green, search, engine

Archived Read

Spot

0/24/2023

spot ecloud global, powered by searx

spot, ecloud, searx, search, search engine, metasearch, meta search

Archived Read

Presearch - the Community-powered, Decentralized Search Engine

7/12/2023

Presearch is a decentralized search engine that provides search choice, quality results, privacy and rewards to those who want to end the search monopoly and take back the web.

Decentralized, Search Engine, Search, PRE

Read Archived

Tools

GitHub - adbar/trafilatura: Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

2/23/2022

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments) - GitHub - adbar/trafilatura: Web scraping library and command-line tool for text dis...

Read Archived

GitHub - puppeteer/puppeteer: Headless Chrome Node.js API

2/23/2022

Headless Chrome Node.js API. Contribute to puppeteer/puppeteer development by creating an account on GitHub.

Read Archived

GitHub - mozilla/readability: A standalone version of the readability lib

2/23/2022

A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.

Read Archived

Typesense

2/23/2022

Lightning-fast, open source search engine for everyone

typesense, search engine, fuzzy search, typo tolerance, faceting, filtering, app search, site search, search bar, algolia, elasticsearch

Archived Read

SentenceTransformers Documentation — Sentence-Transformers documentation

2/23/2022

You can install it using pip:

Archived Read

FastAPI

2/23/2022

FastAPI framework, high performance, easy to learn, fast to code, ready for production

Archived Read

google-research/scann at master · google-research/google-research

2/23/2022

Google Research. Contribute to google-research/google-research development by creating an account on GitHub.

Read Archived

MEMEX - marginalia.nu goes open source [ 2022-05-27 ]

5/27/2022

A motivating factor is the search engine has sort of grown to a scale where it's becoming increasingly difficult to productively work on as a personal solo project. It needs more structure. What's kept me from open sourcing it so far has also been the need for more structure. The needs of the marginalia project, and the needs of an open source project have effectively aligned.

Archived Read

Home - YaCy

Michael Christen

7/12/2023

YaCy P2P - Decentralized Search Engine

YaCy Suchmaschine search engine spider harvester indexer p2p peer network open free download software development

Archived Read

GitHub - N0taN3rd/node-warc: Parse And Create Web ARChive (WARC) files with node.js

7/12/2023

Parse And Create Web ARChive (WARC) files with node.js - GitHub - N0taN3rd/node-warc: Parse And Create Web ARChive (WARC) files with node.js

Archived Read

Enterprise Tools

Enterprise Search Engine - Amazon Kendra - AWS

1/10/2024

Amazon Kendra offers an intelligent enterprise search solution that increases employee productivity and improves customer satisfaction.

Read Archived

AI search that understands

1/10/2024

Enterprises and developers use Algolia’s AI search infrastructure to understand users and show them what they’re looking for.

Read Archived

Search for Static Sites

Elasticlunr.js, lightweight full-text search engine in Javascript for browser search and offline search.

Wei Song

1/15/2023

Elasticlunr.js, lightweight full-text search engine in Javascript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting and field search. A bit like Solr, but much smaller and not as bright, but also provide flexible configuration and query-time boosting.

elasticlunr, full-text search, information retrieval, offline search

Archived Read

Lunr: A bit like Solr, but much smaller and not as bright

1/15/2023

LunrSearch made simple

Archived Read

{if(!e.target.className.includes("read-link")&&!e.target.className.includes("title-link")){const mainLinks=this.querySelectorAll("a.main-link");mainLinks[0].click()}}}}}customElements.define("contexter-box",ContexterBox)},window.contexterSetupComplete||window.contexterSetup();

Pagefind | Pagefind

4/31/2022

Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure.

Archived Read" itemprop="url">Pagefind | Pagefind4/31/2022

Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure.

Archived Read

Stork Search

1/15/2023

Impossibly fast web search, built for static sites.

Archived Read

Specific Search & Recommendation Platforms

Blog Surf | Blog Search Engine

2/23/2022

Blog Surf is the internet's only search engine for blogs. Explore the best writing on the internet.

Archived Read

@braedon

2/23/2022

An open index of well-known resources.

Read Archived

TinyGem.org - bookmarking and content recommendations for people who love to read Hacker News.

2/23/2022

TinyGem is a bookmarking service, that automatically uses the links you save to surface other related content from manually curated sources. If you are intelectually curious, have a selective news diet and enjoy reading places like Hacker News, TinyGem might be for you.

Read Archived

Archie www interface

4/28/2025

This ArchiePlex form can locate files on Anonymous FTP sitesin the Internet.

Read Archived

Corpuses

Common Crawl

6/5/2022

Archived Read

HTTP Archive

4/7/2018

The HTTP Archive Tracks how the web is built by periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.

Archived Read

Crawl Techniques

puppeteer-extra-plugin-stealth

10/9/2022

Stealth mode: Applies various techniques to make detection of headless puppeteer harder.. Latest version: 2.11.1, last published: 3 months ago. Start using puppeteer-extra-plugin-stealth in your project by running `npm i puppeteer-extra-plugin-stealth`. There are 334 other projects in the npm registry using puppeteer-extra-plugin-stealth.

puppeteer, puppeteer-extra, puppeteer-extra-plugin, stealth, stealth-mode, detection-evasion, crawler, chrome, headless, pupeteer

Archived Read

Day 2: Building a tool to generate context pages

Aram Zucker-Scharff

0/8/2022

I want to share lists of links, but make them readable and archived

posts, projects, 11ty, Node, WiP, fetch, Context Pages

Archived Read

Wikidata:Data access - Wikidata

10/9/2022

Other languages:

Archived Read

Notes on responsible web crawling | James' Coffee Blog

6/5/2024

In my blog post brainstorming a new indie web search engine, I noted that running a web search engine is hard. With that in mind, I started to think that I haven't written too much about what I learned about web crawling when running IndieWeb Search, a search engine for the indie web. IndieWeb Search crawled a whitelist of websites, searching for pages, and indexed them for use in the search engine.

Read Archived

Search Techniques

A look at search engines with their own indexes

Seirdy

8/14/2024

A cursory review of all the non-metasearch, indexing search engines I have been able to find.

Read Archived

How To Search The Internet

Wouter Groeneveld

1/26/2024

Thanks to the multi billion dollar advertisement industry, searching for something on the internet …

indieweb, search engines, How To Search The Internet, post

Read Archived

Code

e / infra / spot · GitLab

0/24/2023

GitLab Enterprise Edition

Read Archived

Welcome to searx — Searx Documentation (Searx-1.1.0.tex)

0/24/2023

Search without being tracked.

Archived Read

Build your own Search Engine

10/7/2022

The source code and instructions to create your own version of Wiby.

Archived Read

GitHub - cblgh/lieu: community search engine

7/12/2023

community search engine. Contribute to cblgh/lieu development by creating an account on GitHub.

Archived Read

Download and install Searchlab and YaCy Grid - YaCy Searchlab

Michael Christen

7/12/2023

Search as a service with YaCy Searchlab: Web Crawling and Data Science Apps for Web Content

YaCy Suchmaschine search engine spider harvester indexer p2p peer network open free download software development

Read Archived

https://pagefind.app/

The lifecycle of a search query on my blog | James' Coffee Blog

4/20/2025

Suppose you are looking for my Aeropress recipe. To find this information, you could turn to my blog search engine. This search engine indexes all of my blog posts. The search engine is powered by JameSQL, a NoSQL document database.

Read Archived

Why Should We Care?

The Tragedy of Google Search

Charlie Warzel

9/22/2023

With a landmark antitrust trial under way, a giant of the modern web is buckling under its own weight.

Google Search, Google Search feels, SEO expert, endless libraries of online information, physical world, last year, part encyclopedia, predictive engine, antitrust laws, company command, product placement, user data, Yahoo CEO Marissa Mayer, modern web, charitable explanation, midlife crisis, Google, opening days, company, U.S. search-engine market, search engine, online-search business, open secret of Google SearchThe company, longtime CEO of Waze, Noam Bardin, search quality, importance of data, Marie Haynes, efficient former self, Internal Google emails, Silicon Valley, own success, start-ups, Last fall, most helpful results, recent years, Justice Department, early years, good use, ever-evolving internet, generic mailers.It’s fitting, Google’s mission statement, default browser, simple question, real lesson of Google, scale, unfathomable amounts of information, tacit admission, heart of the case, past searches, technology, Technology

Read Archived

Maybe someday we'll actually be able to search the Web privately

@ekr____

10/3/2023

A look at the new Tiptoe encrypted search system

Archived Read

How Google Alters Search Queries to Get at Your Wallet

Condé Nast

10/2/2023

Testimony during Google’s antitrust case revealed that the company may be altering billions of queries a day to generate search results that will get you to buy more stuff.

ideas, search, google, antitrust, algorithms, advertising, textaboveleftgridwidth, web, tags

Archived Read

The Three-Legged Stool: A Manifesto for a Smaller, Denser Internet - Initiative for Digital Public Infrastructure at UMass Amherst,The Three-Legged Stool: A Manifesto for a Smaller, Denser Internet - Initiative for Digital Public Infrastructure at UMass Amherst

idpiumass

10/9/2024

“The Three-Legged Stool” is the Initiative for Digital Public Infrastructure’s banner white paper: the culmination of our work here at the lab so far and our roadmap for our efforts in the coming years. It was written primarily by Chand Rajendra-Nicolucci and Michael Sugarman under the editorial direction of Ethan Zuckerman. Access “The Three-Legged Stool” […],"The Three-Legged Stool" is the Initiative for Digital Public Infrastructure's banner white paper: the culmination of our work here at the lab so far and our roadmap for our efforts in the coming years. It was written primarily by Chand Rajendra-Nicolucci and Michael Sugarman under the editorial dir

Read Archived

Re-Organizing the World’s Information: Why we need more Boutique…

4/20/2025

For most queries, Google search is pretty underwhelming these days. Google is great at answering questions with an objective answer, like “# of billionaires in the world” or “What is the population of Iceland”. It’s pretty bad at answering questions that require judgment and context like “What do NFT collectors think about NFTs?”.

Archived Read

UK’s CMA slaps Google Search and its 90%+ market share with an antitrust investigation | TechCrunch

Ingrid Lunden

1/14/2025

The Competition and Markets Authority — the U.K.’s antitrust watchdog — is wasting no time in lodging its first official investigation of 2025 under its

Read Archived

The 'bias machine': How Google tells you what you want to hear

Thomas Germain

11/1/2024

"We're at the mercy of Google." Undecided voters in the US who turn to Google may see dramatically different views of the world – even when they're asking the exact same question.

Archived Read

Talking About Search

.@vladquant has a new noncommercial search engine that looks pretty neat. I like this trend. https://t.co/6Y5UpmpaYj
— Ernie Smith (@ShortFormErnie) March 23, 2022

I chatted with him back in January: https://t.co/nWWxHPadRb
— Ernie Smith (@ShortFormErnie) March 23, 2022

Is there an open source object definition for search engine indexes that are willing to work together like Teclis, Kagi and Marginalia seem to do? Could I build my own index and federate in?
— Aram Zucker-Scharff (@Chronotope) March 23, 2022

Let’s invent a term: “The Oggoverse”

The idea behind said term is that it’s basically the opposite of what Google does, so rather than starting with goog, it starts with oggo.
— Ernie Smith (@ShortFormErnie) March 23, 2022

@robinberjon & @braedon were discussing this very thing and now it's in the back of my brain b/c I find it fascinating. Looking at the bottom of page documentation it looks like something I could roll into another effort I'm working on...
— Aram Zucker-Scharff (@Chronotope) March 23, 2022

I'm sick of dead links so building a tool to basically create an index of things I link to in a blog and archive them. I hadn't thought to make it searchable, but now I'm thinking it might make sense to AND it might be something I could offer up to use by indie search engines.
— Aram Zucker-Scharff (@Chronotope) March 23, 2022

Since I'm building an accompanying plugin that lets it work together w/11ty sites (& potentially other static site builders later) I think it would be cool to create a system where passionate bloggers build indexes that they can offer to search engines. https://t.co/LgIeYtsV26
— Aram Zucker-Scharff (@Chronotope) March 23, 2022

And @braedon's work indexing .well-known could be used as a ranking factor if he wanted to share it. It would be easy to see how these things could become mutually beneficial with partnered search engines providing functionality back to replace "Search with Google" embeds.
— Aram Zucker-Scharff (@Chronotope) March 23, 2022

I've been thinking about that a lot and I reckon we can fix it. Building a good index and building a good search UI are two very different things (as Google keeps demonstrating). The only reason to have them together is that the ads in the UI pay for the index. But... https://t.co/UloXHlkWle
— Robin Berjon (@robinberjon) March 18, 2022

...we can split that. There are many proposals for interoperability in social, which is useful but hard, we need to look at interoperability in search, which is a lot easier and comes with great benefits.
— Robin Berjon (@robinberjon) March 18, 2022

Competition in UI, built-in multihoming, integration into browsers and more, diversity of business models (ads or pay), build your own, merge results, no AMP, a return to media pluralism...

It's pure upside. We are *choosing* to live with shitty search by not doing this.
— Robin Berjon (@robinberjon) March 18, 2022

I'm very interested in how Github Actions (and the like) can make building rich indexes cheap and easy if handled properly. It's interesting to think how that might come together with an federation model and a Wikipedia editors approach to topical maintenance and interests.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

I've been wondering about distributed indices but I don't know if it can be done with potentially adversarial participants?
— Robin Berjon (@robinberjon) March 18, 2022

I think there's a good model in how Wikipedia editors handle refining and working on a piece and while, mechanically it is adversarial, I don't think it is philosophically. A mechanism for merging indices with particular priorities or outcome data seems like it would be useful.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

Maybe I don't have a clear view of what you have in mind, but that's not the adversarial I have in mind. I'm worried about malicious actors deliberately decreasing the value.

Indexing has to be automated, there's too much drudgery. How do you only get good indexing?
— Robin Berjon (@robinberjon) March 18, 2022

That's what I was thinking, you need good indexes and sure they have to be automated for collection, but if you have people engaged with topical expertise, you can still have editors who own and maintain indexes of subsets of the web with topical focuses. Then, join em together.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

Thinking along Mastodon lines here, for example, if you want to search for code tips you federate with the folks who have expertise in code and have selected the best sites to index in that space. It could incentivize good citizens to act well, like with Wikipedia editors.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

Hmm, an interesting idea. I wonder how effectively you could integrate heterogeneous indices into a cohesive general search experience?

Wikipedia and Mastodon bring together content from many different sources, but all following a very restricted form.
— Braedon (@braedon) March 19, 2022

But I suspect there's a lot of potential variation in how different indices are best structured, and queried.

I run a very niche index in Well-Known, and would love to contribute that to a project like this, but it's built very differently to a general purpose search engine.
— Braedon (@braedon) March 19, 2022

If you're meaning people only contribute on *what* gets indexed, with most of the technical indexing implementation common/fixed, that seems more clearly doable. But also suggests to me a centralised platform like Wikipedia, which needs a large entity to host it.
— Braedon (@braedon) March 19, 2022

I think it's possible to be more distributed than Wikipedia, but I agree that providing infrastructure is key. You need a lot of common interfaces and the ability to develop bureaucracy where needed.
— Robin Berjon (@robinberjon) March 19, 2022

Yeah, for sure! I think core concepts include being able to join & participate in an indexing project in some way, being able to request & refuse peering, & being able to merge similar structured indices w/different data, & allowing any indexing project to choose merge priorities
— Aram Zucker-Scharff (@Chronotope) March 19, 2022

I've been thinking about that a lot and I reckon we can fix it. Building a good index and building a good search UI are two very different things (as Google keeps demonstrating). The only reason to have them together is that the ads in the UI pay for the index. But... https://t.co/UloXHlkWle
— Robin Berjon (@robinberjon) March 18, 2022

...we can split that. There are many proposals for interoperability in social, which is useful but hard, we need to look at interoperability in search, which is a lot easier and comes with great benefits.
— Robin Berjon (@robinberjon) March 18, 2022

Competition in UI, built-in multihoming, integration into browsers and more, diversity of business models (ads or pay), build your own, merge results, no AMP, a return to media pluralism...

It's pure upside. We are *choosing* to live with shitty search by not doing this.
— Robin Berjon (@robinberjon) March 18, 2022

I'm very interested in how Github Actions (and the like) can make building rich indexes cheap and easy if handled properly. It's interesting to think how that might come together with an federation model and a Wikipedia editors approach to topical maintenance and interests.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

I've been wondering about distributed indices but I don't know if it can be done with potentially adversarial participants?
— Robin Berjon (@robinberjon) March 18, 2022

I think there's a good model in how Wikipedia editors handle refining and working on a piece and while, mechanically it is adversarial, I don't think it is philosophically. A mechanism for merging indices with particular priorities or outcome data seems like it would be useful.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

Maybe I don't have a clear view of what you have in mind, but that's not the adversarial I have in mind. I'm worried about malicious actors deliberately decreasing the value.

Indexing has to be automated, there's too much drudgery. How do you only get good indexing?
— Robin Berjon (@robinberjon) March 18, 2022

You hire librarians.
— melody joy kramer (@mkramer) March 18, 2022

Yeah, this too! It isn't accidental that Wikipedia editors and librarians tend to overlap. I think the thing we need is a way that empowers people with passion and expertise to make the best indexes and represent themselves as candidates to join your search tool to.
— Aram Zucker-Scharff (@Chronotope) March 18, 2022

I love this model (and librarians)! So, this would need shared access to crawl (so as not to duplicate infrastructure, à la common crawl), an API to query federated indices, and ILP so people can be paid and infra costs covered. It's not crazy hard!
— Robin Berjon (@robinberjon) March 18, 2022

I don't know if you'd need a shared crawler, crawling is easy and these days fairly cheap. You just need a shared crawler standard and an agreement to use a common user agent I think?
— Aram Zucker-Scharff (@Chronotope) March 19, 2022

I think this could get really interesting if it came together with Hyperdrive, your browsing than stands in for the crawler and your storage hosts an index which gets auto joined via connections, with larger servers using Hyper agents to pull federation into the standard web.
— Aram Zucker-Scharff (@Chronotope) March 19, 2022

I considered that, but how do you prevent this from including super private stuff that shows up in page?
— Robin Berjon (@robinberjon) March 19, 2022

Yeah that part is a little messy, but maybe start with 'click to add', build a blocklist, evolve the process. Private URLs are a relatively small set compared to the rest of the web I'd bet, and many have common characteristics.
— Aram Zucker-Scharff (@Chronotope) March 19, 2022

I don't know, that would not be foolproof enough. Like, your name would be caught up with news content if you're signed in.
— Robin Berjon (@robinberjon) March 19, 2022

How about: sites *push* their content to be indexed, along with IP terms (to prevent the hostile mining that Google does but also independent repub), and Hyper agents are used to verify that real site content matches the index, to catch cheaters.
— Robin Berjon (@robinberjon) March 19, 2022

Plus, detection of interstitials, notification prompts, bad cookie banners, perf...
— Robin Berjon (@robinberjon) March 19, 2022

How to Surf the Web

5/23/2022

A guide for how to discover cool things on the internet.

Archived Read

How to Surf the Web

@bradenslen

5/23/2022

Hello. I was going to write a post about how to surf the web only I remembered it had already been written, in a far more comprehensive format, by another person. So I'm just going to link to it and…

Archived Read

Search engines | Everything I know

5/27/2022

Meilisearch is neat together with their tokenizer lib they use. More practically DocSearch is great for plug and use solution. Tantivy, Quickwit & Edgesearch are interesting too.

Archived Read

IndieWeb Search - IndieWeb

5/27/2022

This article is a stub. You can help the IndieWeb wiki by expanding it.

Archived Read

A Search Engine Designed To Surprise You - OneZero

Clive Thompson

8/16/2021

Hey nerds: I recently stumbled across “Marginalia Search”. It’s a search engine with a fascinating design — rather than give you exactly what you’re looking for, it tries to surprise you.

Archived Read

Indie Map: Docs

6/5/2022

Indie Map is a complete crawl of 2300 of the most active IndieWeb sites as of June 2017, sliced and diced and rolled up in a few useful ways:

Archived Read

Providing Meaningful Search Results Without Own Index?- Björn Wärmedal

6/7/2022

🍵️

Archived Read

The Future of Search Is Boutique

5/2/2022

The way to improve search is not to mimic Google, but instead to build boutique search engines that index, curate, and organize things in new ways.

Archived Read

Federated Search

6/7/2022

bookmark

Archived Read

What Google Search Isn’t Showing You

Condé Nast

2/10/2022

Kyle Chayka writes about the evolution of Google Search, which has become the runaway favorite Internet search engine despite many users’ misgivings about how the company monetizes the data it collects and how its algorithms determine the search results that a user is shown.

infinite scroll, new yorker favorites, google, search engines, internet, digital technology, algorithmic bias, textbelowcenterfullbleednocontributor, web, tags

Archived Read

Brainstorming a new indie web search engine | James' Coffee Blog

6/5/2024

I would like for there to be more tiny search engines that are focused on a particular topic. It would be cool if I could type in

Read Archived